* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ntp: Clamp PLL update interval
D: portions of the Linux Security Module (LSM) framework and security modules
N: Petr Vandrovec
-E: vandrove@vc.cvut.cz
+E: petr@vandrovec.name
D: Small contributions to ncpfs
D: Matrox framebuffer driver
-S: Chudenicka 8
-S: 10200 Prague 10, Hostivar
-S: Czech Republic
+S: 21513 Conradia Ct
+S: Cupertino, CA 95014
+S: USA
N: Thibaut Varene
E: T-Bone@parisc-linux.org
<sect1><title>Atomic and pointer manipulation</title>
!Iarch/x86/include/asm/atomic.h
-!Iarch/x86/include/asm/unaligned.h
</sect1>
<sect1><title>Delaying, scheduling, and timer routines</title>
</para>
<sect1><title>String Conversions</title>
-!Ilib/vsprintf.c
!Elib/vsprintf.c
</sect1>
<sect1><title>String Manipulation</title>
all the readers who were traversing the list when we deleted the
element are finished. We use <function>call_rcu()</function> to
register a callback which will actually destroy the object once
- the readers are finished.
+ all pre-existing readers are finished. Alternatively,
+ <function>synchronize_rcu()</function> may be used to block until
+ all pre-existing are finished.
</para>
<para>
But how does Read Copy Update know when the readers are
- object_put(obj);
+ list_del_rcu(&obj->list);
cache_num--;
-+ call_rcu(&obj->rcu, cache_delete_rcu, obj);
++ call_rcu(&obj->rcu, cache_delete_rcu);
}
/* Must be holding cache_lock */
if (++cache_num > MAX_CACHE_SIZE) {
struct object *i, *outcast = NULL;
list_for_each_entry(i, &cache, list) {
-@@ -85,6 +94,7 @@
- obj->popularity = 0;
- atomic_set(&obj->refcnt, 1); /* The cache holds a reference */
- spin_lock_init(&obj->lock);
-+ INIT_RCU_HEAD(&obj->rcu);
-
- spin_lock_irqsave(&cache_lock, flags);
- __cache_add(obj);
@@ -104,12 +114,11 @@
struct object *cache_find(int id)
{
</sect1>
</chapter>
+ <chapter id="apiref">
+ <title>Mutex API reference</title>
+!Iinclude/linux/mutex.h
+!Ekernel/mutex.c
+ </chapter>
+
<chapter id="references">
<title>Further reading</title>
include:
a. Keeping a count of the number of data-structure elements
- used by the RCU-protected data structure, including those
- waiting for a grace period to elapse. Enforce a limit
- on this number, stalling updates as needed to allow
- previously deferred frees to complete.
-
- Alternatively, limit only the number awaiting deferred
- free rather than the total number of elements.
+ used by the RCU-protected data structure, including
+ those waiting for a grace period to elapse. Enforce a
+ limit on this number, stalling updates as needed to allow
+ previously deferred frees to complete. Alternatively,
+ limit only the number awaiting deferred free rather than
+ the total number of elements.
+
+ One way to stall the updates is to acquire the update-side
+ mutex. (Don't try this with a spinlock -- other CPUs
+ spinning on the lock could prevent the grace period
+ from ever ending.) Another way to stall the updates
+ is for the updates to use a wrapper function around
+ the memory allocator, so that this wrapper function
+ simulates OOM when there is too much memory awaiting an
+ RCU grace period. There are of course many other
+ variations on this theme.
b. Limiting update rate. For example, if updates occur only
once per hour, then no explicit rate limiting is required,
and the compiler to freely reorder code into and out of RCU
read-side critical sections. It is the responsibility of the
RCU update-side primitives to deal with this.
+
+17. Use CONFIG_PROVE_RCU, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and
+ the __rcu sparse checks to validate your RCU code. These
+ can help find problems as follows:
+
+ CONFIG_PROVE_RCU: check that accesses to RCU-protected data
+ structures are carried out under the proper RCU
+ read-side critical section, while holding the right
+ combination of locks, or whatever other conditions
+ are appropriate.
+
+ CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
+ same object to call_rcu() (or friends) before an RCU
+ grace period has elapsed since the last time that you
+ passed that same object to call_rcu() (or friends).
+
+ __rcu sparse checks: tag the pointer to the RCU-protected data
+ structure with __rcu, and sparse will warn you if you
+ access that pointer without the services of one of the
+ variants of rcu_dereference().
+
+ These debugging aids can help you find problems that are
+ otherwise extremely difficult to spot.
o For !CONFIG_PREEMPT kernels, a CPU looping anywhere in the kernel
without invoking schedule().
+o A CPU-bound real-time task in a CONFIG_PREEMPT kernel, which might
+ happen to preempt a low-priority task in the middle of an RCU
+ read-side critical section. This is especially damaging if
+ that low-priority task is not permitted to run on any other CPU,
+ in which case the next RCU grace period can never complete, which
+ will eventually cause the system to run out of memory and hang.
+ While the system is in the process of running itself out of
+ memory, you might see stall-warning messages.
+
+o A CPU-bound real-time task in a CONFIG_PREEMPT_RT kernel that
+ is running at a higher priority than the RCU softirq threads.
+ This will prevent RCU callbacks from ever being invoked,
+ and in a CONFIG_TREE_PREEMPT_RCU kernel will further prevent
+ RCU grace periods from ever completing. Either way, the
+ system will eventually run out of memory and hang. In the
+ CONFIG_TREE_PREEMPT_RCU case, you might see stall-warning
+ messages.
+
o A bug in the RCU implementation.
o A hardware failure. This is quite unlikely, but has occurred
of RCU callbacks is ready to invoke, then the remainder will
be deferred.
+o "ci" is the number of RCU callbacks that have been invoked for
+ this CPU. Note that ci+ql is the number of callbacks that have
+ been registered in absence of CPU-hotplug activity.
+
+o "co" is the number of RCU callbacks that have been orphaned due to
+ this CPU going offline.
+
+o "ca" is the number of RCU callbacks that have been adopted due to
+ other CPUs going offline. Note that ci+co-ca+ql is the number of
+ RCU callbacks registered on this CPU.
+
There is also an rcu/rcudata.csv file with the same information in
comma-separated-variable spreadsheet format.
o "jfq" is the number of jiffies remaining for this grace period
before force_quiescent_state() is invoked to help push things
- along. Note that CPUs in dyntick-idle mode thoughout the grace
+ along. Note that CPUs in dyntick-idle mode throughout the grace
period will not report on their own, but rather must be check by
some other CPU via force_quiescent_state().
--- /dev/null
+CFQ ioscheduler tunables
+========================
+
+slice_idle
+----------
+This specifies how long CFQ should idle for next request on certain cfq queues
+(for sequential workloads) and service trees (for random workloads) before
+queue is expired and CFQ selects next queue to dispatch from.
+
+By default slice_idle is a non-zero value. That means by default we idle on
+queues/service trees. This can be very helpful on highly seeky media like
+single spindle SATA/SAS disks where we can cut down on overall number of
+seeks and see improved throughput.
+
+Setting slice_idle to 0 will remove all the idling on queues/service tree
+level and one should see an overall improved throughput on faster storage
+devices like multiple SATA/SAS disks in hardware RAID configuration. The down
+side is that isolation provided from WRITES also goes down and notion of
+IO priority becomes weaker.
+
+So depending on storage and workload, it might be useful to set slice_idle=0.
+In general I think for SATA/SAS disks and software RAID of SATA/SAS disks
+keeping slice_idle enabled should be useful. For any configurations where
+there are multiple spindles behind single LUN (Host based hardware RAID
+controller or for storage arrays), setting slice_idle=0 might end up in better
+throughput and acceptable latencies.
+
+CFQ IOPS Mode for group scheduling
+===================================
+Basic CFQ design is to provide priority based time slices. Higher priority
+process gets bigger time slice and lower priority process gets smaller time
+slice. Measuring time becomes harder if storage is fast and supports NCQ and
+it would be better to dispatch multiple requests from multiple cfq queues in
+request queue at a time. In such scenario, it is not possible to measure time
+consumed by single queue accurately.
+
+What is possible though is to measure number of requests dispatched from a
+single queue and also allow dispatch from multiple cfq queue at the same time.
+This effectively becomes the fairness in terms of IOPS (IO operations per
+second).
+
+If one sets slice_idle=0 and if storage supports NCQ, CFQ internally switches
+to IOPS mode and starts providing fairness in terms of number of requests
+dispatched. Note that this mode switching takes effect only for group
+scheduling. For non-cgroup users nothing should change.
CFQ sysfs tunable
=================
/sys/block/<disk>/queue/iosched/group_isolation
+-----------------------------------------------
If group_isolation=1, it provides stronger isolation between groups at the
expense of throughput. By default group_isolation is 0. In general that
and one wants stronger isolation between groups, then set group_isolation=1
but this will come at cost of reduced throughput.
+/sys/block/<disk>/queue/iosched/slice_idle
+------------------------------------------
+On a faster hardware CFQ can be slow, especially with sequential workload.
+This happens because CFQ idles on a single queue and single queue might not
+drive deeper request queue depths to keep the storage busy. In such scenarios
+one can try setting slice_idle=0 and that would switch CFQ to IOPS
+(IO operations per second) mode on NCQ supporting hardware.
+
+That means CFQ will not idle between cfq queues of a cfq group and hence be
+able to driver higher queue depth and achieve better throughput. That also
+means that cfq provides fairness among groups in terms of IOPS and not in
+terms of disk time.
+
+/sys/block/<disk>/queue/iosched/group_idle
+------------------------------------------
+If one disables idling on individual cfq queues and cfq service trees by
+setting slice_idle=0, group_idle kicks in. That means CFQ will still idle
+on the group in an attempt to provide fairness among groups.
+
+By default group_idle is same as slice_idle and does not do anything if
+slice_idle is enabled.
+
+One can experience an overall throughput drop if you have created multiple
+groups and put applications in that group which are not driving enough
+IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
+on individual groups and throughput should improve.
+
What works
==========
- Currently only sync IO queues are support. All the buffered writes are
identifier (rather than the kernel's). The actual value is
architecture and platform dependent.
-3) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
+3) /sys/devices/system/cpu/cpuX/topology/book_id:
+
+ the book ID of cpuX. Typically it is the hardware platform's
+ identifier (rather than the kernel's). The actual value is
+ architecture and platform dependent.
+
+4) /sys/devices/system/cpu/cpuX/topology/thread_siblings:
internel kernel map of cpuX's hardware threads within the same
core as cpuX
-4) /sys/devices/system/cpu/cpuX/topology/core_siblings:
+5) /sys/devices/system/cpu/cpuX/topology/core_siblings:
internal kernel map of cpuX's hardware threads within the same
physical_package_id.
+6) /sys/devices/system/cpu/cpuX/topology/book_siblings:
+
+ internal kernel map of cpuX's hardware threads within the same
+ book_id.
+
To implement it in an architecture-neutral way, a new source file,
-drivers/base/topology.c, is to export the 4 attributes.
+drivers/base/topology.c, is to export the 4 or 6 attributes. The two book
+related sysfs files will only be created if CONFIG_SCHED_BOOK is selected.
For an architecture to support this feature, it must define some of
these macros in include/asm-XXX/topology.h:
#define topology_physical_package_id(cpu)
#define topology_core_id(cpu)
+#define topology_book_id(cpu)
#define topology_thread_cpumask(cpu)
#define topology_core_cpumask(cpu)
+#define topology_book_cpumask(cpu)
The type of **_id is int.
The type of siblings is (const) struct cpumask *.
3) thread_siblings: just the given CPU
4) core_siblings: just the given CPU
+For architectures that don't support books (CONFIG_SCHED_BOOK) there are no
+default definitions for topology_book_id() and topology_book_cpumask().
+
Additionally, CPU topology information is provided under
/sys/devices/system/cpu and includes these files. The internal
source for the output is in brackets ("[]").
----------------------------
-What: Support for VMware's guest paravirtuliazation technique [VMI] will be
- dropped.
-When: 2.6.37 or earlier.
-Why: With the recent innovations in CPU hardware acceleration technologies
- from Intel and AMD, VMware ran a few experiments to compare these
- techniques to guest paravirtualization technique on VMware's platform.
- These hardware assisted virtualization techniques have outperformed the
- performance benefits provided by VMI in most of the workloads. VMware
- expects that these hardware features will be ubiquitous in a couple of
- years, as a result, VMware has started a phased retirement of this
- feature from the hypervisor. We will be removing this feature from the
- Kernel too. Right now we are targeting 2.6.37 but can retire earlier if
- technical reasons (read opportunity to remove major chunk of pvops)
- arise.
-
- Please note that VMI has always been an optimization and non-VMI kernels
- still work fine on VMware's platform.
- Latest versions of VMware's product which support VMI are,
- Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
- releases for these products will continue supporting VMI.
-
- For more details about VMI retirement take a look at this,
- http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html
-
-Who: Alok N Kataria <akataria@vmware.com>
-
-----------------------------
-
What: Support for lcd_switch and display_get in asus-laptop driver
When: March 2010
Why: These two features use non-standard interfaces. There are the
If you want to initialize a structure with an invalid GPIO number, use
some negative number (perhaps "-EINVAL"); that will never be valid. To
-test if a number could reference a GPIO, you may use this predicate:
+test if such number from such a structure could reference a GPIO, you
+may use this predicate:
int gpio_is_valid(int number);
A number that's not valid will be rejected by calls which may request
or free GPIOs (see below). Other numbers may also be rejected; for
-example, a number might be valid but unused on a given board.
-
-Whether a platform supports multiple GPIO controllers is currently a
-platform-specific implementation issue.
+example, a number might be valid but temporarily unused on a given board.
+Whether a platform supports multiple GPIO controllers is a platform-specific
+implementation issue, as are whether that support can leave "holes" in the space
+of GPIO numbers, and whether new controllers can be added at runtime. Such issues
+can affect things including whether adjacent GPIO numbers are both valid.
Using GPIOs
-----------
ARCH_REQUIRE_GPIOLIB or ARCH_WANT_OPTIONAL_GPIOLIB
and arrange that its <asm/gpio.h> includes <asm-generic/gpio.h> and defines
three functions: gpio_get_value(), gpio_set_value(), and gpio_cansleep().
-They may also want to provide a custom value for ARCH_NR_GPIOS.
-ARCH_REQUIRE_GPIOLIB means that the gpio-lib code will always get compiled
+It may also provide a custom value for ARCH_NR_GPIOS, so that it better
+reflects the number of GPIOs in actual use on that platform, without
+wasting static table space. (It should count both built-in/SoC GPIOs and
+also ones on GPIO expanders.
+
+ARCH_REQUIRE_GPIOLIB means that the gpiolib code will always get compiled
into the kernel on that architecture.
-ARCH_WANT_OPTIONAL_GPIOLIB means the gpio-lib code defaults to off and the user
+ARCH_WANT_OPTIONAL_GPIOLIB means the gpiolib code defaults to off and the user
can enable it and build it into the kernel optionally.
If neither of these options are selected, the platform does not support
I2C devices get this attribute created automatically.
RO
-update_rate The rate at which the chip will update readings.
+update_interval The interval at which the chip will update readings.
Unit: millisecond
RW
- Some devices have a variable update rate. This attribute
- can be used to change the update rate to the desired
- frequency.
+ Some devices have a variable update rate or interval.
+ This attribute can be used to change it to the desired value.
************
section titled <section title> from <filename>.
Spaces are allowed in <section title>; do not quote the <section title>.
+!C<filename> is replaced by nothing, but makes the tools check that
+all DOC: sections and documented functions, symbols, etc. are used.
+This makes sense to use when you use !F/!P only and want to verify
+that all documentation is included.
+
Tim.
*/ <twaugh@redhat.com>
[ARM] imx_timer1,OSTS,netx_timer,mpu_timer2,
pxa_timer,timer3,32k_counter,timer0_1
[AVR32] avr32
- [X86-32] pit,hpet,tsc,vmi-timer;
+ [X86-32] pit,hpet,tsc;
scx200_hrt on Geode; cyclone on IBM x440
[MIPS] MIPS
[PARISC] cr16
Reserves a hole at the top of the kernel virtual
address space.
+ reservelow= [X86]
+ Format: nn[K]
+ Set the amount of memory to reserve for BIOS at
+ the bottom of the address space.
+
reset_devices [KNL] Force drivers to reset the underlying device
during initialization.
disables clocksource verification at runtime.
Used to enable high-resolution timer mode on older
hardware, and in virtualized environment.
+ [x86] noirqtime: Do not use TSC to do irq accounting.
+ Used to run time disable IRQ_TIME_ACCOUNTING on any
+ platforms where RDTSC is slow and this accounting
+ can add overhead.
turbografx.map[2|3]= [HW,JOY]
TurboGraFX parallel port interface
registration and unregistration.
Probe handlers are run with preemption disabled. Depending on the
-architecture, handlers may also run with interrupts disabled. In any
-case, your handler should not yield the CPU (e.g., by attempting to
-acquire a semaphore).
+architecture and optimization state, handlers may also run with
+interrupts disabled (e.g., kretprobe handlers and optimized kprobe
+handlers run without interrupt disabled on x86/x86-64). In any case,
+your handler should not yield the CPU (e.g., by attempting to acquire
+a semaphore).
Since a return probe is implemented by replacing the return
address with the trampoline's address, stack backtraces and calls
mutex semantics are sufficient for your code, then there are a couple
of advantages of mutexes:
- - 'struct mutex' is smaller on most architectures: .e.g on x86,
+ - 'struct mutex' is smaller on most architectures: E.g. on x86,
'struct semaphore' is 20 bytes, 'struct mutex' is 16 bytes.
A smaller structure size means less RAM footprint, and better
CPU-cache utilization.
void mutex_lock_nested(struct mutex *lock, unsigned int subclass);
int mutex_lock_interruptible_nested(struct mutex *lock,
unsigned int subclass);
+ int atomic_dec_and_mutex_lock(atomic_t *cnt, struct mutex *lock);
Linux* Base Driver for the Intel(R) PRO/1000 Family of Adapters
===============================================================
-September 26, 2006
-
+Intel Gigabit Linux driver.
+Copyright(c) 1999 - 2010 Intel Corporation.
Contents
========
-- In This Release
- Identifying Your Adapter
-- Building and Installation
- Command Line Parameters
- Speed and Duplex Configuration
- Additional Configurations
-- Known Issues
- Support
-
-In This Release
-===============
-
-This file describes the Linux* Base Driver for the Intel(R) PRO/1000 Family
-of Adapters. This driver includes support for Itanium(R)2-based systems.
-
-For questions related to hardware requirements, refer to the documentation
-supplied with your Intel PRO/1000 adapter. All hardware requirements listed
-apply to use with Linux.
-
-The following features are now available in supported kernels:
- - Native VLANs
- - Channel Bonding (teaming)
- - SNMP
-
-Channel Bonding documentation can be found in the Linux kernel source:
-/Documentation/networking/bonding.txt
-
-The driver information previously displayed in the /proc filesystem is not
-supported in this release. Alternatively, you can use ethtool (version 1.6
-or later), lspci, and ifconfig to obtain the same information.
-
-Instructions on updating ethtool can be found in the section "Additional
-Configurations" later in this document.
-
-NOTE: The Intel(R) 82562v 10/100 Network Connection only provides 10/100
-support.
-
-
Identifying Your Adapter
========================
For more information on how to identify your adapter, go to the Adapter &
Driver ID Guide at:
- http://support.intel.com/support/network/adapter/pro100/21397.htm
+ http://support.intel.com/support/go/network/adapter/idguide.htm
For the latest Intel network drivers for Linux, refer to the following
website. In the search field, enter your adapter name or type, or use the
networking link on the left to search for your adapter:
- http://downloadfinder.intel.com/scripts-df/support_intel.asp
-
+ http://support.intel.com/support/go/network/adapter/home.htm
Command Line Parameters
=======================
-If the driver is built as a module, the following optional parameters
-are used by entering them on the command line with the modprobe command
-using this syntax:
-
- modprobe e1000 [<option>=<VAL1>,<VAL2>,...]
-
-For example, with two PRO/1000 PCI adapters, entering:
-
- modprobe e1000 TxDescriptors=80,128
-
-loads the e1000 driver with 80 TX descriptors for the first adapter and
-128 TX descriptors for the second adapter.
-
The default value for each parameter is generally the recommended setting,
unless otherwise noted.
parameters, see the application note at:
http://www.intel.com/design/network/applnots/ap450.htm
- A descriptor describes a data buffer and attributes related to
- the data buffer. This information is accessed by the hardware.
-
-
AutoNeg
-------
(Supported only on adapters with copper connections)
NOTE: Refer to the Speed and Duplex section of this readme for more
information on the AutoNeg parameter.
-
Duplex
------
(Supported only on adapters with copper connections)
link partner is forced (either full or half), Duplex defaults to half-
duplex.
-
FlowControl
-----------
Valid Range: 0-3 (0=none, 1=Rx only, 2=Tx only, 3=Rx&Tx)
This parameter controls the automatic generation(Tx) and response(Rx)
to Ethernet PAUSE frames.
-
InterruptThrottleRate
---------------------
(not supported on Intel(R) 82542, 82543 or 82544-based adapters)
-Valid Range: 0,1,3,100-100000 (0=off, 1=dynamic, 3=dynamic conservative)
+Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
+ 4=simplified balancing)
Default Value: 3
The driver can limit the amount of interrupts per second that the adapter
-will generate for incoming packets. It does this by writing a value to the
-adapter that is based on the maximum amount of interrupts that the adapter
+will generate for incoming packets. It does this by writing a value to the
+adapter that is based on the maximum amount of interrupts that the adapter
will generate per second.
Setting InterruptThrottleRate to a value greater or equal to 100
load on the system and can lower CPU utilization under heavy load,
but will increase latency as packets are not processed as quickly.
-The default behaviour of the driver previously assumed a static
-InterruptThrottleRate value of 8000, providing a good fallback value for
-all traffic types,but lacking in small packet performance and latency.
-The hardware can handle many more small packets per second however, and
+The default behaviour of the driver previously assumed a static
+InterruptThrottleRate value of 8000, providing a good fallback value for
+all traffic types,but lacking in small packet performance and latency.
+The hardware can handle many more small packets per second however, and
for this reason an adaptive interrupt moderation algorithm was implemented.
Since 7.3.x, the driver has two adaptive modes (setting 1 or 3) in which
-it dynamically adjusts the InterruptThrottleRate value based on the traffic
+it dynamically adjusts the InterruptThrottleRate value based on the traffic
that it receives. After determining the type of incoming traffic in the last
-timeframe, it will adjust the InterruptThrottleRate to an appropriate value
+timeframe, it will adjust the InterruptThrottleRate to an appropriate value
for that traffic.
The algorithm classifies the incoming traffic every interval into
-classes. Once the class is determined, the InterruptThrottleRate value is
-adjusted to suit that traffic type the best. There are three classes defined:
+classes. Once the class is determined, the InterruptThrottleRate value is
+adjusted to suit that traffic type the best. There are three classes defined:
"Bulk traffic", for large amounts of packets of normal size; "Low latency",
for small amounts of traffic and/or a significant percentage of small
-packets; and "Lowest latency", for almost completely small packets or
+packets; and "Lowest latency", for almost completely small packets or
minimal traffic.
-In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
-for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
-latency" or "Lowest latency" class, the InterruptThrottleRate is increased
+In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
+for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
+latency" or "Lowest latency" class, the InterruptThrottleRate is increased
stepwise to 20000. This default mode is suitable for most applications.
For situations where low latency is vital such as cluster or
grid computing, the algorithm can reduce latency even more when
InterruptThrottleRate is set to mode 1. In this mode, which operates
-the same as mode 3, the InterruptThrottleRate will be increased stepwise to
+the same as mode 3, the InterruptThrottleRate will be increased stepwise to
70000 for traffic in class "Lowest latency".
+In simplified mode the interrupt rate is based on the ratio of Tx and
+Rx traffic. If the bytes per second rate is approximately equal, the
+interrupt rate will drop as low as 2000 interrupts per second. If the
+traffic is mostly transmit or mostly receive, the interrupt rate could
+be as high as 8000.
+
Setting InterruptThrottleRate to 0 turns off any interrupt moderation
and may improve small packet latency, but is generally not suitable
for bulk throughput traffic.
be platform-specific. If CPU utilization is not a concern, use
RX_POLLING (NAPI) and default driver settings.
-
-
RxDescriptors
-------------
Valid Range: 80-256 for 82542 and 82543-based adapters
incoming packets, at the expense of increased system memory utilization.
Each descriptor is 16 bytes. A receive buffer is also allocated for each
-descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending
+descriptor and can be either 2048, 4096, 8192, or 16384 bytes, depending
on the MTU setting. The maximum MTU size is 16110.
-NOTE: MTU designates the frame size. It only needs to be set for Jumbo
- Frames. Depending on the available system resources, the request
- for a higher number of receive descriptors may be denied. In this
+NOTE: MTU designates the frame size. It only needs to be set for Jumbo
+ Frames. Depending on the available system resources, the request
+ for a higher number of receive descriptors may be denied. In this
case, use a lower number.
-
RxIntDelay
----------
Valid Range: 0-65535 (0=off)
restoring the network connection. To eliminate the potential
for the hang ensure that RxIntDelay is set to 0.
-
RxAbsIntDelay
-------------
(This parameter is supported only on 82540, 82545 and later adapters.)
along with RxIntDelay, may improve traffic throughput in specific network
conditions.
-
Speed
-----
(This parameter is supported only on adapters with copper connections.)
partner is set to auto-negotiate, the board will auto-detect the correct
speed. Duplex should also be set when Speed is set to either 10 or 100.
-
TxDescriptors
-------------
Valid Range: 80-256 for 82542 and 82543-based adapters
higher number of transmit descriptors may be denied. In this case,
use a lower number.
+TxDescriptorStep
+----------------
+Valid Range: 1 (use every Tx Descriptor)
+ 4 (use every 4th Tx Descriptor)
+
+Default Value: 1 (use every Tx Descriptor)
+
+On certain non-Intel architectures, it has been observed that intense TX
+traffic bursts of short packets may result in an improper descriptor
+writeback. If this occurs, the driver will report a "TX Timeout" and reset
+the adapter, after which the transmit flow will restart, though data may
+have stalled for as much as 10 seconds before it resumes.
+
+The improper writeback does not occur on the first descriptor in a system
+memory cache-line, which is typically 32 bytes, or 4 descriptors long.
+
+Setting TxDescriptorStep to a value of 4 will ensure that all TX descriptors
+are aligned to the start of a system memory cache line, and so this problem
+will not occur.
+
+NOTES: Setting TxDescriptorStep to 4 effectively reduces the number of
+ TxDescriptors available for transmits to 1/4 of the normal allocation.
+ This has a possible negative performance impact, which may be
+ compensated for by allocating more descriptors using the TxDescriptors
+ module parameter.
+
+ There are other conditions which may result in "TX Timeout", which will
+ not be resolved by the use of the TxDescriptorStep parameter. As the
+ issue addressed by this parameter has never been observed on Intel
+ Architecture platforms, it should not be used on Intel platforms.
TxIntDelay
----------
system is reporting dropped transmits, this value may be set too high
causing the driver to run out of available transmit descriptors.
-
TxAbsIntDelay
-------------
(This parameter is supported only on 82540, 82545 and later adapters.)
A value of '1' indicates that the driver should enable IP checksum
offload for received packets (both UDP and TCP) to the adapter hardware.
+Copybreak
+---------
+Valid Range: 0-xxxxxxx (0=off)
+Default Value: 256
+Usage: insmod e1000.ko copybreak=128
+
+Driver copies all packets below or equaling this size to a fresh Rx
+buffer before handing it up the stack.
+
+This parameter is different than other parameters, in that it is a
+single (not 1,1,1 etc.) parameter applied to all driver instances and
+it is also available during runtime at
+/sys/module/e1000/parameters/copybreak
+
+SmartPowerDownEnable
+--------------------
+Valid Range: 0-1
+Default Value: 0 (disabled)
+
+Allows PHY to turn off in lower power states. The user can turn off
+this parameter in supported chipsets.
+
+KumeranLockLoss
+---------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+This workaround skips resetting the PHY at shutdown for the initial
+silicon releases of ICH8 systems.
Speed and Duplex Configuration
==============================
parameter should not be used. Instead, use the Speed and Duplex parameters
previously mentioned to force the adapter to the same speed and duplex.
-
Additional Configurations
=========================
- Configuring the Driver on Different Distributions
- -------------------------------------------------
- Configuring a network driver to load properly when the system is started
- is distribution dependent. Typically, the configuration process involves
- adding an alias line to /etc/modules.conf or /etc/modprobe.conf as well
- as editing other system startup scripts and/or configuration files. Many
- popular Linux distributions ship with tools to make these changes for you.
- To learn the proper way to configure a network device for your system,
- refer to your distribution documentation. If during this process you are
- asked for the driver or module name, the name for the Linux Base Driver
- for the Intel(R) PRO/1000 Family of Adapters is e1000.
-
- As an example, if you install the e1000 driver for two PRO/1000 adapters
- (eth0 and eth1) and set the speed and duplex to 10full and 100half, add
- the following to modules.conf or or modprobe.conf:
-
- alias eth0 e1000
- alias eth1 e1000
- options e1000 Speed=10,100 Duplex=2,1
-
- Viewing Link Messages
- ---------------------
- Link messages will not be displayed to the console if the distribution is
- restricting system messages. In order to see network driver link messages
- on your console, set dmesg to eight by entering the following:
-
- dmesg -n 8
-
- NOTE: This setting is not saved across reboots.
-
Jumbo Frames
------------
Jumbo Frames support is enabled by changing the MTU to a value larger than
setting in a different location.
Notes:
-
- - To enable Jumbo Frames, increase the MTU size on the interface beyond
- 1500.
+ Degradation in throughput performance may be observed in some Jumbo frames
+ environments. If this is observed, increasing the application's socket buffer
+ size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values may help.
+ See the specific application manual and /usr/src/linux*/Documentation/
+ networking/ip-sysctl.txt for more details.
- The maximum MTU setting for Jumbo Frames is 16110. This value coincides
with the maximum Jumbo Frames size of 16128.
- Using Jumbo Frames at 10 or 100 Mbps may result in poor performance or
loss of link.
- - Some Intel gigabit adapters that support Jumbo Frames have a frame size
- limit of 9238 bytes, with a corresponding MTU size limit of 9216 bytes.
- The adapters with this limitation are based on the Intel(R) 82571EB,
- 82572EI, 82573L and 80003ES2LAN controller. These correspond to the
- following product names:
- Intel(R) PRO/1000 PT Server Adapter
- Intel(R) PRO/1000 PT Desktop Adapter
- Intel(R) PRO/1000 PT Network Connection
- Intel(R) PRO/1000 PT Dual Port Server Adapter
- Intel(R) PRO/1000 PT Dual Port Network Connection
- Intel(R) PRO/1000 PF Server Adapter
- Intel(R) PRO/1000 PF Network Connection
- Intel(R) PRO/1000 PF Dual Port Server Adapter
- Intel(R) PRO/1000 PB Server Connection
- Intel(R) PRO/1000 PL Network Connection
- Intel(R) PRO/1000 EB Network Connection with I/O Acceleration
- Intel(R) PRO/1000 EB Backplane Connection with I/O Acceleration
- Intel(R) PRO/1000 PT Quad Port Server Adapter
-
- Adapters based on the Intel(R) 82542 and 82573V/E controller do not
support Jumbo Frames. These correspond to the following product names:
Intel(R) PRO/1000 Gigabit Server Adapter
Intel(R) PRO/1000 PM Network Connection
- - The following adapters do not support Jumbo Frames:
- Intel(R) 82562V 10/100 Network Connection
- Intel(R) 82566DM Gigabit Network Connection
- Intel(R) 82566DC Gigabit Network Connection
- Intel(R) 82566MM Gigabit Network Connection
- Intel(R) 82566MC Gigabit Network Connection
- Intel(R) 82562GT 10/100 Network Connection
- Intel(R) 82562G 10/100 Network Connection
-
-
Ethtool
-------
The driver utilizes the ethtool interface for driver configuration and
The latest release of ethtool can be found from
http://sourceforge.net/projects/gkernel.
- NOTE: Ethtool 1.6 only supports a limited set of ethtool options. Support
- for a more complete ethtool feature set can be enabled by upgrading
- ethtool to ethtool-1.8.1.
-
Enabling Wake on LAN* (WoL)
---------------------------
- WoL is configured through the Ethtool* utility. Ethtool is included with
- all versions of Red Hat after Red Hat 7.2. For other Linux distributions,
- download and install Ethtool from the following website:
- http://sourceforge.net/projects/gkernel.
-
- For instructions on enabling WoL with Ethtool, refer to the website listed
- above.
+ WoL is configured through the Ethtool* utility.
WoL will be enabled on the system during the next shut down or reboot.
For this driver version, in order to enable WoL, the e1000 driver must be
loaded when shutting down or rebooting the system.
- Wake On LAN is only supported on port A for the following devices:
- Intel(R) PRO/1000 PT Dual Port Network Connection
- Intel(R) PRO/1000 PT Dual Port Server Connection
- Intel(R) PRO/1000 PT Dual Port Server Adapter
- Intel(R) PRO/1000 PF Dual Port Server Adapter
- Intel(R) PRO/1000 PT Quad Port Server Adapter
-
- NAPI
- ----
- NAPI (Rx polling mode) is enabled in the e1000 driver.
-
- See www.cyberus.ca/~hadi/usenix-paper.tgz for more information on NAPI.
-
-
-Known Issues
-============
-
-Dropped Receive Packets on Half-duplex 10/100 Networks
-------------------------------------------------------
-If you have an Intel PCI Express adapter running at 10mbps or 100mbps, half-
-duplex, you may observe occasional dropped receive packets. There are no
-workarounds for this problem in this network configuration. The network must
-be updated to operate in full-duplex, and/or 1000mbps only.
-
-Jumbo Frames System Requirement
--------------------------------
-Memory allocation failures have been observed on Linux systems with 64 MB
-of RAM or less that are running Jumbo Frames. If you are using Jumbo
-Frames, your system may require more than the advertised minimum
-requirement of 64 MB of system memory.
-
-Performance Degradation with Jumbo Frames
------------------------------------------
-Degradation in throughput performance may be observed in some Jumbo frames
-environments. If this is observed, increasing the application's socket
-buffer size and/or increasing the /proc/sys/net/ipv4/tcp_*mem entry values
-may help. See the specific application manual and
-/usr/src/linux*/Documentation/
-networking/ip-sysctl.txt for more details.
-
-Jumbo Frames on Foundry BigIron 8000 switch
--------------------------------------------
-There is a known issue using Jumbo frames when connected to a Foundry
-BigIron 8000 switch. This is a 3rd party limitation. If you experience
-loss of packets, lower the MTU size.
-
-Allocating Rx Buffers when Using Jumbo Frames
----------------------------------------------
-Allocating Rx buffers when using Jumbo Frames on 2.6.x kernels may fail if
-the available memory is heavily fragmented. This issue may be seen with PCI-X
-adapters or with packet split disabled. This can be reduced or eliminated
-by changing the amount of available memory for receive buffer allocation, by
-increasing /proc/sys/vm/min_free_kbytes.
-
-Multiple Interfaces on Same Ethernet Broadcast Network
-------------------------------------------------------
-Due to the default ARP behavior on Linux, it is not possible to have
-one system on two IP networks in the same Ethernet broadcast domain
-(non-partitioned switch) behave as expected. All Ethernet interfaces
-will respond to IP traffic for any IP address assigned to the system.
-This results in unbalanced receive traffic.
-
-If you have multiple interfaces in a server, either turn on ARP
-filtering by entering:
-
- echo 1 > /proc/sys/net/ipv4/conf/all/arp_filter
-(this only works if your kernel's version is higher than 2.4.5),
-
-NOTE: This setting is not saved across reboots. The configuration
-change can be made permanent by adding the line:
- net.ipv4.conf.all.arp_filter = 1
-to the file /etc/sysctl.conf
-
- or,
-
-install the interfaces in separate broadcast domains (either in
-different switches or in a switch partitioned to VLANs).
-
-82541/82547 can't link or are slow to link with some link partners
------------------------------------------------------------------
-There is a known compatibility issue with 82541/82547 and some
-low-end switches where the link will not be established, or will
-be slow to establish. In particular, these switches are known to
-be incompatible with 82541/82547:
-
- Planex FXG-08TE
- I-O Data ETG-SH8
-
-To workaround this issue, the driver can be compiled with an override
-of the PHY's master/slave setting. Forcing master or forcing slave
-mode will improve time-to-link.
-
- # make CFLAGS_EXTRA=-DE1000_MASTER_SLAVE=<n>
-
-Where <n> is:
-
- 0 = Hardware default
- 1 = Master mode
- 2 = Slave mode
- 3 = Auto master/slave
-
-Disable rx flow control with ethtool
-------------------------------------
-In order to disable receive flow control using ethtool, you must turn
-off auto-negotiation on the same command line.
-
-For example:
-
- ethtool -A eth? autoneg off rx off
-
-Unplugging network cable while ethtool -p is running
-----------------------------------------------------
-In kernel versions 2.5.50 and later (including 2.6 kernel), unplugging
-the network cable while ethtool -p is running will cause the system to
-become unresponsive to keyboard commands, except for control-alt-delete.
-Restarting the system appears to be the only remedy.
-
-
Support
=======
--- /dev/null
+Linux* Driver for Intel(R) Network Connection
+===============================================================
+
+Intel Gigabit Linux driver.
+Copyright(c) 1999 - 2010 Intel Corporation.
+
+Contents
+========
+
+- Identifying Your Adapter
+- Command Line Parameters
+- Additional Configurations
+- Support
+
+Identifying Your Adapter
+========================
+
+The e1000e driver supports all PCI Express Intel(R) Gigabit Network
+Connections, except those that are 82575, 82576 and 82580-based*.
+
+* NOTE: The Intel(R) PRO/1000 P Dual Port Server Adapter is supported by
+ the e1000 driver, not the e1000e driver due to the 82546 part being used
+ behind a PCI Express bridge.
+
+For more information on how to identify your adapter, go to the Adapter &
+Driver ID Guide at:
+
+ http://support.intel.com/support/go/network/adapter/idguide.htm
+
+For the latest Intel network drivers for Linux, refer to the following
+website. In the search field, enter your adapter name or type, or use the
+networking link on the left to search for your adapter:
+
+ http://support.intel.com/support/go/network/adapter/home.htm
+
+Command Line Parameters
+=======================
+
+The default value for each parameter is generally the recommended setting,
+unless otherwise noted.
+
+NOTES: For more information about the InterruptThrottleRate,
+ RxIntDelay, TxIntDelay, RxAbsIntDelay, and TxAbsIntDelay
+ parameters, see the application note at:
+ http://www.intel.com/design/network/applnots/ap450.htm
+
+InterruptThrottleRate
+---------------------
+Valid Range: 0,1,3,4,100-100000 (0=off, 1=dynamic, 3=dynamic conservative,
+ 4=simplified balancing)
+Default Value: 3
+
+The driver can limit the amount of interrupts per second that the adapter
+will generate for incoming packets. It does this by writing a value to the
+adapter that is based on the maximum amount of interrupts that the adapter
+will generate per second.
+
+Setting InterruptThrottleRate to a value greater or equal to 100
+will program the adapter to send out a maximum of that many interrupts
+per second, even if more packets have come in. This reduces interrupt
+load on the system and can lower CPU utilization under heavy load,
+but will increase latency as packets are not processed as quickly.
+
+The driver has two adaptive modes (setting 1 or 3) in which
+it dynamically adjusts the InterruptThrottleRate value based on the traffic
+that it receives. After determining the type of incoming traffic in the last
+timeframe, it will adjust the InterruptThrottleRate to an appropriate value
+for that traffic.
+
+The algorithm classifies the incoming traffic every interval into
+classes. Once the class is determined, the InterruptThrottleRate value is
+adjusted to suit that traffic type the best. There are three classes defined:
+"Bulk traffic", for large amounts of packets of normal size; "Low latency",
+for small amounts of traffic and/or a significant percentage of small
+packets; and "Lowest latency", for almost completely small packets or
+minimal traffic.
+
+In dynamic conservative mode, the InterruptThrottleRate value is set to 4000
+for traffic that falls in class "Bulk traffic". If traffic falls in the "Low
+latency" or "Lowest latency" class, the InterruptThrottleRate is increased
+stepwise to 20000. This default mode is suitable for most applications.
+
+For situations where low latency is vital such as cluster or
+grid computing, the algorithm can reduce latency even more when
+InterruptThrottleRate is set to mode 1. In this mode, which operates
+the same as mode 3, the InterruptThrottleRate will be increased stepwise to
+70000 for traffic in class "Lowest latency".
+
+In simplified mode the interrupt rate is based on the ratio of Tx and
+Rx traffic. If the bytes per second rate is approximately equal the
+interrupt rate will drop as low as 2000 interrupts per second. If the
+traffic is mostly transmit or mostly receive, the interrupt rate could
+be as high as 8000.
+
+Setting InterruptThrottleRate to 0 turns off any interrupt moderation
+and may improve small packet latency, but is generally not suitable
+for bulk throughput traffic.
+
+NOTE: InterruptThrottleRate takes precedence over the TxAbsIntDelay and
+ RxAbsIntDelay parameters. In other words, minimizing the receive
+ and/or transmit absolute delays does not force the controller to
+ generate more interrupts than what the Interrupt Throttle Rate
+ allows.
+
+NOTE: When e1000e is loaded with default settings and multiple adapters
+ are in use simultaneously, the CPU utilization may increase non-
+ linearly. In order to limit the CPU utilization without impacting
+ the overall throughput, we recommend that you load the driver as
+ follows:
+
+ modprobe e1000e InterruptThrottleRate=3000,3000,3000
+
+ This sets the InterruptThrottleRate to 3000 interrupts/sec for
+ the first, second, and third instances of the driver. The range
+ of 2000 to 3000 interrupts per second works on a majority of
+ systems and is a good starting point, but the optimal value will
+ be platform-specific. If CPU utilization is not a concern, use
+ RX_POLLING (NAPI) and default driver settings.
+
+RxIntDelay
+----------
+Valid Range: 0-65535 (0=off)
+Default Value: 0
+
+This value delays the generation of receive interrupts in units of 1.024
+microseconds. Receive interrupt reduction can improve CPU efficiency if
+properly tuned for specific network traffic. Increasing this value adds
+extra latency to frame reception and can end up decreasing the throughput
+of TCP traffic. If the system is reporting dropped receives, this value
+may be set too high, causing the driver to run out of available receive
+descriptors.
+
+CAUTION: When setting RxIntDelay to a value other than 0, adapters may
+ hang (stop transmitting) under certain network conditions. If
+ this occurs a NETDEV WATCHDOG message is logged in the system
+ event log. In addition, the controller is automatically reset,
+ restoring the network connection. To eliminate the potential
+ for the hang ensure that RxIntDelay is set to 0.
+
+RxAbsIntDelay
+-------------
+Valid Range: 0-65535 (0=off)
+Default Value: 8
+
+This value, in units of 1.024 microseconds, limits the delay in which a
+receive interrupt is generated. Useful only if RxIntDelay is non-zero,
+this value ensures that an interrupt is generated after the initial
+packet is received within the set amount of time. Proper tuning,
+along with RxIntDelay, may improve traffic throughput in specific network
+conditions.
+
+TxIntDelay
+----------
+Valid Range: 0-65535 (0=off)
+Default Value: 8
+
+This value delays the generation of transmit interrupts in units of
+1.024 microseconds. Transmit interrupt reduction can improve CPU
+efficiency if properly tuned for specific network traffic. If the
+system is reporting dropped transmits, this value may be set too high
+causing the driver to run out of available transmit descriptors.
+
+TxAbsIntDelay
+-------------
+Valid Range: 0-65535 (0=off)
+Default Value: 32
+
+This value, in units of 1.024 microseconds, limits the delay in which a
+transmit interrupt is generated. Useful only if TxIntDelay is non-zero,
+this value ensures that an interrupt is generated after the initial
+packet is sent on the wire within the set amount of time. Proper tuning,
+along with TxIntDelay, may improve traffic throughput in specific
+network conditions.
+
+Copybreak
+---------
+Valid Range: 0-xxxxxxx (0=off)
+Default Value: 256
+
+Driver copies all packets below or equaling this size to a fresh Rx
+buffer before handing it up the stack.
+
+This parameter is different than other parameters, in that it is a
+single (not 1,1,1 etc.) parameter applied to all driver instances and
+it is also available during runtime at
+/sys/module/e1000e/parameters/copybreak
+
+SmartPowerDownEnable
+--------------------
+Valid Range: 0-1
+Default Value: 0 (disabled)
+
+Allows PHY to turn off in lower power states. The user can set this parameter
+in supported chipsets.
+
+KumeranLockLoss
+---------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+This workaround skips resetting the PHY at shutdown for the initial
+silicon releases of ICH8 systems.
+
+IntMode
+-------
+Valid Range: 0-2 (0=legacy, 1=MSI, 2=MSI-X)
+Default Value: 2
+
+Allows changing the interrupt mode at module load time, without requiring a
+recompile. If the driver load fails to enable a specific interrupt mode, the
+driver will try other interrupt modes, from least to most compatible. The
+interrupt order is MSI-X, MSI, Legacy. If specifying MSI (IntMode=1)
+interrupts, only MSI and Legacy will be attempted.
+
+CrcStripping
+------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+Strip the CRC from received packets before sending up the network stack. If
+you have a machine with a BMC enabled but cannot receive IPMI traffic after
+loading or enabling the driver, try disabling this feature.
+
+WriteProtectNVM
+---------------
+Valid Range: 0-1
+Default Value: 1 (enabled)
+
+Set the hardware to ignore all write/erase cycles to the GbE region in the
+ICHx NVM (non-volatile memory). This feature can be disabled by the
+WriteProtectNVM module parameter (enabled by default) only after a hardware
+reset, but the machine must be power cycled before trying to enable writes.
+
+Note: the kernel boot option iomem=relaxed may need to be set if the kernel
+config option CONFIG_STRICT_DEVMEM=y, if the root user wants to write the
+NVM from user space via ethtool.
+
+Additional Configurations
+=========================
+
+ Jumbo Frames
+ ------------
+ Jumbo Frames support is enabled by changing the MTU to a value larger than
+ the default of 1500. Use the ifconfig command to increase the MTU size.
+ For example:
+
+ ifconfig eth<x> mtu 9000 up
+
+ This setting is not saved across reboots.
+
+ Notes:
+
+ - The maximum MTU setting for Jumbo Frames is 9216. This value coincides
+ with the maximum Jumbo Frames size of 9234 bytes.
+
+ - Using Jumbo Frames at 10 or 100 Mbps is not supported and may result in
+ poor performance or loss of link.
+
+ - Some adapters limit Jumbo Frames sized packets to a maximum of
+ 4096 bytes and some adapters do not support Jumbo Frames.
+
+
+ Ethtool
+ -------
+ The driver utilizes the ethtool interface for driver configuration and
+ diagnostics, as well as displaying statistical information. We
+ strongly recommend downloading the latest version of Ethtool at:
+
+ http://sourceforge.net/projects/gkernel.
+
+ Speed and Duplex
+ ----------------
+ Speed and Duplex are configured through the Ethtool* utility. For
+ instructions, refer to the Ethtool man page.
+
+ Enabling Wake on LAN* (WoL)
+ ---------------------------
+ WoL is configured through the Ethtool* utility. For instructions on
+ enabling WoL with Ethtool, refer to the Ethtool man page.
+
+ WoL will be enabled on the system during the next shut down or reboot.
+ For this driver version, in order to enable WoL, the e1000e driver must be
+ loaded when shutting down or rebooting the system.
+
+ In most cases Wake On LAN is only supported on port A for multiple port
+ adapters. To verify if a port supports Wake on LAN run ethtool eth<X>.
+
+
+Support
+=======
+
+For general information, go to the Intel support website at:
+
+ www.intel.com/support/
+
+or the Intel Wired Networking project hosted by Sourceforge at:
+
+ http://sourceforge.net/projects/e1000
+
+If an issue is identified with the released source code on the supported
+kernel with a supported adapter, email the specific information related
+to the issue to e1000-devel@lists.sf.net
Linux* Base Driver for Intel(R) Network Connection
==================================================
-November 24, 2009
+Intel Gigabit Linux driver.
+Copyright(c) 1999 - 2010 Intel Corporation.
Contents
========
-- In This Release
- Identifying Your Adapter
- Known Issues/Troubleshooting
- Support
-In This Release
-===============
-
This file describes the ixgbevf Linux* Base Driver for Intel Network
Connection.
For more information on how to identify your adapter, go to the Adapter &
Driver ID Guide at:
- http://support.intel.com/support/network/sb/CS-008441.htm
+ http://support.intel.com/support/go/network/adapter/idguide.htm
Known Issues/Troubleshooting
============================
If an issue is identified with the released source code on the supported
kernel with a supported adapter, email the specific information related
to the issue to e1000-devel@lists.sf.net
-
-License
-=======
-
-Intel 10 Gigabit Linux driver.
-Copyright(c) 1999 - 2009 Intel Corporation.
-
-This program is free software; you can redistribute it and/or modify it
-under the terms and conditions of the GNU General Public License,
-version 2, as published by the Free Software Foundation.
-
-This program is distributed in the hope it will be useful, but WITHOUT
-ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
-FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
-more details.
-
-You should have received a copy of the GNU General Public License along with
-this program; if not, write to the Free Software Foundation, Inc.,
-51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
-
-The full GNU General Public License is included in this distribution in
-the file called "COPYING".
-
-Trademarks
-==========
-
-Intel, Itanium, and Pentium are trademarks or registered trademarks of
-Intel Corporation or its subsidiaries in the United States and other
-countries.
-
-* Other names and brands may be claimed as the property of others.
current limit is controllable).
(C) 2008 Wolfson Microelectronics PLC.
-Author: Liam Girdwood <lg@opensource.wolfsonmicro.com>
+Author: Liam Girdwood <lrg@slimlogic.co.uk>
Nomenclature
Conexant 5066
=============
laptop Basic Laptop config (default)
+ hp-laptop HP laptops, e g G60
dell-laptop Dell laptops
dell-vostro Dell Vostro
olpc-xo-1_5 OLPC XO 1.5
}
if (opt_unpoison && !hwpoison_forget_fd) {
- sprintf(buf, "%s/renew-pfn", hwpoison_debug_fs);
+ sprintf(buf, "%s/unpoison-pfn", hwpoison_debug_fs);
hwpoison_forget_fd = checked_open(buf, O_WRONLY);
}
}
--- /dev/null
+
+Concurrency Managed Workqueue (cmwq)
+
+September, 2010 Tejun Heo <tj@kernel.org>
+ Florian Mickler <florian@mickler.org>
+
+CONTENTS
+
+1. Introduction
+2. Why cmwq?
+3. The Design
+4. Application Programming Interface (API)
+5. Example Execution Scenarios
+6. Guidelines
+
+
+1. Introduction
+
+There are many cases where an asynchronous process execution context
+is needed and the workqueue (wq) API is the most commonly used
+mechanism for such cases.
+
+When such an asynchronous execution context is needed, a work item
+describing which function to execute is put on a queue. An
+independent thread serves as the asynchronous execution context. The
+queue is called workqueue and the thread is called worker.
+
+While there are work items on the workqueue the worker executes the
+functions associated with the work items one after the other. When
+there is no work item left on the workqueue the worker becomes idle.
+When a new work item gets queued, the worker begins executing again.
+
+
+2. Why cmwq?
+
+In the original wq implementation, a multi threaded (MT) wq had one
+worker thread per CPU and a single threaded (ST) wq had one worker
+thread system-wide. A single MT wq needed to keep around the same
+number of workers as the number of CPUs. The kernel grew a lot of MT
+wq users over the years and with the number of CPU cores continuously
+rising, some systems saturated the default 32k PID space just booting
+up.
+
+Although MT wq wasted a lot of resource, the level of concurrency
+provided was unsatisfactory. The limitation was common to both ST and
+MT wq albeit less severe on MT. Each wq maintained its own separate
+worker pool. A MT wq could provide only one execution context per CPU
+while a ST wq one for the whole system. Work items had to compete for
+those very limited execution contexts leading to various problems
+including proneness to deadlocks around the single execution context.
+
+The tension between the provided level of concurrency and resource
+usage also forced its users to make unnecessary tradeoffs like libata
+choosing to use ST wq for polling PIOs and accepting an unnecessary
+limitation that no two polling PIOs can progress at the same time. As
+MT wq don't provide much better concurrency, users which require
+higher level of concurrency, like async or fscache, had to implement
+their own thread pool.
+
+Concurrency Managed Workqueue (cmwq) is a reimplementation of wq with
+focus on the following goals.
+
+* Maintain compatibility with the original workqueue API.
+
+* Use per-CPU unified worker pools shared by all wq to provide
+ flexible level of concurrency on demand without wasting a lot of
+ resource.
+
+* Automatically regulate worker pool and level of concurrency so that
+ the API users don't need to worry about such details.
+
+
+3. The Design
+
+In order to ease the asynchronous execution of functions a new
+abstraction, the work item, is introduced.
+
+A work item is a simple struct that holds a pointer to the function
+that is to be executed asynchronously. Whenever a driver or subsystem
+wants a function to be executed asynchronously it has to set up a work
+item pointing to that function and queue that work item on a
+workqueue.
+
+Special purpose threads, called worker threads, execute the functions
+off of the queue, one after the other. If no work is queued, the
+worker threads become idle. These worker threads are managed in so
+called thread-pools.
+
+The cmwq design differentiates between the user-facing workqueues that
+subsystems and drivers queue work items on and the backend mechanism
+which manages thread-pool and processes the queued work items.
+
+The backend is called gcwq. There is one gcwq for each possible CPU
+and one gcwq to serve work items queued on unbound workqueues.
+
+Subsystems and drivers can create and queue work items through special
+workqueue API functions as they see fit. They can influence some
+aspects of the way the work items are executed by setting flags on the
+workqueue they are putting the work item on. These flags include
+things like CPU locality, reentrancy, concurrency limits and more. To
+get a detailed overview refer to the API description of
+alloc_workqueue() below.
+
+When a work item is queued to a workqueue, the target gcwq is
+determined according to the queue parameters and workqueue attributes
+and appended on the shared worklist of the gcwq. For example, unless
+specifically overridden, a work item of a bound workqueue will be
+queued on the worklist of exactly that gcwq that is associated to the
+CPU the issuer is running on.
+
+For any worker pool implementation, managing the concurrency level
+(how many execution contexts are active) is an important issue. cmwq
+tries to keep the concurrency at a minimal but sufficient level.
+Minimal to save resources and sufficient in that the system is used at
+its full capacity.
+
+Each gcwq bound to an actual CPU implements concurrency management by
+hooking into the scheduler. The gcwq is notified whenever an active
+worker wakes up or sleeps and keeps track of the number of the
+currently runnable workers. Generally, work items are not expected to
+hog a CPU and consume many cycles. That means maintaining just enough
+concurrency to prevent work processing from stalling should be
+optimal. As long as there are one or more runnable workers on the
+CPU, the gcwq doesn't start execution of a new work, but, when the
+last running worker goes to sleep, it immediately schedules a new
+worker so that the CPU doesn't sit idle while there are pending work
+items. This allows using a minimal number of workers without losing
+execution bandwidth.
+
+Keeping idle workers around doesn't cost other than the memory space
+for kthreads, so cmwq holds onto idle ones for a while before killing
+them.
+
+For an unbound wq, the above concurrency management doesn't apply and
+the gcwq for the pseudo unbound CPU tries to start executing all work
+items as soon as possible. The responsibility of regulating
+concurrency level is on the users. There is also a flag to mark a
+bound wq to ignore the concurrency management. Please refer to the
+API section for details.
+
+Forward progress guarantee relies on that workers can be created when
+more execution contexts are necessary, which in turn is guaranteed
+through the use of rescue workers. All work items which might be used
+on code paths that handle memory reclaim are required to be queued on
+wq's that have a rescue-worker reserved for execution under memory
+pressure. Else it is possible that the thread-pool deadlocks waiting
+for execution contexts to free up.
+
+
+4. Application Programming Interface (API)
+
+alloc_workqueue() allocates a wq. The original create_*workqueue()
+functions are deprecated and scheduled for removal. alloc_workqueue()
+takes three arguments - @name, @flags and @max_active. @name is the
+name of the wq and also used as the name of the rescuer thread if
+there is one.
+
+A wq no longer manages execution resources but serves as a domain for
+forward progress guarantee, flush and work item attributes. @flags
+and @max_active control how work items are assigned execution
+resources, scheduled and executed.
+
+@flags:
+
+ WQ_NON_REENTRANT
+
+ By default, a wq guarantees non-reentrance only on the same
+ CPU. A work item may not be executed concurrently on the same
+ CPU by multiple workers but is allowed to be executed
+ concurrently on multiple CPUs. This flag makes sure
+ non-reentrance is enforced across all CPUs. Work items queued
+ to a non-reentrant wq are guaranteed to be executed by at most
+ one worker system-wide at any given time.
+
+ WQ_UNBOUND
+
+ Work items queued to an unbound wq are served by a special
+ gcwq which hosts workers which are not bound to any specific
+ CPU. This makes the wq behave as a simple execution context
+ provider without concurrency management. The unbound gcwq
+ tries to start execution of work items as soon as possible.
+ Unbound wq sacrifices locality but is useful for the following
+ cases.
+
+ * Wide fluctuation in the concurrency level requirement is
+ expected and using bound wq may end up creating large number
+ of mostly unused workers across different CPUs as the issuer
+ hops through different CPUs.
+
+ * Long running CPU intensive workloads which can be better
+ managed by the system scheduler.
+
+ WQ_FREEZEABLE
+
+ A freezeable wq participates in the freeze phase of the system
+ suspend operations. Work items on the wq are drained and no
+ new work item starts execution until thawed.
+
+ WQ_RESCUER
+
+ All wq which might be used in the memory reclaim paths _MUST_
+ have this flag set. This reserves one worker exclusively for
+ the execution of this wq under memory pressure.
+
+ WQ_HIGHPRI
+
+ Work items of a highpri wq are queued at the head of the
+ worklist of the target gcwq and start execution regardless of
+ the current concurrency level. In other words, highpri work
+ items will always start execution as soon as execution
+ resource is available.
+
+ Ordering among highpri work items is preserved - a highpri
+ work item queued after another highpri work item will start
+ execution after the earlier highpri work item starts.
+
+ Although highpri work items are not held back by other
+ runnable work items, they still contribute to the concurrency
+ level. Highpri work items in runnable state will prevent
+ non-highpri work items from starting execution.
+
+ This flag is meaningless for unbound wq.
+
+ WQ_CPU_INTENSIVE
+
+ Work items of a CPU intensive wq do not contribute to the
+ concurrency level. In other words, runnable CPU intensive
+ work items will not prevent other work items from starting
+ execution. This is useful for bound work items which are
+ expected to hog CPU cycles so that their execution is
+ regulated by the system scheduler.
+
+ Although CPU intensive work items don't contribute to the
+ concurrency level, start of their executions is still
+ regulated by the concurrency management and runnable
+ non-CPU-intensive work items can delay execution of CPU
+ intensive work items.
+
+ This flag is meaningless for unbound wq.
+
+ WQ_HIGHPRI | WQ_CPU_INTENSIVE
+
+ This combination makes the wq avoid interaction with
+ concurrency management completely and behave as a simple
+ per-CPU execution context provider. Work items queued on a
+ highpri CPU-intensive wq start execution as soon as resources
+ are available and don't affect execution of other work items.
+
+@max_active:
+
+@max_active determines the maximum number of execution contexts per
+CPU which can be assigned to the work items of a wq. For example,
+with @max_active of 16, at most 16 work items of the wq can be
+executing at the same time per CPU.
+
+Currently, for a bound wq, the maximum limit for @max_active is 512
+and the default value used when 0 is specified is 256. For an unbound
+wq, the limit is higher of 512 and 4 * num_possible_cpus(). These
+values are chosen sufficiently high such that they are not the
+limiting factor while providing protection in runaway cases.
+
+The number of active work items of a wq is usually regulated by the
+users of the wq, more specifically, by how many work items the users
+may queue at the same time. Unless there is a specific need for
+throttling the number of active work items, specifying '0' is
+recommended.
+
+Some users depend on the strict execution ordering of ST wq. The
+combination of @max_active of 1 and WQ_UNBOUND is used to achieve this
+behavior. Work items on such wq are always queued to the unbound gcwq
+and only one work item can be active at any given time thus achieving
+the same ordering property as ST wq.
+
+
+5. Example Execution Scenarios
+
+The following example execution scenarios try to illustrate how cmwq
+behave under different configurations.
+
+ Work items w0, w1, w2 are queued to a bound wq q0 on the same CPU.
+ w0 burns CPU for 5ms then sleeps for 10ms then burns CPU for 5ms
+ again before finishing. w1 and w2 burn CPU for 5ms then sleep for
+ 10ms.
+
+Ignoring all other tasks, works and processing overhead, and assuming
+simple FIFO scheduling, the following is one highly simplified version
+of possible sequences of events with the original wq.
+
+ TIME IN MSECS EVENT
+ 0 w0 starts and burns CPU
+ 5 w0 sleeps
+ 15 w0 wakes up and burns CPU
+ 20 w0 finishes
+ 20 w1 starts and burns CPU
+ 25 w1 sleeps
+ 35 w1 wakes up and finishes
+ 35 w2 starts and burns CPU
+ 40 w2 sleeps
+ 50 w2 wakes up and finishes
+
+And with cmwq with @max_active >= 3,
+
+ TIME IN MSECS EVENT
+ 0 w0 starts and burns CPU
+ 5 w0 sleeps
+ 5 w1 starts and burns CPU
+ 10 w1 sleeps
+ 10 w2 starts and burns CPU
+ 15 w2 sleeps
+ 15 w0 wakes up and burns CPU
+ 20 w0 finishes
+ 20 w1 wakes up and finishes
+ 25 w2 wakes up and finishes
+
+If @max_active == 2,
+
+ TIME IN MSECS EVENT
+ 0 w0 starts and burns CPU
+ 5 w0 sleeps
+ 5 w1 starts and burns CPU
+ 10 w1 sleeps
+ 15 w0 wakes up and burns CPU
+ 20 w0 finishes
+ 20 w1 wakes up and finishes
+ 20 w2 starts and burns CPU
+ 25 w2 sleeps
+ 35 w2 wakes up and finishes
+
+Now, let's assume w1 and w2 are queued to a different wq q1 which has
+WQ_HIGHPRI set,
+
+ TIME IN MSECS EVENT
+ 0 w1 and w2 start and burn CPU
+ 5 w1 sleeps
+ 10 w2 sleeps
+ 10 w0 starts and burns CPU
+ 15 w0 sleeps
+ 15 w1 wakes up and finishes
+ 20 w2 wakes up and finishes
+ 25 w0 wakes up and burns CPU
+ 30 w0 finishes
+
+If q1 has WQ_CPU_INTENSIVE set,
+
+ TIME IN MSECS EVENT
+ 0 w0 starts and burns CPU
+ 5 w0 sleeps
+ 5 w1 and w2 start and burn CPU
+ 10 w1 sleeps
+ 15 w2 sleeps
+ 15 w0 wakes up and burns CPU
+ 20 w0 finishes
+ 20 w1 wakes up and finishes
+ 25 w2 wakes up and finishes
+
+
+6. Guidelines
+
+* Do not forget to use WQ_RESCUER if a wq may process work items which
+ are used during memory reclaim. Each wq with WQ_RESCUER set has one
+ rescuer thread reserved for it. If there is dependency among
+ multiple work items used during memory reclaim, they should be
+ queued to separate wq each with WQ_RESCUER.
+
+* Unless strict ordering is required, there is no need to use ST wq.
+
+* Unless there is a specific need, using 0 for @max_active is
+ recommended. In most use cases, concurrency level usually stays
+ well under the default limit.
+
+* A wq serves as a domain for forward progress guarantee (WQ_RESCUER),
+ flush and work item attributes. Work items which are not involved
+ in memory reclaim and don't need to be flushed as a part of a group
+ of work items, and don't require any special attribute, can use one
+ of the system wq. There is no difference in execution
+ characteristics between using a dedicated wq and a system wq.
+
+* Unless work items are expected to consume a huge amount of CPU
+ cycles, using a bound wq is usually beneficial due to the increased
+ level of locality in wq operations and work item execution.
S: Maintained
F: arch/arm/mach-s3c6410/
+ARM/S5P ARM ARCHITECTURES
+M: Kukjin Kim <kgene.kim@samsung.com>
+L: linux-arm-kernel@lists.infradead.org (moderated for non-subscribers)
+L: linux-samsung-soc@vger.kernel.org (moderated for non-subscribers)
+S: Maintained
+F: arch/arm/mach-s5p*/
+
+ARM/SAMSUNG S5P SERIES FIMC SUPPORT
+M: Kyungmin Park <kyungmin.park@samsung.com>
+M: Sylwester Nawrocki <s.nawrocki@samsung.com>
+L: linux-arm-kernel@lists.infradead.org
+L: linux-media@vger.kernel.org
+S: Maintained
+F: arch/arm/plat-s5p/dev-fimc*
+F: arch/arm/plat-samsung/include/plat/*fimc*
+F: drivers/media/video/s5p-fimc/
+
ARM/SHMOBILE ARM ARCHITECTURE
M: Paul Mundt <lethal@linux-sh.org>
M: Magnus Damm <magnus.damm@gmail.com>
M: Jay Cliburn <jcliburn@gmail.com>
M: Chris Snook <chris.snook@gmail.com>
M: Jie Yang <jie.yang@atheros.com>
-L: atl1-devel@lists.sourceforge.net
+L: netdev@vger.kernel.org
W: http://sourceforge.net/projects/atl1
W: http://atl1.sourceforge.net
S: Maintained
F: include/linux/cfag12864b.h
AVR32 ARCHITECTURE
-M: Haavard Skinnemoen <hskinnemoen@atmel.com>
+M: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
W: http://www.atmel.com/products/AVR32/
W: http://avr32linux.org/
W: http://avrfreaks.net/
F: arch/avr32/
AVR32/AT32AP MACHINE SUPPORT
-M: Haavard Skinnemoen <hskinnemoen@atmel.com>
+M: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
S: Supported
F: arch/avr32/mach-at32ap/
F: Documentation/video4linux/cafe_ccic
F: drivers/media/video/cafe_ccic*
+CAIF NETWORK LAYER
+M: Sjur Braendeland <sjur.brandeland@stericsson.com>
+L: netdev@vger.kernel.org
+S: Supported
+F: Documentation/networking/caif/
+F: drivers/net/caif/
+F: include/linux/caif/
+F: include/net/caif/
+F: net/caif/
+
CALGARY x86-64 IOMMU
M: Muli Ben-Yehuda <muli@il.ibm.com>
M: "Jon D. Mason" <jdmason@kudzu.us>
S: Supported
F: Documentation/filesystems/ceph.txt
F: fs/ceph
+F: net/ceph
+F: include/linux/ceph
CERTIFIED WIRELESS USB (WUSB) SUBSYSTEM:
M: David Vrabel <david.vrabel@csr.com>
S: Maintained
F: drivers/platform/x86/eeepc-laptop.c
+EFIFB FRAMEBUFFER DRIVER
+L: linux-fbdev@vger.kernel.org
+M: Peter Jones <pjones@redhat.com>
+S: Maintained
+F: drivers/video/efifb.c
+
EFS FILESYSTEM
W: http://aeschi.ch.eu.org/efs/
S: Orphan
F: drivers/scsi/gdt*
GENERIC GPIO I2C DRIVER
-M: Haavard Skinnemoen <hskinnemoen@atmel.com>
+M: Haavard Skinnemoen <hskinnemoen@gmail.com>
S: Supported
F: drivers/i2c/busses/i2c-gpio.c
F: include/linux/i2c-gpio.h
F: drivers/media/video/gspca/
HARDWARE MONITORING
+M: Jean Delvare <khali@linux-fr.org>
+M: Guenter Roeck <guenter.roeck@ericsson.com>
L: lm-sensors@lm-sensors.org
W: http://www.lm-sensors.org/
-S: Orphan
+T: quilt kernel.org/pub/linux/kernel/people/jdelvare/linux-2.6/jdelvare-hwmon/
+T: quilt kernel.org/pub/linux/kernel/people/groeck/linux-staging/
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging.git
+S: Maintained
F: Documentation/hwmon/
F: drivers/hwmon/
F: include/linux/hwmon*.h
F: arch/x86/kernel/hpet.c
F: arch/x86/include/asm/hpet.h
-HPET: ACPI
-M: Bob Picco <bob.picco@hp.com>
-S: Maintained
-F: drivers/char/hpet.c
-
HPFS FILESYSTEM
M: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
W: http://artax.karlin.mff.cuni.cz/~mikulas/vyplody/hpfs/index-e.cgi
S: Maintained
F: drivers/net/ixp2000/
-INTEL ETHERNET DRIVERS (e100/e1000/e1000e/igb/igbvf/ixgb/ixgbe)
+INTEL ETHERNET DRIVERS (e100/e1000/e1000e/igb/igbvf/ixgb/ixgbe/ixgbevf)
M: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
M: Jesse Brandeburg <jesse.brandeburg@intel.com>
M: Bruce Allan <bruce.w.allan@intel.com>
-M: Alex Duyck <alexander.h.duyck@intel.com>
+M: Carolyn Wyborny <carolyn.wyborny@intel.com>
+M: Don Skidmore <donald.c.skidmore@intel.com>
+M: Greg Rose <gregory.v.rose@intel.com>
M: PJ Waskiewicz <peter.p.waskiewicz.jr@intel.com>
+M: Alex Duyck <alexander.h.duyck@intel.com>
M: John Ronciak <john.ronciak@intel.com>
L: e1000-devel@lists.sourceforge.net
W: http://e1000.sourceforge.net/
S: Supported
+F: Documentation/networking/e100.txt
+F: Documentation/networking/e1000.txt
+F: Documentation/networking/e1000e.txt
+F: Documentation/networking/igb.txt
+F: Documentation/networking/igbvf.txt
+F: Documentation/networking/ixgb.txt
+F: Documentation/networking/ixgbe.txt
+F: Documentation/networking/ixgbevf.txt
F: drivers/net/e100.c
F: drivers/net/e1000/
F: drivers/net/e1000e/
F: drivers/net/igbvf/
F: drivers/net/ixgb/
F: drivers/net/ixgbe/
+F: drivers/net/ixgbevf/
INTEL PRO/WIRELESS 2100 NETWORK CONNECTION SUPPORT
L: linux-wireless@vger.kernel.org
IOC3 SERIAL DRIVER
M: Pat Gefre <pfg@sgi.com>
-L: linux-mips@linux-mips.org
+L: linux-serial@vger.kernel.org
S: Maintained
F: drivers/serial/ioc3_serial.c
KEXEC
M: Eric Biederman <ebiederm@xmission.com>
-W: http://ftp.kernel.org/pub/linux/kernel/people/horms/kexec-tools/
+W: http://kernel.org/pub/linux/utils/kernel/kexec/
L: kexec@lists.infradead.org
S: Maintained
F: include/linux/kexec.h
S: Supported
MATROX FRAMEBUFFER DRIVER
-M: Petr Vandrovec <vandrove@vc.cvut.cz>
L: linux-fbdev@vger.kernel.org
-S: Maintained
+S: Orphan
F: drivers/video/matrox/matroxfb_*
F: include/linux/matroxfb.h
F: drivers/char/mxser.*
MSI LAPTOP SUPPORT
-M: Lennart Poettering <mzxreary@0pointer.de>
+M: Lee, Chun-Yi <jlee@novell.com>
L: platform-driver-x86@vger.kernel.org
-W: https://tango.0pointer.de/mailman/listinfo/s270-linux
-W: http://0pointer.de/lennart/tchibo.html
S: Maintained
F: drivers/platform/x86/msi-laptop.c
F: drivers/mfd/
MULTIMEDIA CARD (MMC), SECURE DIGITAL (SD) AND SDIO SUBSYSTEM
-S: Orphan
+M: Chris Ball <cjb@laptop.org>
L: linux-mmc@vger.kernel.org
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc.git
+S: Maintained
F: drivers/mmc/
F: include/linux/mmc/
F: include/linux/isicom.h
MUSB MULTIPOINT HIGH SPEED DUAL-ROLE CONTROLLER
-M: Felipe Balbi <felipe.balbi@nokia.com>
+M: Felipe Balbi <balbi@ti.com>
L: linux-usb@vger.kernel.org
T: git git://gitorious.org/usb/usb.git
S: Maintained
F: drivers/net/natsemi.c
NCP FILESYSTEM
-M: Petr Vandrovec <vandrove@vc.cvut.cz>
-S: Maintained
+M: Petr Vandrovec <petr@vandrovec.name>
+S: Odd Fixes
F: fs/ncpfs/
NCR DUAL 700 SCSI DRIVER (MICROCHANNEL)
F: drivers/char/hw_random/omap-rng.c
OMAP USB SUPPORT
-M: Felipe Balbi <felipe.balbi@nokia.com>
+M: Felipe Balbi <balbi@ti.com>
M: David Brownell <dbrownell@users.sourceforge.net>
L: linux-usb@vger.kernel.org
L: linux-omap@vger.kernel.org
F: include/linux/qnx4_fs.h
F: include/linux/qnxtypes.h
+RADOS BLOCK DEVICE (RBD)
+F: include/linux/qnxtypes.h
+M: Yehuda Sadeh <yehuda@hq.newdream.net>
+M: Sage Weil <sage@newdream.net>
+M: ceph-devel@vger.kernel.org
+S: Supported
+F: drivers/block/rbd.c
+F: drivers/block/rbd_types.h
+
RADEON FRAMEBUFFER DISPLAY DRIVER
M: Benjamin Herrenschmidt <benh@kernel.crashing.org>
L: linux-fbdev@vger.kernel.org
M: Josh Triplett <josh@freedesktop.org>
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
S: Supported
+T: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
F: Documentation/RCU/torture.txt
F: kernel/rcutorture.c
M: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
W: http://www.rdrop.com/users/paulmck/rclock/
S: Supported
+T: git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu.git
F: Documentation/RCU/
F: include/linux/rcu*
F: include/linux/srcu*
F: kernel/srcu*
X: kernel/rcutorture.c
-REAL TIME CLOCK DRIVER
+REAL TIME CLOCK DRIVER (LEGACY)
M: Paul Gortmaker <p_gortmaker@yahoo.com>
S: Maintained
-F: Documentation/rtc.txt
-F: drivers/rtc/
-F: include/linux/rtc.h
+F: drivers/char/rtc.c
REAL TIME CLOCK (RTC) SUBSYSTEM
M: Alessandro Zummo <a.zummo@towertech.it>
F: drivers/media/video/*7146*
F: include/media/*7146*
+SAMSUNG AUDIO (ASoC) DRIVERS
+M: Jassi Brar <jassi.brar@samsung.com>
+L: alsa-devel@alsa-project.org (moderated for non-subscribers)
+S: Supported
+F: sound/soc/s3c24xx
+
TLG2300 VIDEO4LINUX-2 DRIVER
M: Huang Shijie <shijie8@gmail.com>
M: Kang Yong <kangyong@telegent.com>
F: drivers/mmc/host/sdricoh_cs.c
SECURE DIGITAL HOST CONTROLLER INTERFACE (SDHCI) DRIVER
-S: Orphan
+M: Chris Ball <cjb@laptop.org>
L: linux-mmc@vger.kernel.org
+T: git git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc.git
+S: Maintained
F: drivers/mmc/host/sdhci.*
SECURE DIGITAL HOST CONTROLLER INTERFACE, OPEN FIRMWARE BINDINGS (SDHCI-OF)
WOLFSON MICROELECTRONICS DRIVERS
M: Mark Brown <broonie@opensource.wolfsonmicro.com>
M: Ian Lartey <ian@opensource.wolfsonmicro.com>
+M: Dimitris Papastamos <dp@opensource.wolfsonmicro.com>
+T: git git://opensource.wolfsonmicro.com/linux-2.6-asoc
T: git git://opensource.wolfsonmicro.com/linux-2.6-audioplus
-W: http://opensource.wolfsonmicro.com/node/8
+W: http://opensource.wolfsonmicro.com/content/linux-drivers-wolfson-devices
S: Supported
F: Documentation/hwmon/wm83??
F: drivers/leds/leds-wm83*.c
VERSION = 2
PATCHLEVEL = 6
SUBLEVEL = 36
-EXTRAVERSION = -rc3
-NAME = Sheep on Meth
+EXTRAVERSION =
+NAME = Flesh-Eating Bats with Fangs
# *DOCUMENTATION*
# To see a list of typical targets execute "make help"
ifdef CONFIG_FUNCTION_TRACER
KBUILD_CFLAGS += -pg
+ifdef CONFIG_DYNAMIC_FTRACE
+ ifdef CONFIG_HAVE_C_RECORDMCOUNT
+ BUILD_C_RECORDMCOUNT := y
+ export BUILD_C_RECORDMCOUNT
+ endif
+endif
endif
# We trigger additional mismatches with less inlining
# conserve stack if available
KBUILD_CFLAGS += $(call cc-option,-fconserve-stack)
+# check for 'asm goto'
+ifeq ($(shell $(CONFIG_SHELL) $(srctree)/scripts/gcc-goto.sh $(CC)), y)
+ KBUILD_CFLAGS += -DCC_HAVE_ASM_GOTO
+endif
+
# Add user supplied CPPFLAGS, AFLAGS and CFLAGS as the last assignments
# But warn user when we do so
warn-assign = \
config KPROBES
bool "Kprobes"
- depends on KALLSYMS && MODULES
+ depends on MODULES
depends on HAVE_KPROBES
+ select KALLSYMS
help
Kprobes allows you to trap at almost any kernel address and
execute a callback function. register_kprobe() establishes
def_bool y
depends on KPROBES && HAVE_OPTPROBES
depends on !PREEMPT
- select KALLSYMS_ALL
config HAVE_EFFICIENT_UNALIGNED_ACCESS
bool
subsystem. Also has support for calculating CPU cycle events
to determine how many clock cycles in a given period.
+config HAVE_ARCH_JUMP_LABEL
+ bool
+
source "kernel/gcov/Kconfig"
select HAVE_IDE
select HAVE_OPROFILE
select HAVE_SYSCALL_WRAPPERS
+ select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
select HAVE_DMA_ATTRS
help
/* ??? Ought to use this in arch/alpha/kernel/signal.c too. */
#ifndef CONFIG_SMP
+#include <linux/sched.h>
+
extern void __load_new_mm_context(struct mm_struct *);
static inline void
flush_icache_user_range(struct vm_area_struct *vma, struct page *page,
#ifndef __ASM_ALPHA_PERF_EVENT_H
#define __ASM_ALPHA_PERF_EVENT_H
-/* Alpha only supports software events through this interface. */
-extern void set_perf_event_pending(void);
-
-#define PERF_EVENT_INDEX_OFFSET 0
-
#ifdef CONFIG_PERF_EVENTS
extern void init_hw_perf_events(void);
#else
#define __NR_pwritev 491
#define __NR_rt_tgsigqueueinfo 492
#define __NR_perf_event_open 493
+#define __NR_fanotify_init 494
+#define __NR_fanotify_mark 495
+#define __NR_prlimit64 496
#ifdef __KERNEL__
-#define NR_SYSCALLS 494
+#define NR_SYSCALLS 497
#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
#define __ARCH_WANT_SYS_OLD_GETRLIMIT
#define __ARCH_WANT_SYS_OLDUMOUNT
#define __ARCH_WANT_SYS_SIGPENDING
+#define __ARCH_WANT_SYS_RT_SIGSUSPEND
/* "Conditional" syscalls. What we want is
ldq $20, HAE_REG($19); \
stq $21, HAE_CACHE($19); \
stq $21, 0($20); \
- ldq $0, 0($sp); \
- ldq $1, 8($sp); \
99:; \
ldq $19, 72($sp); \
ldq $20, 80($sp); \
cmovne $26, 0, $19 /* $19 = 0 => non-restartable */
ldq $0, SP_OFF($sp)
and $0, 8, $0
- beq $0, restore_all
-ret_from_reschedule:
+ beq $0, ret_to_kernel
+ret_to_user:
/* Make sure need_resched and sigpending don't change between
sampling and the rti. */
lda $16, 7
call_pal PAL_swpipl
ldl $5, TI_FLAGS($8)
and $5, _TIF_WORK_MASK, $2
- bne $5, work_pending
+ bne $2, work_pending
restore_all:
RESTORE_ALL
call_pal PAL_rti
+ret_to_kernel:
+ lda $16, 7
+ call_pal PAL_swpipl
+ br restore_all
+
.align 3
$syscall_error:
/*
* $8: current.
* $19: The old syscall number, or zero if this is not a return
* from a syscall that errored and is possibly restartable.
- * $20: Error indication.
+ * $20: The old a3 value
*/
.align 4
$work_notifysig:
mov $sp, $16
- br $1, do_switch_stack
+ bsr $1, do_switch_stack
mov $sp, $17
mov $5, $18
+ mov $19, $9 /* save old syscall number */
+ mov $20, $10 /* save old a3 */
+ and $5, _TIF_SIGPENDING, $2
+ cmovne $2, 0, $9 /* we don't want double syscall restarts */
jsr $26, do_notify_resume
+ mov $9, $19
+ mov $10, $20
bsr $1, undo_switch_stack
- br restore_all
+ br ret_to_user
.end work_pending
/*
beq $1, 1f
ldq $27, 0($2)
1: jsr $26, ($27), sys_gettimeofday
+ret_from_straced:
ldgp $gp, 0($26)
/* check return.. */
/* We don't actually care for a3 success widgetry in the kernel.
Not for positive errno values. */
stq $0, 0($sp) /* $0 */
- br restore_all
+ br ret_to_kernel
.end kernel_thread
/*
.ent sys_sigreturn
sys_sigreturn:
.prologue 0
+ lda $9, ret_from_straced
+ cmpult $26, $9, $9
mov $sp, $17
lda $18, -SWITCH_STACK_SIZE($sp)
lda $sp, -SWITCH_STACK_SIZE($sp)
jsr $26, do_sigreturn
- br $1, undo_switch_stack
+ bne $9, 1f
+ jsr $26, syscall_trace
+1: br $1, undo_switch_stack
br ret_from_sys_call
.end sys_sigreturn
.ent sys_rt_sigreturn
sys_rt_sigreturn:
.prologue 0
+ lda $9, ret_from_straced
+ cmpult $26, $9, $9
mov $sp, $17
lda $18, -SWITCH_STACK_SIZE($sp)
lda $sp, -SWITCH_STACK_SIZE($sp)
jsr $26, do_rt_sigreturn
- br $1, undo_switch_stack
+ bne $9, 1f
+ jsr $26, syscall_trace
+1: br $1, undo_switch_stack
br ret_from_sys_call
.end sys_rt_sigreturn
.align 4
- .globl sys_sigsuspend
- .ent sys_sigsuspend
-sys_sigsuspend:
- .prologue 0
- mov $sp, $17
- br $1, do_switch_stack
- mov $sp, $18
- subq $sp, 16, $sp
- stq $26, 0($sp)
- jsr $26, do_sigsuspend
- ldq $26, 0($sp)
- lda $sp, SWITCH_STACK_SIZE+16($sp)
- ret
-.end sys_sigsuspend
-
- .align 4
- .globl sys_rt_sigsuspend
- .ent sys_rt_sigsuspend
-sys_rt_sigsuspend:
- .prologue 0
- mov $sp, $18
- br $1, do_switch_stack
- mov $sp, $19
- subq $sp, 16, $sp
- stq $26, 0($sp)
- jsr $26, do_rt_sigsuspend
- ldq $26, 0($sp)
- lda $sp, SWITCH_STACK_SIZE+16($sp)
- ret
-.end sys_rt_sigsuspend
-
- .align 4
.globl sys_sethae
.ent sys_sethae
sys_sethae:
.end sys_execve
.align 4
- .globl osf_sigprocmask
- .ent osf_sigprocmask
-osf_sigprocmask:
- .prologue 0
- mov $sp, $18
- jmp $31, sys_osf_sigprocmask
-.end osf_sigprocmask
-
- .align 4
.globl alpha_ni_syscall
.ent alpha_ni_syscall
alpha_ni_syscall:
ev6_parse_cbox(u64 c_addr, u64 c1_syn, u64 c2_syn,
u64 c_stat, u64 c_sts, int print)
{
- char *sourcename[] = { "UNKNOWN", "UNKNOWN", "UNKNOWN",
- "MEMORY", "BCACHE", "DCACHE",
- "BCACHE PROBE", "BCACHE PROBE" };
- char *streamname[] = { "D", "I" };
- char *bitsname[] = { "SINGLE", "DOUBLE" };
+ static const char * const sourcename[] = {
+ "UNKNOWN", "UNKNOWN", "UNKNOWN",
+ "MEMORY", "BCACHE", "DCACHE",
+ "BCACHE PROBE", "BCACHE PROBE"
+ };
+ static const char * const streamname[] = { "D", "I" };
+ static const char * const bitsname[] = { "SINGLE", "DOUBLE" };
int status = MCHK_DISPOSITION_REPORT;
int source = -1, stream = -1, bits = -1;
static void
marvel_print_pox_trans_sum(u64 trans_sum)
{
- char *pcix_cmd[] = { "Interrupt Acknowledge",
- "Special Cycle",
- "I/O Read",
- "I/O Write",
- "Reserved",
- "Reserved / Device ID Message",
- "Memory Read",
- "Memory Write",
- "Reserved / Alias to Memory Read Block",
- "Reserved / Alias to Memory Write Block",
- "Configuration Read",
- "Configuration Write",
- "Memory Read Multiple / Split Completion",
- "Dual Address Cycle",
- "Memory Read Line / Memory Read Block",
- "Memory Write and Invalidate / Memory Write Block"
+ static const char * const pcix_cmd[] = {
+ "Interrupt Acknowledge",
+ "Special Cycle",
+ "I/O Read",
+ "I/O Write",
+ "Reserved",
+ "Reserved / Device ID Message",
+ "Memory Read",
+ "Memory Write",
+ "Reserved / Alias to Memory Read Block",
+ "Reserved / Alias to Memory Write Block",
+ "Configuration Read",
+ "Configuration Write",
+ "Memory Read Multiple / Split Completion",
+ "Dual Address Cycle",
+ "Memory Read Line / Memory Read Block",
+ "Memory Write and Invalidate / Memory Write Block"
};
#define IO7__POX_TRANSUM__PCI_ADDR__S (0)
int status = MCHK_DISPOSITION_REPORT;
#ifdef CONFIG_VERBOSE_MCHECK
- char *serror_src[] = {"GPCI", "APCI", "AGP HP", "AGP LP"};
- char *serror_cmd[] = {"DMA Read", "DMA RMW", "SGTE Read", "Reserved"};
+ static const char * const serror_src[] = {
+ "GPCI", "APCI", "AGP HP", "AGP LP"
+ };
+ static const char * const serror_cmd[] = {
+ "DMA Read", "DMA RMW", "SGTE Read", "Reserved"
+ };
#endif /* CONFIG_VERBOSE_MCHECK */
#define TITAN__PCHIP_SERROR__LOST_UECC (1UL << 0)
int status = MCHK_DISPOSITION_REPORT;
#ifdef CONFIG_VERBOSE_MCHECK
- char *perror_cmd[] = { "Interrupt Acknowledge", "Special Cycle",
- "I/O Read", "I/O Write",
- "Reserved", "Reserved",
- "Memory Read", "Memory Write",
- "Reserved", "Reserved",
- "Configuration Read", "Configuration Write",
- "Memory Read Multiple", "Dual Address Cycle",
- "Memory Read Line","Memory Write and Invalidate"
+ static const char * const perror_cmd[] = {
+ "Interrupt Acknowledge", "Special Cycle",
+ "I/O Read", "I/O Write",
+ "Reserved", "Reserved",
+ "Memory Read", "Memory Write",
+ "Reserved", "Reserved",
+ "Configuration Read", "Configuration Write",
+ "Memory Read Multiple", "Dual Address Cycle",
+ "Memory Read Line", "Memory Write and Invalidate"
};
#endif /* CONFIG_VERBOSE_MCHECK */
int cmd, len;
unsigned long addr;
- char *agperror_cmd[] = { "Read (low-priority)", "Read (high-priority)",
- "Write (low-priority)",
- "Write (high-priority)",
- "Reserved", "Reserved",
- "Flush", "Fence"
+ static const char * const agperror_cmd[] = {
+ "Read (low-priority)", "Read (high-priority)",
+ "Write (low-priority)", "Write (high-priority)",
+ "Reserved", "Reserved",
+ "Flush", "Fence"
};
#endif /* CONFIG_VERBOSE_MCHECK */
#include <linux/kernel.h>
#include <linux/mm.h>
#include <linux/smp.h>
-#include <linux/smp_lock.h>
#include <linux/stddef.h>
#include <linux/syscalls.h>
#include <linux/unistd.h>
{
struct mm_struct *mm;
- lock_kernel();
mm = current->mm;
mm->end_code = bss_start + bss_len;
mm->start_brk = bss_start + bss_len;
printk("set_program_attributes(%lx %lx %lx %lx)\n",
text_start, text_len, bss_start, bss_len);
#endif
- unlock_kernel();
return 0;
}
long error;
int __user *min_buf_size_ptr;
- lock_kernel();
switch (code) {
case PL_SET:
if (get_user(error, &args->set.nbytes))
error = -EOPNOTSUPP;
break;
};
- unlock_kernel();
return error;
}
SYSCALL_DEFINE3(osf_sysinfo, int, command, char __user *, buf, long, count)
{
- char *sysinfo_table[] = {
+ const char *sysinfo_table[] = {
utsname()->sysname,
utsname()->nodename,
utsname()->release,
"dummy", /* secure RPC domain */
};
unsigned long offset;
- char *res;
+ const char *res;
long len, err = -EINVAL;
offset = command-1;
{
struct pci_dev *pdev = to_pci_dev(container_of(kobj,
struct device, kobj));
- struct resource *res = (struct resource *)attr->private;
+ struct resource *res = attr->private;
enum pci_mmap_state mmap_type;
struct pci_bus_region bar;
int i;
new_raw_count) != prev_raw_count)
goto again;
- delta = (new_raw_count - (prev_raw_count & alpha_pmu->pmc_count_mask[idx])) + ovf;
+ delta = (new_raw_count - (prev_raw_count & alpha_pmu->pmc_count_mask[idx])) + ovf;
/* It is possible on very rare occasions that the PMC has overflowed
* but the interrupt is yet to come. Detect and fix this situation.
struct hw_perf_event *hwc = &pe->hw;
int idx = hwc->idx;
- if (cpuc->current_idx[j] != PMC_NO_INDEX) {
- cpuc->idx_mask |= (1<<cpuc->current_idx[j]);
- continue;
+ if (cpuc->current_idx[j] == PMC_NO_INDEX) {
+ alpha_perf_event_set_period(pe, hwc, idx);
+ cpuc->current_idx[j] = idx;
}
- alpha_perf_event_set_period(pe, hwc, idx);
- cpuc->current_idx[j] = idx;
- cpuc->idx_mask |= (1<<cpuc->current_idx[j]);
+ if (!(hwc->state & PERF_HES_STOPPED))
+ cpuc->idx_mask |= (1<<cpuc->current_idx[j]);
}
cpuc->config = cpuc->event[0]->hw.config_base;
}
* - this function is called from outside this module via the pmu struct
* returned from perf event initialisation.
*/
-static int alpha_pmu_enable(struct perf_event *event)
+static int alpha_pmu_add(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
int n0;
int ret;
- unsigned long flags;
+ unsigned long irq_flags;
/*
* The Sparc code has the IRQ disable first followed by the perf
* nevertheless we disable the PMCs first to enable a potential
* final PMI to occur before we disable interrupts.
*/
- perf_disable();
- local_irq_save(flags);
+ perf_pmu_disable(event->pmu);
+ local_irq_save(irq_flags);
/* Default to error to be returned */
ret = -EAGAIN;
}
}
- local_irq_restore(flags);
- perf_enable();
+ hwc->state = PERF_HES_UPTODATE;
+ if (!(flags & PERF_EF_START))
+ hwc->state |= PERF_HES_STOPPED;
+
+ local_irq_restore(irq_flags);
+ perf_pmu_enable(event->pmu);
return ret;
}
* - this function is called from outside this module via the pmu struct
* returned from perf event initialisation.
*/
-static void alpha_pmu_disable(struct perf_event *event)
+static void alpha_pmu_del(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
- unsigned long flags;
+ unsigned long irq_flags;
int j;
- perf_disable();
- local_irq_save(flags);
+ perf_pmu_disable(event->pmu);
+ local_irq_save(irq_flags);
for (j = 0; j < cpuc->n_events; j++) {
if (event == cpuc->event[j]) {
}
}
- local_irq_restore(flags);
- perf_enable();
+ local_irq_restore(irq_flags);
+ perf_pmu_enable(event->pmu);
}
}
-static void alpha_pmu_unthrottle(struct perf_event *event)
+static void alpha_pmu_stop(struct perf_event *event, int flags)
+{
+ struct hw_perf_event *hwc = &event->hw;
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+
+ if (!(hwc->state & PERF_HES_STOPPED)) {
+ cpuc->idx_mask &= ~(1UL<<hwc->idx);
+ hwc->state |= PERF_HES_STOPPED;
+ }
+
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ alpha_perf_event_update(event, hwc, hwc->idx, 0);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
+
+ if (cpuc->enabled)
+ wrperfmon(PERFMON_CMD_DISABLE, (1UL<<hwc->idx));
+}
+
+
+static void alpha_pmu_start(struct perf_event *event, int flags)
{
struct hw_perf_event *hwc = &event->hw;
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
+ return;
+
+ if (flags & PERF_EF_RELOAD) {
+ WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+ alpha_perf_event_set_period(event, hwc, hwc->idx);
+ }
+
+ hwc->state = 0;
+
cpuc->idx_mask |= 1UL<<hwc->idx;
- wrperfmon(PERFMON_CMD_ENABLE, (1UL<<hwc->idx));
+ if (cpuc->enabled)
+ wrperfmon(PERFMON_CMD_ENABLE, (1UL<<hwc->idx));
}
return 0;
}
-static const struct pmu pmu = {
- .enable = alpha_pmu_enable,
- .disable = alpha_pmu_disable,
- .read = alpha_pmu_read,
- .unthrottle = alpha_pmu_unthrottle,
-};
-
-
/*
* Main entry point to initialise a HW performance event.
*/
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+static int alpha_pmu_event_init(struct perf_event *event)
{
int err;
+ switch (event->attr.type) {
+ case PERF_TYPE_RAW:
+ case PERF_TYPE_HARDWARE:
+ case PERF_TYPE_HW_CACHE:
+ break;
+
+ default:
+ return -ENOENT;
+ }
+
if (!alpha_pmu)
- return ERR_PTR(-ENODEV);
+ return -ENODEV;
/* Do the real initialisation work. */
err = __hw_perf_event_init(event);
- if (err)
- return ERR_PTR(err);
-
- return &pmu;
+ return err;
}
-
-
/*
* Main entry point - enable HW performance counters.
*/
-void hw_perf_enable(void)
+static void alpha_pmu_enable(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
* Main entry point - disable HW performance counters.
*/
-void hw_perf_disable(void)
+static void alpha_pmu_disable(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
wrperfmon(PERFMON_CMD_DISABLE, cpuc->idx_mask);
}
+static struct pmu pmu = {
+ .pmu_enable = alpha_pmu_enable,
+ .pmu_disable = alpha_pmu_disable,
+ .event_init = alpha_pmu_event_init,
+ .add = alpha_pmu_add,
+ .del = alpha_pmu_del,
+ .start = alpha_pmu_start,
+ .stop = alpha_pmu_stop,
+ .read = alpha_pmu_read,
+};
+
/*
* Main entry point - don't know when this is called but it
wrperfmon(PERFMON_CMD_DISABLE, cpuc->idx_mask);
/* la_ptr is the counter that overflowed. */
- if (unlikely(la_ptr >= perf_max_events)) {
+ if (unlikely(la_ptr >= alpha_pmu->num_pmcs)) {
/* This should never occur! */
irq_err_count++;
pr_warning("PMI: silly index %ld\n", la_ptr);
/* Interrupts coming too quickly; "throttle" the
* counter, i.e., disable it for a little while.
*/
- cpuc->idx_mask &= ~(1UL<<idx);
+ alpha_pmu_stop(event, 0);
}
}
wrperfmon(PERFMON_CMD_ENABLE, cpuc->idx_mask);
/* And set up PMU specification */
alpha_pmu = &ev67_pmu;
- perf_max_events = alpha_pmu->num_pmcs;
+
+ perf_pmu_register(&pmu);
}
dest[27] = pt->r27;
dest[28] = pt->r28;
dest[29] = pt->gp;
- dest[30] = rdusp();
+ dest[30] = ti == current_thread_info() ? rdusp() : ti->pcb.usp;
dest[31] = pt->pc;
/* Once upon a time this was the PS value. Which is stupid
/*
* The OSF/1 sigprocmask calling sequence is different from the
* C sigprocmask() sequence..
- *
- * how:
- * 1 - SIG_BLOCK
- * 2 - SIG_UNBLOCK
- * 3 - SIG_SETMASK
- *
- * We change the range to -1 .. 1 in order to let gcc easily
- * use the conditional move instructions.
- *
- * Note that we don't need to acquire the kernel lock for SMP
- * operation, as all of this is local to this thread.
*/
-SYSCALL_DEFINE3(osf_sigprocmask, int, how, unsigned long, newmask,
- struct pt_regs *, regs)
+SYSCALL_DEFINE2(osf_sigprocmask, int, how, unsigned long, newmask)
{
- unsigned long oldmask = -EINVAL;
-
- if ((unsigned long)how-1 <= 2) {
- long sign = how-2; /* -1 .. 1 */
- unsigned long block, unblock;
-
- newmask &= _BLOCKABLE;
- spin_lock_irq(¤t->sighand->siglock);
- oldmask = current->blocked.sig[0];
-
- unblock = oldmask & ~newmask;
- block = oldmask | newmask;
- if (!sign)
- block = unblock;
- if (sign <= 0)
- newmask = block;
- if (_NSIG_WORDS > 1 && sign > 0)
- sigemptyset(¤t->blocked);
- current->blocked.sig[0] = newmask;
- recalc_sigpending();
- spin_unlock_irq(¤t->sighand->siglock);
-
- regs->r0 = 0; /* special no error return */
+ sigset_t oldmask;
+ sigset_t mask;
+ unsigned long res;
+
+ siginitset(&mask, newmask & _BLOCKABLE);
+ res = sigprocmask(how, &mask, &oldmask);
+ if (!res) {
+ force_successful_syscall_return();
+ res = oldmask.sig[0];
}
- return oldmask;
+ return res;
}
SYSCALL_DEFINE3(osf_sigaction, int, sig,
old_sigset_t mask;
if (!access_ok(VERIFY_READ, act, sizeof(*act)) ||
__get_user(new_ka.sa.sa_handler, &act->sa_handler) ||
- __get_user(new_ka.sa.sa_flags, &act->sa_flags))
+ __get_user(new_ka.sa.sa_flags, &act->sa_flags) ||
+ __get_user(mask, &act->sa_mask))
return -EFAULT;
- __get_user(mask, &act->sa_mask);
siginitset(&new_ka.sa.sa_mask, mask);
new_ka.ka_restorer = NULL;
}
if (!ret && oact) {
if (!access_ok(VERIFY_WRITE, oact, sizeof(*oact)) ||
__put_user(old_ka.sa.sa_handler, &oact->sa_handler) ||
- __put_user(old_ka.sa.sa_flags, &oact->sa_flags))
+ __put_user(old_ka.sa.sa_flags, &oact->sa_flags) ||
+ __put_user(old_ka.sa.sa_mask.sig[0], &oact->sa_mask))
return -EFAULT;
- __put_user(old_ka.sa.sa_mask.sig[0], &oact->sa_mask);
}
return ret;
/*
* Atomically swap in the new signal mask, and wait for a signal.
*/
-asmlinkage int
-do_sigsuspend(old_sigset_t mask, struct pt_regs *regs, struct switch_stack *sw)
+SYSCALL_DEFINE1(sigsuspend, old_sigset_t, mask)
{
mask &= _BLOCKABLE;
spin_lock_irq(¤t->sighand->siglock);
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
- /* Indicate EINTR on return from any possible signal handler,
- which will not come back through here, but via sigreturn. */
- regs->r0 = EINTR;
- regs->r19 = 1;
-
- current->state = TASK_INTERRUPTIBLE;
- schedule();
- set_thread_flag(TIF_RESTORE_SIGMASK);
- return -ERESTARTNOHAND;
-}
-
-asmlinkage int
-do_rt_sigsuspend(sigset_t __user *uset, size_t sigsetsize,
- struct pt_regs *regs, struct switch_stack *sw)
-{
- sigset_t set;
-
- /* XXX: Don't preclude handling different sized sigset_t's. */
- if (sigsetsize != sizeof(sigset_t))
- return -EINVAL;
- if (copy_from_user(&set, uset, sizeof(set)))
- return -EFAULT;
-
- sigdelsetmask(&set, ~_BLOCKABLE);
- spin_lock_irq(¤t->sighand->siglock);
- current->saved_sigmask = current->blocked;
- current->blocked = set;
- recalc_sigpending();
- spin_unlock_irq(¤t->sighand->siglock);
-
- /* Indicate EINTR on return from any possible signal handler,
- which will not come back through here, but via sigreturn. */
- regs->r0 = EINTR;
- regs->r19 = 1;
-
current->state = TASK_INTERRUPTIBLE;
schedule();
set_thread_flag(TIF_RESTORE_SIGMASK);
unsigned long usp;
long i, err = __get_user(regs->pc, &sc->sc_pc);
+ current_thread_info()->restart_block.fn = do_no_restart_syscall;
+
sw->r26 = (unsigned long) ret_from_sys_call;
err |= __get_user(regs->r0, sc->sc_regs+0);
regs->pc -= 4;
break;
case ERESTART_RESTARTBLOCK:
- current_thread_info()->restart_block.fn = do_no_restart_syscall;
regs->r0 = EINTR;
break;
}
srm_env_t *entry;
char *page;
- entry = (srm_env_t *)m->private;
+ entry = m->private;
page = (char *)__get_free_page(GFP_USER);
if (!page)
return -ENOMEM;
.quad sys_open /* 45 */
.quad alpha_ni_syscall
.quad sys_getxgid
- .quad osf_sigprocmask
+ .quad sys_osf_sigprocmask
.quad alpha_ni_syscall
.quad alpha_ni_syscall /* 50 */
.quad sys_acct
.quad sys_pwritev
.quad sys_rt_tgsigqueueinfo
.quad sys_perf_event_open
+ .quad sys_fanotify_init
+ .quad sys_fanotify_mark /* 495 */
+ .quad sys_prlimit64
.size sys_call_table, . - sys_call_table
.type sys_call_table, @object
#include <linux/init.h>
#include <linux/bcd.h>
#include <linux/profile.h>
-#include <linux/perf_event.h>
+#include <linux/irq_work.h>
#include <asm/uaccess.h>
#include <asm/io.h>
unsigned long est_cycle_freq;
-#ifdef CONFIG_PERF_EVENTS
+#ifdef CONFIG_IRQ_WORK
-DEFINE_PER_CPU(u8, perf_event_pending);
+DEFINE_PER_CPU(u8, irq_work_pending);
-#define set_perf_event_pending_flag() __get_cpu_var(perf_event_pending) = 1
-#define test_perf_event_pending() __get_cpu_var(perf_event_pending)
-#define clear_perf_event_pending() __get_cpu_var(perf_event_pending) = 0
+#define set_irq_work_pending_flag() __get_cpu_var(irq_work_pending) = 1
+#define test_irq_work_pending() __get_cpu_var(irq_work_pending)
+#define clear_irq_work_pending() __get_cpu_var(irq_work_pending) = 0
-void set_perf_event_pending(void)
+void set_irq_work_pending(void)
{
- set_perf_event_pending_flag();
+ set_irq_work_pending_flag();
}
-#else /* CONFIG_PERF_EVENTS */
+#else /* CONFIG_IRQ_WORK */
-#define test_perf_event_pending() 0
-#define clear_perf_event_pending()
+#define test_irq_work_pending() 0
+#define clear_irq_work_pending()
-#endif /* CONFIG_PERF_EVENTS */
+#endif /* CONFIG_IRQ_WORK */
static inline __u32 rpcc(void)
write_sequnlock(&xtime_lock);
+ if (test_irq_work_pending()) {
+ clear_irq_work_pending();
+ irq_work_run();
+ }
+
#ifndef CONFIG_SMP
while (nticks--)
update_process_times(user_mode(get_irq_regs()));
#endif
- if (test_perf_event_pending()) {
- clear_perf_event_pending();
- perf_event_do_pending();
- }
-
return IRQ_HANDLED;
}
#include <linux/sched.h>
#include <linux/tty.h>
#include <linux/delay.h>
-#include <linux/smp_lock.h>
#include <linux/module.h>
#include <linux/init.h>
#include <linux/kallsyms.h>
return;
}
- lock_kernel();
printk("Bad unaligned kernel access at %016lx: %p %lx %lu\n",
pc, va, opcode, reg);
do_exit(SIGSEGV);
* Yikes! No one to forward the exception to.
* Since the registers are in a weird format, dump them ourselves.
*/
- lock_kernel();
printk("%s(%d): unhandled unaligned exception\n",
current->comm, task_pid_nr(current));
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_LZO
select HAVE_KERNEL_LZMA
+ select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
select PERF_USE_VMALLOC
select HAVE_REGS_AND_STACK_ACCESS_API
bool "Atmel AT91"
select ARCH_REQUIRE_GPIOLIB
select HAVE_CLK
- select ARCH_USES_GETTIMEOFFSET
help
This enables support for systems based on the Atmel AT91RM9200,
AT91SAM9 and AT91CAP9 processors.
ACTLR register. Note that setting specific bits in the ACTLR register
may not be available in non-secure mode.
+config ARM_ERRATA_742230
+ bool "ARM errata: DMB operation may be faulty"
+ depends on CPU_V7 && SMP
+ help
+ This option enables the workaround for the 742230 Cortex-A9
+ (r1p0..r2p2) erratum. Under rare circumstances, a DMB instruction
+ between two write operations may not ensure the correct visibility
+ ordering of the two writes. This workaround sets a specific bit in
+ the diagnostic register of the Cortex-A9 which causes the DMB
+ instruction to behave as a DSB, ensuring the correct behaviour of
+ the two writes.
+
+config ARM_ERRATA_742231
+ bool "ARM errata: Incorrect hazard handling in the SCU may lead to data corruption"
+ depends on CPU_V7 && SMP
+ help
+ This option enables the workaround for the 742231 Cortex-A9
+ (r2p0..r2p2) erratum. Under certain conditions, specific to the
+ Cortex-A9 MPCore micro-architecture, two CPUs working in SMP mode,
+ accessing some data located in the same cache line, may get corrupted
+ data due to bad handling of the address hazard when the line gets
+ replaced from one of the CPUs at the same time as another CPU is
+ accessing it. This workaround sets specific bits in the diagnostic
+ register of the Cortex-A9 which reduces the linefill issuing
+ capabilities of the processor.
+
config PL310_ERRATA_588369
bool "Clean & Invalidate maintenance operations do not invalidate clean lines"
depends on CACHE_L2X0 && ARCH_OMAP4
invalidated are not, resulting in an incoherency in the system page
tables. The workaround changes the TLB flushing routines to invalidate
entries regardless of the ASID.
+
+config ARM_ERRATA_743622
+ bool "ARM errata: Faulty hazard checking in the Store Buffer may lead to data corruption"
+ depends on CPU_V7
+ help
+ This option enables the workaround for the 743622 Cortex-A9
+ (r2p0..r2p2) erratum. Under very rare conditions, a faulty
+ optimisation in the Cortex-A9 Store Buffer may lead to data
+ corruption. This workaround sets a specific bit in the diagnostic
+ register of the Cortex-A9 which disables the Store Buffer
+ optimisation, preventing the defect from occurring. This has no
+ visible impact on the overall performance or power consumption of the
+ processor.
+
endmenu
source "arch/arm/common/Kconfig"
0xf8000000. This assumes the zImage being placed in the first 128MB
from start of memory.
-config ZRELADDR
- hex "Physical address of the decompressed kernel image"
- depends on !AUTO_ZRELADDR
- default 0x00008000 if ARCH_BCMRING ||\
- ARCH_CNS3XXX ||\
- ARCH_DOVE ||\
- ARCH_EBSA110 ||\
- ARCH_FOOTBRIDGE ||\
- ARCH_INTEGRATOR ||\
- ARCH_IOP13XX ||\
- ARCH_IOP33X ||\
- ARCH_IXP2000 ||\
- ARCH_IXP23XX ||\
- ARCH_IXP4XX ||\
- ARCH_KIRKWOOD ||\
- ARCH_KS8695 ||\
- ARCH_LOKI ||\
- ARCH_MMP ||\
- ARCH_MV78XX0 ||\
- ARCH_NOMADIK ||\
- ARCH_NUC93X ||\
- ARCH_NS9XXX ||\
- ARCH_ORION5X ||\
- ARCH_SPEAR3XX ||\
- ARCH_SPEAR6XX ||\
- ARCH_TEGRA ||\
- ARCH_U8500 ||\
- ARCH_VERSATILE ||\
- ARCH_W90X900
- default 0x08008000 if ARCH_MX1 ||\
- ARCH_SHARK
- default 0x10008000 if ARCH_MSM ||\
- ARCH_OMAP1 ||\
- ARCH_RPC
- default 0x20008000 if ARCH_S5P6440 ||\
- ARCH_S5P6442 ||\
- ARCH_S5PC100 ||\
- ARCH_S5PV210
- default 0x30008000 if ARCH_S3C2410 ||\
- ARCH_S3C2400 ||\
- ARCH_S3C2412 ||\
- ARCH_S3C2416 ||\
- ARCH_S3C2440 ||\
- ARCH_S3C2443
- default 0x40008000 if ARCH_STMP378X ||\
- ARCH_STMP37XX ||\
- ARCH_SH7372 ||\
- ARCH_SH7377 ||\
- ARCH_S5PV310
- default 0x50008000 if ARCH_S3C64XX ||\
- ARCH_SH7367
- default 0x60008000 if ARCH_VEXPRESS
- default 0x80008000 if ARCH_MX25 ||\
- ARCH_MX3 ||\
- ARCH_NETX ||\
- ARCH_OMAP2PLUS ||\
- ARCH_PNX4008
- default 0x90008000 if ARCH_MX5 ||\
- ARCH_MX91231
- default 0xa0008000 if ARCH_IOP32X ||\
- ARCH_PXA ||\
- MACH_MX27
- default 0xc0008000 if ARCH_LH7A40X ||\
- MACH_MX21
- default 0xf0008000 if ARCH_AAEC2000 ||\
- ARCH_L7200
- default 0xc0028000 if ARCH_CLPS711X
- default 0x70008000 if ARCH_AT91 && (ARCH_AT91CAP9 || ARCH_AT91SAM9G45)
- default 0x20008000 if ARCH_AT91 && !(ARCH_AT91CAP9 || ARCH_AT91SAM9G45)
- default 0xc0008000 if ARCH_DAVINCI && ARCH_DAVINCI_DA8XX
- default 0x80008000 if ARCH_DAVINCI && !ARCH_DAVINCI_DA8XX
- default 0x00008000 if ARCH_EP93XX && EP93XX_SDCE3_SYNC_PHYS_OFFSET
- default 0xc0008000 if ARCH_EP93XX && EP93XX_SDCE0_PHYS_OFFSET
- default 0xd0008000 if ARCH_EP93XX && EP93XX_SDCE1_PHYS_OFFSET
- default 0xe0008000 if ARCH_EP93XX && EP93XX_SDCE2_PHYS_OFFSET
- default 0xf0008000 if ARCH_EP93XX && EP93XX_SDCE3_ASYNC_PHYS_OFFSET
- default 0x00008000 if ARCH_GEMINI && GEMINI_MEM_SWAP
- default 0x10008000 if ARCH_GEMINI && !GEMINI_MEM_SWAP
- default 0x70008000 if ARCH_REALVIEW && REALVIEW_HIGH_PHYS_OFFSET
- default 0x00008000 if ARCH_REALVIEW && !REALVIEW_HIGH_PHYS_OFFSET
- default 0xc0208000 if ARCH_SA1100 && SA1111
- default 0xc0008000 if ARCH_SA1100 && !SA1111
- default 0x30108000 if ARCH_S3C2410 && PM_H1940
- default 0x28E08000 if ARCH_U300 && MACH_U300_SINGLE_RAM
- default 0x48008000 if ARCH_U300 && !MACH_U300_SINGLE_RAM
- help
- ZRELADDR is the physical address where the decompressed kernel
- image will be placed. ZRELADDR has to be specified when the
- assumption of AUTO_ZRELADDR is not valid, or when ZBOOT_ROM is
- selected.
-
endmenu
menu "CPU Power Management"
MKIMAGE := $(srctree)/scripts/mkuboot.sh
ifneq ($(MACHINE),)
--include $(srctree)/$(MACHINE)/Makefile.boot
+include $(srctree)/$(MACHINE)/Makefile.boot
endif
# Note: the following conditions must always be true:
+# ZRELADDR == virt_to_phys(PAGE_OFFSET + TEXT_OFFSET)
# PARAMS_PHYS must be within 4MB of ZRELADDR
# INITRD_PHYS must be in RAM
+ZRELADDR := $(zreladdr-y)
PARAMS_PHYS := $(params_phys-y)
INITRD_PHYS := $(initrd_phys-y)
-export INITRD_PHYS PARAMS_PHYS
+export ZRELADDR INITRD_PHYS PARAMS_PHYS
targets := Image zImage xipImage bootpImage uImage
ifeq ($(CONFIG_ZBOOT_ROM),y)
$(obj)/uImage: LOADADDR=$(CONFIG_ZBOOT_ROM_TEXT)
else
-$(obj)/uImage: LOADADDR=$(CONFIG_ZRELADDR)
+$(obj)/uImage: LOADADDR=$(ZRELADDR)
endif
ifeq ($(CONFIG_THUMB2_KERNEL),y)
EXTRA_CFLAGS := -fpic -fno-builtin
EXTRA_AFLAGS := -Wa,-march=all
+# Supply ZRELADDR to the decompressor via a linker symbol.
+ifneq ($(CONFIG_AUTO_ZRELADDR),y)
+LDFLAGS_vmlinux := --defsym zreladdr=$(ZRELADDR)
+endif
ifeq ($(CONFIG_CPU_ENDIAN_BE8),y)
LDFLAGS_vmlinux += --be8
endif
$(obj)/font.c: $(FONTC)
$(call cmd,shipped)
-$(obj)/vmlinux.lds: $(obj)/vmlinux.lds.in arch/arm/boot/Makefile .config
+$(obj)/vmlinux.lds: $(obj)/vmlinux.lds.in arch/arm/boot/Makefile $(KCONFIG_CONFIG)
@sed "$(SEDFLAGS)" < $< > $@
and r4, pc, #0xf8000000
add r4, r4, #TEXT_OFFSET
#else
- ldr r4, =CONFIG_ZRELADDR
+ ldr r4, =zreladdr
#endif
subs r0, r0, r1 @ calculate the delta offset
return 0;
}
+int dma_needs_bounce(struct device *dev, dma_addr_t dma_addr, size_t size)
+{
+ dev_dbg(dev, "%s: dma_addr %08x, size %08x\n",
+ __func__, dma_addr, size);
+ return (dev->bus == &pci_bus_type) &&
+ ((dma_addr + size - PHYS_OFFSET) >= SZ_64M);
+}
+
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+ if (mask >= PHYS_OFFSET + SZ_64M - 1)
+ return 0;
+
+ return -EIO;
+}
+
int __init it8152_pci_setup(int nr, struct pci_sys_data *sys)
{
it8152_io.start = IT8152_IO_BASE + 0x12000;
* DMA access and 1 if the buffer needs to be bounced.
*
*/
-#ifdef CONFIG_SA1111
extern int dma_needs_bounce(struct device*, dma_addr_t, size_t);
-#else
-static inline int dma_needs_bounce(struct device *dev, dma_addr_t addr,
- size_t size)
-{
- return 0;
-}
-#endif
/*
* The DMA API, implemented by dmabounce.c. See below for descriptions.
#ifndef __ARM_PERF_EVENT_H__
#define __ARM_PERF_EVENT_H__
-/*
- * NOP: on *most* (read: all supported) ARM platforms, the performance
- * counter interrupts are regular interrupts and not an NMI. This
- * means that when we receive the interrupt we can call
- * perf_event_do_pending() that handles all of the work with
- * interrupts enabled.
- */
-static inline void
-set_perf_event_pending(void)
-{
-}
-
/* ARM performance counters start from 1 (in the cp15 accesses) so use the
* same indexes here for consistency. */
#define PERF_EVENT_INDEX_OFFSET 1
#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
#define pgprot_dmacoherent(prot) \
__pgprot_modify(prot, L_PTE_MT_MASK|L_PTE_EXEC, L_PTE_MT_BUFFERABLE)
+#define __HAVE_PHYS_MEM_ACCESS_PROT
+struct file;
+extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t vma_prot);
#else
#define pgprot_dmacoherent(prot) \
__pgprot_modify(prot, L_PTE_MT_MASK|L_PTE_EXEC, L_PTE_MT_UNCACHED)
#define __NR_perf_event_open (__NR_SYSCALL_BASE+364)
#define __NR_recvmmsg (__NR_SYSCALL_BASE+365)
#define __NR_accept4 (__NR_SYSCALL_BASE+366)
+#define __NR_fanotify_init (__NR_SYSCALL_BASE+367)
+#define __NR_fanotify_mark (__NR_SYSCALL_BASE+368)
+#define __NR_prlimit64 (__NR_SYSCALL_BASE+369)
/*
* The following SWIs are ARM private.
CALL(sys_perf_event_open)
/* 365 */ CALL(sys_recvmmsg)
CALL(sys_accept4)
+ CALL(sys_fanotify_init)
+ CALL(sys_fanotify_mark)
+ CALL(sys_prlimit64)
#ifndef syscalls_counted
.equ syscalls_padding, ((NR_syscalls + 3) & ~3) - NR_syscalls
#define syscalls_counted
beq no_work_pending
mov r0, sp @ 'regs'
mov r2, why @ 'syscall'
+ tst r1, #_TIF_SIGPENDING @ delivering a signal?
+ movne why, #0 @ prevent further restarts
bl do_notify_resume
b ret_slow_syscall @ Check work again
sys_sigreturn_wrapper:
add r0, sp, #S_OFF
+ mov why, #0 @ prevent syscall restart handling
b sys_sigreturn
ENDPROC(sys_sigreturn_wrapper)
sys_rt_sigreturn_wrapper:
add r0, sp, #S_OFF
+ mov why, #0 @ prevent syscall restart handling
b sys_rt_sigreturn
ENDPROC(sys_rt_sigreturn_wrapper)
{
/*
* MSR : cccc 0011 0x10 xxxx xxxx xxxx xxxx xxxx
- * Undef : cccc 0011 0x00 xxxx xxxx xxxx xxxx xxxx
+ * Undef : cccc 0011 0100 xxxx xxxx xxxx xxxx xxxx
* ALU op with S bit and Rd == 15 :
* cccc 001x xxx1 xxxx 1111 xxxx xxxx xxxx
*/
- if ((insn & 0x0f900000) == 0x03200000 || /* MSR & Undef */
+ if ((insn & 0x0fb00000) == 0x03200000 || /* MSR */
+ (insn & 0x0ff00000) == 0x03400000 || /* Undef */
(insn & 0x0e10f000) == 0x0210f000) /* ALU s-bit, R15 */
return INSN_REJECTED;
* *S (bit 20) updates condition codes
* ADC/SBC/RSC reads the C flag
*/
- insn &= 0xfff00fff; /* Rn = r0, Rd = r0 */
+ insn &= 0xffff0fff; /* Rd = r0 */
asi->insn[0] = insn;
asi->insn_handler = (insn & (1 << 20)) ? /* S-bit */
emulate_alu_imm_rwflags : emulate_alu_imm_rflags;
}
EXPORT_SYMBOL_GPL(armpmu_get_max_events);
+int perf_num_counters(void)
+{
+ return armpmu_get_max_events();
+}
+EXPORT_SYMBOL_GPL(perf_num_counters);
+
#define HW_OP_UNSUPPORTED 0xFFFF
#define C(_x) \
}
static void
-armpmu_disable(struct perf_event *event)
+armpmu_read(struct perf_event *event)
{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
- int idx = hwc->idx;
-
- WARN_ON(idx < 0);
-
- clear_bit(idx, cpuc->active_mask);
- armpmu->disable(hwc, idx);
-
- barrier();
- armpmu_event_update(event, hwc, idx);
- cpuc->events[idx] = NULL;
- clear_bit(idx, cpuc->used_mask);
+ /* Don't read disabled counters! */
+ if (hwc->idx < 0)
+ return;
- perf_event_update_userpage(event);
+ armpmu_event_update(event, hwc, hwc->idx);
}
static void
-armpmu_read(struct perf_event *event)
+armpmu_stop(struct perf_event *event, int flags)
{
struct hw_perf_event *hwc = &event->hw;
- /* Don't read disabled counters! */
- if (hwc->idx < 0)
+ if (!armpmu)
return;
- armpmu_event_update(event, hwc, hwc->idx);
+ /*
+ * ARM pmu always has to update the counter, so ignore
+ * PERF_EF_UPDATE, see comments in armpmu_start().
+ */
+ if (!(hwc->state & PERF_HES_STOPPED)) {
+ armpmu->disable(hwc, hwc->idx);
+ barrier(); /* why? */
+ armpmu_event_update(event, hwc, hwc->idx);
+ hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+ }
}
static void
-armpmu_unthrottle(struct perf_event *event)
+armpmu_start(struct perf_event *event, int flags)
{
struct hw_perf_event *hwc = &event->hw;
+ if (!armpmu)
+ return;
+
+ /*
+ * ARM pmu always has to reprogram the period, so ignore
+ * PERF_EF_RELOAD, see the comment below.
+ */
+ if (flags & PERF_EF_RELOAD)
+ WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+
+ hwc->state = 0;
/*
* Set the period again. Some counters can't be stopped, so when we
- * were throttled we simply disabled the IRQ source and the counter
+ * were stopped we simply disabled the IRQ source and the counter
* may have been left counting. If we don't do this step then we may
* get an interrupt too soon or *way* too late if the overflow has
* happened since disabling.
armpmu->enable(hwc, hwc->idx);
}
+static void
+armpmu_del(struct perf_event *event, int flags)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
+ int idx = hwc->idx;
+
+ WARN_ON(idx < 0);
+
+ clear_bit(idx, cpuc->active_mask);
+ armpmu_stop(event, PERF_EF_UPDATE);
+ cpuc->events[idx] = NULL;
+ clear_bit(idx, cpuc->used_mask);
+
+ perf_event_update_userpage(event);
+}
+
static int
-armpmu_enable(struct perf_event *event)
+armpmu_add(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
int idx;
int err = 0;
+ perf_pmu_disable(event->pmu);
+
/* If we don't have a space for the counter then finish early. */
idx = armpmu->get_event_idx(cpuc, hwc);
if (idx < 0) {
cpuc->events[idx] = event;
set_bit(idx, cpuc->active_mask);
- /* Set the period for the event. */
- armpmu_event_set_period(event, hwc, idx);
-
- /* Enable the event. */
- armpmu->enable(hwc, idx);
+ hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+ if (flags & PERF_EF_START)
+ armpmu_start(event, PERF_EF_RELOAD);
/* Propagate our changes to the userspace mapping. */
perf_event_update_userpage(event);
out:
+ perf_pmu_enable(event->pmu);
return err;
}
-static struct pmu pmu = {
- .enable = armpmu_enable,
- .disable = armpmu_disable,
- .unthrottle = armpmu_unthrottle,
- .read = armpmu_read,
-};
+static struct pmu pmu;
static int
validate_event(struct cpu_hw_events *cpuc,
{
struct hw_perf_event fake_event = event->hw;
- if (event->pmu && event->pmu != &pmu)
- return 0;
+ if (event->pmu != &pmu || event->state <= PERF_EVENT_STATE_OFF)
+ return 1;
return armpmu->get_event_idx(cpuc, &fake_event) >= 0;
}
return err;
}
-const struct pmu *
-hw_perf_event_init(struct perf_event *event)
+static int armpmu_event_init(struct perf_event *event)
{
int err = 0;
+ switch (event->attr.type) {
+ case PERF_TYPE_RAW:
+ case PERF_TYPE_HARDWARE:
+ case PERF_TYPE_HW_CACHE:
+ break;
+
+ default:
+ return -ENOENT;
+ }
+
if (!armpmu)
- return ERR_PTR(-ENODEV);
+ return -ENODEV;
event->destroy = hw_perf_event_destroy;
if (!atomic_inc_not_zero(&active_events)) {
- if (atomic_read(&active_events) > perf_max_events) {
+ if (atomic_read(&active_events) > armpmu->num_events) {
atomic_dec(&active_events);
- return ERR_PTR(-ENOSPC);
+ return -ENOSPC;
}
mutex_lock(&pmu_reserve_mutex);
}
if (err)
- return ERR_PTR(err);
+ return err;
err = __hw_perf_event_init(event);
if (err)
hw_perf_event_destroy(event);
- return err ? ERR_PTR(err) : &pmu;
+ return err;
}
-void
-hw_perf_enable(void)
+static void armpmu_enable(struct pmu *pmu)
{
/* Enable all of the perf events on hardware. */
int idx;
armpmu->start();
}
-void
-hw_perf_disable(void)
+static void armpmu_disable(struct pmu *pmu)
{
if (armpmu)
armpmu->stop();
}
+static struct pmu pmu = {
+ .pmu_enable = armpmu_enable,
+ .pmu_disable = armpmu_disable,
+ .event_init = armpmu_event_init,
+ .add = armpmu_add,
+ .del = armpmu_del,
+ .start = armpmu_start,
+ .stop = armpmu_stop,
+ .read = armpmu_read,
+};
+
/*
* ARMv6 Performance counter handling code.
*
/*
* Handle the pending perf events.
*
- * Note: this call *must* be run with interrupts enabled. For
- * platforms that can have the PMU interrupts raised as a PMI, this
+ * Note: this call *must* be run with interrupts disabled. For
+ * platforms that can have the PMU interrupts raised as an NMI, this
* will not work.
*/
- perf_event_do_pending();
+ irq_work_run();
return IRQ_HANDLED;
}
/*
* Handle the pending perf events.
*
- * Note: this call *must* be run with interrupts enabled. For
- * platforms that can have the PMU interrupts raised as a PMI, this
+ * Note: this call *must* be run with interrupts disabled. For
+ * platforms that can have the PMU interrupts raised as an NMI, this
* will not work.
*/
- perf_event_do_pending();
+ irq_work_run();
return IRQ_HANDLED;
}
armpmu->disable(hwc, idx);
}
- perf_event_do_pending();
+ irq_work_run();
/*
* Re-enable the PMU.
armpmu->disable(hwc, idx);
}
- perf_event_do_pending();
+ irq_work_run();
/*
* Re-enable the PMU.
armpmu = &armv6pmu;
memcpy(armpmu_perf_cache_map, armv6_perf_cache_map,
sizeof(armv6_perf_cache_map));
- perf_max_events = armv6pmu.num_events;
break;
case 0xB020: /* ARM11mpcore */
armpmu = &armv6mpcore_pmu;
memcpy(armpmu_perf_cache_map,
armv6mpcore_perf_cache_map,
sizeof(armv6mpcore_perf_cache_map));
- perf_max_events = armv6mpcore_pmu.num_events;
break;
case 0xC080: /* Cortex-A8 */
armv7pmu.id = ARM_PERF_PMU_ID_CA8;
/* Reset PMNC and read the nb of CNTx counters
supported */
armv7pmu.num_events = armv7_reset_read_pmnc();
- perf_max_events = armv7pmu.num_events;
break;
case 0xC090: /* Cortex-A9 */
armv7pmu.id = ARM_PERF_PMU_ID_CA9;
/* Reset PMNC and read the nb of CNTx counters
supported */
armv7pmu.num_events = armv7_reset_read_pmnc();
- perf_max_events = armv7pmu.num_events;
break;
}
/* Intel CPUs [xscale]. */
armpmu = &xscale1pmu;
memcpy(armpmu_perf_cache_map, xscale_perf_cache_map,
sizeof(xscale_perf_cache_map));
- perf_max_events = xscale1pmu.num_events;
break;
case 2:
armpmu = &xscale2pmu;
memcpy(armpmu_perf_cache_map, xscale_perf_cache_map,
sizeof(xscale_perf_cache_map));
- perf_max_events = xscale2pmu.num_events;
break;
}
}
arm_pmu_names[armpmu->id], armpmu->num_events);
} else {
pr_info("no hardware support available\n");
- perf_max_events = -1;
}
+ perf_pmu_register(&pmu);
+
return 0;
}
arch_initcall(init_hw_perf_events);
/*
* Callchain handling code.
*/
-static inline void
-callchain_store(struct perf_callchain_entry *entry,
- u64 ip)
-{
- if (entry->nr < PERF_MAX_STACK_DEPTH)
- entry->ip[entry->nr++] = ip;
-}
/*
* The registers we're interested in are at the end of the variable
if (__copy_from_user_inatomic(&buftail, tail, sizeof(buftail)))
return NULL;
- callchain_store(entry, buftail.lr);
+ perf_callchain_store(entry, buftail.lr);
/*
* Frame pointers should strictly progress back up the stack
return buftail.fp - 1;
}
-static void
-perf_callchain_user(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
+void
+perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
{
struct frame_tail *tail;
- callchain_store(entry, PERF_CONTEXT_USER);
-
- if (!user_mode(regs))
- regs = task_pt_regs(current);
tail = (struct frame_tail *)regs->ARM_fp - 1;
void *data)
{
struct perf_callchain_entry *entry = data;
- callchain_store(entry, fr->pc);
+ perf_callchain_store(entry, fr->pc);
return 0;
}
-static void
-perf_callchain_kernel(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
+void
+perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
{
struct stackframe fr;
- callchain_store(entry, PERF_CONTEXT_KERNEL);
fr.fp = regs->ARM_fp;
fr.sp = regs->ARM_sp;
fr.lr = regs->ARM_lr;
fr.pc = regs->ARM_pc;
walk_stackframe(&fr, callchain_trace, entry);
}
-
-static void
-perf_do_callchain(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
-{
- int is_user;
-
- if (!regs)
- return;
-
- is_user = user_mode(regs);
-
- if (!current || !current->pid)
- return;
-
- if (is_user && current->state != TASK_RUNNING)
- return;
-
- if (!is_user)
- perf_callchain_kernel(regs, entry);
-
- if (current->mm)
- perf_callchain_user(regs, entry);
-}
-
-static DEFINE_PER_CPU(struct perf_callchain_entry, pmc_irq_entry);
-
-struct perf_callchain_entry *
-perf_callchain(struct pt_regs *regs)
-{
- struct perf_callchain_entry *entry = &__get_cpu_var(pmc_irq_entry);
-
- entry->nr = 0;
- perf_do_callchain(regs, entry);
- return entry;
-}
.pmc_mask = 1 << AT91SAM9G45_ID_SSC1,
.type = CLK_TYPE_PERIPHERAL,
};
-static struct clk tcb_clk = {
- .name = "tcb_clk",
+static struct clk tcb0_clk = {
+ .name = "tcb0_clk",
.pmc_mask = 1 << AT91SAM9G45_ID_TCB,
.type = CLK_TYPE_PERIPHERAL,
};
.parent = &uhphs_clk,
};
+/* One additional fake clock for second TC block */
+static struct clk tcb1_clk = {
+ .name = "tcb1_clk",
+ .pmc_mask = 0,
+ .type = CLK_TYPE_PERIPHERAL,
+ .parent = &tcb0_clk,
+};
+
static struct clk *periph_clocks[] __initdata = {
&pioA_clk,
&pioB_clk,
&spi1_clk,
&ssc0_clk,
&ssc1_clk,
- &tcb_clk,
+ &tcb0_clk,
&pwm_clk,
&tsc_clk,
&dma_clk,
&mmc1_clk,
// irq0
&ohci_clk,
+ &tcb1_clk,
};
/*
.end = AT91_BASE_SYS + AT91_DMA + SZ_512 - 1,
.flags = IORESOURCE_MEM,
},
- [2] = {
+ [1] = {
.start = AT91SAM9G45_ID_DMA,
.end = AT91SAM9G45_ID_DMA,
.flags = IORESOURCE_IRQ,
.sda_is_open_drain = 1,
.scl_pin = AT91_PIN_PA21,
.scl_is_open_drain = 1,
- .udelay = 2, /* ~100 kHz */
+ .udelay = 5, /* ~100 kHz */
};
static struct platform_device at91sam9g45_twi0_device = {
.sda_is_open_drain = 1,
.scl_pin = AT91_PIN_PB11,
.scl_is_open_drain = 1,
- .udelay = 2, /* ~100 kHz */
+ .udelay = 5, /* ~100 kHz */
};
static struct platform_device at91sam9g45_twi1_device = {
static void __init at91_add_device_tc(void)
{
/* this chip has one clock and irq for all six TC channels */
- at91_clock_associate("tcb_clk", &at91sam9g45_tcb0_device.dev, "t0_clk");
+ at91_clock_associate("tcb0_clk", &at91sam9g45_tcb0_device.dev, "t0_clk");
platform_device_register(&at91sam9g45_tcb0_device);
- at91_clock_associate("tcb_clk", &at91sam9g45_tcb1_device.dev, "t0_clk");
+ at91_clock_associate("tcb1_clk", &at91sam9g45_tcb1_device.dev, "t0_clk");
platform_device_register(&at91sam9g45_tcb1_device);
}
#else
.start = AT91_PIN_PC11,
.end = AT91_PIN_PC11,
.flags = IORESOURCE_IRQ
+ | IORESOURCE_IRQ_LOWEDGE | IORESOURCE_IRQ_HIGHEDGE,
}
};
static struct dm9000_plat_data dm9000_platdata = {
- .flags = DM9000_PLATF_16BITONLY,
+ .flags = DM9000_PLATF_16BITONLY | DM9000_PLATF_NO_EEPROM,
};
static struct platform_device dm9000_device = {
/*
- * MCI (SD/MMC)
- */
-static struct at91_mmc_data __initdata ek_mmc_data = {
- .wire4 = 1,
-// .det_pin = ... not connected
-// .wp_pin = ... not connected
-// .vcc_pin = ... not connected
-};
-
-
-/*
* NAND flash
*/
static struct mtd_partition __initdata ek_nand_partition[] = {
at91_add_device_nand(&ek_nand_data);
}
+/*
+ * SPI related devices
+ */
+#if defined(CONFIG_SPI_ATMEL) || defined(CONFIG_SPI_ATMEL_MODULE)
/*
* ADS7846 Touchscreen
#endif
};
+#else /* CONFIG_SPI_ATMEL_* */
+/* spi0 and mmc/sd share the same PIO pins: cannot be used at the same time */
+
+/*
+ * MCI (SD/MMC)
+ * det_pin, wp_pin and vcc_pin are not connected
+ */
+static struct at91_mmc_data __initdata ek_mmc_data = {
+ .wire4 = 1,
+};
+
+#endif /* CONFIG_SPI_ATMEL_* */
+
/*
* LCD Controller
int __init clk_register(struct clk *clk)
{
if (clk_is_peripheral(clk)) {
- clk->parent = &mck;
+ if (!clk->parent)
+ clk->parent = &mck;
clk->mode = pmc_periph_mode;
list_add_tail(&clk->node, &clocks);
}
static inline void arch_idle(void)
{
-#ifndef CONFIG_DEBUG_KERNEL
/*
* Disable the processor clock. The processor will be automatically
* re-enabled by an interrupt or by a reset.
*/
at91_sys_write(AT91_PMC_SCDR, AT91_PMC_PCK);
-#else
+#ifndef CONFIG_CPU_ARM920T
/*
* Set the processor (CP15) into 'Wait for Interrupt' mode.
- * Unlike disabling the processor clock via the PMC (above)
- * this allows the processor to be woken via JTAG.
+ * Post-RM9200 processors need this in conjunction with the above
+ * to save power when idle.
*/
cpu_do_idle();
#endif
memset(&gDMA, 0, sizeof(gDMA));
- init_MUTEX_LOCKED(&gDMA.lock);
+ sema_init(&gDMA.lock, 0);
init_waitqueue_head(&gDMA.freeChannelQ);
/* Initialize the Hardware */
{
memset(memMap, 0, sizeof(*memMap));
- init_MUTEX(&memMap->lock);
+ sema_init(&memMap->lock, 1);
return 0;
}
.virtual = SRAM_VIRT,
.pfn = __phys_to_pfn(0x00010000),
.length = SZ_32K,
- /* MT_MEMORY_NONCACHED requires supersection alignment */
- .type = MT_DEVICE,
+ .type = MT_MEMORY_NONCACHED,
},
};
.virtual = SRAM_VIRT,
.pfn = __phys_to_pfn(0x00010000),
.length = SZ_32K,
- /* MT_MEMORY_NONCACHED requires supersection alignment */
- .type = MT_DEVICE,
+ .type = MT_MEMORY_NONCACHED,
},
};
.virtual = SRAM_VIRT,
.pfn = __phys_to_pfn(0x00008000),
.length = SZ_16K,
- /* MT_MEMORY_NONCACHED requires supersection alignment */
- .type = MT_DEVICE,
+ .type = MT_MEMORY_NONCACHED,
},
};
.virtual = SRAM_VIRT,
.pfn = __phys_to_pfn(0x00010000),
.length = SZ_32K,
- /* MT_MEMORY_NONCACHED requires supersection alignment */
- .type = MT_DEVICE,
+ .type = MT_MEMORY_NONCACHED,
},
};
#define IO_SPACE_LIMIT 0xffffffff
-#define __io(a) ((void __iomem *)(((a) - DOVE_PCIE0_IO_PHYS_BASE) +\
- DOVE_PCIE0_IO_VIRT_BASE))
-#define __mem_pci(a) (a)
+#define __io(a) ((void __iomem *)(((a) - DOVE_PCIE0_IO_BUS_BASE) + \
+ DOVE_PCIE0_IO_VIRT_BASE))
+#define __mem_pci(a) (a)
#endif
clkdev_add_table(clocks, ARRAY_SIZE(clocks));
return 0;
}
-arch_initcall(ep93xx_clock_init);
+postcore_initcall(ep93xx_clock_init);
v &= ~(M2P_CONTROL_STALL_IRQ_EN | M2P_CONTROL_NFB_IRQ_EN);
m2p_set_control(ch, v);
- while (m2p_channel_state(ch) == STATE_ON)
+ while (m2p_channel_state(ch) >= STATE_ON)
cpu_relax();
m2p_set_control(ch, 0x0);
select IMX_HAVE_PLATFORM_IMX_I2C
select IMX_HAVE_PLATFORM_IMX_UART
select IMX_HAVE_PLATFORM_MXC_NAND
+ select MXC_ULPI if USB_ULPI
help
Include support for Eukrea CPUIMX27 platform. This includes
specific configurations for the module and its peripherals.
i2c_register_board_info(0, eukrea_cpuimx27_i2c_devices,
ARRAY_SIZE(eukrea_cpuimx27_i2c_devices));
- imx27_add_i2c_imx1(&cpuimx27_i2c1_data);
+ imx27_add_i2c_imx0(&cpuimx27_i2c1_data);
platform_add_devices(platform_devices, ARRAY_SIZE(platform_devices));
return pci_scan_bus(sys->busnr, &ixp4xx_ops, sys);
}
+int dma_set_coherent_mask(struct device *dev, u64 mask)
+{
+ if (mask >= SZ_64M - 1)
+ return 0;
+
+ return -EIO;
+}
+
EXPORT_SYMBOL(ixp4xx_pci_read);
EXPORT_SYMBOL(ixp4xx_pci_write);
#define PCIBIOS_MAX_MEM 0x4BFFFFFF
#endif
+#define ARCH_HAS_DMA_SET_COHERENT_MASK
+
#define pcibios_assign_all_busses() 1
/* Register locations and bits */
#define KIRKWOOD_PCIE1_IO_PHYS_BASE 0xf3000000
#define KIRKWOOD_PCIE1_IO_VIRT_BASE 0xfef00000
-#define KIRKWOOD_PCIE1_IO_BUS_BASE 0x00000000
+#define KIRKWOOD_PCIE1_IO_BUS_BASE 0x00100000
#define KIRKWOOD_PCIE1_IO_SIZE SZ_1M
#define KIRKWOOD_PCIE_IO_PHYS_BASE 0xf2000000
* IORESOURCE_IO
*/
pp->res[0].name = "PCIe 0 I/O Space";
- pp->res[0].start = KIRKWOOD_PCIE_IO_PHYS_BASE;
+ pp->res[0].start = KIRKWOOD_PCIE_IO_BUS_BASE;
pp->res[0].end = pp->res[0].start + KIRKWOOD_PCIE_IO_SIZE - 1;
pp->res[0].flags = IORESOURCE_IO;
* IORESOURCE_IO
*/
pp->res[0].name = "PCIe 1 I/O Space";
- pp->res[0].start = KIRKWOOD_PCIE1_IO_PHYS_BASE;
+ pp->res[0].start = KIRKWOOD_PCIE1_IO_BUS_BASE;
pp->res[0].end = pp->res[0].start + KIRKWOOD_PCIE1_IO_SIZE - 1;
pp->res[0].flags = IORESOURCE_IO;
#ifndef __ASM_MACH_SYSTEM_H
#define __ASM_MACH_SYSTEM_H
+#include <mach/cputype.h>
+
static inline void arch_idle(void)
{
cpu_do_idle();
static inline void arch_reset(char mode, const char *cmd)
{
- cpu_reset(0);
+ if (cpu_is_pxa168())
+ cpu_reset(0xffff0000);
+ else
+ cpu_reset(0);
}
#endif /* __ASM_MACH_SYSTEM_H */
* Add platform devices present on this baseboard and init
* them from CPU side as far as required to use them later on
*/
-void __init eukrea_mbimxsd_baseboard_init(void)
+void __init eukrea_mbimxsd25_baseboard_init(void)
{
if (mxc_iomux_v3_setup_multiple_pads(eukrea_mbimxsd_pads,
ARRAY_SIZE(eukrea_mbimxsd_pads)))
if (!otg_mode_host)
mxc_register_device(&otg_udc_device, &otg_device_pdata);
-#ifdef CONFIG_MACH_EUKREA_MBIMXSD_BASEBOARD
- eukrea_mbimxsd_baseboard_init();
+#ifdef CONFIG_MACH_EUKREA_MBIMXSD25_BASEBOARD
+ eukrea_mbimxsd25_baseboard_init();
#endif
}
aad = &clk_consumer[(pdr0 >> 16) & 0xf];
if (aad->sel)
- fref = fref * 2 / 3;
+ fref = fref * 3 / 4;
return fref / aad->arm;
}
{
unsigned long pdr0 = __raw_readl(CCM_BASE + CCM_PDR0);
struct arm_ahb_div *aad;
- unsigned long fref = get_rate_mpll();
+ unsigned long fref = get_rate_arm();
aad = &clk_consumer[(pdr0 >> 16) & 0xf];
return get_rate_ahb(NULL) >> 1;
}
-static unsigned long get_3_3_div(unsigned long in)
-{
- return (((in >> 3) & 0x7) + 1) * ((in & 0x7) + 1);
-}
-
static unsigned long get_rate_uart(struct clk *clk)
{
unsigned long pdr3 = __raw_readl(CCM_BASE + CCM_PDR3);
unsigned long pdr4 = __raw_readl(CCM_BASE + CCM_PDR4);
- unsigned long div = get_3_3_div(pdr4 >> 10);
+ unsigned long div = ((pdr4 >> 10) & 0x3f) + 1;
if (pdr3 & (1 << 14))
return get_rate_arm() / div;
break;
}
- return rate / get_3_3_div(div);
+ return rate / (div + 1);
}
static unsigned long get_rate_mshc(struct clk *clk)
else
rate = get_rate_ppll();
- return rate / get_3_3_div((pdr2 >> 16) & 0x3f);
+ return rate / (((pdr2 >> 16) & 0x3f) + 1);
}
static unsigned long get_rate_otg(struct clk *clk)
else
rate = get_rate_ppll();
- return rate / get_3_3_div((pdr4 >> 22) & 0x3f);
+ return rate / (((pdr4 >> 22) & 0x3f) + 1);
}
static unsigned long get_rate_ipg_per(struct clk *clk)
{
unsigned long pdr0 = __raw_readl(CCM_BASE + CCM_PDR0);
unsigned long pdr4 = __raw_readl(CCM_BASE + CCM_PDR4);
- unsigned long div1, div2;
+ unsigned long div;
if (pdr0 & (1 << 26)) {
- div1 = (pdr4 >> 19) & 0x7;
- div2 = (pdr4 >> 16) & 0x7;
- return get_rate_arm() / ((div1 + 1) * (div2 + 1));
+ div = (pdr4 >> 16) & 0x3f;
+ return get_rate_arm() / (div + 1);
} else {
- div1 = (pdr0 >> 12) & 0x7;
- return get_rate_ahb(NULL) / div1;
+ div = (pdr0 >> 12) & 0x7;
+ return get_rate_ahb(NULL) / (div + 1);
}
}
+static unsigned long get_rate_hsp(struct clk *clk)
+{
+ unsigned long hsp_podf = (__raw_readl(CCM_BASE + CCM_PDR0) >> 20) & 0x03;
+ unsigned long fref = get_rate_mpll();
+
+ if (fref > 400 * 1000 * 1000) {
+ switch (hsp_podf) {
+ case 0:
+ return fref >> 2;
+ case 1:
+ return fref >> 3;
+ case 2:
+ return fref / 3;
+ }
+ } else {
+ switch (hsp_podf) {
+ case 0:
+ case 2:
+ return fref / 3;
+ case 1:
+ return fref / 6;
+ }
+ }
+
+ return 0;
+}
+
static int clk_cgr_enable(struct clk *clk)
{
u32 reg;
DEFINE_CLOCK(i2c2_clk, 1, CCM_CGR1, 12, get_rate_ipg_per, NULL);
DEFINE_CLOCK(i2c3_clk, 2, CCM_CGR1, 14, get_rate_ipg_per, NULL);
DEFINE_CLOCK(iomuxc_clk, 0, CCM_CGR1, 16, NULL, NULL);
-DEFINE_CLOCK(ipu_clk, 0, CCM_CGR1, 18, get_rate_ahb, NULL);
+DEFINE_CLOCK(ipu_clk, 0, CCM_CGR1, 18, get_rate_hsp, NULL);
DEFINE_CLOCK(kpp_clk, 0, CCM_CGR1, 20, get_rate_ipg, NULL);
DEFINE_CLOCK(mlb_clk, 0, CCM_CGR1, 22, get_rate_ahb, NULL);
DEFINE_CLOCK(mshc_clk, 0, CCM_CGR1, 24, get_rate_mshc, NULL);
int __init mx35_clocks_init()
{
- unsigned int ll = 0;
+ unsigned int cgr2 = 3 << 26, cgr3 = 0;
#if defined(CONFIG_DEBUG_LL) && !defined(CONFIG_DEBUG_ICEDCC)
- ll = (3 << 16);
+ cgr2 |= 3 << 16;
#endif
clkdev_add_table(lookups, ARRAY_SIZE(lookups));
__raw_writel((3 << 18), CCM_BASE + CCM_CGR0);
__raw_writel((3 << 2) | (3 << 4) | (3 << 6) | (3 << 8) | (3 << 16),
CCM_BASE + CCM_CGR1);
- __raw_writel((3 << 26) | ll, CCM_BASE + CCM_CGR2);
- __raw_writel(0, CCM_BASE + CCM_CGR3);
+
+ /*
+ * Check if we came up in internal boot mode. If yes, we need some
+ * extra clocks turned on, otherwise the MX35 boot ROM code will
+ * hang after a watchdog reset.
+ */
+ if (!(__raw_readl(CCM_BASE + CCM_RCSR) & (3 << 10))) {
+ /* Additionally turn on UART1, SCC, and IIM clocks */
+ cgr2 |= 3 << 16 | 3 << 4;
+ cgr3 |= 3 << 2;
+ }
+
+ __raw_writel(cgr2, CCM_BASE + CCM_CGR2);
+ __raw_writel(cgr3, CCM_BASE + CCM_CGR3);
mxc_timer_init(&gpt_clk,
MX35_IO_ADDRESS(MX35_GPT1_BASE_ADDR), MX35_INT_GPT);
* Add platform devices present on this baseboard and init
* them from CPU side as far as required to use them later on
*/
-void __init eukrea_mbimxsd_baseboard_init(void)
+void __init eukrea_mbimxsd35_baseboard_init(void)
{
if (mxc_iomux_v3_setup_multiple_pads(eukrea_mbimxsd_pads,
ARRAY_SIZE(eukrea_mbimxsd_pads)))
if (!otg_mode_host)
mxc_register_device(&mxc_otg_udc_device, &otg_device_pdata);
-#ifdef CONFIG_MACH_EUKREA_MBIMXSD_BASEBOARD
- eukrea_mbimxsd_baseboard_init();
+#ifdef CONFIG_MACH_EUKREA_MBIMXSD35_BASEBOARD
+ eukrea_mbimxsd35_baseboard_init();
#endif
}
{
u32 reg;
reg = __raw_readl(clk->enable_reg);
- reg &= ~(MXC_CCM_CCGRx_MOD_OFF << clk->enable_shift);
+ reg &= ~(MXC_CCM_CCGRx_CG_MASK << clk->enable_shift);
__raw_writel(reg, clk->enable_reg);
}
freqs.cpu = policy->cpu;
if (freq_debug)
- pr_debug(KERN_INFO "Changing CPU frequency to %d Mhz, "
- "(SDRAM %d Mhz)\n",
+ pr_debug("Changing CPU frequency to %d Mhz, (SDRAM %d Mhz)\n",
freqs.new / 1000, (pxa_freq_settings[idx].div2) ?
(new_freq_mem / 2000) : (new_freq_mem / 1000));
return 0;
}
-static __init int pxa_cpufreq_init(struct cpufreq_policy *policy)
+static int pxa_cpufreq_init(struct cpufreq_policy *policy)
{
int i;
unsigned int freq;
return 0;
}
-static __init int pxa3xx_cpufreq_init(struct cpufreq_policy *policy)
+static int pxa3xx_cpufreq_init(struct cpufreq_policy *policy)
{
int ret = -EINVAL;
* <= 0x2 for pxa21x/pxa25x/pxa26x/pxa27x
* == 0x3 for pxa300/pxa310/pxa320
*/
+#if defined(CONFIG_PXA25x) || defined(CONFIG_PXA27x)
#define __cpu_is_pxa2xx(id) \
({ \
unsigned int _id = (id) >> 13 & 0x7; \
_id <= 0x2; \
})
+#else
+#define __cpu_is_pxa2xx(id) (0)
+#endif
+#ifdef CONFIG_PXA3xx
#define __cpu_is_pxa3xx(id) \
({ \
unsigned int _id = (id) >> 13 & 0x7; \
_id == 0x3; \
})
+#else
+#define __cpu_is_pxa3xx(id) (0)
+#endif
+#if defined(CONFIG_CPU_PXA930) || defined(CONFIG_CPU_PXA935)
#define __cpu_is_pxa93x(id) \
({ \
unsigned int _id = (id) >> 4 & 0xfff; \
_id == 0x683 || _id == 0x693; \
})
+#else
+#define __cpu_is_pxa93x(id) (0)
+#endif
#define cpu_is_pxa2xx() \
({ \
#define PCIBIOS_MIN_IO 0
#define PCIBIOS_MIN_MEM 0
#define pcibios_assign_all_busses() 1
+#define ARCH_HAS_DMA_SET_COHERENT_MASK
#endif
-
#endif /* _ASM_ARCH_HARDWARE_H */
#ifndef __ASM_ARM_ARCH_IO_H
#define __ASM_ARM_ARCH_IO_H
+#include <mach/hardware.h>
+
#define IO_SPACE_LIMIT 0xffffffff
/*
#define GPIO46_CI_DD_7 MFP_CFG_DRV(GPIO46, AF0, DS04X)
#define GPIO47_CI_DD_8 MFP_CFG_DRV(GPIO47, AF1, DS04X)
#define GPIO48_CI_DD_9 MFP_CFG_DRV(GPIO48, AF1, DS04X)
-#define GPIO52_CI_HSYNC MFP_CFG_DRV(GPIO52, AF0, DS04X)
-#define GPIO51_CI_VSYNC MFP_CFG_DRV(GPIO51, AF0, DS04X)
#define GPIO49_CI_MCLK MFP_CFG_DRV(GPIO49, AF0, DS04X)
#define GPIO50_CI_PCLK MFP_CFG_DRV(GPIO50, AF0, DS04X)
+#define GPIO51_CI_HSYNC MFP_CFG_DRV(GPIO51, AF0, DS04X)
+#define GPIO52_CI_VSYNC MFP_CFG_DRV(GPIO52, AF0, DS04X)
/* KEYPAD */
#define GPIO3_KP_DKIN_6 MFP_CFG_LPM(GPIO3, AF2, FLOAT)
},
};
+static struct i2c_pxa_platform_data palm27x_i2c_power_info = {
+ .use_pio = 1,
+};
+
void __init palm27x_pmic_init(void)
{
i2c_register_board_info(1, ARRAY_AND_SIZE(palm27x_pi2c_board_info));
- pxa27x_set_i2c_power_info(NULL);
+ pxa27x_set_i2c_power_info(&palm27x_i2c_power_info);
}
#endif
#if defined(CONFIG_MMC_PXA) || defined(CONFIG_MMC_PXA_MODULE)
static struct pxamci_platform_data vpac270_mci_platform_data = {
.ocr_mask = MMC_VDD_32_33 | MMC_VDD_33_34,
+ .gpio_power = -1,
.gpio_card_detect = GPIO53_VPAC270_SD_DETECT_N,
.gpio_card_ro = GPIO52_VPAC270_SD_READONLY,
.detect_delay_ms = 200,
#include <mach/map.h>
#include <mach/gpio-bank-c.h>
#include <mach/spi-clocks.h>
+#include <mach/irqs.h>
#include <plat/s3c64xx-spi.h>
#include <plat/gpio-cfg.h>
-#include <plat/irqs.h>
+#include <plat/devs.h>
static char *spi_src_clks[] = {
[S3C64XX_SPI_SRCCLK_PCLK] = "pclk",
#include <plat/devs.h>
#include <plat/regs-serial.h>
-#define UCON S3C2410_UCON_DEFAULT | S3C2410_UCON_UCLK
-#define ULCON S3C2410_LCON_CS8 | S3C2410_LCON_PNONE | S3C2410_LCON_STOPB
-#define UFCON S3C2410_UFCON_RXTRIG8 | S3C2410_UFCON_FIFOMODE
+#define UCON (S3C2410_UCON_DEFAULT | S3C2410_UCON_UCLK)
+#define ULCON (S3C2410_LCON_CS8 | S3C2410_LCON_PNONE | S3C2410_LCON_STOPB)
+#define UFCON (S3C2410_UFCON_RXTRIG8 | S3C2410_UFCON_FIFOMODE)
static struct s3c2410_uartcfg real6410_uartcfgs[] __initdata = {
[0] = {
- .hwport = 0,
- .flags = 0,
- .ucon = UCON,
- .ulcon = ULCON,
- .ufcon = UFCON,
+ .hwport = 0,
+ .flags = 0,
+ .ucon = UCON,
+ .ulcon = ULCON,
+ .ufcon = UFCON,
},
[1] = {
- .hwport = 1,
- .flags = 0,
- .ucon = UCON,
- .ulcon = ULCON,
- .ufcon = UFCON,
+ .hwport = 1,
+ .flags = 0,
+ .ucon = UCON,
+ .ulcon = ULCON,
+ .ufcon = UFCON,
},
[2] = {
- .hwport = 2,
- .flags = 0,
- .ucon = UCON,
- .ulcon = ULCON,
- .ufcon = UFCON,
+ .hwport = 2,
+ .flags = 0,
+ .ucon = UCON,
+ .ulcon = ULCON,
+ .ufcon = UFCON,
},
[3] = {
- .hwport = 3,
- .flags = 0,
- .ucon = UCON,
- .ulcon = ULCON,
- .ufcon = UFCON,
+ .hwport = 3,
+ .flags = 0,
+ .ucon = UCON,
+ .ulcon = ULCON,
+ .ufcon = UFCON,
},
};
/* DM9000AEP 10/100 ethernet controller */
static struct resource real6410_dm9k_resource[] = {
- [0] = {
- .start = S3C64XX_PA_XM0CSN1,
- .end = S3C64XX_PA_XM0CSN1 + 1,
- .flags = IORESOURCE_MEM
- },
- [1] = {
- .start = S3C64XX_PA_XM0CSN1 + 4,
- .end = S3C64XX_PA_XM0CSN1 + 5,
- .flags = IORESOURCE_MEM
- },
- [2] = {
- .start = S3C_EINT(7),
- .end = S3C_EINT(7),
- .flags = IORESOURCE_IRQ,
- }
+ [0] = {
+ .start = S3C64XX_PA_XM0CSN1,
+ .end = S3C64XX_PA_XM0CSN1 + 1,
+ .flags = IORESOURCE_MEM
+ },
+ [1] = {
+ .start = S3C64XX_PA_XM0CSN1 + 4,
+ .end = S3C64XX_PA_XM0CSN1 + 5,
+ .flags = IORESOURCE_MEM
+ },
+ [2] = {
+ .start = S3C_EINT(7),
+ .end = S3C_EINT(7),
+ .flags = IORESOURCE_IRQ | IORESOURCE_IRQ_HIGHLEVEL
+ }
};
static struct dm9000_plat_data real6410_dm9k_pdata = {
- .flags = (DM9000_PLATF_16BITONLY | DM9000_PLATF_NO_EEPROM),
+ .flags = (DM9000_PLATF_16BITONLY | DM9000_PLATF_NO_EEPROM),
};
static struct platform_device real6410_device_eth = {
- .name = "dm9000",
- .id = -1,
- .num_resources = ARRAY_SIZE(real6410_dm9k_resource),
- .resource = real6410_dm9k_resource,
- .dev = {
- .platform_data = &real6410_dm9k_pdata,
- },
+ .name = "dm9000",
+ .id = -1,
+ .num_resources = ARRAY_SIZE(real6410_dm9k_resource),
+ .resource = real6410_dm9k_resource,
+ .dev = {
+ .platform_data = &real6410_dm9k_pdata,
+ },
};
static struct platform_device *real6410_devices[] __initdata = {
/* set timing for nCS1 suitable for ethernet chip */
__raw_writel((0 << S3C64XX_SROM_BCX__PMC__SHIFT) |
- (6 << S3C64XX_SROM_BCX__TACP__SHIFT) |
- (4 << S3C64XX_SROM_BCX__TCAH__SHIFT) |
- (1 << S3C64XX_SROM_BCX__TCOH__SHIFT) |
- (13 << S3C64XX_SROM_BCX__TACC__SHIFT) |
- (4 << S3C64XX_SROM_BCX__TCOS__SHIFT) |
- (0 << S3C64XX_SROM_BCX__TACS__SHIFT), S3C64XX_SROM_BC1);
+ (6 << S3C64XX_SROM_BCX__TACP__SHIFT) |
+ (4 << S3C64XX_SROM_BCX__TCAH__SHIFT) |
+ (1 << S3C64XX_SROM_BCX__TCOH__SHIFT) |
+ (13 << S3C64XX_SROM_BCX__TACC__SHIFT) |
+ (4 << S3C64XX_SROM_BCX__TCOS__SHIFT) |
+ (0 << S3C64XX_SROM_BCX__TACS__SHIFT), S3C64XX_SROM_BC1);
platform_add_devices(real6410_devices, ARRAY_SIZE(real6410_devices));
}
#include <linux/sysdev.h>
#include <linux/serial_core.h>
#include <linux/platform_device.h>
+#include <linux/sched.h>
#include <asm/mach/arch.h>
#include <asm/mach/map.h>
#include <linux/sysdev.h>
#include <linux/serial_core.h>
#include <linux/platform_device.h>
+#include <linux/sched.h>
#include <asm/mach/arch.h>
#include <asm/mach/map.h>
#include <linux/sysdev.h>
#include <linux/serial_core.h>
#include <linux/platform_device.h>
+#include <linux/sched.h>
#include <asm/mach/arch.h>
#include <asm/mach/map.h>
return s5p_gatectrl(S5P_CLKGATE_IP3, clk, enable);
}
-static int s5pv210_clk_ip4_ctrl(struct clk *clk, int enable)
-{
- return s5p_gatectrl(S5P_CLKGATE_IP4, clk, enable);
-}
-
static int s5pv210_clk_mask0_ctrl(struct clk *clk, int enable)
{
return s5p_gatectrl(S5P_CLK_SRC_MASK0, clk, enable);
.enable = s5pv210_clk_ip0_ctrl,
.ctrlbit = (1<<29),
}, {
+ .name = "fimc",
+ .id = 0,
+ .parent = &clk_hclk_dsys.clk,
+ .enable = s5pv210_clk_ip0_ctrl,
+ .ctrlbit = (1 << 24),
+ }, {
+ .name = "fimc",
+ .id = 1,
+ .parent = &clk_hclk_dsys.clk,
+ .enable = s5pv210_clk_ip0_ctrl,
+ .ctrlbit = (1 << 25),
+ }, {
+ .name = "fimc",
+ .id = 2,
+ .parent = &clk_hclk_dsys.clk,
+ .enable = s5pv210_clk_ip0_ctrl,
+ .ctrlbit = (1 << 26),
+ }, {
.name = "otg",
.id = -1,
.parent = &clk_hclk_psys.clk,
.id = 1,
.parent = &clk_pclk_psys.clk,
.enable = s5pv210_clk_ip3_ctrl,
- .ctrlbit = (1<<8),
+ .ctrlbit = (1 << 10),
}, {
.name = "i2c",
.id = 2,
#include <linux/io.h>
#include <linux/sysdev.h>
#include <linux/platform_device.h>
+#include <linux/sched.h>
#include <asm/mach/arch.h>
#include <asm/mach/map.h>
{
.virtual = (unsigned long)S5P_VA_SYSTIMER,
.pfn = __phys_to_pfn(S5PV210_PA_SYSTIMER),
- .length = SZ_1M,
+ .length = SZ_4K,
.type = MT_DEVICE,
}, {
.virtual = (unsigned long)VA_VIC2,
#
# Common objects
-obj-y := timer.o console.o clock.o
+obj-y := timer.o console.o clock.o pm_runtime.o
# CPU objects
obj-$(CONFIG_ARCH_SH7367) += setup-sh7367.o clock-sh7367.o intc-sh7367.o
#include <linux/platform_device.h>
#include <linux/delay.h>
#include <linux/mfd/sh_mobile_sdhi.h>
+#include <linux/mfd/tmio.h>
#include <linux/mmc/host.h>
#include <linux/mtd/mtd.h>
#include <linux/mtd/partitions.h>
#include <linux/sh_clk.h>
#include <linux/gpio.h>
#include <linux/input.h>
+#include <linux/leds.h>
#include <linux/input/sh_keysc.h>
#include <linux/usb/r8a66597.h>
.dma_slave_tx = SHDMA_SLAVE_SDHI1_TX,
.dma_slave_rx = SHDMA_SLAVE_SDHI1_RX,
.tmio_ocr_mask = MMC_VDD_165_195,
+ .tmio_flags = TMIO_MMC_WRPROTECT_DISABLE,
};
static struct resource sdhi1_resources[] = {
static struct platform_device fsi_device = {
.name = "sh_fsi2",
- .id = 0,
+ .id = -1,
.num_resources = ARRAY_SIZE(fsi_resources),
.resource = fsi_resources,
.dev = {
},
};
+static struct gpio_led ap4evb_leds[] = {
+ {
+ .name = "led4",
+ .gpio = GPIO_PORT185,
+ .default_state = LEDS_GPIO_DEFSTATE_ON,
+ },
+ {
+ .name = "led2",
+ .gpio = GPIO_PORT186,
+ .default_state = LEDS_GPIO_DEFSTATE_ON,
+ },
+ {
+ .name = "led3",
+ .gpio = GPIO_PORT187,
+ .default_state = LEDS_GPIO_DEFSTATE_ON,
+ },
+ {
+ .name = "led1",
+ .gpio = GPIO_PORT188,
+ .default_state = LEDS_GPIO_DEFSTATE_ON,
+ }
+};
+
+static struct gpio_led_platform_data ap4evb_leds_pdata = {
+ .num_leds = ARRAY_SIZE(ap4evb_leds),
+ .leds = ap4evb_leds,
+};
+
+static struct platform_device leds_device = {
+ .name = "leds-gpio",
+ .id = 0,
+ .dev = {
+ .platform_data = &ap4evb_leds_pdata,
+ },
+};
+
static struct platform_device *ap4evb_devices[] __initdata = {
+ &leds_device,
&nor_flash_device,
&smc911x_device,
&sdhi0_device,
gpio_request(GPIO_FN_CS5A, NULL);
gpio_request(GPIO_FN_IRQ6_39, NULL);
- /* enable LED 1 - 4 */
- gpio_request(GPIO_PORT185, NULL);
- gpio_request(GPIO_PORT186, NULL);
- gpio_request(GPIO_PORT187, NULL);
- gpio_request(GPIO_PORT188, NULL);
- gpio_direction_output(GPIO_PORT185, 1);
- gpio_direction_output(GPIO_PORT186, 1);
- gpio_direction_output(GPIO_PORT187, 1);
- gpio_direction_output(GPIO_PORT188, 1);
- gpio_export(GPIO_PORT185, 0);
- gpio_export(GPIO_PORT186, 0);
- gpio_export(GPIO_PORT187, 0);
- gpio_export(GPIO_PORT188, 0);
-
/* enable Debug switch (S6) */
gpio_request(GPIO_PORT32, NULL);
gpio_request(GPIO_PORT33, NULL);
struct clk pllc2_clk = {
.ops = &pllc2_clk_ops,
- .flags = CLK_ENABLE_ON_INIT,
.parent = &extal1_div2_clk,
.freq_table = pllc2_freq_table,
.parent_table = pllc2_parent,
enum { MSTP001,
MSTP131, MSTP130,
- MSTP129, MSTP128,
+ MSTP129, MSTP128, MSTP127, MSTP126,
MSTP118, MSTP117, MSTP116,
MSTP106, MSTP101, MSTP100,
MSTP223,
[MSTP130] = MSTP(&div4_clks[DIV4_B], SMSTPCR1, 30, 0), /* VEU2 */
[MSTP129] = MSTP(&div4_clks[DIV4_B], SMSTPCR1, 29, 0), /* VEU1 */
[MSTP128] = MSTP(&div4_clks[DIV4_B], SMSTPCR1, 28, 0), /* VEU0 */
+ [MSTP127] = MSTP(&div4_clks[DIV4_B], SMSTPCR1, 27, 0), /* CEU */
+ [MSTP126] = MSTP(&div4_clks[DIV4_B], SMSTPCR1, 26, 0), /* CSI2 */
[MSTP118] = MSTP(&div4_clks[DIV4_B], SMSTPCR1, 18, 0), /* DSITX */
[MSTP117] = MSTP(&div4_clks[DIV4_B], SMSTPCR1, 17, 0), /* LCDC1 */
[MSTP116] = MSTP(&div6_clks[DIV6_SUB], SMSTPCR1, 16, 0), /* IIC0 */
[MSTP201] = MSTP(&div6_clks[DIV6_SUB], SMSTPCR2, 1, 0), /* SCIFA3 */
[MSTP200] = MSTP(&div6_clks[DIV6_SUB], SMSTPCR2, 0, 0), /* SCIFA4 */
[MSTP329] = MSTP(&r_clk, SMSTPCR3, 29, 0), /* CMT10 */
- [MSTP328] = MSTP(&div6_clks[DIV6_SPU], SMSTPCR3, 28, CLK_ENABLE_ON_INIT), /* FSIA */
+ [MSTP328] = MSTP(&div6_clks[DIV6_SPU], SMSTPCR3, 28, 0), /* FSIA */
[MSTP323] = MSTP(&div6_clks[DIV6_SUB], SMSTPCR3, 23, 0), /* IIC1 */
[MSTP322] = MSTP(&div6_clks[DIV6_SUB], SMSTPCR3, 22, 0), /* USB0 */
[MSTP314] = MSTP(&div4_clks[DIV4_HP], SMSTPCR3, 14, 0), /* SDHI0 */
CLKDEV_DEV_ID("uio_pdrv_genirq.3", &mstp_clks[MSTP130]), /* VEU2 */
CLKDEV_DEV_ID("uio_pdrv_genirq.2", &mstp_clks[MSTP129]), /* VEU1 */
CLKDEV_DEV_ID("uio_pdrv_genirq.1", &mstp_clks[MSTP128]), /* VEU0 */
+ CLKDEV_DEV_ID("sh_mobile_ceu.0", &mstp_clks[MSTP127]), /* CEU */
+ CLKDEV_DEV_ID("sh-mobile-csi2.0", &mstp_clks[MSTP126]), /* CSI2 */
CLKDEV_DEV_ID("sh-mipi-dsi.0", &mstp_clks[MSTP118]), /* DSITX */
CLKDEV_DEV_ID("sh_mobile_lcdc_fb.1", &mstp_clks[MSTP117]), /* LCDC1 */
CLKDEV_DEV_ID("i2c-sh_mobile.0", &mstp_clks[MSTP116]), /* IIC0 */
/*
- * SH-Mobile Timer
+ * SH-Mobile Clock Framework
*
* Copyright (C) 2010 Magnus Damm
*
+ * Used together with arch/arm/common/clkdev.c and drivers/sh/clk.c.
+ *
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; version 2 of the License.
--- /dev/null
+/*
+ * arch/arm/mach-shmobile/pm_runtime.c
+ *
+ * Runtime PM support code for SuperH Mobile ARM
+ *
+ * Copyright (C) 2009-2010 Magnus Damm
+ *
+ * This file is subject to the terms and conditions of the GNU General Public
+ * License. See the file "COPYING" in the main directory of this archive
+ * for more details.
+ */
+
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/io.h>
+#include <linux/pm_runtime.h>
+#include <linux/platform_device.h>
+#include <linux/clk.h>
+#include <linux/sh_clk.h>
+#include <linux/bitmap.h>
+
+#ifdef CONFIG_PM_RUNTIME
+#define BIT_ONCE 0
+#define BIT_ACTIVE 1
+#define BIT_CLK_ENABLED 2
+
+struct pm_runtime_data {
+ unsigned long flags;
+ struct clk *clk;
+};
+
+static void __devres_release(struct device *dev, void *res)
+{
+ struct pm_runtime_data *prd = res;
+
+ dev_dbg(dev, "__devres_release()\n");
+
+ if (test_bit(BIT_CLK_ENABLED, &prd->flags))
+ clk_disable(prd->clk);
+
+ if (test_bit(BIT_ACTIVE, &prd->flags))
+ clk_put(prd->clk);
+}
+
+static struct pm_runtime_data *__to_prd(struct device *dev)
+{
+ return devres_find(dev, __devres_release, NULL, NULL);
+}
+
+static void platform_pm_runtime_init(struct device *dev,
+ struct pm_runtime_data *prd)
+{
+ if (prd && !test_and_set_bit(BIT_ONCE, &prd->flags)) {
+ prd->clk = clk_get(dev, NULL);
+ if (!IS_ERR(prd->clk)) {
+ set_bit(BIT_ACTIVE, &prd->flags);
+ dev_info(dev, "clocks managed by runtime pm\n");
+ }
+ }
+}
+
+static void platform_pm_runtime_bug(struct device *dev,
+ struct pm_runtime_data *prd)
+{
+ if (prd && !test_and_set_bit(BIT_ONCE, &prd->flags))
+ dev_err(dev, "runtime pm suspend before resume\n");
+}
+
+int platform_pm_runtime_suspend(struct device *dev)
+{
+ struct pm_runtime_data *prd = __to_prd(dev);
+
+ dev_dbg(dev, "platform_pm_runtime_suspend()\n");
+
+ platform_pm_runtime_bug(dev, prd);
+
+ if (prd && test_bit(BIT_ACTIVE, &prd->flags)) {
+ clk_disable(prd->clk);
+ clear_bit(BIT_CLK_ENABLED, &prd->flags);
+ }
+
+ return 0;
+}
+
+int platform_pm_runtime_resume(struct device *dev)
+{
+ struct pm_runtime_data *prd = __to_prd(dev);
+
+ dev_dbg(dev, "platform_pm_runtime_resume()\n");
+
+ platform_pm_runtime_init(dev, prd);
+
+ if (prd && test_bit(BIT_ACTIVE, &prd->flags)) {
+ clk_enable(prd->clk);
+ set_bit(BIT_CLK_ENABLED, &prd->flags);
+ }
+
+ return 0;
+}
+
+int platform_pm_runtime_idle(struct device *dev)
+{
+ /* suspend synchronously to disable clocks immediately */
+ return pm_runtime_suspend(dev);
+}
+
+static int platform_bus_notify(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct device *dev = data;
+ struct pm_runtime_data *prd;
+
+ dev_dbg(dev, "platform_bus_notify() %ld !\n", action);
+
+ if (action == BUS_NOTIFY_BIND_DRIVER) {
+ prd = devres_alloc(__devres_release, sizeof(*prd), GFP_KERNEL);
+ if (prd)
+ devres_add(dev, prd);
+ else
+ dev_err(dev, "unable to alloc memory for runtime pm\n");
+ }
+
+ return 0;
+}
+
+#else /* CONFIG_PM_RUNTIME */
+
+static int platform_bus_notify(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+ struct device *dev = data;
+ struct clk *clk;
+
+ dev_dbg(dev, "platform_bus_notify() %ld !\n", action);
+
+ switch (action) {
+ case BUS_NOTIFY_BIND_DRIVER:
+ clk = clk_get(dev, NULL);
+ if (!IS_ERR(clk)) {
+ clk_enable(clk);
+ clk_put(clk);
+ dev_info(dev, "runtime pm disabled, clock forced on\n");
+ }
+ break;
+ case BUS_NOTIFY_UNBOUND_DRIVER:
+ clk = clk_get(dev, NULL);
+ if (!IS_ERR(clk)) {
+ clk_disable(clk);
+ clk_put(clk);
+ dev_info(dev, "runtime pm disabled, clock forced off\n");
+ }
+ break;
+ }
+
+ return 0;
+}
+
+#endif /* CONFIG_PM_RUNTIME */
+
+static struct notifier_block platform_bus_notifier = {
+ .notifier_call = platform_bus_notify
+};
+
+static int __init sh_pm_runtime_init(void)
+{
+ bus_register_notifier(&platform_bus_type, &platform_bus_notifier);
+ return 0;
+}
+core_initcall(sh_pm_runtime_init);
extern int gpio_get_value(unsigned gpio);
extern void gpio_set_value(unsigned gpio, int value);
+#define gpio_get_value_cansleep gpio_get_value
+#define gpio_set_value_cansleep gpio_set_value
+
/* wrappers to sleep-enable the previous two functions */
static inline unsigned gpio_to_irq(unsigned gpio)
{
}
#if 0
-static void ct_ca9x4_timer_init(void)
+static void __init ct_ca9x4_timer_init(void)
{
writel(0, MMIO_P2V(CT_CA9X4_TIMER0) + TIMER_CTRL);
writel(0, MMIO_P2V(CT_CA9X4_TIMER1) + TIMER_CTRL);
.resource = pmu_resources,
};
-static void ct_ca9x4_init(void)
+static void __init ct_ca9x4_init(void)
{
int i;
#ifdef CONFIG_CACHE_L2X0
- l2x0_init(MMIO_P2V(CT_CA9X4_L2CC), 0x00000000, 0xfe0fffff);
+ void __iomem *l2x0_base = MMIO_P2V(CT_CA9X4_L2CC);
+
+ /* set RAM latencies to 1 cycle for this core tile. */
+ writel(0, l2x0_base + L2X0_TAG_LATENCY_CTRL);
+ writel(0, l2x0_base + L2X0_DATA_LATENCY_CTRL);
+
+ l2x0_init(l2x0_base, 0x00400000, 0xfe0fffff);
#endif
clkdev_add_table(lookups, ARRAY_SIZE(lookups));
}
-static void v2m_timer_init(void)
+static void __init v2m_timer_init(void)
{
writel(0, MMIO_P2V(V2M_TIMER0) + TIMER_CTRL);
writel(0, MMIO_P2V(V2M_TIMER1) + TIMER_CTRL);
# ARMv6k
config CPU_32v6K
bool "Support ARM V6K processor extensions" if !SMP
- depends on CPU_V6
+ depends on CPU_V6 || CPU_V7
default y if SMP && !(ARCH_MX3 || ARCH_OMAP2)
help
Say Y here if your ARMv6 processor supports the 'K' extension.
if (ai_usermode & UM_SIGNAL)
force_sig(SIGBUS, current);
- else
- set_cr(cr_no_alignment);
+ else {
+ /*
+ * We're about to disable the alignment trap and return to
+ * user space. But if an interrupt occurs before actually
+ * reaching user space, then the IRQ vector entry code will
+ * notice that we were still in kernel space and therefore
+ * the alignment trap won't be re-enabled in that case as it
+ * is presumed to be always on from kernel space.
+ * Let's prevent that race by disabling interrupts here (they
+ * are disabled on the way back to user space anyway in
+ * entry-common.S) and disable the alignment trap only if
+ * there is no work pending for this thread.
+ */
+ raw_local_irq_disable();
+ if (!(current_thread_info()->flags & _TIF_WORK_MASK))
+ set_cr(cr_no_alignment);
+ }
return 0;
}
}
} while (size -= PAGE_SIZE);
+ dsb();
+
return (void *)c->vm_start;
}
return NULL;
/*
* Don't allow RAM to be mapped - this causes problems with ARMv6+
*/
- if (WARN_ON(pfn_valid(pfn)))
- return NULL;
+ if (pfn_valid(pfn)) {
+ printk(KERN_WARNING "BUG: Your driver calls ioremap() on system memory. This leads\n"
+ KERN_WARNING "to architecturally unpredictable behaviour on ARMv6+, and ioremap()\n"
+ KERN_WARNING "will fail in the next kernel release. Please fix your driver.\n");
+ WARN_ON(1);
+ }
type = get_mem_type(mtype);
if (!type)
#include <linux/nodemask.h>
#include <linux/memblock.h>
#include <linux/sort.h>
+#include <linux/fs.h>
#include <asm/cputype.h>
#include <asm/sections.h>
.domain = DOMAIN_USER,
},
[MT_MEMORY] = {
+ .prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
+ L_PTE_WRITE | L_PTE_EXEC,
+ .prot_l1 = PMD_TYPE_TABLE,
.prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE,
.domain = DOMAIN_KERNEL,
},
.domain = DOMAIN_KERNEL,
},
[MT_MEMORY_NONCACHED] = {
+ .prot_pte = L_PTE_PRESENT | L_PTE_YOUNG | L_PTE_DIRTY |
+ L_PTE_WRITE | L_PTE_EXEC | L_PTE_MT_BUFFERABLE,
+ .prot_l1 = PMD_TYPE_TABLE,
.prot_sect = PMD_TYPE_SECT | PMD_SECT_AP_WRITE,
.domain = DOMAIN_KERNEL,
},
* Enable CPU-specific coherency if supported.
* (Only available on XSC3 at the moment.)
*/
- if (arch_is_coherent() && cpu_is_xsc3())
+ if (arch_is_coherent() && cpu_is_xsc3()) {
mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
-
+ mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
+ mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
+ mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
+ }
/*
* ARMv6 and above have extended page tables.
*/
mem_types[MT_DEVICE_CACHED].prot_sect |= PMD_SECT_S;
mem_types[MT_DEVICE_CACHED].prot_pte |= L_PTE_SHARED;
mem_types[MT_MEMORY].prot_sect |= PMD_SECT_S;
+ mem_types[MT_MEMORY].prot_pte |= L_PTE_SHARED;
mem_types[MT_MEMORY_NONCACHED].prot_sect |= PMD_SECT_S;
+ mem_types[MT_MEMORY_NONCACHED].prot_pte |= L_PTE_SHARED;
#endif
}
mem_types[MT_LOW_VECTORS].prot_l1 |= ecc_mask;
mem_types[MT_HIGH_VECTORS].prot_l1 |= ecc_mask;
mem_types[MT_MEMORY].prot_sect |= ecc_mask | cp->pmd;
+ mem_types[MT_MEMORY].prot_pte |= kern_pgprot;
+ mem_types[MT_MEMORY_NONCACHED].prot_sect |= ecc_mask;
mem_types[MT_ROM].prot_sect |= cp->pmd;
switch (cp->pmd) {
}
}
+#ifdef CONFIG_ARM_DMA_MEM_BUFFERABLE
+pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
+ unsigned long size, pgprot_t vma_prot)
+{
+ if (!pfn_valid(pfn))
+ return pgprot_noncached(vma_prot);
+ else if (file->f_flags & O_SYNC)
+ return pgprot_writecombine(vma_prot);
+ return vma_prot;
+}
+EXPORT_SYMBOL(phys_mem_access_prot);
+#endif
+
#define vectors_base() (vectors_high() ? 0xffff0000 : 0)
static void __init *early_alloc(unsigned long sz)
* It is assumed that:
* - cache type register is implemented
*/
-__v7_setup:
+__v7_ca9mp_setup:
#ifdef CONFIG_SMP
mrc p15, 0, r0, c1, c0, 1
tst r0, #(1 << 6) @ SMP/nAMP mode enabled?
orreq r0, r0, #(1 << 6) | (1 << 0) @ Enable SMP/nAMP mode and
mcreq p15, 0, r0, c1, c0, 1 @ TLB ops broadcasting
#endif
+__v7_setup:
adr r12, __v7_setup_stack @ the local stack
stmia r12, {r0-r5, r7, r9, r11, lr}
bl v7_flush_dcache_all
mrc p15, 0, r0, c0, c0, 0 @ read main ID register
and r10, r0, #0xff000000 @ ARM?
teq r10, #0x41000000
- bne 2f
+ bne 3f
and r5, r0, #0x00f00000 @ variant
and r6, r0, #0x0000000f @ revision
- orr r0, r6, r5, lsr #20-4 @ combine variant and revision
+ orr r6, r6, r5, lsr #20-4 @ combine variant and revision
+ ubfx r0, r0, #4, #12 @ primary part number
+ /* Cortex-A8 Errata */
+ ldr r10, =0x00000c08 @ Cortex-A8 primary part number
+ teq r0, r10
+ bne 2f
#ifdef CONFIG_ARM_ERRATA_430973
teq r5, #0x00100000 @ only present in r1p*
mrceq p15, 0, r10, c1, c0, 1 @ read aux control register
mcreq p15, 0, r10, c1, c0, 1 @ write aux control register
#endif
#ifdef CONFIG_ARM_ERRATA_458693
- teq r0, #0x20 @ only present in r2p0
+ teq r6, #0x20 @ only present in r2p0
mrceq p15, 0, r10, c1, c0, 1 @ read aux control register
orreq r10, r10, #(1 << 5) @ set L1NEON to 1
orreq r10, r10, #(1 << 9) @ set PLDNOP to 1
mcreq p15, 0, r10, c1, c0, 1 @ write aux control register
#endif
#ifdef CONFIG_ARM_ERRATA_460075
- teq r0, #0x20 @ only present in r2p0
+ teq r6, #0x20 @ only present in r2p0
mrceq p15, 1, r10, c9, c0, 2 @ read L2 cache aux ctrl register
tsteq r10, #1 << 22
orreq r10, r10, #(1 << 22) @ set the Write Allocate disable bit
mcreq p15, 1, r10, c9, c0, 2 @ write the L2 cache aux ctrl register
#endif
+ b 3f
-2: mov r10, #0
+ /* Cortex-A9 Errata */
+2: ldr r10, =0x00000c09 @ Cortex-A9 primary part number
+ teq r0, r10
+ bne 3f
+#ifdef CONFIG_ARM_ERRATA_742230
+ cmp r6, #0x22 @ only present up to r2p2
+ mrcle p15, 0, r10, c15, c0, 1 @ read diagnostic register
+ orrle r10, r10, #1 << 4 @ set bit #4
+ mcrle p15, 0, r10, c15, c0, 1 @ write diagnostic register
+#endif
+#ifdef CONFIG_ARM_ERRATA_742231
+ teq r6, #0x20 @ present in r2p0
+ teqne r6, #0x21 @ present in r2p1
+ teqne r6, #0x22 @ present in r2p2
+ mrceq p15, 0, r10, c15, c0, 1 @ read diagnostic register
+ orreq r10, r10, #1 << 12 @ set bit #12
+ orreq r10, r10, #1 << 22 @ set bit #22
+ mcreq p15, 0, r10, c15, c0, 1 @ write diagnostic register
+#endif
+#ifdef CONFIG_ARM_ERRATA_743622
+ teq r6, #0x20 @ present in r2p0
+ teqne r6, #0x21 @ present in r2p1
+ teqne r6, #0x22 @ present in r2p2
+ mrceq p15, 0, r10, c15, c0, 1 @ read diagnostic register
+ orreq r10, r10, #1 << 6 @ set bit #6
+ mcreq p15, 0, r10, c15, c0, 1 @ write diagnostic register
+#endif
+
+3: mov r10, #0
#ifdef HARVARD_CACHE
mcr p15, 0, r10, c7, c5, 0 @ I+BTB cache invalidate
#endif
.section ".proc.info.init", #alloc, #execinstr
+ .type __v7_ca9mp_proc_info, #object
+__v7_ca9mp_proc_info:
+ .long 0x410fc090 @ Required ID value
+ .long 0xff0ffff0 @ Mask for ID
+ .long PMD_TYPE_SECT | \
+ PMD_SECT_AP_WRITE | \
+ PMD_SECT_AP_READ | \
+ PMD_FLAGS
+ .long PMD_TYPE_SECT | \
+ PMD_SECT_XN | \
+ PMD_SECT_AP_WRITE | \
+ PMD_SECT_AP_READ
+ b __v7_ca9mp_setup
+ .long cpu_arch_name
+ .long cpu_elf_name
+ .long HWCAP_SWP|HWCAP_HALF|HWCAP_THUMB|HWCAP_FAST_MULT|HWCAP_EDSP|HWCAP_TLS
+ .long cpu_v7_name
+ .long v7_processor_functions
+ .long v7wbi_tlb_fns
+ .long v6_user_fns
+ .long v7_cache_fns
+ .size __v7_ca9mp_proc_info, . - __v7_ca9mp_proc_info
+
/*
* Match any ARMv7 processor core.
*/
oprofilefs.o oprofile_stats.o \
timer_int.o )
+ifeq ($(CONFIG_HW_PERF_EVENTS),y)
+DRIVER_OBJS += $(addprefix ../../../drivers/oprofile/, oprofile_perf.o)
+endif
+
oprofile-y := $(DRIVER_OBJS) common.o
#include <asm/ptrace.h>
#ifdef CONFIG_HW_PERF_EVENTS
-/*
- * Per performance monitor configuration as set via oprofilefs.
- */
-struct op_counter_config {
- unsigned long count;
- unsigned long enabled;
- unsigned long event;
- unsigned long unit_mask;
- unsigned long kernel;
- unsigned long user;
- struct perf_event_attr attr;
-};
-
-static int op_arm_enabled;
-static DEFINE_MUTEX(op_arm_mutex);
-
-static struct op_counter_config *counter_config;
-static struct perf_event **perf_events[nr_cpumask_bits];
-static int perf_num_counters;
-
-/*
- * Overflow callback for oprofile.
- */
-static void op_overflow_handler(struct perf_event *event, int unused,
- struct perf_sample_data *data, struct pt_regs *regs)
-{
- int id;
- u32 cpu = smp_processor_id();
-
- for (id = 0; id < perf_num_counters; ++id)
- if (perf_events[cpu][id] == event)
- break;
-
- if (id != perf_num_counters)
- oprofile_add_sample(regs, id);
- else
- pr_warning("oprofile: ignoring spurious overflow "
- "on cpu %u\n", cpu);
-}
-
-/*
- * Called by op_arm_setup to create perf attributes to mirror the oprofile
- * settings in counter_config. Attributes are created as `pinned' events and
- * so are permanently scheduled on the PMU.
- */
-static void op_perf_setup(void)
-{
- int i;
- u32 size = sizeof(struct perf_event_attr);
- struct perf_event_attr *attr;
-
- for (i = 0; i < perf_num_counters; ++i) {
- attr = &counter_config[i].attr;
- memset(attr, 0, size);
- attr->type = PERF_TYPE_RAW;
- attr->size = size;
- attr->config = counter_config[i].event;
- attr->sample_period = counter_config[i].count;
- attr->pinned = 1;
- }
-}
-
-static int op_create_counter(int cpu, int event)
-{
- int ret = 0;
- struct perf_event *pevent;
-
- if (!counter_config[event].enabled || (perf_events[cpu][event] != NULL))
- return ret;
-
- pevent = perf_event_create_kernel_counter(&counter_config[event].attr,
- cpu, -1,
- op_overflow_handler);
-
- if (IS_ERR(pevent)) {
- ret = PTR_ERR(pevent);
- } else if (pevent->state != PERF_EVENT_STATE_ACTIVE) {
- pr_warning("oprofile: failed to enable event %d "
- "on CPU %d\n", event, cpu);
- ret = -EBUSY;
- } else {
- perf_events[cpu][event] = pevent;
- }
-
- return ret;
-}
-
-static void op_destroy_counter(int cpu, int event)
-{
- struct perf_event *pevent = perf_events[cpu][event];
-
- if (pevent) {
- perf_event_release_kernel(pevent);
- perf_events[cpu][event] = NULL;
- }
-}
-
-/*
- * Called by op_arm_start to create active perf events based on the
- * perviously configured attributes.
- */
-static int op_perf_start(void)
-{
- int cpu, event, ret = 0;
-
- for_each_online_cpu(cpu) {
- for (event = 0; event < perf_num_counters; ++event) {
- ret = op_create_counter(cpu, event);
- if (ret)
- goto out;
- }
- }
-
-out:
- return ret;
-}
-
-/*
- * Called by op_arm_stop at the end of a profiling run.
- */
-static void op_perf_stop(void)
+char *op_name_from_perf_id(void)
{
- int cpu, event;
+ enum arm_perf_pmu_ids id = armpmu_get_pmu_id();
- for_each_online_cpu(cpu)
- for (event = 0; event < perf_num_counters; ++event)
- op_destroy_counter(cpu, event);
-}
-
-
-static char *op_name_from_perf_id(enum arm_perf_pmu_ids id)
-{
switch (id) {
case ARM_PERF_PMU_ID_XSCALE1:
return "arm/xscale1";
}
}
-static int op_arm_create_files(struct super_block *sb, struct dentry *root)
-{
- unsigned int i;
-
- for (i = 0; i < perf_num_counters; i++) {
- struct dentry *dir;
- char buf[4];
-
- snprintf(buf, sizeof buf, "%d", i);
- dir = oprofilefs_mkdir(sb, root, buf);
- oprofilefs_create_ulong(sb, dir, "enabled", &counter_config[i].enabled);
- oprofilefs_create_ulong(sb, dir, "event", &counter_config[i].event);
- oprofilefs_create_ulong(sb, dir, "count", &counter_config[i].count);
- oprofilefs_create_ulong(sb, dir, "unit_mask", &counter_config[i].unit_mask);
- oprofilefs_create_ulong(sb, dir, "kernel", &counter_config[i].kernel);
- oprofilefs_create_ulong(sb, dir, "user", &counter_config[i].user);
- }
-
- return 0;
-}
-
-static int op_arm_setup(void)
-{
- spin_lock(&oprofilefs_lock);
- op_perf_setup();
- spin_unlock(&oprofilefs_lock);
- return 0;
-}
-
-static int op_arm_start(void)
-{
- int ret = -EBUSY;
-
- mutex_lock(&op_arm_mutex);
- if (!op_arm_enabled) {
- ret = 0;
- op_perf_start();
- op_arm_enabled = 1;
- }
- mutex_unlock(&op_arm_mutex);
- return ret;
-}
-
-static void op_arm_stop(void)
-{
- mutex_lock(&op_arm_mutex);
- if (op_arm_enabled)
- op_perf_stop();
- op_arm_enabled = 0;
- mutex_unlock(&op_arm_mutex);
-}
-
-#ifdef CONFIG_PM
-static int op_arm_suspend(struct platform_device *dev, pm_message_t state)
-{
- mutex_lock(&op_arm_mutex);
- if (op_arm_enabled)
- op_perf_stop();
- mutex_unlock(&op_arm_mutex);
- return 0;
-}
-
-static int op_arm_resume(struct platform_device *dev)
-{
- mutex_lock(&op_arm_mutex);
- if (op_arm_enabled && op_perf_start())
- op_arm_enabled = 0;
- mutex_unlock(&op_arm_mutex);
- return 0;
-}
-
-static struct platform_driver oprofile_driver = {
- .driver = {
- .name = "arm-oprofile",
- },
- .resume = op_arm_resume,
- .suspend = op_arm_suspend,
-};
-
-static struct platform_device *oprofile_pdev;
-
-static int __init init_driverfs(void)
-{
- int ret;
-
- ret = platform_driver_register(&oprofile_driver);
- if (ret)
- goto out;
-
- oprofile_pdev = platform_device_register_simple(
- oprofile_driver.driver.name, 0, NULL, 0);
- if (IS_ERR(oprofile_pdev)) {
- ret = PTR_ERR(oprofile_pdev);
- platform_driver_unregister(&oprofile_driver);
- }
-
-out:
- return ret;
-}
-
-static void exit_driverfs(void)
-{
- platform_device_unregister(oprofile_pdev);
- platform_driver_unregister(&oprofile_driver);
-}
-#else
-static int __init init_driverfs(void) { return 0; }
-#define exit_driverfs() do { } while (0)
-#endif /* CONFIG_PM */
-
static int report_trace(struct stackframe *frame, void *d)
{
unsigned int *depth = d;
int __init oprofile_arch_init(struct oprofile_operations *ops)
{
- int cpu, ret = 0;
-
- perf_num_counters = armpmu_get_max_events();
-
- counter_config = kcalloc(perf_num_counters,
- sizeof(struct op_counter_config), GFP_KERNEL);
-
- if (!counter_config) {
- pr_info("oprofile: failed to allocate %d "
- "counters\n", perf_num_counters);
- return -ENOMEM;
- }
-
- ret = init_driverfs();
- if (ret) {
- kfree(counter_config);
- return ret;
- }
-
- for_each_possible_cpu(cpu) {
- perf_events[cpu] = kcalloc(perf_num_counters,
- sizeof(struct perf_event *), GFP_KERNEL);
- if (!perf_events[cpu]) {
- pr_info("oprofile: failed to allocate %d perf events "
- "for cpu %d\n", perf_num_counters, cpu);
- while (--cpu >= 0)
- kfree(perf_events[cpu]);
- return -ENOMEM;
- }
- }
-
ops->backtrace = arm_backtrace;
- ops->create_files = op_arm_create_files;
- ops->setup = op_arm_setup;
- ops->start = op_arm_start;
- ops->stop = op_arm_stop;
- ops->shutdown = op_arm_stop;
- ops->cpu_type = op_name_from_perf_id(armpmu_get_pmu_id());
-
- if (!ops->cpu_type)
- ret = -ENODEV;
- else
- pr_info("oprofile: using %s\n", ops->cpu_type);
- return ret;
+ return oprofile_perf_init(ops);
}
-void oprofile_arch_exit(void)
+void __exit oprofile_arch_exit(void)
{
- int cpu, id;
- struct perf_event *event;
-
- if (*perf_events) {
- exit_driverfs();
- for_each_possible_cpu(cpu) {
- for (id = 0; id < perf_num_counters; ++id) {
- event = perf_events[cpu][id];
- if (event != NULL)
- perf_event_release_kernel(event);
- }
- kfree(perf_events[cpu]);
- }
- }
-
- if (counter_config)
- kfree(counter_config);
+ oprofile_perf_exit();
}
#else
int __init oprofile_arch_init(struct oprofile_operations *ops)
pr_info("oprofile: hardware counters not available\n");
return -ENODEV;
}
-void oprofile_arch_exit(void) {}
+void __exit oprofile_arch_exit(void) {}
#endif /* CONFIG_HW_PERF_EVENTS */
config ARCH_MX5
bool "MX5-based"
select CPU_V7
+ select ARM_L1_CACHE_SHIFT_6
help
This enables support for systems based on the Freescale i.MX51 family
* mach-mx5/eukrea_mbimx51-baseboard.c for cpuimx51
*/
-extern void eukrea_mbimx25_baseboard_init(void);
+extern void eukrea_mbimxsd25_baseboard_init(void);
extern void eukrea_mbimx27_baseboard_init(void);
-extern void eukrea_mbimx35_baseboard_init(void);
+extern void eukrea_mbimxsd35_baseboard_init(void);
extern void eukrea_mbimx51_baseboard_init(void);
#endif
return -EAGAIN;
for (i = 0; i < 4; i++) {
- v = is_idle ? __raw_readl(TZIC_ENSET0(i)) : wakeup_intr[i];
- __raw_writel(v, TZIC_WAKEUP0(i));
+ v = is_idle ? __raw_readl(tzic_base + TZIC_ENSET0(i)) :
+ wakeup_intr[i];
+ __raw_writel(v, tzic_base + TZIC_WAKEUP0(i));
}
return 0;
/*
- * linux/arch/arm/mach-nomadik/timer.c
+ * linux/arch/arm/plat-nomadik/timer.c
*
* Copyright (C) 2008 STMicroelectronics
* Copyright (C) 2010 Alessandro Rubini
cr = readl(mtu_base + MTU_CR(1));
writel(0, mtu_base + MTU_LR(1));
writel(cr | MTU_CRn_ENA, mtu_base + MTU_CR(1));
- writel(0x2, mtu_base + MTU_IMSC);
+ writel(1 << 1, mtu_base + MTU_IMSC);
break;
case CLOCK_EVT_MODE_SHUTDOWN:
case CLOCK_EVT_MODE_UNUSED:
{
unsigned long rate;
struct clk *clk0;
- struct clk *clk1;
- u32 cr;
+ u32 cr = MTU_CRn_32BITS;
clk0 = clk_get_sys("mtu0", NULL);
BUG_ON(IS_ERR(clk0));
- clk1 = clk_get_sys("mtu1", NULL);
- BUG_ON(IS_ERR(clk1));
-
clk_enable(clk0);
- clk_enable(clk1);
/*
- * Tick rate is 2.4MHz for Nomadik and 110MHz for ux500:
- * use a divide-by-16 counter if it's more than 16MHz
+ * Tick rate is 2.4MHz for Nomadik and 2.4Mhz, 100MHz or 133 MHz
+ * for ux500.
+ * Use a divide-by-16 counter if the tick rate is more than 32MHz.
+ * At 32 MHz, the timer (with 32 bit counter) can be programmed
+ * to wake-up at a max 127s a head in time. Dividing a 2.4 MHz timer
+ * with 16 gives too low timer resolution.
*/
- cr = MTU_CRn_32BITS;;
rate = clk_get_rate(clk0);
- if (rate > 16 << 20) {
+ if (rate > 32000000) {
rate /= 16;
cr |= MTU_CRn_PRESCALE_16;
} else {
pr_err("timer: failed to initialize clock source %s\n",
nmdk_clksrc.name);
- /* Timer 1 is used for events, fix according to rate */
- cr = MTU_CRn_32BITS;
- rate = clk_get_rate(clk1);
- if (rate > 16 << 20) {
- rate /= 16;
- cr |= MTU_CRn_PRESCALE_16;
- } else {
- cr |= MTU_CRn_PRESCALE_1;
- }
+ /* Timer 1 is used for events */
+
clockevents_calc_mult_shift(&nmdk_clkevt, rate, MTU_MIN_RANGE);
writel(cr | MTU_CRn_ONESHOT, mtu_base + MTU_CR(1)); /* off, currently */
config OMAP_DEBUG_LEDS
bool
depends on OMAP_DEBUG_DEVICES
- default y if LEDS
+ default y if LEDS_CLASS
config OMAP_RESET_CLOCKS
bool "Reset unused clocks during boot"
if ((start <= da) && (da < start + bytes)) {
dev_dbg(obj->dev, "%s: %08x<=%08x(%x)\n",
__func__, start, da, bytes);
+ iotlb_load_cr(obj, &cr);
iommu_write_reg(obj, 1, MMU_FLUSH_ENTRY);
}
}
/* Writing zero to RSYNC_ERR clears the IRQ */
MCBSP_WRITE(mcbsp_rx, SPCR1, MCBSP_READ_CACHE(mcbsp_rx, SPCR1));
} else {
- complete(&mcbsp_rx->tx_irq_completion);
+ complete(&mcbsp_rx->rx_irq_completion);
}
return IRQ_HANDLED;
if (omap_sram_size == 0)
return;
- if (cpu_is_omap24xx()) {
- omap_sram_io_desc[0].virtual = OMAP2_SRAM_VA;
-
- base = OMAP2_SRAM_PA;
- base = ROUND_DOWN(base, PAGE_SIZE);
- omap_sram_io_desc[0].pfn = __phys_to_pfn(base);
- }
-
if (cpu_is_omap34xx()) {
- omap_sram_io_desc[0].virtual = OMAP3_SRAM_VA;
- base = OMAP3_SRAM_PA;
- base = ROUND_DOWN(base, PAGE_SIZE);
- omap_sram_io_desc[0].pfn = __phys_to_pfn(base);
-
/*
* SRAM must be marked as non-cached on OMAP3 since the
* CORE DPLL M2 divider change code (in SRAM) runs with the
omap_sram_io_desc[0].type = MT_MEMORY_NONCACHED;
}
- if (cpu_is_omap44xx()) {
- omap_sram_io_desc[0].virtual = OMAP4_SRAM_VA;
- base = OMAP4_SRAM_PA;
- base = ROUND_DOWN(base, PAGE_SIZE);
- omap_sram_io_desc[0].pfn = __phys_to_pfn(base);
- }
- omap_sram_io_desc[0].length = 1024 * 1024; /* Use section desc */
+ omap_sram_io_desc[0].virtual = omap_sram_base;
+ base = omap_sram_start;
+ base = ROUND_DOWN(base, PAGE_SIZE);
+ omap_sram_io_desc[0].pfn = __phys_to_pfn(base);
+ omap_sram_io_desc[0].length = ROUND_DOWN(omap_sram_size, PAGE_SIZE);
iotable_init(omap_sram_io_desc, ARRAY_SIZE(omap_sram_io_desc));
printk(KERN_INFO "SRAM: Mapped pa 0x%08lx to va 0x%08lx size: 0x%lx\n",
static int __devinit pwm_probe(struct platform_device *pdev)
{
- struct platform_device_id *id = platform_get_device_id(pdev);
+ const struct platform_device_id *id = platform_get_device_id(pdev);
struct pwm_device *pwm, *secondary = NULL;
struct resource *r;
int ret = 0;
*/
#include <linux/kernel.h>
+#include <linux/dma-mapping.h>
#include <linux/platform_device.h>
#include <linux/interrupt.h>
#include <linux/ioport.h>
static struct resource s5p_fimc0_resource[] = {
[0] = {
.start = S5P_PA_FIMC0,
- .end = S5P_PA_FIMC0 + SZ_1M - 1,
+ .end = S5P_PA_FIMC0 + SZ_4K - 1,
.flags = IORESOURCE_MEM,
},
[1] = {
},
};
+static u64 s5p_fimc0_dma_mask = DMA_BIT_MASK(32);
+
struct platform_device s5p_device_fimc0 = {
.name = "s5p-fimc",
.id = 0,
.num_resources = ARRAY_SIZE(s5p_fimc0_resource),
.resource = s5p_fimc0_resource,
+ .dev = {
+ .dma_mask = &s5p_fimc0_dma_mask,
+ .coherent_dma_mask = DMA_BIT_MASK(32),
+ },
};
*/
#include <linux/kernel.h>
+#include <linux/dma-mapping.h>
#include <linux/platform_device.h>
#include <linux/interrupt.h>
#include <linux/ioport.h>
static struct resource s5p_fimc1_resource[] = {
[0] = {
.start = S5P_PA_FIMC1,
- .end = S5P_PA_FIMC1 + SZ_1M - 1,
+ .end = S5P_PA_FIMC1 + SZ_4K - 1,
.flags = IORESOURCE_MEM,
},
[1] = {
},
};
+static u64 s5p_fimc1_dma_mask = DMA_BIT_MASK(32);
+
struct platform_device s5p_device_fimc1 = {
.name = "s5p-fimc",
.id = 1,
.num_resources = ARRAY_SIZE(s5p_fimc1_resource),
.resource = s5p_fimc1_resource,
+ .dev = {
+ .dma_mask = &s5p_fimc1_dma_mask,
+ .coherent_dma_mask = DMA_BIT_MASK(32),
+ },
};
*/
#include <linux/kernel.h>
+#include <linux/dma-mapping.h>
#include <linux/platform_device.h>
#include <linux/interrupt.h>
#include <linux/ioport.h>
static struct resource s5p_fimc2_resource[] = {
[0] = {
.start = S5P_PA_FIMC2,
- .end = S5P_PA_FIMC2 + SZ_1M - 1,
+ .end = S5P_PA_FIMC2 + SZ_4K - 1,
.flags = IORESOURCE_MEM,
},
[1] = {
},
};
+static u64 s5p_fimc2_dma_mask = DMA_BIT_MASK(32);
+
struct platform_device s5p_device_fimc2 = {
.name = "s5p-fimc",
.id = 2,
.num_resources = ARRAY_SIZE(s5p_fimc2_resource),
.resource = s5p_fimc2_resource,
+ .dev = {
+ .dma_mask = &s5p_fimc2_dma_mask,
+ .coherent_dma_mask = DMA_BIT_MASK(32),
+ },
};
static int s3c_adc_resume(struct platform_device *pdev)
{
struct adc_device *adc = platform_get_drvdata(pdev);
- unsigned long flags;
clk_enable(adc->clk);
enable_irq(adc->irq);
#include <plat/clock.h>
#include <plat/cpu.h>
+#include <linux/serial_core.h>
+#include <plat/regs-serial.h> /* for s3c24xx_uart_devs */
+
/* clock information */
static LIST_HEAD(clocks);
return 0;
}
+static int dev_is_s3c_uart(struct device *dev)
+{
+ struct platform_device **pdev = s3c24xx_uart_devs;
+ int i;
+ for (i = 0; i < ARRAY_SIZE(s3c24xx_uart_devs); i++, pdev++)
+ if (*pdev && dev == &(*pdev)->dev)
+ return 1;
+ return 0;
+}
+
+/*
+ * Serial drivers call get_clock() very early, before platform bus
+ * has been set up, this requires a special check to let them get
+ * a proper clock
+ */
+
+static int dev_is_platform_device(struct device *dev)
+{
+ return dev->bus == &platform_bus_type ||
+ (dev->bus == NULL && dev_is_s3c_uart(dev));
+}
+
/* Clock API calls */
struct clk *clk_get(struct device *dev, const char *id)
struct clk *clk = ERR_PTR(-ENOENT);
int idno;
- if (dev == NULL || dev->bus != &platform_bus_type)
+ if (dev == NULL || !dev_is_platform_device(dev))
idno = -1;
else
idno = to_platform_device(dev)->id;
if (!chip)
return -EINVAL;
- off = chip->chip.base - pin;
+ off = pin - chip->chip.base;
shift = off * 2;
reg = chip->base + 0x0C;
drvstr = __raw_readl(reg);
- drvstr = 0xffff & (0x3 << shift);
drvstr = drvstr >> shift;
+ drvstr &= 0x3;
return (__force s5p_gpio_drvstr_t)drvstr;
}
if (!chip)
return -EINVAL;
- off = chip->chip.base - pin;
+ off = pin - chip->chip.base;
shift = off * 2;
reg = chip->base + 0x0C;
tmp = __raw_readl(reg);
+ tmp &= ~(0x3 << shift);
tmp |= drvstr << shift;
__raw_writel(tmp, reg);
/* Define values for the drvstr available for each gpio pin.
*
* These values control the value of the output signal driver strength,
- * configurable on most pins on the S5C series.
+ * configurable on most pins on the S5P series.
*/
-#define S5P_GPIO_DRVSTR_LV1 ((__force s5p_gpio_drvstr_t)0x00)
-#define S5P_GPIO_DRVSTR_LV2 ((__force s5p_gpio_drvstr_t)0x01)
-#define S5P_GPIO_DRVSTR_LV3 ((__force s5p_gpio_drvstr_t)0x10)
-#define S5P_GPIO_DRVSTR_LV4 ((__force s5p_gpio_drvstr_t)0x11)
+#define S5P_GPIO_DRVSTR_LV1 ((__force s5p_gpio_drvstr_t)0x0)
+#define S5P_GPIO_DRVSTR_LV2 ((__force s5p_gpio_drvstr_t)0x2)
+#define S5P_GPIO_DRVSTR_LV3 ((__force s5p_gpio_drvstr_t)0x1)
+#define S5P_GPIO_DRVSTR_LV4 ((__force s5p_gpio_drvstr_t)0x3)
/**
* s5c_gpio_get_drvstr() - get the driver streght value of a gpio pin
#
# http://www.arm.linux.org.uk/developer/machines/?action=new
#
-# Last update: Mon Jul 12 21:10:14 2010
+# Last update: Thu Sep 9 22:43:01 2010
#
# machine_is_xxx CONFIG_xxxx MACH_TYPE_xxx number
#
gw2388 MACH_GW2388 GW2388 2635
jadecpu MACH_JADECPU JADECPU 2636
carlisle MACH_CARLISLE CARLISLE 2637
-lux_sf9 MACH_LUX_SFT9 LUX_SFT9 2638
+lux_sf9 MACH_LUX_SF9 LUX_SF9 2638
nemid_tb MACH_NEMID_TB NEMID_TB 2639
terrier MACH_TERRIER TERRIER 2640
turbot MACH_TURBOT TURBOT 2641
netviz MACH_NETVIZ NETVIZ 2964
flexibity MACH_FLEXIBITY FLEXIBITY 2965
wlan_computer MACH_WLAN_COMPUTER WLAN_COMPUTER 2966
+lpc24xx MACH_LPC24XX LPC24XX 2967
+spica MACH_SPICA SPICA 2968
+gpsdisplay MACH_GPSDISPLAY GPSDISPLAY 2969
+bipnet MACH_BIPNET BIPNET 2970
+overo_ctu_inertial MACH_OVERO_CTU_INERTIAL OVERO_CTU_INERTIAL 2971
+davinci_dm355_mmm MACH_DAVINCI_DM355_MMM DAVINCI_DM355_MMM 2972
+pc9260_v2 MACH_PC9260_V2 PC9260_V2 2973
+ptx7545 MACH_PTX7545 PTX7545 2974
+tm_efdc MACH_TM_EFDC TM_EFDC 2975
+omap3_waldo1 MACH_OMAP3_WALDO1 OMAP3_WALDO1 2977
+flyer MACH_FLYER FLYER 2978
+tornado3240 MACH_TORNADO3240 TORNADO3240 2979
+soli_01 MACH_SOLI_01 SOLI_01 2980
+omapl138_europalc MACH_OMAPL138_EUROPALC OMAPL138_EUROPALC 2981
+helios_v1 MACH_HELIOS_V1 HELIOS_V1 2982
+netspace_lite_v2 MACH_NETSPACE_LITE_V2 NETSPACE_LITE_V2 2983
+ssc MACH_SSC SSC 2984
+premierwave_en MACH_PREMIERWAVE_EN PREMIERWAVE_EN 2985
+wasabi MACH_WASABI WASABI 2986
+vivow MACH_VIVOW VIVOW 2987
+mx50_rdp MACH_MX50_RDP MX50_RDP 2988
+universal MACH_UNIVERSAL UNIVERSAL 2989
+real6410 MACH_REAL6410 REAL6410 2990
+spx_sakura MACH_SPX_SAKURA SPX_SAKURA 2991
+ij3k_2440 MACH_IJ3K_2440 IJ3K_2440 2992
+omap3_bc10 MACH_OMAP3_BC10 OMAP3_BC10 2993
+thebe MACH_THEBE THEBE 2994
+rv082 MACH_RV082 RV082 2995
+armlguest MACH_ARMLGUEST ARMLGUEST 2996
+tjinc1000 MACH_TJINC1000 TJINC1000 2997
+dockstar MACH_DOCKSTAR DOCKSTAR 2998
+ax8008 MACH_AX8008 AX8008 2999
+gnet_sgce MACH_GNET_SGCE GNET_SGCE 3000
+pxwnas_500_1000 MACH_PXWNAS_500_1000 PXWNAS_500_1000 3001
+ea20 MACH_EA20 EA20 3002
+awm2 MACH_AWM2 AWM2 3003
+ti8148evm MACH_TI8148EVM TI8148EVM 3004
+tegra_seaboard MACH_TEGRA_SEABOARD TEGRA_SEABOARD 3005
+linkstation_chlv2 MACH_LINKSTATION_CHLV2 LINKSTATION_CHLV2 3006
+tera_pro2_rack MACH_TERA_PRO2_RACK TERA_PRO2_RACK 3007
+rubys MACH_RUBYS RUBYS 3008
+aquarius MACH_AQUARIUS AQUARIUS 3009
+mx53_ard MACH_MX53_ARD MX53_ARD 3010
+mx53_smd MACH_MX53_SMD MX53_SMD 3011
+lswxl MACH_LSWXL LSWXL 3012
+dove_avng_v3 MACH_DOVE_AVNG_V3 DOVE_AVNG_V3 3013
+sdi_ess_9263 MACH_SDI_ESS_9263 SDI_ESS_9263 3014
+jocpu550 MACH_JOCPU550 JOCPU550 3015
+msm8x60_rumi3 MACH_MSM8X60_RUMI3 MSM8X60_RUMI3 3016
+msm8x60_ffa MACH_MSM8X60_FFA MSM8X60_FFA 3017
+yanomami MACH_YANOMAMI YANOMAMI 3018
+gta04 MACH_GTA04 GTA04 3019
+cm_a510 MACH_CM_A510 CM_A510 3020
+omap3_rfs200 MACH_OMAP3_RFS200 OMAP3_RFS200 3021
+kx33xx MACH_KX33XX KX33XX 3022
+ptx7510 MACH_PTX7510 PTX7510 3023
+top9000 MACH_TOP9000 TOP9000 3024
+teenote MACH_TEENOTE TEENOTE 3025
+ts3 MACH_TS3 TS3 3026
+a0 MACH_A0 A0 3027
+fsm9xxx_surf MACH_FSM9XXX_SURF FSM9XXX_SURF 3028
+fsm9xxx_ffa MACH_FSM9XXX_FFA FSM9XXX_FFA 3029
+frrhwcdma60w MACH_FRRHWCDMA60W FRRHWCDMA60W 3030
+remus MACH_REMUS REMUS 3031
+at91cap7xdk MACH_AT91CAP7XDK AT91CAP7XDK 3032
+at91cap7stk MACH_AT91CAP7STK AT91CAP7STK 3033
+kt_sbc_sam9_1 MACH_KT_SBC_SAM9_1 KT_SBC_SAM9_1 3034
+oratisrouter MACH_ORATISROUTER ORATISROUTER 3035
+armada_xp_db MACH_ARMADA_XP_DB ARMADA_XP_DB 3036
+spdm MACH_SPDM SPDM 3037
+gtib MACH_GTIB GTIB 3038
+dgm3240 MACH_DGM3240 DGM3240 3039
+atlas_i_lpe MACH_ATLAS_I_LPE ATLAS_I_LPE 3040
+htcmega MACH_HTCMEGA HTCMEGA 3041
+tricorder MACH_TRICORDER TRICORDER 3042
+tx28 MACH_TX28 TX28 3043
+bstbrd MACH_BSTBRD BSTBRD 3044
+pwb3090 MACH_PWB3090 PWB3090 3045
+idea6410 MACH_IDEA6410 IDEA6410 3046
+qbc9263 MACH_QBC9263 QBC9263 3047
+borabora MACH_BORABORA BORABORA 3048
+valdez MACH_VALDEZ VALDEZ 3049
+ls9g20 MACH_LS9G20 LS9G20 3050
+mios_v1 MACH_MIOS_V1 MIOS_V1 3051
+s5pc110_crespo MACH_S5PC110_CRESPO S5PC110_CRESPO 3052
+controltek9g20 MACH_CONTROLTEK9G20 CONTROLTEK9G20 3053
+tin307 MACH_TIN307 TIN307 3054
+tin510 MACH_TIN510 TIN510 3055
+bluecheese MACH_BLUECHEESE BLUECHEESE 3057
+tem3x30 MACH_TEM3X30 TEM3X30 3058
+harvest_desoto MACH_HARVEST_DESOTO HARVEST_DESOTO 3059
+msm8x60_qrdc MACH_MSM8X60_QRDC MSM8X60_QRDC 3060
+spear900 MACH_SPEAR900 SPEAR900 3061
+pcontrol_g20 MACH_PCONTROL_G20 PCONTROL_G20 3062
vfree(module->arch.syminfo);
module->arch.syminfo = NULL;
- return module_bug_finalize(hdr, sechdrs, module);
+ return 0;
}
void module_arch_cleanup(struct module *module)
{
- module_bug_cleanup(module);
}
default y
select HAVE_IDE
select HAVE_ARCH_TRACEHOOK
+ select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
config ZONE_DMA
struct user_context *user = current->thread.user;
unsigned long tbr, psr;
+ /* Always make any pending restarted system calls return -EINTR */
+ current_thread_info()->restart_block.fn = do_no_restart_syscall;
+
tbr = user->i.tbr;
psr = user->i.psr;
if (copy_from_user(user, &sc->sc_context, sizeof(sc->sc_context)))
struct sigframe __user *frame;
int rsig;
+ set_fs(USER_DS);
+
frame = get_sigframe(ka, sizeof(*frame));
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
(unsigned long) (frame->retcode + 2));
}
- /* set up registers for signal handler */
- __frame->sp = (unsigned long) frame;
- __frame->lr = (unsigned long) &frame->retcode;
- __frame->gr8 = sig;
-
+ /* Set up registers for the signal handler */
if (current->personality & FDPIC_FUNCPTRS) {
struct fdpic_func_descriptor __user *funcptr =
(struct fdpic_func_descriptor __user *) ka->sa.sa_handler;
- __get_user(__frame->pc, &funcptr->text);
- __get_user(__frame->gr15, &funcptr->GOT);
+ struct fdpic_func_descriptor desc;
+ if (copy_from_user(&desc, funcptr, sizeof(desc)))
+ goto give_sigsegv;
+ __frame->pc = desc.text;
+ __frame->gr15 = desc.GOT;
} else {
__frame->pc = (unsigned long) ka->sa.sa_handler;
__frame->gr15 = 0;
}
- set_fs(USER_DS);
+ __frame->sp = (unsigned long) frame;
+ __frame->lr = (unsigned long) &frame->retcode;
+ __frame->gr8 = sig;
/* the tracer may want to single-step inside the handler */
if (test_thread_flag(TIF_SINGLESTEP))
return 0;
give_sigsegv:
- force_sig(SIGSEGV, current);
+ force_sigsegv(sig, current);
return -EFAULT;
} /* end setup_frame() */
struct rt_sigframe __user *frame;
int rsig;
+ set_fs(USER_DS);
+
frame = get_sigframe(ka, sizeof(*frame));
if (!access_ok(VERIFY_WRITE, frame, sizeof(*frame)))
}
/* Set up registers for signal handler */
- __frame->sp = (unsigned long) frame;
- __frame->lr = (unsigned long) &frame->retcode;
- __frame->gr8 = sig;
- __frame->gr9 = (unsigned long) &frame->info;
-
if (current->personality & FDPIC_FUNCPTRS) {
struct fdpic_func_descriptor __user *funcptr =
(struct fdpic_func_descriptor __user *) ka->sa.sa_handler;
- __get_user(__frame->pc, &funcptr->text);
- __get_user(__frame->gr15, &funcptr->GOT);
+ struct fdpic_func_descriptor desc;
+ if (copy_from_user(&desc, funcptr, sizeof(desc)))
+ goto give_sigsegv;
+ __frame->pc = desc.text;
+ __frame->gr15 = desc.GOT;
} else {
__frame->pc = (unsigned long) ka->sa.sa_handler;
__frame->gr15 = 0;
}
- set_fs(USER_DS);
+ __frame->sp = (unsigned long) frame;
+ __frame->lr = (unsigned long) &frame->retcode;
+ __frame->gr8 = sig;
+ __frame->gr9 = (unsigned long) &frame->info;
/* the tracer may want to single-step inside the handler */
if (test_thread_flag(TIF_SINGLESTEP))
return 0;
give_sigsegv:
- force_sig(SIGSEGV, current);
+ force_sigsegv(sig, current);
return -EFAULT;
} /* end setup_rt_frame() */
int ret;
/* Are we from a system call? */
- if (in_syscall(__frame)) {
+ if (__frame->syscallno != -1) {
/* If so, check system call restarting.. */
switch (__frame->gr8) {
case -ERESTART_RESTARTBLOCK:
__frame->gr8 = __frame->orig_gr8;
__frame->pc -= 4;
}
+ __frame->syscallno = -1;
}
/* Set up the stack frame */
break;
case -ERESTART_RESTARTBLOCK:
- __frame->gr8 = __NR_restart_syscall;
+ __frame->gr7 = __NR_restart_syscall;
__frame->pc -= 4;
break;
}
+ __frame->syscallno = -1;
}
/* if there's no signal to deliver, we just put the saved sigmask
lib-y := \
__ashldi3.o __lshrdi3.o __muldi3.o __ashrdi3.o __negdi2.o __ucmpdi2.o \
checksum.o memcpy.o memset.o atomic-ops.o atomic64-ops.o \
- outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o perf_event.o
+ outsl_ns.o outsl_sw.o insl_ns.o insl_sw.o cache.o
const Elf_Shdr *sechdrs,
struct module *me)
{
- return module_bug_finalize(hdr, sechdrs, me);
+ return 0;
}
void module_arch_cleanup(struct module *mod)
{
- module_bug_cleanup(mod);
}
}
static __inline__ void __user *
-compat_alloc_user_space (long len)
+arch_compat_alloc_user_space (long len)
{
struct pt_regs *regs = task_pt_regs(current);
return (void __user *) (((regs->r12 & 0xffffffff) & -16) - len);
* David Mosberger-Tang <davidm@hpl.hp.com>
*/
-
-#include <linux/threads.h>
-#include <linux/irq.h>
-
-#include <asm/processor.h>
-
/*
* No irq_cpustat_t for IA-64. The data is held in the per-CPU data structure.
*/
#define local_softirq_pending() (local_cpu_data->softirq_pending)
+#include <linux/threads.h>
+#include <linux/irq.h>
+
+#include <asm/processor.h>
+
extern void __iomem *ipi_base_addr;
void ack_bad_irq(unsigned int irq);
void default_idle(void);
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-extern void account_system_vtime(struct task_struct *);
-#endif
-
#endif /* __KERNEL__ */
#endif /* __ASSEMBLY__ */
;;
RSM_PSR_I(p0, r18, r19) // mask interrupt delivery
- mov ar.ccv=0
andcm r14=r14,r17 // filter out SIGKILL & SIGSTOP
+ mov r8=EINVAL // default to EINVAL
#ifdef CONFIG_SMP
- mov r17=1
+ // __ticket_spin_trylock(r31)
+ ld4 r17=[r31]
;;
- cmpxchg4.acq r18=[r31],r17,ar.ccv // try to acquire the lock
- mov r8=EINVAL // default to EINVAL
+ mov.m ar.ccv=r17
+ extr.u r9=r17,17,15
+ adds r19=1,r17
+ extr.u r18=r17,0,15
+ ;;
+ cmp.eq p6,p7=r9,r18
;;
+(p6) cmpxchg4.acq r9=[r31],r19,ar.ccv
+(p6) dep.z r20=r19,1,15 // next serving ticket for unlock
+(p7) br.cond.spnt.many .lock_contention
+ ;;
+ cmp4.eq p0,p7=r9,r17
+ adds r31=2,r31
+(p7) br.cond.spnt.many .lock_contention
ld8 r3=[r2] // re-read current->blocked now that we hold the lock
- cmp4.ne p6,p0=r18,r0
-(p6) br.cond.spnt.many .lock_contention
;;
#else
ld8 r3=[r2] // re-read current->blocked now that we hold the lock
- mov r8=EINVAL // default to EINVAL
#endif
add r18=IA64_TASK_PENDING_OFFSET+IA64_SIGPENDING_SIGNAL_OFFSET,r16
add r19=IA64_TASK_SIGNAL_OFFSET,r16
(p6) br.cond.spnt.few 1b // yes -> retry
#ifdef CONFIG_SMP
- st4.rel [r31]=r0 // release the lock
+ // __ticket_spin_unlock(r31)
+ st2.rel [r31]=r20
+ mov r20=0 // i must not leak kernel bits...
#endif
SSM_PSR_I(p0, p9, r31)
;;
.sig_pending:
#ifdef CONFIG_SMP
- st4.rel [r31]=r0 // release the lock
+ // __ticket_spin_unlock(r31)
+ st2.rel [r31]=r20 // release the lock
#endif
SSM_PSR_I(p0, p9, r17)
;;
* These are used to set parameters in the core dumps.
*/
#define ELF_CLASS ELFCLASS32
-#if defined(__LITTLE_ENDIAN)
+#if defined(__LITTLE_ENDIAN__)
#define ELF_DATA ELFDATA2LSB
-#elif defined(__BIG_ENDIAN)
+#elif defined(__BIG_ENDIAN__)
#define ELF_DATA ELFDATA2MSB
#else
#error no endian defined
#undef __HAVE_ARCH_SIG_BITOPS
struct pt_regs;
-extern int do_signal(struct pt_regs *regs, sigset_t *oldset);
#define ptrace_signal_deliver(regs, cookie) do { } while (0)
#define __ARCH_WANT_SYS_OLD_GETRLIMIT /*will be unused*/
#define __ARCH_WANT_SYS_OLDUMOUNT
#define __ARCH_WANT_SYS_RT_SIGACTION
+#define __ARCH_WANT_SYS_RT_SIGSUSPEND
#define __IGNORE_lchown
#define __IGNORE_setuid
--- /dev/null
+vmlinux.lds
work_notifysig: ; deal with pending signals and
; notify-resume requests
mv r0, sp ; arg1 : struct pt_regs *regs
- ldi r1, #0 ; arg2 : sigset_t *oldset
- mv r2, r9 ; arg3 : __u32 thread_info_flags
+ mv r1, r9 ; arg2 : __u32 thread_info_flags
bl do_notify_resume
- bra restore_all
+ bra resume_userspace
; perform syscall exit tracing
ALIGN
if (access_process_vm(child, pc&~3, &insn, sizeof(insn), 0)
!= sizeof(insn))
- break;
+ return -EIO;
compute_next_pc(insn, pc, &next_pc, child);
if (next_pc & 0x80000000)
- break;
+ return -EIO;
if (embed_debug_trap(child, next_pc))
- break;
+ return -EIO;
invalidate_cache();
+ return 0;
}
void user_disable_single_step(struct task_struct *child)
#define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP)))
-int do_signal(struct pt_regs *, sigset_t *);
-
-asmlinkage int
-sys_rt_sigsuspend(sigset_t __user *unewset, size_t sigsetsize,
- unsigned long r2, unsigned long r3, unsigned long r4,
- unsigned long r5, unsigned long r6, struct pt_regs *regs)
-{
- sigset_t newset;
-
- /* XXX: Don't preclude handling different sized sigset_t's. */
- if (sigsetsize != sizeof(sigset_t))
- return -EINVAL;
-
- if (copy_from_user(&newset, unewset, sizeof(newset)))
- return -EFAULT;
- sigdelsetmask(&newset, sigmask(SIGKILL)|sigmask(SIGSTOP));
-
- spin_lock_irq(¤t->sighand->siglock);
- current->saved_sigmask = current->blocked;
- current->blocked = newset;
- recalc_sigpending();
- spin_unlock_irq(¤t->sighand->siglock);
-
- current->state = TASK_INTERRUPTIBLE;
- schedule();
- set_thread_flag(TIF_RESTORE_SIGMASK);
- return -ERESTARTNOHAND;
-}
-
asmlinkage int
sys_sigaltstack(const stack_t __user *uss, stack_t __user *uoss,
unsigned long r2, unsigned long r3, unsigned long r4,
return (void __user *)((sp - frame_size) & -8ul);
}
-static void setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
+static int setup_rt_frame(int sig, struct k_sigaction *ka, siginfo_t *info,
sigset_t *set, struct pt_regs *regs)
{
struct rt_sigframe __user *frame;
current->comm, current->pid, frame, regs->pc);
#endif
- return;
+ return 0;
give_sigsegv:
force_sigsegv(sig, current);
+ return -EFAULT;
+}
+
+static int prev_insn(struct pt_regs *regs)
+{
+ u16 inst;
+ if (get_user(inst, (u16 __user *)(regs->bpc - 2)))
+ return -EFAULT;
+ if ((inst & 0xfff0) == 0x10f0) /* trap ? */
+ regs->bpc -= 2;
+ else
+ regs->bpc -= 4;
+ regs->syscall_nr = -1;
+ return 0;
}
/*
* OK, we're invoking a handler
*/
-static void
+static int
handle_signal(unsigned long sig, struct k_sigaction *ka, siginfo_t *info,
sigset_t *oldset, struct pt_regs *regs)
{
- unsigned short inst;
-
/* Are we from a system call? */
if (regs->syscall_nr >= 0) {
/* If so, check system call restarting.. */
/* fallthrough */
case -ERESTARTNOINTR:
regs->r0 = regs->orig_r0;
- inst = *(unsigned short *)(regs->bpc - 2);
- if ((inst & 0xfff0) == 0x10f0) /* trap ? */
- regs->bpc -= 2;
- else
- regs->bpc -= 4;
+ if (prev_insn(regs) < 0)
+ return -EFAULT;
}
}
/* Set up the stack frame */
- setup_rt_frame(sig, ka, info, oldset, regs);
+ if (setup_rt_frame(sig, ka, info, oldset, regs))
+ return -EFAULT;
spin_lock_irq(¤t->sighand->siglock);
sigorsets(¤t->blocked,¤t->blocked,&ka->sa.sa_mask);
sigaddset(¤t->blocked,sig);
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
+ return 0;
}
/*
* want to handle. Thus you cannot kill init even with a SIGKILL even by
* mistake.
*/
-int do_signal(struct pt_regs *regs, sigset_t *oldset)
+static void do_signal(struct pt_regs *regs)
{
siginfo_t info;
int signr;
struct k_sigaction ka;
- unsigned short inst;
+ sigset_t *oldset;
/*
* We want the common case to go fast, which
* if so.
*/
if (!user_mode(regs))
- return 1;
+ return;
if (try_to_freeze())
goto no_signal;
- if (!oldset)
+ if (test_thread_flag(TIF_RESTORE_SIGMASK))
+ oldset = ¤t->saved_sigmask;
+ else
oldset = ¤t->blocked;
signr = get_signal_to_deliver(&info, &ka, regs, NULL);
*/
/* Whee! Actually deliver the signal. */
- handle_signal(signr, &ka, &info, oldset, regs);
- return 1;
+ if (handle_signal(signr, &ka, &info, oldset, regs) == 0)
+ clear_thread_flag(TIF_RESTORE_SIGMASK);
+
+ return;
}
no_signal:
regs->r0 == -ERESTARTSYS ||
regs->r0 == -ERESTARTNOINTR) {
regs->r0 = regs->orig_r0;
- inst = *(unsigned short *)(regs->bpc - 2);
- if ((inst & 0xfff0) == 0x10f0) /* trap ? */
- regs->bpc -= 2;
- else
- regs->bpc -= 4;
- }
- if (regs->r0 == -ERESTART_RESTARTBLOCK){
+ prev_insn(regs);
+ } else if (regs->r0 == -ERESTART_RESTARTBLOCK){
regs->r0 = regs->orig_r0;
regs->r7 = __NR_restart_syscall;
- inst = *(unsigned short *)(regs->bpc - 2);
- if ((inst & 0xfff0) == 0x10f0) /* trap ? */
- regs->bpc -= 2;
- else
- regs->bpc -= 4;
+ prev_insn(regs);
}
}
- return 0;
+ if (test_thread_flag(TIF_RESTORE_SIGMASK)) {
+ clear_thread_flag(TIF_RESTORE_SIGMASK);
+ sigprocmask(SIG_SETMASK, ¤t->saved_sigmask, NULL);
+ }
}
/*
* notification of userspace execution resumption
* - triggered by current->work.notify_resume
*/
-void do_notify_resume(struct pt_regs *regs, sigset_t *oldset,
- __u32 thread_info_flags)
+void do_notify_resume(struct pt_regs *regs, __u32 thread_info_flags)
{
/* Pending single-step? */
if (thread_info_flags & _TIF_SINGLESTEP)
/* deal with pending signal delivery */
if (thread_info_flags & _TIF_SIGPENDING)
- do_signal(regs,oldset);
+ do_signal(regs);
if (thread_info_flags & _TIF_NOTIFY_RESUME) {
clear_thread_flag(TIF_NOTIFY_RESUME);
#define __NR_set_thread_area 334
#define __NR_atomic_cmpxchg_32 335
#define __NR_atomic_barrier 336
+#define __NR_fanotify_init 337
+#define __NR_fanotify_mark 338
+#define __NR_prlimit64 339
#ifdef __KERNEL__
-#define NR_syscalls 337
+#define NR_syscalls 340
#define __ARCH_WANT_IPC_PARSE_VERSION
#define __ARCH_WANT_OLD_READDIR
.long sys_set_thread_area
.long sys_atomic_cmpxchg_32 /* 335 */
.long sys_atomic_barrier
+ .long sys_fanotify_init
+ .long sys_fanotify_mark
+ .long sys_prlimit64
void mac_mksound( unsigned int freq, unsigned int length )
{
__u32 cfreq = ( freq << 5 ) / 468;
- __u32 flags;
+ unsigned long flags;
int i;
if ( mac_special_bell == NULL )
*/
static void mac_quadra_start_bell( unsigned int freq, unsigned int length, unsigned int volume )
{
- __u32 flags;
+ unsigned long flags;
/* if the bell is already ringing, ring longer */
if ( mac_bell_duration > 0 )
static void mac_quadra_ring_bell( unsigned long ignored )
{
int i, count = mac_asc_samplespersec / HZ;
- __u32 flags;
+ unsigned long flags;
/*
* we neither want a sound buffer overflow nor underflow, so we need to match
.long sys_set_thread_area
.long sys_atomic_cmpxchg_32 /* 335 */
.long sys_atomic_barrier
+ .long sys_fanotify_init
+ .long sys_fanotify_mark
+ .long sys_prlimit64
.rept NR_syscalls-(.-sys_call_table)/4
.long sys_ni_syscall
include arch/mips/Kbuild.platforms
obj-y := $(platform-y)
+# make clean traverses $(obj-) without having included .config, so
+# everything ends up here
+obj- := $(platform-)
+
# mips object files
# The object files are linked as core-y files would be linked
select HAVE_KPROBES
select HAVE_KRETPROBES
select RTC_LIB if !MACH_LOONGSON
+ select GENERIC_ATOMIC64 if !64BIT
mainmenu "Linux/MIPS Kernel Configuration"
config GENERIC_ISA_DMA
bool
select ZONE_DMA if GENERIC_ISA_DMA_SUPPORT_BROKEN=n
+ select ISA_DMA_API
config GENERIC_ISA_DMA_SUPPORT_BROKEN
bool
select GENERIC_ISA_DMA
+config ISA_DMA_API
+ bool
+
config GENERIC_GPIO
bool
select SYS_SUPPORTS_SMP
select SMP_UP
help
- This is a kernel model which is also known a VSMP or lately
- has been marketesed into SMVP.
+ This is a kernel model which is known a VSMP but lately has been
+ marketesed into SMVP.
+ Virtual SMP uses the processor's VPEs to implement virtual
+ processors. In currently available configuration of the 34K processor
+ this allows for a dual processor. Both processors will share the same
+ primary caches; each will obtain the half of the TLB for it's own
+ exclusive use. For a layman this model can be described as similar to
+ what Intel calls Hyperthreading.
+
+ For further information see http://www.linux-mips.org/wiki/34K#VSMP
config MIPS_MT_SMTC
bool "SMTC: Use all TCs on all VPEs for SMP"
help
This is a kernel model which is known a SMTC or lately has been
marketesed into SMVP.
+ is presenting the available TC's of the core as processors to Linux.
+ On currently available 34K processors this means a Linux system will
+ see up to 5 processors. The implementation of the SMTC kernel differs
+ significantly from VSMP and cannot efficiently coexist in the same
+ kernel binary so the choice between VSMP and SMTC is a compile time
+ decision.
+
+ For further information see http://www.linux-mips.org/wiki/34K#SMTC
endchoice
char **prom_argv;
char **prom_envp;
-void prom_init_cmdline(void)
+void __init prom_init_cmdline(void)
{
int i;
}
}
-int prom_get_ethernet_addr(char *ethernet_addr)
+int __init prom_get_ethernet_addr(char *ethernet_addr)
{
char *ethaddr_str;
return 0;
}
-EXPORT_SYMBOL(prom_get_ethernet_addr);
void __init prom_free_prom_memory(void)
{
hostprogs-y := calc_vmlinuz_load_addr
VMLINUZ_LOAD_ADDRESS = $(shell $(obj)/calc_vmlinuz_load_addr \
- $(objtree)/$(KBUILD_IMAGE) $(VMLINUX_LOAD_ADDRESS))
+ $(obj)/vmlinux.bin $(VMLINUX_LOAD_ADDRESS))
vmlinuzobjs-y += $(obj)/piggy.o
vmlinuz.srec: vmlinuz
$(call cmd,objcopy)
-clean-files := $(objtree)/vmlinuz.*
+clean-files := $(objtree)/vmlinuz $(objtree)/vmlinuz.{32,ecoff,bin,srec}
def_bool y
select SPARSEMEM_STATIC
depends on CPU_CAVIUM_OCTEON
+
+config CAVIUM_OCTEON_HELPER
+ def_bool y
+ depends on OCTEON_ETHERNET || PCI
return NOTIFY_OK; /* Let default notifier send signals */
}
-static int cnmips_cu2_setup(void)
+static int __init cnmips_cu2_setup(void)
{
return cu2_notifier(cnmips_cu2_call, 0);
}
obj-y += cvmx-bootmem.o cvmx-l2c.o cvmx-sysinfo.o octeon-model.o
-obj-$(CONFIG_PCI) += cvmx-helper-errata.o cvmx-helper-jtag.o
+obj-$(CONFIG_CAVIUM_OCTEON_HELPER) += cvmx-helper-errata.o cvmx-helper-jtag.o
#
# DECstation family
#
-platform-$(CONFIG_MACH_DECSTATION) = dec/
+platform-$(CONFIG_MACH_DECSTATION) += dec/
cflags-$(CONFIG_MACH_DECSTATION) += \
-I$(srctree)/arch/mips/include/asm/mach-dec
libs-$(CONFIG_MACH_DECSTATION) += arch/mips/dec/prom/
*/
#define atomic64_add_negative(i, v) (atomic64_add_return(i, (v)) < 0)
+#else /* !CONFIG_64BIT */
+
+#include <asm-generic/atomic64.h>
+
#endif /* CONFIG_64BIT */
/*
return (u32)(unsigned long)uptr;
}
-static inline void __user *compat_alloc_user_space(long len)
+static inline void __user *arch_compat_alloc_user_space(long len)
{
struct pt_regs *regs = (struct pt_regs *)
((unsigned long) current_thread_info() + THREAD_SIZE - 32) - 1;
#define cu2_notifier(fn, pri) \
({ \
- static struct notifier_block fn##_nb __cpuinitdata = { \
+ static struct notifier_block fn##_nb = { \
.notifier_call = fn, \
.priority = pri \
}; \
*/
#ifdef CONFIG_32BIT
+#include <linux/types.h>
struct flock {
short l_type;
*/
struct gic_intr_map {
unsigned int cpunum; /* Directed to this CPU */
+#define GIC_UNUSED 0xdead /* Dummy data */
unsigned int pin; /* Directed to this Pin */
unsigned int polarity; /* Polarity : +/- */
unsigned int trigtype; /* Trigger : Edge/Levl */
#ifndef __ASM_MACH_TX49XX_KMALLOC_H
#define __ASM_MACH_TX49XX_KMALLOC_H
-#define ARCH_KMALLOC_MINALIGN L1_CACHE_BYTES
+#define ARCH_DMA_MINALIGN L1_CACHE_BYTES
#endif /* __ASM_MACH_TX49XX_KMALLOC_H */
#define GIC_EXT_INTR(x) x
-/* Dummy data */
-#define X 0xdead
-
/* External Interrupts used for IPI */
#define GIC_IPI_EXT_INTR_RESCHED_VPE0 16
#define GIC_IPI_EXT_INTR_CALLFNC_VPE0 17
((unsigned long)(x) - PAGE_OFFSET + PHYS_OFFSET)
#endif
#define __va(x) ((void *)((unsigned long)(x) + PAGE_OFFSET - PHYS_OFFSET))
+
+/*
+ * RELOC_HIDE was originally added by 6007b903dfe5f1d13e0c711ac2894bdd4a61b1ad
+ * (lmo) rsp. 8431fd094d625b94d364fe393076ccef88e6ce18 (kernel.org). The
+ * discussion can be found in lkml posting
+ * <a2ebde260608230500o3407b108hc03debb9da6e62c@mail.gmail.com> which is
+ * archived at http://lists.linuxcoding.com/kernel/2006-q3/msg17360.html
+ *
+ * It is unclear if the misscompilations mentioned in
+ * http://lkml.org/lkml/2010/8/8/138 also affect MIPS so we keep this one
+ * until GCC 3.x has been retired before we can apply
+ * https://patchwork.linux-mips.org/patch/1541/
+ */
+
#define __pa_symbol(x) __pa(RELOC_HIDE((unsigned long)(x), 0))
#define pfn_to_kaddr(pfn) __va((pfn) << PAGE_SHIFT)
#ifdef __ARCH_SI_TRAPNO
int _trapno; /* TRAP # which caused the signal */
#endif
+ short _addr_lsb;
} _sigfault;
/* SIGPOLL, SIGXFSZ (To do ...) */
#define _TIF_LOAD_WATCH (1<<TIF_LOAD_WATCH)
/* work to do on interrupt/exception return */
-#define _TIF_WORK_MASK (0x0000ffef & ~_TIF_SECCOMP)
+#define _TIF_WORK_MASK (0x0000ffef & \
+ ~(_TIF_SECCOMP | _TIF_SYSCALL_AUDIT))
/* work to do on any return to u-space */
#define _TIF_ALLWORK_MASK (0x8000ffff & ~_TIF_SECCOMP)
#define __NR_perf_event_open (__NR_Linux + 333)
#define __NR_accept4 (__NR_Linux + 334)
#define __NR_recvmmsg (__NR_Linux + 335)
+#define __NR_fanotify_init (__NR_Linux + 336)
+#define __NR_fanotify_mark (__NR_Linux + 337)
+#define __NR_prlimit64 (__NR_Linux + 338)
/*
* Offset of the last Linux o32 flavoured syscall
*/
-#define __NR_Linux_syscalls 335
+#define __NR_Linux_syscalls 338
#endif /* _MIPS_SIM == _MIPS_SIM_ABI32 */
#define __NR_O32_Linux 4000
-#define __NR_O32_Linux_syscalls 335
+#define __NR_O32_Linux_syscalls 338
#if _MIPS_SIM == _MIPS_SIM_ABI64
#define __NR_perf_event_open (__NR_Linux + 292)
#define __NR_accept4 (__NR_Linux + 293)
#define __NR_recvmmsg (__NR_Linux + 294)
+#define __NR_fanotify_init (__NR_Linux + 295)
+#define __NR_fanotify_mark (__NR_Linux + 296)
+#define __NR_prlimit64 (__NR_Linux + 297)
/*
* Offset of the last Linux 64-bit flavoured syscall
*/
-#define __NR_Linux_syscalls 294
+#define __NR_Linux_syscalls 297
#endif /* _MIPS_SIM == _MIPS_SIM_ABI64 */
#define __NR_64_Linux 5000
-#define __NR_64_Linux_syscalls 294
+#define __NR_64_Linux_syscalls 297
#if _MIPS_SIM == _MIPS_SIM_NABI32
#define __NR_accept4 (__NR_Linux + 297)
#define __NR_recvmmsg (__NR_Linux + 298)
#define __NR_getdents64 (__NR_Linux + 299)
+#define __NR_fanotify_init (__NR_Linux + 300)
+#define __NR_fanotify_mark (__NR_Linux + 301)
+#define __NR_prlimit64 (__NR_Linux + 302)
/*
* Offset of the last N32 flavoured syscall
*/
-#define __NR_Linux_syscalls 299
+#define __NR_Linux_syscalls 302
#endif /* _MIPS_SIM == _MIPS_SIM_NABI32 */
#define __NR_N32_Linux 6000
-#define __NR_N32_Linux_syscalls 299
+#define __NR_N32_Linux_syscalls 302
#ifdef __KERNEL__
-core-$(CONFIG_MACH_JZ4740) += arch/mips/jz4740/
+platform-$(CONFIG_MACH_JZ4740) += jz4740/
cflags-$(CONFIG_MACH_JZ4740) += -I$(srctree)/arch/mips/include/asm/mach-jz4740
load-$(CONFIG_MACH_JZ4740) += 0xffffffff80010000
return -EFAULT;
}
- regs->regs[0] = 0;
switch (insn.i_format.opcode) {
/*
* jr and jalr are in r_format format.
#include <asm/io.h>
#include <asm/gic.h>
#include <asm/gcmpregs.h>
-#include <asm/mips-boards/maltaint.h>
#include <asm/irq.h>
#include <linux/hardirq.h>
#include <asm-generic/bitops/find.h>
int i;
irq -= _irqbase;
- pr_debug(KERN_DEBUG "%s(%d) called\n", __func__, irq);
+ pr_debug("%s(%d) called\n", __func__, irq);
cpumask_and(&tmp, cpumask, cpu_online_mask);
if (cpus_empty(tmp))
return -1;
/* Setup specifics */
for (i = 0; i < mapsize; i++) {
cpu = intrmap[i].cpunum;
- if (cpu == X)
+ if (cpu == GIC_UNUSED)
continue;
if (cpu == 0 && i != 0 && intrmap[i].flags == 0)
continue;
struct pt_regs *regs = args->regs;
int trap = (regs->cp0_cause & 0x7c) >> 2;
- /* Userpace events, ignore. */
+ /* Userspace events, ignore. */
if (user_mode(regs))
return NOTIFY_DONE;
memset(&tz, 0, sizeof(tz));
if ((ret.retval = sp_syscall(__NR_gettimeofday, (int)&tv,
(int)&tz, 0, 0)) == 0)
- ret.retval = tv.tv_sec;
+ ret.retval = tv.tv_sec;
break;
case MTSP_SYSCALL_EXIT:
{
return sys_lookup_dcookie(merge_64(a0, a1), buf, len);
}
+
+SYSCALL_DEFINE6(32_fanotify_mark, int, fanotify_fd, unsigned int, flags,
+ u64, a3, u64, a4, int, dfd, const char __user *, pathname)
+{
+ return sys_fanotify_mark(fanotify_fd, flags, merge_64(a3, a4),
+ dfd, pathname);
+}
if (!check_same_owner(p) && !capable(CAP_SYS_NICE))
goto out_unlock;
- retval = security_task_setscheduler(p, 0, NULL);
+ retval = security_task_setscheduler(p)
if (retval)
goto out_unlock;
{
/* do the secure computing check first */
if (!entryexit)
- secure_computing(regs->regs[0]);
+ secure_computing(regs->regs[2]);
if (unlikely(current->audit_context) && entryexit)
audit_syscall_exit(AUDITSC_RESULT(regs->regs[2]),
out:
if (unlikely(current->audit_context) && !entryexit)
- audit_syscall_entry(audit_arch(), regs->regs[0],
+ audit_syscall_entry(audit_arch(), regs->regs[2],
regs->regs[4], regs->regs[5],
regs->regs[6], regs->regs[7]);
}
sw t0, PT_R7(sp) # set error flag
beqz t0, 1f
+ lw t1, PT_R2(sp) # syscall number
negu v0 # error
- sw v0, PT_R0(sp) # set flag for syscall
- # restarting
+ sw t1, PT_R0(sp) # save it for syscall restarting
1: sw v0, PT_R2(sp) # result
o32_syscall_exit:
sw t0, PT_R7(sp) # set error flag
beqz t0, 1f
+ lw t1, PT_R2(sp) # syscall number
negu v0 # error
- sw v0, PT_R0(sp) # set flag for syscall
- # restarting
+ sw t1, PT_R0(sp) # save it for syscall restarting
1: sw v0, PT_R2(sp) # result
j syscall_exit
* We probably should handle this case a bit more drastic.
*/
bad_stack:
- negu v0 # error
- sw v0, PT_R0(sp)
+ li v0, EFAULT
sw v0, PT_R2(sp)
li t0, 1 # set error flag
sw t0, PT_R7(sp)
sys sys_rt_tgsigqueueinfo 4
sys sys_perf_event_open 5
sys sys_accept4 4
- sys sys_recvmmsg 5
+ sys sys_recvmmsg 5 /* 4335 */
+ sys sys_fanotify_init 2
+ sys sys_fanotify_mark 6
+ sys sys_prlimit64 4
.endm
/* We pre-compute the number of _instruction_ bytes needed to
sd t0, PT_R7(sp) # set error flag
beqz t0, 1f
+ ld t1, PT_R2(sp) # syscall number
dnegu v0 # error
- sd v0, PT_R0(sp) # set flag for syscall
- # restarting
+ sd t1, PT_R0(sp) # save it for syscall restarting
1: sd v0, PT_R2(sp) # result
n64_syscall_exit:
sd t0, PT_R7(sp) # set error flag
beqz t0, 1f
+ ld t1, PT_R2(sp) # syscall number
dnegu v0 # error
- sd v0, PT_R0(sp) # set flag for syscall restarting
+ sd t1, PT_R0(sp) # save it for syscall restarting
1: sd v0, PT_R2(sp) # result
j syscall_exit
PTR sys_pipe2
PTR sys_inotify_init1
PTR sys_preadv
- PTR sys_pwritev /* 5390 */
+ PTR sys_pwritev /* 5290 */
PTR sys_rt_tgsigqueueinfo
PTR sys_perf_event_open
PTR sys_accept4
- PTR sys_recvmmsg
+ PTR sys_recvmmsg
+ PTR sys_fanotify_init /* 5295 */
+ PTR sys_fanotify_mark
+ PTR sys_prlimit64
.size sys_call_table,.-sys_call_table
sd t0, PT_R7(sp) # set error flag
beqz t0, 1f
+ ld t1, PT_R2(sp) # syscall number
dnegu v0 # error
- sd v0, PT_R0(sp) # set flag for syscall restarting
+ sd t1, PT_R0(sp) # save it for syscall restarting
1: sd v0, PT_R2(sp) # result
local_irq_disable # make sure need_resched and
sd t0, PT_R7(sp) # set error flag
beqz t0, 1f
+ ld t1, PT_R2(sp) # syscall number
dnegu v0 # error
- sd v0, PT_R0(sp) # set flag for syscall restarting
+ sd t1, PT_R0(sp) # save it for syscall restarting
1: sd v0, PT_R2(sp) # result
j syscall_exit
PTR sys_cacheflush
PTR sys_cachectl
PTR sys_sysmips
- PTR sys_io_setup /* 6200 */
+ PTR compat_sys_io_setup /* 6200 */
PTR sys_io_destroy
- PTR sys_io_getevents
- PTR sys_io_submit
+ PTR compat_sys_io_getevents
+ PTR compat_sys_io_submit
PTR sys_io_cancel
PTR sys_exit_group /* 6205 */
PTR sys_lookup_dcookie
PTR sys_perf_event_open
PTR sys_accept4
PTR compat_sys_recvmmsg
- PTR sys_getdents
+ PTR sys_getdents64
+ PTR sys_fanotify_init /* 6300 */
+ PTR sys_fanotify_mark
+ PTR sys_prlimit64
.size sysn32_call_table,.-sysn32_call_table
sd t0, PT_R7(sp) # set error flag
beqz t0, 1f
+ ld t1, PT_R2(sp) # syscall number
dnegu v0 # error
- sd v0, PT_R0(sp) # flag for syscall restarting
+ sd t1, PT_R0(sp) # save it for syscall restarting
1: sd v0, PT_R2(sp) # result
o32_syscall_exit:
sd t0, PT_R7(sp) # set error flag
beqz t0, 1f
+ ld t1, PT_R2(sp) # syscall number
dnegu v0 # error
- sd v0, PT_R0(sp) # set flag for syscall restarting
+ sd t1, PT_R0(sp) # save it for syscall restarting
1: sd v0, PT_R2(sp) # result
j syscall_exit
* The stackpointer for a call with more than 4 arguments is bad.
*/
bad_stack:
- dnegu v0 # error
- sd v0, PT_R0(sp)
+ li v0, EFAULT
sd v0, PT_R2(sp)
li t0, 1 # set error flag
sd t0, PT_R7(sp)
PTR compat_sys_futex
PTR compat_sys_sched_setaffinity
PTR compat_sys_sched_getaffinity /* 4240 */
- PTR sys_io_setup
+ PTR compat_sys_io_setup
PTR sys_io_destroy
- PTR sys_io_getevents
- PTR sys_io_submit
+ PTR compat_sys_io_getevents
+ PTR compat_sys_io_submit
PTR sys_io_cancel /* 4245 */
PTR sys_exit_group
PTR sys32_lookup_dcookie
PTR compat_sys_rt_tgsigqueueinfo
PTR sys_perf_event_open
PTR sys_accept4
- PTR compat_sys_recvmmsg
+ PTR compat_sys_recvmmsg /* 4335 */
+ PTR sys_fanotify_init
+ PTR sys_32_fanotify_mark
+ PTR sys_prlimit64
.size sys_call_table,.-sys_call_table
{
struct rt_sigframe __user *frame;
sigset_t set;
- stack_t st;
int sig;
frame = (struct rt_sigframe __user *) regs.regs[29];
else if (sig)
force_sig(sig, current);
- if (__copy_from_user(&st, &frame->rs_uc.uc_stack, sizeof(st)))
- goto badframe;
/* It is more difficult to avoid calling this function than to
call it and ignore errors. */
- do_sigaltstack((stack_t __user *)&st, NULL, regs.regs[29]);
+ do_sigaltstack(&frame->rs_uc.uc_stack, NULL, regs.regs[29]);
/*
* Don't let your children do this ...
struct mips_abi *abi = current->thread.abi;
void *vdso = current->mm->context.vdso;
- switch(regs->regs[0]) {
- case ERESTART_RESTARTBLOCK:
- case ERESTARTNOHAND:
- regs->regs[2] = EINTR;
- break;
- case ERESTARTSYS:
- if (!(ka->sa.sa_flags & SA_RESTART)) {
+ if (regs->regs[0]) {
+ switch(regs->regs[2]) {
+ case ERESTART_RESTARTBLOCK:
+ case ERESTARTNOHAND:
regs->regs[2] = EINTR;
break;
+ case ERESTARTSYS:
+ if (!(ka->sa.sa_flags & SA_RESTART)) {
+ regs->regs[2] = EINTR;
+ break;
+ }
+ /* fallthrough */
+ case ERESTARTNOINTR:
+ regs->regs[7] = regs->regs[26];
+ regs->regs[2] = regs->regs[0];
+ regs->cp0_epc -= 4;
}
- /* fallthrough */
- case ERESTARTNOINTR: /* Userland will reload $v0. */
- regs->regs[7] = regs->regs[26];
- regs->cp0_epc -= 8;
- }
- regs->regs[0] = 0; /* Don't deal with this again. */
+ regs->regs[0] = 0; /* Don't deal with this again. */
+ }
if (sig_uses_siginfo(ka))
ret = abi->setup_rt_frame(vdso + abi->rt_signal_return_offset,
ret = abi->setup_frame(vdso + abi->signal_return_offset,
ka, regs, sig, oldset);
+ if (ret)
+ return ret;
+
spin_lock_irq(¤t->sighand->siglock);
sigorsets(¤t->blocked, ¤t->blocked, &ka->sa.sa_mask);
if (!(ka->sa.sa_flags & SA_NODEFER))
return;
}
- /*
- * Who's code doesn't conform to the restartable syscall convention
- * dies here!!! The li instruction, a single machine instruction,
- * must directly be followed by the syscall instruction.
- */
if (regs->regs[0]) {
if (regs->regs[2] == ERESTARTNOHAND ||
regs->regs[2] == ERESTARTSYS ||
regs->regs[2] == ERESTARTNOINTR) {
+ regs->regs[2] = regs->regs[0];
regs->regs[7] = regs->regs[26];
- regs->cp0_epc -= 8;
+ regs->cp0_epc -= 4;
}
if (regs->regs[2] == ERESTART_RESTARTBLOCK) {
regs->regs[2] = current->thread.abi->restart;
asmlinkage void sysn32_rt_sigreturn(nabi_no_regargs struct pt_regs regs)
{
struct rt_sigframe_n32 __user *frame;
+ mm_segment_t old_fs;
sigset_t set;
stack_t st;
s32 sp;
/* It is more difficult to avoid calling this function than to
call it and ignore errors. */
+ old_fs = get_fs();
+ set_fs(KERNEL_DS);
do_sigaltstack((stack_t __user *)&st, NULL, regs.regs[29]);
+ set_fs(old_fs);
+
/*
* Don't let your children do this ...
unsigned long value;
unsigned int res;
- regs->regs[0] = 0;
-
/*
* This load never faults.
*/
static gfp_t massage_gfp_flags(const struct device *dev, gfp_t gfp)
{
+ gfp_t dma_flag;
+
/* ignore region specifiers */
gfp &= ~(__GFP_DMA | __GFP_DMA32 | __GFP_HIGHMEM);
-#ifdef CONFIG_ZONE_DMA
+#ifdef CONFIG_ISA
if (dev == NULL)
- gfp |= __GFP_DMA;
- else if (dev->coherent_dma_mask < DMA_BIT_MASK(24))
- gfp |= __GFP_DMA;
+ dma_flag = __GFP_DMA;
else
#endif
-#ifdef CONFIG_ZONE_DMA32
+#if defined(CONFIG_ZONE_DMA32) && defined(CONFIG_ZONE_DMA)
if (dev->coherent_dma_mask < DMA_BIT_MASK(32))
- gfp |= __GFP_DMA32;
+ dma_flag = __GFP_DMA;
+ else if (dev->coherent_dma_mask < DMA_BIT_MASK(64))
+ dma_flag = __GFP_DMA32;
+ else
+#endif
+#if defined(CONFIG_ZONE_DMA32) && !defined(CONFIG_ZONE_DMA)
+ if (dev->coherent_dma_mask < DMA_BIT_MASK(64))
+ dma_flag = __GFP_DMA32;
+ else
+#endif
+#if defined(CONFIG_ZONE_DMA) && !defined(CONFIG_ZONE_DMA32)
+ if (dev->coherent_dma_mask < DMA_BIT_MASK(64))
+ dma_flag = __GFP_DMA;
else
#endif
- ;
+ dma_flag = 0;
/* Don't invoke OOM killer */
gfp |= __GFP_NORETRY;
- return gfp;
+ return gfp | dma_flag;
}
void *dma_alloc_noncoherent(struct device *dev, size_t size,
#define tc_lsize 32
extern unsigned long icache_way_size, dcache_way_size;
-unsigned long tcache_size;
+static unsigned long tcache_size;
#include <asm/r4kcache.h>
*/
#define GIC_CPU_NMI GIC_MAP_TO_NMI_MSK
+#define X GIC_UNUSED
+
static struct gic_intr_map gic_intr_map[GIC_NUM_INTRS] = {
{ X, X, X, X, 0 },
{ X, X, X, X, 0 },
{ X, X, X, X, 0 },
/* The remainder of this table is initialised by fill_ipi_map */
};
+#undef X
/*
* GCMP needs to be detected before any SMP initialisation
if (!((pcicvalue == PCIM_H_EA) ||
(pcicvalue == PCIM_H_IA_FIX) ||
(pcicvalue == PCIM_H_IA_RR))) {
- pr_err(KERN_ERR "PCI init error!!!\n");
+ pr_err("PCI init error!!!\n");
/* Not in Host Mode, return ERROR */
return -1;
}
*/
#include <linux/kernel.h>
+#include <asm/processor.h>
#include <asm/reboot.h>
#include <glb.h>
void pnx8550_machine_restart(char *command)
{
- char head[] = "************* Machine restart *************";
- char foot[] = "*******************************************";
-
- printk("\n\n");
- printk("%s\n", head);
- if (command != NULL)
- printk("* %s\n", command);
- printk("%s\n", foot);
-
PNX8550_RST_CTL = PNX8550_RST_DO_SW_RST;
}
void pnx8550_machine_halt(void)
{
- printk("*** Machine halt. (Not implemented) ***\n");
-}
-
-void pnx8550_machine_power_off(void)
-{
- printk("*** Machine power off. (Not implemented) ***\n");
+ while (1) {
+ if (cpu_wait)
+ cpu_wait();
+ }
}
extern void __init board_setup(void);
extern void pnx8550_machine_restart(char *);
extern void pnx8550_machine_halt(void);
-extern void pnx8550_machine_power_off(void);
extern struct resource ioport_resource;
extern struct resource iomem_resource;
extern char *prom_getcmdline(void);
_machine_restart = pnx8550_machine_restart;
_machine_halt = pnx8550_machine_halt;
- pm_power_off = pnx8550_machine_power_off;
+ pm_power_off = pnx8550_machine_halt;
/* Clear the Global 2 Register, PCI Inta Output Enable Registers
Bit 1:Enable DAC Powerdown
config MN10300
def_bool y
select HAVE_OPROFILE
- select HAVE_ARCH_TRACEHOOK
config AM33
def_bool y
choice
prompt "GDB stub port"
- default GDBSTUB_TTYSM0
+ default GDBSTUB_ON_TTYSM0
depends on GDBSTUB
help
Select the serial port used for GDB-stub.
#include <asm-generic/bitops/hweight.h>
#define ext2_set_bit_atomic(lock, nr, addr) \
- test_and_set_bit((nr) ^ 0x18, (addr))
+ test_and_set_bit((nr), (addr))
#define ext2_clear_bit_atomic(lock, nr, addr) \
- test_and_clear_bit((nr) ^ 0x18, (addr))
+ test_and_clear_bit((nr), (addr))
#include <asm-generic/bitops/ext2-non-atomic.h>
#include <asm-generic/bitops/minix-le.h>
/* These should not be considered constants from userland. */
#define SIGRTMIN 32
-#define SIGRTMAX (_NSIG-1)
+#define SIGRTMAX _NSIG
/*
* SA_FLAGS values:
._intr = &SC0ICR,
._rxb = &SC0RXB,
._txb = &SC0TXB,
- .rx_name = "ttySM0/Rx",
- .tx_name = "ttySM0/Tx",
+ .rx_name = "ttySM0:Rx",
+ .tx_name = "ttySM0:Tx",
#ifdef CONFIG_MN10300_TTYSM0_TIMER8
- .tm_name = "ttySM0/Timer8",
+ .tm_name = "ttySM0:Timer8",
._tmxmd = &TM8MD,
._tmxbr = &TM8BR,
._tmicr = &TM8ICR,
.tm_irq = TM8IRQ,
.div_timer = MNSCx_DIV_TIMER_16BIT,
#else /* CONFIG_MN10300_TTYSM0_TIMER2 */
- .tm_name = "ttySM0/Timer2",
+ .tm_name = "ttySM0:Timer2",
._tmxmd = &TM2MD,
._tmxbr = (volatile u16 *) &TM2BR,
._tmicr = &TM2ICR,
._intr = &SC1ICR,
._rxb = &SC1RXB,
._txb = &SC1TXB,
- .rx_name = "ttySM1/Rx",
- .tx_name = "ttySM1/Tx",
+ .rx_name = "ttySM1:Rx",
+ .tx_name = "ttySM1:Tx",
#ifdef CONFIG_MN10300_TTYSM1_TIMER9
- .tm_name = "ttySM1/Timer9",
+ .tm_name = "ttySM1:Timer9",
._tmxmd = &TM9MD,
._tmxbr = &TM9BR,
._tmicr = &TM9ICR,
.tm_irq = TM9IRQ,
.div_timer = MNSCx_DIV_TIMER_16BIT,
#else /* CONFIG_MN10300_TTYSM1_TIMER3 */
- .tm_name = "ttySM1/Timer3",
+ .tm_name = "ttySM1:Timer3",
._tmxmd = &TM3MD,
._tmxbr = (volatile u16 *) &TM3BR,
._tmicr = &TM3ICR,
.uart.lock =
__SPIN_LOCK_UNLOCKED(mn10300_serial_port_sif2.uart.lock),
.name = "ttySM2",
- .rx_name = "ttySM2/Rx",
- .tx_name = "ttySM2/Tx",
- .tm_name = "ttySM2/Timer10",
+ .rx_name = "ttySM2:Rx",
+ .tx_name = "ttySM2:Tx",
+ .tm_name = "ttySM2:Timer10",
._iobase = &SC2CTR,
._control = &SC2CTR,
._status = &SC2STR,
const Elf_Shdr *sechdrs,
struct module *me)
{
- return module_bug_finalize(hdr, sechdrs, me);
+ return 0;
}
/*
*/
void module_arch_cleanup(struct module *mod)
{
- module_bug_cleanup(mod);
}
old_sigset_t mask;
if (verify_area(VERIFY_READ, act, sizeof(*act)) ||
__get_user(new_ka.sa.sa_handler, &act->sa_handler) ||
- __get_user(new_ka.sa.sa_restorer, &act->sa_restorer))
+ __get_user(new_ka.sa.sa_restorer, &act->sa_restorer) ||
+ __get_user(new_ka.sa.sa_flags, &act->sa_flags) ||
+ __get_user(mask, &act->sa_mask))
return -EFAULT;
- __get_user(new_ka.sa.sa_flags, &act->sa_flags);
- __get_user(mask, &act->sa_mask);
siginitset(&new_ka.sa.sa_mask, mask);
}
if (!ret && oact) {
if (verify_area(VERIFY_WRITE, oact, sizeof(*oact)) ||
__put_user(old_ka.sa.sa_handler, &oact->sa_handler) ||
- __put_user(old_ka.sa.sa_restorer, &oact->sa_restorer))
+ __put_user(old_ka.sa.sa_restorer, &oact->sa_restorer) ||
+ __put_user(old_ka.sa.sa_flags, &oact->sa_flags) ||
+ __put_user(old_ka.sa.sa_mask.sig[0], &oact->sa_mask))
return -EFAULT;
- __put_user(old_ka.sa.sa_flags, &oact->sa_flags);
- __put_user(old_ka.sa.sa_mask.sig[0], &oact->sa_mask);
}
return ret;
{
unsigned int err = 0;
+ /* Always make any pending restarted system calls return -EINTR */
+ current_thread_info()->restart_block.fn = do_no_restart_syscall;
+
if (is_using_fpu(current))
fpu_kill_state(current);
regs->d0 = sig;
regs->d1 = (unsigned long) &frame->sc;
- set_fs(USER_DS);
-
/* the tracer may want to single-step inside the handler */
if (test_thread_flag(TIF_SINGLESTEP))
ptrace_notify(SIGTRAP);
return 0;
give_sigsegv:
- force_sig(SIGSEGV, current);
+ force_sigsegv(sig, current);
return -EFAULT;
}
regs->d0 = sig;
regs->d1 = (long) &frame->info;
- set_fs(USER_DS);
-
/* the tracer may want to single-step inside the handler */
if (test_thread_flag(TIF_SINGLESTEP))
ptrace_notify(SIGTRAP);
return 0;
give_sigsegv:
- force_sig(SIGSEGV, current);
+ force_sigsegv(sig, current);
return -EFAULT;
}
+static inline void stepback(struct pt_regs *regs)
+{
+ regs->pc -= 2;
+ regs->orig_d0 = -1;
+}
+
/*
* handle the actual delivery of a signal to userspace
*/
/* fallthrough */
case -ERESTARTNOINTR:
regs->d0 = regs->orig_d0;
- regs->pc -= 2;
+ stepback(regs);
}
}
case -ERESTARTSYS:
case -ERESTARTNOINTR:
regs->d0 = regs->orig_d0;
- regs->pc -= 2;
+ stepback(regs);
break;
case -ERESTART_RESTARTBLOCK:
regs->d0 = __NR_restart_syscall;
- regs->pc -= 2;
+ stepback(regs);
break;
}
}
# Makefile for the MN10300-specific memory management code
#
+cacheflush-y := cache.o cache-mn10300.o
+cacheflush-$(CONFIG_MN10300_CACHE_WBACK) += cache-flush-mn10300.o
+
+cacheflush-$(CONFIG_MN10300_CACHE_DISABLED) := cache-disabled.o
+
obj-y := \
init.o fault.o pgtable.o extable.o tlb-mn10300.o mmu-context.o \
- misalignment.o dma-alloc.o
-
-ifneq ($(CONFIG_MN10300_CACHE_DISABLED),y)
-obj-y += cache.o cache-mn10300.o
-ifeq ($(CONFIG_MN10300_CACHE_WBACK),y)
-obj-y += cache-flush-mn10300.o
-endif
-endif
+ misalignment.o dma-alloc.o $(cacheflush-y)
-/* Performance event handling
+/* Handle the cache being disabled
*
- * Copyright (C) 2009 Red Hat, Inc. All Rights Reserved.
+ * Copyright (C) 2010 Red Hat, Inc. All Rights Reserved.
* Written by David Howells (dhowells@redhat.com)
*
* This program is free software; you can redistribute it and/or
* as published by the Free Software Foundation; either version
* 2 of the Licence, or (at your option) any later version.
*/
-
-#include <linux/perf_event.h>
+#include <linux/mm.h>
/*
- * mark the performance event as pending
+ * allow userspace to flush the instruction cache
*/
-void set_perf_event_pending(void)
+asmlinkage long sys_cacheflush(unsigned long start, unsigned long end)
{
+ if (end < start)
+ return -EINVAL;
+ return 0;
}
void flush_icache_range(unsigned long start, unsigned long end)
{
#ifdef CONFIG_MN10300_CACHE_WBACK
- unsigned long addr, size, off;
+ unsigned long addr, size, base, off;
struct page *page;
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *ppte, pte;
+ if (end > 0x80000000UL) {
+ /* addresses above 0xa0000000 do not go through the cache */
+ if (end > 0xa0000000UL) {
+ end = 0xa0000000UL;
+ if (start >= end)
+ return;
+ }
+
+ /* kernel addresses between 0x80000000 and 0x9fffffff do not
+ * require page tables, so we just map such addresses directly */
+ base = (start >= 0x80000000UL) ? start : 0x80000000UL;
+ mn10300_dcache_flush_range(base, end);
+ if (base == start)
+ goto invalidate;
+ end = base;
+ }
+
for (; start < end; start += size) {
/* work out how much of the page to flush */
off = start & (PAGE_SIZE - 1);
}
#endif
+invalidate:
mn10300_icache_inv();
}
EXPORT_SYMBOL(flush_icache_range);
select RTC_DRV_GENERIC
select INIT_ALL_POSSIBLE
select BUG
+ select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
select GENERIC_ATOMIC64 if !64BIT
help
return (u32)(unsigned long)uptr;
}
-static __inline__ void __user *compat_alloc_user_space(long len)
+static __inline__ void __user *arch_compat_alloc_user_space(long len)
{
struct pt_regs *regs = ¤t->thread.regs;
return (void __user *)regs->gr[30];
#ifndef __ASM_PARISC_PERF_EVENT_H
#define __ASM_PARISC_PERF_EVENT_H
-/* parisc only supports software events through this interface. */
-static inline void set_perf_event_pending(void) { }
+/* Empty, just to avoid compiling error */
#endif /* __ASM_PARISC_PERF_EVENT_H */
nsyms = newptr - (Elf_Sym *)symhdr->sh_addr;
DEBUGP("NEW num_symtab %lu\n", nsyms);
symhdr->sh_size = nsyms * sizeof(Elf_Sym);
- return module_bug_finalize(hdr, sechdrs, me);
+ return 0;
}
void module_arch_cleanup(struct module *mod)
{
deregister_unwind_table(mod);
- module_bug_cleanup(mod);
}
select HAVE_OPROFILE
select HAVE_SYSCALL_WRAPPERS if PPC64
select GENERIC_ATOMIC64 if PPC32
+ select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
select HAVE_REGS_AND_STACK_ACCESS_API
select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S_64
return (u32)(unsigned long)uptr;
}
-static inline void __user *compat_alloc_user_space(long len)
+static inline void __user *arch_compat_alloc_user_space(long len)
{
struct pt_regs *regs = current->thread.regs;
unsigned long usp = regs->gpr[1];
#ifndef __ARCH_POWERPC_ASM_FSLDMA_H__
#define __ARCH_POWERPC_ASM_FSLDMA_H__
+#include <linux/slab.h>
#include <linux/dmaengine.h>
/*
u8 soft_enabled; /* irq soft-enable flag */
u8 hard_enabled; /* set if irqs are enabled in MSR */
u8 io_sync; /* writel() needs spin_unlock sync */
- u8 perf_event_pending; /* PM interrupt while soft-disabled */
+ u8 irq_work_pending; /* IRQ_WORK interrupt while soft-disable */
/* Stuff for accurate time accounting */
u64 user_time; /* accumulated usermode TB ticks */
#define PTRRELOC(x) ((typeof(x)) add_reloc_offset((unsigned long)(x)))
-#ifdef CONFIG_VIRT_CPU_ACCOUNTING
-extern void account_system_vtime(struct task_struct *);
-#endif
-
extern struct dentry *powerpc_debugfs_root;
#endif /* __KERNEL__ */
const Elf_Shdr *sechdrs, struct module *me)
{
const Elf_Shdr *sect;
- int err;
-
- err = module_bug_finalize(hdr, sechdrs, me);
- if (err)
- return err;
/* Apply feature fixups */
sect = find_section(hdr, sechdrs, "__ftr_fixup");
void module_arch_cleanup(struct module *mod)
{
- module_bug_cleanup(mod);
}
#include "ppc32.h"
#endif
-/*
- * Store another value in a callchain_entry.
- */
-static inline void callchain_store(struct perf_callchain_entry *entry, u64 ip)
-{
- unsigned int nr = entry->nr;
-
- if (nr < PERF_MAX_STACK_DEPTH) {
- entry->ip[nr] = ip;
- entry->nr = nr + 1;
- }
-}
/*
* Is sp valid as the address of the next kernel stack frame after prev_sp?
return 0;
}
-static void perf_callchain_kernel(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
+void
+perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
{
unsigned long sp, next_sp;
unsigned long next_ip;
lr = regs->link;
sp = regs->gpr[1];
- callchain_store(entry, PERF_CONTEXT_KERNEL);
- callchain_store(entry, regs->nip);
+ perf_callchain_store(entry, regs->nip);
if (!validate_sp(sp, current, STACK_FRAME_OVERHEAD))
return;
next_ip = regs->nip;
lr = regs->link;
level = 0;
- callchain_store(entry, PERF_CONTEXT_KERNEL);
+ perf_callchain_store(entry, PERF_CONTEXT_KERNEL);
} else {
if (level == 0)
++level;
}
- callchain_store(entry, next_ip);
+ perf_callchain_store(entry, next_ip);
if (!valid_next_sp(next_sp, sp))
return;
sp = next_sp;
puc == (unsigned long) &sf->uc;
}
-static void perf_callchain_user_64(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
+static void perf_callchain_user_64(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
{
unsigned long sp, next_sp;
unsigned long next_ip;
next_ip = regs->nip;
lr = regs->link;
sp = regs->gpr[1];
- callchain_store(entry, PERF_CONTEXT_USER);
- callchain_store(entry, next_ip);
+ perf_callchain_store(entry, next_ip);
for (;;) {
fp = (unsigned long __user *) sp;
read_user_stack_64(&uregs[PT_R1], &sp))
return;
level = 0;
- callchain_store(entry, PERF_CONTEXT_USER);
- callchain_store(entry, next_ip);
+ perf_callchain_store(entry, PERF_CONTEXT_USER);
+ perf_callchain_store(entry, next_ip);
continue;
}
if (level == 0)
next_ip = lr;
- callchain_store(entry, next_ip);
+ perf_callchain_store(entry, next_ip);
++level;
sp = next_sp;
}
return __get_user_inatomic(*ret, ptr);
}
-static inline void perf_callchain_user_64(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
+static inline void perf_callchain_user_64(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
{
}
return mctx->mc_gregs;
}
-static void perf_callchain_user_32(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
+static void perf_callchain_user_32(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
{
unsigned int sp, next_sp;
unsigned int next_ip;
next_ip = regs->nip;
lr = regs->link;
sp = regs->gpr[1];
- callchain_store(entry, PERF_CONTEXT_USER);
- callchain_store(entry, next_ip);
+ perf_callchain_store(entry, next_ip);
while (entry->nr < PERF_MAX_STACK_DEPTH) {
fp = (unsigned int __user *) (unsigned long) sp;
read_user_stack_32(&uregs[PT_R1], &sp))
return;
level = 0;
- callchain_store(entry, PERF_CONTEXT_USER);
- callchain_store(entry, next_ip);
+ perf_callchain_store(entry, PERF_CONTEXT_USER);
+ perf_callchain_store(entry, next_ip);
continue;
}
if (level == 0)
next_ip = lr;
- callchain_store(entry, next_ip);
+ perf_callchain_store(entry, next_ip);
++level;
sp = next_sp;
}
}
-/*
- * Since we can't get PMU interrupts inside a PMU interrupt handler,
- * we don't need separate irq and nmi entries here.
- */
-static DEFINE_PER_CPU(struct perf_callchain_entry, cpu_perf_callchain);
-
-struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
+void
+perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
{
- struct perf_callchain_entry *entry = &__get_cpu_var(cpu_perf_callchain);
-
- entry->nr = 0;
-
- if (!user_mode(regs)) {
- perf_callchain_kernel(regs, entry);
- if (current->mm)
- regs = task_pt_regs(current);
- else
- regs = NULL;
- }
-
- if (regs) {
- if (current_is_64bit())
- perf_callchain_user_64(regs, entry);
- else
- perf_callchain_user_32(regs, entry);
- }
-
- return entry;
+ if (current_is_64bit())
+ perf_callchain_user_64(entry, regs);
+ else
+ perf_callchain_user_32(entry, regs);
}
{
s64 val, delta, prev;
+ if (event->hw.state & PERF_HES_STOPPED)
+ return;
+
if (!event->hw.idx)
return;
/*
* Disable all events to prevent PMU interrupts and to allow
* events to be added or removed.
*/
-void hw_perf_disable(void)
+static void power_pmu_disable(struct pmu *pmu)
{
struct cpu_hw_events *cpuhw;
unsigned long flags;
* If we were previously disabled and events were added, then
* put the new config on the PMU.
*/
-void hw_perf_enable(void)
+static void power_pmu_enable(struct pmu *pmu)
{
struct perf_event *event;
struct cpu_hw_events *cpuhw;
}
local64_set(&event->hw.prev_count, val);
event->hw.idx = idx;
+ if (event->hw.state & PERF_HES_STOPPED)
+ val = 0;
write_pmc(idx, val);
perf_event_update_userpage(event);
}
* re-enable the PMU in order to get hw_perf_enable to do the
* actual work of reconfiguring the PMU.
*/
-static int power_pmu_enable(struct perf_event *event)
+static int power_pmu_add(struct perf_event *event, int ef_flags)
{
struct cpu_hw_events *cpuhw;
unsigned long flags;
int ret = -EAGAIN;
local_irq_save(flags);
- perf_disable();
+ perf_pmu_disable(event->pmu);
/*
* Add the event to the list (if there is room)
cpuhw->events[n0] = event->hw.config;
cpuhw->flags[n0] = event->hw.event_base;
+ if (!(ef_flags & PERF_EF_START))
+ event->hw.state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
/*
* If group events scheduling transaction was started,
* skip the schedulability test here, it will be peformed
ret = 0;
out:
- perf_enable();
+ perf_pmu_enable(event->pmu);
local_irq_restore(flags);
return ret;
}
/*
* Remove a event from the PMU.
*/
-static void power_pmu_disable(struct perf_event *event)
+static void power_pmu_del(struct perf_event *event, int ef_flags)
{
struct cpu_hw_events *cpuhw;
long i;
unsigned long flags;
local_irq_save(flags);
- perf_disable();
+ perf_pmu_disable(event->pmu);
power_pmu_read(event);
cpuhw->mmcr[0] &= ~(MMCR0_PMXE | MMCR0_FCECE);
}
- perf_enable();
+ perf_pmu_enable(event->pmu);
local_irq_restore(flags);
}
/*
- * Re-enable interrupts on a event after they were throttled
- * because they were coming too fast.
+ * POWER-PMU does not support disabling individual counters, hence
+ * program their cycle counter to their max value and ignore the interrupts.
*/
-static void power_pmu_unthrottle(struct perf_event *event)
+
+static void power_pmu_start(struct perf_event *event, int ef_flags)
+{
+ unsigned long flags;
+ s64 left;
+
+ if (!event->hw.idx || !event->hw.sample_period)
+ return;
+
+ if (!(event->hw.state & PERF_HES_STOPPED))
+ return;
+
+ if (ef_flags & PERF_EF_RELOAD)
+ WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+
+ local_irq_save(flags);
+ perf_pmu_disable(event->pmu);
+
+ event->hw.state = 0;
+ left = local64_read(&event->hw.period_left);
+ write_pmc(event->hw.idx, left);
+
+ perf_event_update_userpage(event);
+ perf_pmu_enable(event->pmu);
+ local_irq_restore(flags);
+}
+
+static void power_pmu_stop(struct perf_event *event, int ef_flags)
{
- s64 val, left;
unsigned long flags;
if (!event->hw.idx || !event->hw.sample_period)
return;
+
+ if (event->hw.state & PERF_HES_STOPPED)
+ return;
+
local_irq_save(flags);
- perf_disable();
+ perf_pmu_disable(event->pmu);
+
power_pmu_read(event);
- left = event->hw.sample_period;
- event->hw.last_period = left;
- val = 0;
- if (left < 0x80000000L)
- val = 0x80000000L - left;
- write_pmc(event->hw.idx, val);
- local64_set(&event->hw.prev_count, val);
- local64_set(&event->hw.period_left, left);
+ event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+ write_pmc(event->hw.idx, 0);
+
perf_event_update_userpage(event);
- perf_enable();
+ perf_pmu_enable(event->pmu);
local_irq_restore(flags);
}
* Set the flag to make pmu::enable() not perform the
* schedulability test, it will be performed at commit time
*/
-void power_pmu_start_txn(const struct pmu *pmu)
+void power_pmu_start_txn(struct pmu *pmu)
{
struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
+ perf_pmu_disable(pmu);
cpuhw->group_flag |= PERF_EVENT_TXN;
cpuhw->n_txn_start = cpuhw->n_events;
}
* Clear the flag and pmu::enable() will perform the
* schedulability test.
*/
-void power_pmu_cancel_txn(const struct pmu *pmu)
+void power_pmu_cancel_txn(struct pmu *pmu)
{
struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
cpuhw->group_flag &= ~PERF_EVENT_TXN;
+ perf_pmu_enable(pmu);
}
/*
* Perform the group schedulability test as a whole
* Return 0 if success
*/
-int power_pmu_commit_txn(const struct pmu *pmu)
+int power_pmu_commit_txn(struct pmu *pmu)
{
struct cpu_hw_events *cpuhw;
long i, n;
cpuhw->event[i]->hw.config = cpuhw->events[i];
cpuhw->group_flag &= ~PERF_EVENT_TXN;
+ perf_pmu_enable(pmu);
return 0;
}
-struct pmu power_pmu = {
- .enable = power_pmu_enable,
- .disable = power_pmu_disable,
- .read = power_pmu_read,
- .unthrottle = power_pmu_unthrottle,
- .start_txn = power_pmu_start_txn,
- .cancel_txn = power_pmu_cancel_txn,
- .commit_txn = power_pmu_commit_txn,
-};
-
/*
* Return 1 if we might be able to put event on a limited PMC,
* or 0 if not.
return 0;
}
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+static int power_pmu_event_init(struct perf_event *event)
{
u64 ev;
unsigned long flags;
struct cpu_hw_events *cpuhw;
if (!ppmu)
- return ERR_PTR(-ENXIO);
+ return -ENOENT;
+
switch (event->attr.type) {
case PERF_TYPE_HARDWARE:
ev = event->attr.config;
if (ev >= ppmu->n_generic || ppmu->generic_events[ev] == 0)
- return ERR_PTR(-EOPNOTSUPP);
+ return -EOPNOTSUPP;
ev = ppmu->generic_events[ev];
break;
case PERF_TYPE_HW_CACHE:
err = hw_perf_cache_event(event->attr.config, &ev);
if (err)
- return ERR_PTR(err);
+ return err;
break;
case PERF_TYPE_RAW:
ev = event->attr.config;
break;
default:
- return ERR_PTR(-EINVAL);
+ return -ENOENT;
}
+
event->hw.config_base = ev;
event->hw.idx = 0;
* XXX we should check if the task is an idle task.
*/
flags = 0;
- if (event->ctx->task)
+ if (event->attach_state & PERF_ATTACH_TASK)
flags |= PPMU_ONLY_COUNT_RUN;
/*
*/
ev = normal_pmc_alternative(ev, flags);
if (!ev)
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
}
}
n = collect_events(event->group_leader, ppmu->n_counter - 1,
ctrs, events, cflags);
if (n < 0)
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
}
events[n] = ev;
ctrs[n] = event;
cflags[n] = flags;
if (check_excludes(ctrs, cflags, n, 1))
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
cpuhw = &get_cpu_var(cpu_hw_events);
err = power_check_constraints(cpuhw, events, cflags, n + 1);
put_cpu_var(cpu_hw_events);
if (err)
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
event->hw.config = events[n];
event->hw.event_base = cflags[n];
}
event->destroy = hw_perf_event_destroy;
- if (err)
- return ERR_PTR(err);
- return &power_pmu;
+ return err;
}
+struct pmu power_pmu = {
+ .pmu_enable = power_pmu_enable,
+ .pmu_disable = power_pmu_disable,
+ .event_init = power_pmu_event_init,
+ .add = power_pmu_add,
+ .del = power_pmu_del,
+ .start = power_pmu_start,
+ .stop = power_pmu_stop,
+ .read = power_pmu_read,
+ .start_txn = power_pmu_start_txn,
+ .cancel_txn = power_pmu_cancel_txn,
+ .commit_txn = power_pmu_commit_txn,
+};
+
/*
* A counter has overflowed; update its count and record
* things if requested. Note that interrupts are hard-disabled
s64 prev, delta, left;
int record = 0;
+ if (event->hw.state & PERF_HES_STOPPED) {
+ write_pmc(event->hw.idx, 0);
+ return;
+ }
+
/* we don't have to worry about interrupts here */
prev = local64_read(&event->hw.prev_count);
delta = (val - prev) & 0xfffffffful;
val = 0x80000000LL - left;
}
+ write_pmc(event->hw.idx, val);
+ local64_set(&event->hw.prev_count, val);
+ local64_set(&event->hw.period_left, left);
+ perf_event_update_userpage(event);
+
/*
* Finally record data if requested.
*/
if (event->attr.sample_type & PERF_SAMPLE_ADDR)
perf_get_data_addr(regs, &data.addr);
- if (perf_event_overflow(event, nmi, &data, regs)) {
- /*
- * Interrupts are coming too fast - throttle them
- * by setting the event to 0, so it will be
- * at least 2^30 cycles until the next interrupt
- * (assuming each event counts at most 2 counts
- * per cycle).
- */
- val = 0;
- left = ~0ULL >> 1;
- }
+ if (perf_event_overflow(event, nmi, &data, regs))
+ power_pmu_stop(event, 0);
}
-
- write_pmc(event->hw.idx, val);
- local64_set(&event->hw.prev_count, val);
- local64_set(&event->hw.period_left, left);
- perf_event_update_userpage(event);
}
/*
freeze_events_kernel = MMCR0_FCHV;
#endif /* CONFIG_PPC64 */
+ perf_pmu_register(&power_pmu);
perf_cpu_notifier(power_pmu_notifier);
return 0;
{
s64 val, delta, prev;
+ if (event->hw.state & PERF_HES_STOPPED)
+ return;
+
/*
* Performance monitor interrupts come even when interrupts
* are soft-disabled, as long as interrupts are hard-enabled.
* Disable all events to prevent PMU interrupts and to allow
* events to be added or removed.
*/
-void hw_perf_disable(void)
+static void fsl_emb_pmu_disable(struct pmu *pmu)
{
struct cpu_hw_events *cpuhw;
unsigned long flags;
* If we were previously disabled and events were added, then
* put the new config on the PMU.
*/
-void hw_perf_enable(void)
+static void fsl_emb_pmu_enable(struct pmu *pmu)
{
struct cpu_hw_events *cpuhw;
unsigned long flags;
return n;
}
-/* perf must be disabled, context locked on entry */
-static int fsl_emb_pmu_enable(struct perf_event *event)
+/* context locked on entry */
+static int fsl_emb_pmu_add(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuhw;
int ret = -EAGAIN;
u64 val;
int i;
+ perf_pmu_disable(event->pmu);
cpuhw = &get_cpu_var(cpu_hw_events);
if (event->hw.config & FSL_EMB_EVENT_RESTRICTED)
val = 0x80000000L - left;
}
local64_set(&event->hw.prev_count, val);
+
+ if (!(flags & PERF_EF_START)) {
+ event->hw.state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+ val = 0;
+ }
+
write_pmc(i, val);
perf_event_update_userpage(event);
ret = 0;
out:
put_cpu_var(cpu_hw_events);
+ perf_pmu_enable(event->pmu);
return ret;
}
-/* perf must be disabled, context locked on entry */
-static void fsl_emb_pmu_disable(struct perf_event *event)
+/* context locked on entry */
+static void fsl_emb_pmu_del(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuhw;
int i = event->hw.idx;
+ perf_pmu_disable(event->pmu);
if (i < 0)
goto out;
cpuhw->n_events--;
out:
+ perf_pmu_enable(event->pmu);
put_cpu_var(cpu_hw_events);
}
-/*
- * Re-enable interrupts on a event after they were throttled
- * because they were coming too fast.
- *
- * Context is locked on entry, but perf is not disabled.
- */
-static void fsl_emb_pmu_unthrottle(struct perf_event *event)
+static void fsl_emb_pmu_start(struct perf_event *event, int ef_flags)
{
- s64 val, left;
unsigned long flags;
+ s64 left;
if (event->hw.idx < 0 || !event->hw.sample_period)
return;
+
+ if (!(event->hw.state & PERF_HES_STOPPED))
+ return;
+
+ if (ef_flags & PERF_EF_RELOAD)
+ WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+
local_irq_save(flags);
- perf_disable();
- fsl_emb_pmu_read(event);
- left = event->hw.sample_period;
- event->hw.last_period = left;
- val = 0;
- if (left < 0x80000000L)
- val = 0x80000000L - left;
- write_pmc(event->hw.idx, val);
- local64_set(&event->hw.prev_count, val);
- local64_set(&event->hw.period_left, left);
+ perf_pmu_disable(event->pmu);
+
+ event->hw.state = 0;
+ left = local64_read(&event->hw.period_left);
+ write_pmc(event->hw.idx, left);
+
perf_event_update_userpage(event);
- perf_enable();
+ perf_pmu_enable(event->pmu);
local_irq_restore(flags);
}
-static struct pmu fsl_emb_pmu = {
- .enable = fsl_emb_pmu_enable,
- .disable = fsl_emb_pmu_disable,
- .read = fsl_emb_pmu_read,
- .unthrottle = fsl_emb_pmu_unthrottle,
-};
+static void fsl_emb_pmu_stop(struct perf_event *event, int ef_flags)
+{
+ unsigned long flags;
+
+ if (event->hw.idx < 0 || !event->hw.sample_period)
+ return;
+
+ if (event->hw.state & PERF_HES_STOPPED)
+ return;
+
+ local_irq_save(flags);
+ perf_pmu_disable(event->pmu);
+
+ fsl_emb_pmu_read(event);
+ event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+ write_pmc(event->hw.idx, 0);
+
+ perf_event_update_userpage(event);
+ perf_pmu_enable(event->pmu);
+ local_irq_restore(flags);
+}
/*
* Release the PMU if this is the last perf_event.
return 0;
}
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+static int fsl_emb_pmu_event_init(struct perf_event *event)
{
u64 ev;
struct perf_event *events[MAX_HWEVENTS];
case PERF_TYPE_HARDWARE:
ev = event->attr.config;
if (ev >= ppmu->n_generic || ppmu->generic_events[ev] == 0)
- return ERR_PTR(-EOPNOTSUPP);
+ return -EOPNOTSUPP;
ev = ppmu->generic_events[ev];
break;
case PERF_TYPE_HW_CACHE:
err = hw_perf_cache_event(event->attr.config, &ev);
if (err)
- return ERR_PTR(err);
+ return err;
break;
case PERF_TYPE_RAW:
break;
default:
- return ERR_PTR(-EINVAL);
+ return -ENOENT;
}
event->hw.config = ppmu->xlate_event(ev);
if (!(event->hw.config & FSL_EMB_EVENT_VALID))
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
/*
* If this is in a group, check if it can go on with all the
n = collect_events(event->group_leader,
ppmu->n_counter - 1, events);
if (n < 0)
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
}
if (event->hw.config & FSL_EMB_EVENT_RESTRICTED) {
}
if (num_restricted >= ppmu->n_restricted)
- return ERR_PTR(-EINVAL);
+ return -EINVAL;
}
event->hw.idx = -1;
if (event->attr.exclude_kernel)
event->hw.config_base |= PMLCA_FCS;
if (event->attr.exclude_idle)
- return ERR_PTR(-ENOTSUPP);
+ return -ENOTSUPP;
event->hw.last_period = event->hw.sample_period;
local64_set(&event->hw.period_left, event->hw.last_period);
}
event->destroy = hw_perf_event_destroy;
- if (err)
- return ERR_PTR(err);
- return &fsl_emb_pmu;
+ return err;
}
+static struct pmu fsl_emb_pmu = {
+ .pmu_enable = fsl_emb_pmu_enable,
+ .pmu_disable = fsl_emb_pmu_disable,
+ .event_init = fsl_emb_pmu_event_init,
+ .add = fsl_emb_pmu_add,
+ .del = fsl_emb_pmu_del,
+ .start = fsl_emb_pmu_start,
+ .stop = fsl_emb_pmu_stop,
+ .read = fsl_emb_pmu_read,
+};
+
/*
* A counter has overflowed; update its count and record
* things if requested. Note that interrupts are hard-disabled
s64 prev, delta, left;
int record = 0;
+ if (event->hw.state & PERF_HES_STOPPED) {
+ write_pmc(event->hw.idx, 0);
+ return;
+ }
+
/* we don't have to worry about interrupts here */
prev = local64_read(&event->hw.prev_count);
delta = (val - prev) & 0xfffffffful;
val = 0x80000000LL - left;
}
+ write_pmc(event->hw.idx, val);
+ local64_set(&event->hw.prev_count, val);
+ local64_set(&event->hw.period_left, left);
+ perf_event_update_userpage(event);
+
/*
* Finally record data if requested.
*/
perf_sample_data_init(&data, 0);
data.period = event->hw.last_period;
- if (perf_event_overflow(event, nmi, &data, regs)) {
- /*
- * Interrupts are coming too fast - throttle them
- * by setting the event to 0, so it will be
- * at least 2^30 cycles until the next interrupt
- * (assuming each event counts at most 2 counts
- * per cycle).
- */
- val = 0;
- left = ~0ULL >> 1;
- }
+ if (perf_event_overflow(event, nmi, &data, regs))
+ fsl_emb_pmu_stop(event, 0);
}
-
- write_pmc(event->hw.idx, val);
- local64_set(&event->hw.prev_count, val);
- local64_set(&event->hw.period_left, left);
- perf_event_update_userpage(event);
}
static void perf_event_interrupt(struct pt_regs *regs)
pr_info("%s performance monitor hardware support registered\n",
pmu->name);
+ perf_pmu_register(&fsl_emb_pmu);
+
return 0;
}
ti->local_flags &= ~_TLF_RESTORE_SIGMASK;
sigprocmask(SIG_SETMASK, ¤t->saved_sigmask, NULL);
}
+ regs->trap = 0;
return 0; /* no signals delivered */
}
ret = handle_rt_signal64(signr, &ka, &info, oldset, regs);
}
+ regs->trap = 0;
if (ret) {
spin_lock_irq(¤t->sighand->siglock);
sigorsets(¤t->blocked, ¤t->blocked,
if (!sig)
save_r2 = (unsigned int)regs->gpr[2];
err = restore_general_regs(regs, sr);
+ regs->trap = 0;
err |= __get_user(msr, &sr->mc_gregs[PT_MSR]);
if (!sig)
regs->gpr[2] = (unsigned long) save_r2;
regs->nip = (unsigned long) ka->sa.sa_handler;
/* enter the signal handler in big-endian mode */
regs->msr &= ~MSR_LE;
- regs->trap = 0;
return 1;
badframe:
regs->nip = (unsigned long) ka->sa.sa_handler;
/* enter the signal handler in big-endian mode */
regs->msr &= ~MSR_LE;
- regs->trap = 0;
return 1;
err |= __get_user(regs->xer, &sc->gp_regs[PT_XER]);
err |= __get_user(regs->ccr, &sc->gp_regs[PT_CCR]);
/* skip SOFTE */
- err |= __get_user(regs->trap, &sc->gp_regs[PT_TRAP]);
+ regs->trap = 0;
err |= __get_user(regs->dar, &sc->gp_regs[PT_DAR]);
err |= __get_user(regs->dsisr, &sc->gp_regs[PT_DSISR]);
err |= __get_user(regs->result, &sc->gp_regs[PT_RESULT]);
#include <linux/posix-timers.h>
#include <linux/irq.h>
#include <linux/delay.h>
-#include <linux/perf_event.h>
+#include <linux/irq_work.h>
#include <asm/trace.h>
#include <asm/io.h>
}
#endif /* CONFIG_PPC_ISERIES */
-#ifdef CONFIG_PERF_EVENTS
+#ifdef CONFIG_IRQ_WORK
/*
* 64-bit uses a byte in the PACA, 32-bit uses a per-cpu variable...
*/
#ifdef CONFIG_PPC64
-static inline unsigned long test_perf_event_pending(void)
+static inline unsigned long test_irq_work_pending(void)
{
unsigned long x;
asm volatile("lbz %0,%1(13)"
: "=r" (x)
- : "i" (offsetof(struct paca_struct, perf_event_pending)));
+ : "i" (offsetof(struct paca_struct, irq_work_pending)));
return x;
}
-static inline void set_perf_event_pending_flag(void)
+static inline void set_irq_work_pending_flag(void)
{
asm volatile("stb %0,%1(13)" : :
"r" (1),
- "i" (offsetof(struct paca_struct, perf_event_pending)));
+ "i" (offsetof(struct paca_struct, irq_work_pending)));
}
-static inline void clear_perf_event_pending(void)
+static inline void clear_irq_work_pending(void)
{
asm volatile("stb %0,%1(13)" : :
"r" (0),
- "i" (offsetof(struct paca_struct, perf_event_pending)));
+ "i" (offsetof(struct paca_struct, irq_work_pending)));
}
#else /* 32-bit */
-DEFINE_PER_CPU(u8, perf_event_pending);
+DEFINE_PER_CPU(u8, irq_work_pending);
-#define set_perf_event_pending_flag() __get_cpu_var(perf_event_pending) = 1
-#define test_perf_event_pending() __get_cpu_var(perf_event_pending)
-#define clear_perf_event_pending() __get_cpu_var(perf_event_pending) = 0
+#define set_irq_work_pending_flag() __get_cpu_var(irq_work_pending) = 1
+#define test_irq_work_pending() __get_cpu_var(irq_work_pending)
+#define clear_irq_work_pending() __get_cpu_var(irq_work_pending) = 0
#endif /* 32 vs 64 bit */
-void set_perf_event_pending(void)
+void set_irq_work_pending(void)
{
preempt_disable();
- set_perf_event_pending_flag();
+ set_irq_work_pending_flag();
set_dec(1);
preempt_enable();
}
-#else /* CONFIG_PERF_EVENTS */
+#else /* CONFIG_IRQ_WORK */
-#define test_perf_event_pending() 0
-#define clear_perf_event_pending()
+#define test_irq_work_pending() 0
+#define clear_irq_work_pending()
-#endif /* CONFIG_PERF_EVENTS */
+#endif /* CONFIG_IRQ_WORK */
/*
* For iSeries shared processors, we have to let the hypervisor
calculate_steal_time();
- if (test_perf_event_pending()) {
- clear_perf_event_pending();
- perf_event_do_pending();
+ if (test_irq_work_pending()) {
+ clear_irq_work_pending();
+ irq_work_run();
}
#ifdef CONFIG_PPC_ISERIES
int id_match = 0;
if (dev == NULL || id == NULL)
- return NULL;
+ return clk;
mutex_lock(&clocks_mutex);
list_for_each_entry(p, &clocks, node) {
if (bus_range == NULL || len < 2 * sizeof(int)) {
printk(KERN_WARNING EFIKA_PLATFORM_NAME
": Can't get bus-range for %s\n", pcictrl->full_name);
- return;
+ goto out_put;
}
if (bus_range[1] == bus_range[0])
printk(" controlled by %s\n", pcictrl->full_name);
printk("\n");
- hose = pcibios_alloc_controller(of_node_get(pcictrl));
+ hose = pcibios_alloc_controller(pcictrl);
if (!hose) {
printk(KERN_WARNING EFIKA_PLATFORM_NAME
": Can't allocate PCI controller structure for %s\n",
pcictrl->full_name);
- return;
+ goto out_put;
}
hose->first_busno = bus_range[0];
hose->ops = &rtas_pci_ops;
pci_process_bridge_OF_ranges(hose, pcictrl, 0);
+ return;
+out_put:
+ of_node_put(pcictrl);
}
#else
clrbits32(&simple_gpio->simple_dvo, sync | out);
clrbits8(&wkup_gpio->wkup_dvo, reset);
- /* wait at lease 1 us */
- udelay(2);
+ /* wait for 1 us */
+ udelay(1);
/* Deassert reset */
setbits8(&wkup_gpio->wkup_dvo, reset);
+ /* wait at least 200ns */
+ /* 7 ~= (200ns * timebase) / ns2sec */
+ __delay(7);
+
/* Restore pin-muxing */
out_be32(&simple_gpio->port_config, mux);
select HAVE_KVM if 64BIT
select HAVE_ARCH_TRACEHOOK
select INIT_ALL_POSSIBLE
+ select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
select HAVE_KERNEL_GZIP
select HAVE_KERNEL_BZIP2
can be controlled through /sys/devices/system/cpu/cpu#.
Say N if you want to disable CPU hotplug.
+config SCHED_BOOK
+ bool "Book scheduler support"
+ depends on SMP
+ help
+ Book scheduler support improves the CPU scheduler's decision making
+ when dealing with machines that have several books.
+
config MATHEMU
bool "IEEE FPU emulation"
depends on MARCH_G5
#endif
-static inline void __user *compat_alloc_user_space(long len)
+static inline void __user *arch_compat_alloc_user_space(long len)
{
unsigned long stack;
#ifndef __ASM_HARDIRQ_H
#define __ASM_HARDIRQ_H
-#include <linux/threads.h>
-#include <linux/sched.h>
-#include <linux/cache.h>
-#include <linux/interrupt.h>
#include <asm/lowcore.h>
#define local_softirq_pending() (S390_lowcore.softirq_pending)
* Copyright 2009 Martin Schwidefsky, IBM Corporation.
*/
-static inline void set_perf_event_pending(void) {}
-static inline void clear_perf_event_pending(void) {}
+/* Empty, just to avoid compiling error */
#define PERF_EVENT_INDEX_OFFSET 0
extern void account_vtime(struct task_struct *, struct task_struct *);
extern void account_tick_vtime(struct task_struct *);
-extern void account_system_vtime(struct task_struct *);
#ifdef CONFIG_PFAULT
extern void pfault_irq_init(void);
#include <linux/cpumask.h>
-#define mc_capable() (1)
-
-const struct cpumask *cpu_coregroup_mask(unsigned int cpu);
-
extern unsigned char cpu_core_id[NR_CPUS];
extern cpumask_t cpu_core_map[NR_CPUS];
+static inline const struct cpumask *cpu_coregroup_mask(unsigned int cpu)
+{
+ return &cpu_core_map[cpu];
+}
+
#define topology_core_id(cpu) (cpu_core_id[cpu])
#define topology_core_cpumask(cpu) (&cpu_core_map[cpu])
+#define mc_capable() (1)
+
+#ifdef CONFIG_SCHED_BOOK
+
+extern unsigned char cpu_book_id[NR_CPUS];
+extern cpumask_t cpu_book_map[NR_CPUS];
+
+static inline const struct cpumask *cpu_book_mask(unsigned int cpu)
+{
+ return &cpu_book_map[cpu];
+}
+
+#define topology_book_id(cpu) (cpu_book_id[cpu])
+#define topology_book_cpumask(cpu) (&cpu_book_map[cpu])
+
+#endif /* CONFIG_SCHED_BOOK */
int topology_set_cpu_management(int fc);
void topology_schedule_update(void);
};
#endif
+#define SD_BOOK_INIT SD_CPU_INIT
+
#include <asm-generic/topology.h>
#endif /* _ASM_S390_TOPOLOGY_H */
{
vfree(me->arch.syminfo);
me->arch.syminfo = NULL;
- return module_bug_finalize(hdr, sechdrs, me);
+ return 0;
}
void module_arch_cleanup(struct module *mod)
{
- module_bug_cleanup(mod);
}
union tl_entry tle[0];
};
-struct core_info {
- struct core_info *next;
+struct mask_info {
+ struct mask_info *next;
unsigned char id;
cpumask_t mask;
};
static int topology_enabled;
static void topology_work_fn(struct work_struct *work);
static struct tl_info *tl_info;
-static struct core_info core_info;
static int machine_has_topology;
static struct timer_list topology_timer;
static void set_topology_timer(void);
/* topology_lock protects the core linked list */
static DEFINE_SPINLOCK(topology_lock);
+static struct mask_info core_info;
cpumask_t cpu_core_map[NR_CPUS];
unsigned char cpu_core_id[NR_CPUS];
-static cpumask_t cpu_coregroup_map(unsigned int cpu)
+#ifdef CONFIG_SCHED_BOOK
+static struct mask_info book_info;
+cpumask_t cpu_book_map[NR_CPUS];
+unsigned char cpu_book_id[NR_CPUS];
+#endif
+
+static cpumask_t cpu_group_map(struct mask_info *info, unsigned int cpu)
{
- struct core_info *core = &core_info;
- unsigned long flags;
cpumask_t mask;
cpus_clear(mask);
if (!topology_enabled || !machine_has_topology)
return cpu_possible_map;
- spin_lock_irqsave(&topology_lock, flags);
- while (core) {
- if (cpu_isset(cpu, core->mask)) {
- mask = core->mask;
+ while (info) {
+ if (cpu_isset(cpu, info->mask)) {
+ mask = info->mask;
break;
}
- core = core->next;
+ info = info->next;
}
- spin_unlock_irqrestore(&topology_lock, flags);
if (cpus_empty(mask))
mask = cpumask_of_cpu(cpu);
return mask;
}
-const struct cpumask *cpu_coregroup_mask(unsigned int cpu)
-{
- return &cpu_core_map[cpu];
-}
-
-static void add_cpus_to_core(struct tl_cpu *tl_cpu, struct core_info *core)
+static void add_cpus_to_mask(struct tl_cpu *tl_cpu, struct mask_info *book,
+ struct mask_info *core)
{
unsigned int cpu;
rcpu = CPU_BITS - 1 - cpu + tl_cpu->origin;
for_each_present_cpu(lcpu) {
- if (cpu_logical_map(lcpu) == rcpu) {
- cpu_set(lcpu, core->mask);
- cpu_core_id[lcpu] = core->id;
- smp_cpu_polarization[lcpu] = tl_cpu->pp;
- }
+ if (cpu_logical_map(lcpu) != rcpu)
+ continue;
+#ifdef CONFIG_SCHED_BOOK
+ cpu_set(lcpu, book->mask);
+ cpu_book_id[lcpu] = book->id;
+#endif
+ cpu_set(lcpu, core->mask);
+ cpu_core_id[lcpu] = core->id;
+ smp_cpu_polarization[lcpu] = tl_cpu->pp;
}
}
}
-static void clear_cores(void)
+static void clear_masks(void)
{
- struct core_info *core = &core_info;
+ struct mask_info *info;
- while (core) {
- cpus_clear(core->mask);
- core = core->next;
+ info = &core_info;
+ while (info) {
+ cpus_clear(info->mask);
+ info = info->next;
+ }
+#ifdef CONFIG_SCHED_BOOK
+ info = &book_info;
+ while (info) {
+ cpus_clear(info->mask);
+ info = info->next;
}
+#endif
}
static union tl_entry *next_tle(union tl_entry *tle)
static void tl_to_cores(struct tl_info *info)
{
+#ifdef CONFIG_SCHED_BOOK
+ struct mask_info *book = &book_info;
+#else
+ struct mask_info *book = NULL;
+#endif
+ struct mask_info *core = &core_info;
union tl_entry *tle, *end;
- struct core_info *core = &core_info;
+
spin_lock_irq(&topology_lock);
- clear_cores();
+ clear_masks();
tle = info->tle;
end = (union tl_entry *)((unsigned long)info + info->length);
while (tle < end) {
switch (tle->nl) {
- case 5:
- case 4:
- case 3:
+#ifdef CONFIG_SCHED_BOOK
case 2:
+ book = book->next;
+ book->id = tle->container.id;
break;
+#endif
case 1:
core = core->next;
core->id = tle->container.id;
break;
case 0:
- add_cpus_to_core(&tle->cpu, core);
+ add_cpus_to_mask(&tle->cpu, book, core);
break;
default:
- clear_cores();
+ clear_masks();
machine_has_topology = 0;
goto out;
}
static void update_cpu_core_map(void)
{
+ unsigned long flags;
int cpu;
- for_each_possible_cpu(cpu)
- cpu_core_map[cpu] = cpu_coregroup_map(cpu);
+ spin_lock_irqsave(&topology_lock, flags);
+ for_each_possible_cpu(cpu) {
+ cpu_core_map[cpu] = cpu_group_map(&core_info, cpu);
+#ifdef CONFIG_SCHED_BOOK
+ cpu_book_map[cpu] = cpu_group_map(&book_info, cpu);
+#endif
+ }
+ spin_unlock_irqrestore(&topology_lock, flags);
+}
+
+static void store_topology(struct tl_info *info)
+{
+#ifdef CONFIG_SCHED_BOOK
+ int rc;
+
+ rc = stsi(info, 15, 1, 3);
+ if (rc != -ENOSYS)
+ return;
+#endif
+ stsi(info, 15, 1, 2);
}
int arch_update_cpu_topology(void)
topology_update_polarization_simple();
return 0;
}
- stsi(info, 15, 1, 2);
+ store_topology(info);
tl_to_cores(info);
update_cpu_core_map();
for_each_online_cpu(cpu) {
}
__initcall(init_topology_update);
+static void alloc_masks(struct tl_info *info, struct mask_info *mask, int offset)
+{
+ int i, nr_masks;
+
+ nr_masks = info->mag[NR_MAG - offset];
+ for (i = 0; i < info->mnest - offset; i++)
+ nr_masks *= info->mag[NR_MAG - offset - 1 - i];
+ nr_masks = max(nr_masks, 1);
+ for (i = 0; i < nr_masks; i++) {
+ mask->next = alloc_bootmem(sizeof(struct mask_info));
+ mask = mask->next;
+ }
+}
+
void __init s390_init_cpu_topology(void)
{
unsigned long long facility_bits;
struct tl_info *info;
- struct core_info *core;
- int nr_cores;
int i;
if (stfle(&facility_bits, 1) <= 0)
tl_info = alloc_bootmem_pages(PAGE_SIZE);
info = tl_info;
- stsi(info, 15, 1, 2);
-
- nr_cores = info->mag[NR_MAG - 2];
- for (i = 0; i < info->mnest - 2; i++)
- nr_cores *= info->mag[NR_MAG - 3 - i];
-
+ store_topology(info);
pr_info("The CPU configuration topology of the machine is:");
for (i = 0; i < NR_MAG; i++)
printk(" %d", info->mag[i]);
printk(" / %d\n", info->mnest);
-
- core = &core_info;
- for (i = 0; i < nr_cores; i++) {
- core->next = alloc_bootmem(sizeof(struct core_info));
- core = core->next;
- if (!core)
- goto error;
- }
- return;
-error:
- machine_has_topology = 0;
+ alloc_masks(info, &core_info, 2);
+#ifdef CONFIG_SCHED_BOOK
+ alloc_masks(info, &book_info, 3);
+#endif
}
select HAVE_ARCH_TRACEHOOK
select HAVE_DMA_API_DEBUG
select HAVE_DMA_ATTRS
+ select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
select PERF_USE_VMALLOC
select HAVE_KERNEL_GZIP
select PM
select PM_RUNTIME
+config CPU_HAS_PMU
+ depends on CPU_SH4 || CPU_SH4A
+ default y
+ bool
+
if SUPERH32
choice
LLSC, this should be more efficient than the other alternative of
disabling interrupts around the atomic sequence.
+config HW_PERF_EVENTS
+ bool "Enable hardware performance counter support for perf events"
+ depends on PERF_EVENTS && CPU_HAS_PMU
+ default y
+ help
+ Enable hardware performance counter support for perf events. If
+ disabled, perf events will use software events only.
+
source "drivers/sh/Kconfig"
endmenu
extern int reserve_pmc_hardware(void);
extern void release_pmc_hardware(void);
-static inline void set_perf_event_pending(void)
-{
- /* Nothing to see here, move along. */
-}
-
-#define PERF_EVENT_INDEX_OFFSET 0
-
#endif /* __ASM_SH_PERF_EVENT_H */
int ret = 0;
ret |= module_dwarf_finalize(hdr, sechdrs, me);
- ret |= module_bug_finalize(hdr, sechdrs, me);
return ret;
}
void module_arch_cleanup(struct module *mod)
{
- module_bug_cleanup(mod);
module_dwarf_cleanup(mod);
}
#include <asm/unwinder.h>
#include <asm/ptrace.h>
-static inline void callchain_store(struct perf_callchain_entry *entry, u64 ip)
-{
- if (entry->nr < PERF_MAX_STACK_DEPTH)
- entry->ip[entry->nr++] = ip;
-}
static void callchain_warning(void *data, char *msg)
{
struct perf_callchain_entry *entry = data;
if (reliable)
- callchain_store(entry, addr);
+ perf_callchain_store(entry, addr);
}
static const struct stacktrace_ops callchain_ops = {
.address = callchain_address,
};
-static void
-perf_callchain_kernel(struct pt_regs *regs, struct perf_callchain_entry *entry)
+void
+perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
{
- callchain_store(entry, PERF_CONTEXT_KERNEL);
- callchain_store(entry, regs->pc);
+ perf_callchain_store(entry, regs->pc);
unwind_stack(NULL, regs, NULL, &callchain_ops, entry);
}
-
-static void
-perf_do_callchain(struct pt_regs *regs, struct perf_callchain_entry *entry)
-{
- int is_user;
-
- if (!regs)
- return;
-
- is_user = user_mode(regs);
-
- if (is_user && current->state != TASK_RUNNING)
- return;
-
- /*
- * Only the kernel side is implemented for now.
- */
- if (!is_user)
- perf_callchain_kernel(regs, entry);
-}
-
-/*
- * No need for separate IRQ and NMI entries.
- */
-static DEFINE_PER_CPU(struct perf_callchain_entry, callchain);
-
-struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
-{
- struct perf_callchain_entry *entry = &__get_cpu_var(callchain);
-
- entry->nr = 0;
-
- perf_do_callchain(regs, entry);
-
- return entry;
-}
return !!sh_pmu;
}
+const char *perf_pmu_name(void)
+{
+ if (!sh_pmu)
+ return NULL;
+
+ return sh_pmu->name;
+}
+EXPORT_SYMBOL_GPL(perf_pmu_name);
+
+int perf_num_counters(void)
+{
+ if (!sh_pmu)
+ return 0;
+
+ return sh_pmu->num_events;
+}
+EXPORT_SYMBOL_GPL(perf_num_counters);
+
/*
* Release the PMU if this is the last perf_event.
*/
local64_add(delta, &event->count);
}
-static void sh_pmu_disable(struct perf_event *event)
+static void sh_pmu_stop(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
- clear_bit(idx, cpuc->active_mask);
- sh_pmu->disable(hwc, idx);
+ if (!(event->hw.state & PERF_HES_STOPPED)) {
+ sh_pmu->disable(hwc, idx);
+ cpuc->events[idx] = NULL;
+ event->hw.state |= PERF_HES_STOPPED;
+ }
+
+ if ((flags & PERF_EF_UPDATE) && !(event->hw.state & PERF_HES_UPTODATE)) {
+ sh_perf_event_update(event, &event->hw, idx);
+ event->hw.state |= PERF_HES_UPTODATE;
+ }
+}
+
+static void sh_pmu_start(struct perf_event *event, int flags)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ struct hw_perf_event *hwc = &event->hw;
+ int idx = hwc->idx;
+
+ if (WARN_ON_ONCE(idx == -1))
+ return;
+
+ if (flags & PERF_EF_RELOAD)
+ WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
- barrier();
+ cpuc->events[idx] = event;
+ event->hw.state = 0;
+ sh_pmu->enable(hwc, idx);
+}
- sh_perf_event_update(event, &event->hw, idx);
+static void sh_pmu_del(struct perf_event *event, int flags)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- cpuc->events[idx] = NULL;
- clear_bit(idx, cpuc->used_mask);
+ sh_pmu_stop(event, PERF_EF_UPDATE);
+ __clear_bit(event->hw.idx, cpuc->used_mask);
perf_event_update_userpage(event);
}
-static int sh_pmu_enable(struct perf_event *event)
+static int sh_pmu_add(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
int idx = hwc->idx;
+ int ret = -EAGAIN;
+
+ perf_pmu_disable(event->pmu);
- if (test_and_set_bit(idx, cpuc->used_mask)) {
+ if (__test_and_set_bit(idx, cpuc->used_mask)) {
idx = find_first_zero_bit(cpuc->used_mask, sh_pmu->num_events);
if (idx == sh_pmu->num_events)
- return -EAGAIN;
+ goto out;
- set_bit(idx, cpuc->used_mask);
+ __set_bit(idx, cpuc->used_mask);
hwc->idx = idx;
}
sh_pmu->disable(hwc, idx);
- cpuc->events[idx] = event;
- set_bit(idx, cpuc->active_mask);
-
- sh_pmu->enable(hwc, idx);
+ event->hw.state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+ if (flags & PERF_EF_START)
+ sh_pmu_start(event, PERF_EF_RELOAD);
perf_event_update_userpage(event);
-
- return 0;
+ ret = 0;
+out:
+ perf_pmu_enable(event->pmu);
+ return ret;
}
static void sh_pmu_read(struct perf_event *event)
sh_perf_event_update(event, &event->hw, event->hw.idx);
}
-static const struct pmu pmu = {
- .enable = sh_pmu_enable,
- .disable = sh_pmu_disable,
- .read = sh_pmu_read,
-};
-
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+static int sh_pmu_event_init(struct perf_event *event)
{
- int err = __hw_perf_event_init(event);
+ int err;
+
+ switch (event->attr.type) {
+ case PERF_TYPE_RAW:
+ case PERF_TYPE_HW_CACHE:
+ case PERF_TYPE_HARDWARE:
+ err = __hw_perf_event_init(event);
+ break;
+
+ default:
+ return -ENOENT;
+ }
+
if (unlikely(err)) {
if (event->destroy)
event->destroy(event);
- return ERR_PTR(err);
}
- return &pmu;
+ return err;
+}
+
+static void sh_pmu_enable(struct pmu *pmu)
+{
+ if (!sh_pmu_initialized())
+ return;
+
+ sh_pmu->enable_all();
+}
+
+static void sh_pmu_disable(struct pmu *pmu)
+{
+ if (!sh_pmu_initialized())
+ return;
+
+ sh_pmu->disable_all();
}
+static struct pmu pmu = {
+ .pmu_enable = sh_pmu_enable,
+ .pmu_disable = sh_pmu_disable,
+ .event_init = sh_pmu_event_init,
+ .add = sh_pmu_add,
+ .del = sh_pmu_del,
+ .start = sh_pmu_start,
+ .stop = sh_pmu_stop,
+ .read = sh_pmu_read,
+};
+
static void sh_pmu_setup(int cpu)
{
struct cpu_hw_events *cpuhw = &per_cpu(cpu_hw_events, cpu);
return NOTIFY_OK;
}
-void hw_perf_enable(void)
-{
- if (!sh_pmu_initialized())
- return;
-
- sh_pmu->enable_all();
-}
-
-void hw_perf_disable(void)
-{
- if (!sh_pmu_initialized())
- return;
-
- sh_pmu->disable_all();
-}
-
-int __cpuinit register_sh_pmu(struct sh_pmu *pmu)
+int __cpuinit register_sh_pmu(struct sh_pmu *_pmu)
{
if (sh_pmu)
return -EBUSY;
- sh_pmu = pmu;
+ sh_pmu = _pmu;
- pr_info("Performance Events: %s support registered\n", pmu->name);
+ pr_info("Performance Events: %s support registered\n", _pmu->name);
- WARN_ON(pmu->num_events > MAX_HWEVENTS);
+ WARN_ON(_pmu->num_events > MAX_HWEVENTS);
+ perf_pmu_register(&pmu);
perf_cpu_notifier(sh_pmu_notifier);
return 0;
}
oprofilefs.o oprofile_stats.o \
timer_int.o )
+ifeq ($(CONFIG_HW_PERF_EVENTS),y)
+DRIVER_OBJS += $(addprefix ../../../drivers/oprofile/, oprofile_perf.o)
+endif
+
oprofile-y := $(DRIVER_OBJS) common.o backtrace.o
#include <linux/init.h>
#include <linux/errno.h>
#include <linux/smp.h>
+#include <linux/perf_event.h>
#include <asm/processor.h>
-#include "op_impl.h"
-
-static struct op_sh_model *model;
-
-static struct op_counter_config ctr[20];
+#ifdef CONFIG_HW_PERF_EVENTS
extern void sh_backtrace(struct pt_regs * const regs, unsigned int depth);
-static int op_sh_setup(void)
-{
- /* Pre-compute the values to stuff in the hardware registers. */
- model->reg_setup(ctr);
-
- /* Configure the registers on all cpus. */
- on_each_cpu(model->cpu_setup, NULL, 1);
-
- return 0;
-}
-
-static int op_sh_create_files(struct super_block *sb, struct dentry *root)
+char *op_name_from_perf_id(void)
{
- int i, ret = 0;
+ const char *pmu;
+ char buf[20];
+ int size;
- for (i = 0; i < model->num_counters; i++) {
- struct dentry *dir;
- char buf[4];
+ pmu = perf_pmu_name();
+ if (!pmu)
+ return NULL;
- snprintf(buf, sizeof(buf), "%d", i);
- dir = oprofilefs_mkdir(sb, root, buf);
+ size = snprintf(buf, sizeof(buf), "sh/%s", pmu);
+ if (size > -1 && size < sizeof(buf))
+ return buf;
- ret |= oprofilefs_create_ulong(sb, dir, "enabled", &ctr[i].enabled);
- ret |= oprofilefs_create_ulong(sb, dir, "event", &ctr[i].event);
- ret |= oprofilefs_create_ulong(sb, dir, "kernel", &ctr[i].kernel);
- ret |= oprofilefs_create_ulong(sb, dir, "user", &ctr[i].user);
-
- if (model->create_files)
- ret |= model->create_files(sb, dir);
- else
- ret |= oprofilefs_create_ulong(sb, dir, "count", &ctr[i].count);
-
- /* Dummy entries */
- ret |= oprofilefs_create_ulong(sb, dir, "unit_mask", &ctr[i].unit_mask);
- }
-
- return ret;
+ return NULL;
}
-static int op_sh_start(void)
+int __init oprofile_arch_init(struct oprofile_operations *ops)
{
- /* Enable performance monitoring for all counters. */
- on_each_cpu(model->cpu_start, NULL, 1);
+ ops->backtrace = sh_backtrace;
- return 0;
+ return oprofile_perf_init(ops);
}
-static void op_sh_stop(void)
+void __exit oprofile_arch_exit(void)
{
- /* Disable performance monitoring for all counters. */
- on_each_cpu(model->cpu_stop, NULL, 1);
+ oprofile_perf_exit();
}
-
+#else
int __init oprofile_arch_init(struct oprofile_operations *ops)
{
- struct op_sh_model *lmodel = NULL;
- int ret;
-
- /*
- * Always assign the backtrace op. If the counter initialization
- * fails, we fall back to the timer which will still make use of
- * this.
- */
- ops->backtrace = sh_backtrace;
-
- /*
- * XXX
- *
- * All of the SH7750/SH-4A counters have been converted to perf,
- * this infrastructure hook is left for other users until they've
- * had a chance to convert over, at which point all of this
- * will be deleted.
- */
-
- if (!lmodel)
- return -ENODEV;
- if (!(current_cpu_data.flags & CPU_HAS_PERF_COUNTER))
- return -ENODEV;
-
- ret = lmodel->init();
- if (unlikely(ret != 0))
- return ret;
-
- model = lmodel;
-
- ops->setup = op_sh_setup;
- ops->create_files = op_sh_create_files;
- ops->start = op_sh_start;
- ops->stop = op_sh_stop;
- ops->cpu_type = lmodel->cpu_type;
-
- printk(KERN_INFO "oprofile: using %s performance monitoring.\n",
- lmodel->cpu_type);
-
- return 0;
-}
-
-void oprofile_arch_exit(void)
-{
- if (model && model->exit)
- model->exit();
+ pr_info("oprofile: hardware counters not available\n");
+ return -ENODEV;
}
+void __exit oprofile_arch_exit(void) {}
+#endif /* CONFIG_HW_PERF_EVENTS */
+++ /dev/null
-#ifndef __OP_IMPL_H
-#define __OP_IMPL_H
-
-/* Per-counter configuration as set via oprofilefs. */
-struct op_counter_config {
- unsigned long enabled;
- unsigned long event;
-
- unsigned long count;
-
- /* Dummy values for userspace tool compliance */
- unsigned long kernel;
- unsigned long user;
- unsigned long unit_mask;
-};
-
-/* Per-architecture configury and hooks. */
-struct op_sh_model {
- void (*reg_setup)(struct op_counter_config *);
- int (*create_files)(struct super_block *sb, struct dentry *dir);
- void (*cpu_setup)(void *dummy);
- int (*init)(void);
- void (*exit)(void);
- void (*cpu_start)(void *args);
- void (*cpu_stop)(void *args);
- char *cpu_type;
- unsigned char num_counters;
-};
-
-/* arch/sh/oprofile/common.c */
-extern void sh_backtrace(struct pt_regs * const regs, unsigned int depth);
-
-#endif /* __OP_IMPL_H */
select ARCH_WANT_OPTIONAL_GPIOLIB
select RTC_CLASS
select RTC_DRV_M48T59
+ select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
select PERF_USE_VMALLOC
select HAVE_DMA_ATTRS
select HAVE_DMA_API_DEBUG
+ select HAVE_ARCH_JUMP_LABEL
config SPARC32
def_bool !64BIT
select RTC_DRV_BQ4802
select RTC_DRV_SUN4V
select RTC_DRV_STARFIRE
+ select HAVE_IRQ_WORK
select HAVE_PERF_EVENTS
select PERF_USE_VMALLOC
return (u32)(unsigned long)uptr;
}
-static inline void __user *compat_alloc_user_space(long len)
+static inline void __user *arch_compat_alloc_user_space(long len)
{
struct pt_regs *regs = current_thread_info()->kregs;
unsigned long usp = regs->u_regs[UREG_I6];
--- /dev/null
+#ifndef _ASM_SPARC_JUMP_LABEL_H
+#define _ASM_SPARC_JUMP_LABEL_H
+
+#ifdef __KERNEL__
+
+#include <linux/types.h>
+#include <asm/system.h>
+
+#define JUMP_LABEL_NOP_SIZE 4
+
+#define JUMP_LABEL(key, label) \
+ do { \
+ asm goto("1:\n\t" \
+ "nop\n\t" \
+ "nop\n\t" \
+ ".pushsection __jump_table, \"a\"\n\t"\
+ ".word 1b, %l[" #label "], %c0\n\t" \
+ ".popsection \n\t" \
+ : : "i" (key) : : label);\
+ } while (0)
+
+#endif /* __KERNEL__ */
+
+typedef u32 jump_label_t;
+
+struct jump_entry {
+ jump_label_t code;
+ jump_label_t target;
+ jump_label_t key;
+};
+
+#endif
#ifndef __ASM_SPARC_PERF_EVENT_H
#define __ASM_SPARC_PERF_EVENT_H
-extern void set_perf_event_pending(void);
-
-#define PERF_EVENT_INDEX_OFFSET 0
-
#ifdef CONFIG_PERF_EVENTS
#include <asm/ptrace.h>
pc--$(CONFIG_PERF_EVENTS) := perf_event.o
obj-$(CONFIG_SPARC64) += $(pc--y)
+
+obj-$(CONFIG_SPARC64) += jump_label.o
--- /dev/null
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/mutex.h>
+#include <linux/cpu.h>
+
+#include <linux/jump_label.h>
+#include <linux/memory.h>
+
+#ifdef HAVE_JUMP_LABEL
+
+void arch_jump_label_transform(struct jump_entry *entry,
+ enum jump_label_type type)
+{
+ u32 val;
+ u32 *insn = (u32 *) (unsigned long) entry->code;
+
+ if (type == JUMP_LABEL_ENABLE) {
+ s32 off = (s32)entry->target - (s32)entry->code;
+
+#ifdef CONFIG_SPARC64
+ /* ba,pt %xcc, . + (off << 2) */
+ val = 0x10680000 | ((u32) off >> 2);
+#else
+ /* ba . + (off << 2) */
+ val = 0x10800000 | ((u32) off >> 2);
+#endif
+ } else {
+ val = 0x01000000;
+ }
+
+ get_online_cpus();
+ mutex_lock(&text_mutex);
+ *insn = val;
+ flushi(insn);
+ mutex_unlock(&text_mutex);
+ put_online_cpus();
+}
+
+void arch_jump_label_text_poke_early(jump_label_t addr)
+{
+ u32 *insn_p = (u32 *) (unsigned long) addr;
+
+ *insn_p = 0x01000000;
+ flushi(insn_p);
+}
+
+#endif
#include <asm/spitfire.h>
#ifdef CONFIG_SPARC64
+
+#include <linux/jump_label.h>
+
static void *module_map(unsigned long size)
{
struct vm_struct *area;
const Elf_Shdr *sechdrs,
struct module *me)
{
+ /* make jump label nops */
+ jump_label_apply_nops(me);
+
/* Cheetah's I-cache is fully coherent. */
if (tlb_type == spitfire) {
unsigned long va;
#include <linux/init.h>
#include <linux/irq.h>
-#include <linux/perf_event.h>
+#include <linux/irq_work.h>
#include <linux/ftrace.h>
#include <asm/pil.h>
old_regs = set_irq_regs(regs);
irq_enter();
-#ifdef CONFIG_PERF_EVENTS
- perf_event_do_pending();
+#ifdef CONFIG_IRQ_WORK
+ irq_work_run();
#endif
irq_exit();
set_irq_regs(old_regs);
}
-void set_perf_event_pending(void)
+void arch_irq_work_raise(void)
{
set_softint(1 << PIL_DEFERRED_PCR_WORK);
}
enc = perf_event_get_enc(cpuc->events[i]);
pcr &= ~mask_for_index(idx);
- pcr |= event_encoding(enc, idx);
+ if (hwc->state & PERF_HES_STOPPED)
+ pcr |= nop_for_index(idx);
+ else
+ pcr |= event_encoding(enc, idx);
}
out:
return pcr;
}
-void hw_perf_enable(void)
+static void sparc_pmu_enable(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
u64 pcr;
pcr_ops->write(cpuc->pcr);
}
-void hw_perf_disable(void)
+static void sparc_pmu_disable(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
u64 val;
pcr_ops->write(cpuc->pcr);
}
-static void sparc_pmu_disable(struct perf_event *event)
+static int active_event_index(struct cpu_hw_events *cpuc,
+ struct perf_event *event)
+{
+ int i;
+
+ for (i = 0; i < cpuc->n_events; i++) {
+ if (cpuc->event[i] == event)
+ break;
+ }
+ BUG_ON(i == cpuc->n_events);
+ return cpuc->current_idx[i];
+}
+
+static void sparc_pmu_start(struct perf_event *event, int flags)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ int idx = active_event_index(cpuc, event);
+
+ if (flags & PERF_EF_RELOAD) {
+ WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+ sparc_perf_event_set_period(event, &event->hw, idx);
+ }
+
+ event->hw.state = 0;
+
+ sparc_pmu_enable_event(cpuc, &event->hw, idx);
+}
+
+static void sparc_pmu_stop(struct perf_event *event, int flags)
+{
+ struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ int idx = active_event_index(cpuc, event);
+
+ if (!(event->hw.state & PERF_HES_STOPPED)) {
+ sparc_pmu_disable_event(cpuc, &event->hw, idx);
+ event->hw.state |= PERF_HES_STOPPED;
+ }
+
+ if (!(event->hw.state & PERF_HES_UPTODATE) && (flags & PERF_EF_UPDATE)) {
+ sparc_perf_event_update(event, &event->hw, idx);
+ event->hw.state |= PERF_HES_UPTODATE;
+ }
+}
+
+static void sparc_pmu_del(struct perf_event *event, int _flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- struct hw_perf_event *hwc = &event->hw;
unsigned long flags;
int i;
local_irq_save(flags);
- perf_disable();
+ perf_pmu_disable(event->pmu);
for (i = 0; i < cpuc->n_events; i++) {
if (event == cpuc->event[i]) {
- int idx = cpuc->current_idx[i];
+ /* Absorb the final count and turn off the
+ * event.
+ */
+ sparc_pmu_stop(event, PERF_EF_UPDATE);
/* Shift remaining entries down into
* the existing slot.
cpuc->current_idx[i];
}
- /* Absorb the final count and turn off the
- * event.
- */
- sparc_pmu_disable_event(cpuc, hwc, idx);
- barrier();
- sparc_perf_event_update(event, hwc, idx);
-
perf_event_update_userpage(event);
cpuc->n_events--;
}
}
- perf_enable();
+ perf_pmu_enable(event->pmu);
local_irq_restore(flags);
}
-static int active_event_index(struct cpu_hw_events *cpuc,
- struct perf_event *event)
-{
- int i;
-
- for (i = 0; i < cpuc->n_events; i++) {
- if (cpuc->event[i] == event)
- break;
- }
- BUG_ON(i == cpuc->n_events);
- return cpuc->current_idx[i];
-}
-
static void sparc_pmu_read(struct perf_event *event)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
sparc_perf_event_update(event, hwc, idx);
}
-static void sparc_pmu_unthrottle(struct perf_event *event)
-{
- struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
- int idx = active_event_index(cpuc, event);
- struct hw_perf_event *hwc = &event->hw;
-
- sparc_pmu_enable_event(cpuc, hwc, idx);
-}
-
static atomic_t active_events = ATOMIC_INIT(0);
static DEFINE_MUTEX(pmc_grab_mutex);
if (!n_ev)
return 0;
- if (n_ev > perf_max_events)
+ if (n_ev > MAX_HWEVENTS)
return -1;
msk0 = perf_event_get_msk(events[0]);
return n;
}
-static int sparc_pmu_enable(struct perf_event *event)
+static int sparc_pmu_add(struct perf_event *event, int ef_flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int n0, ret = -EAGAIN;
unsigned long flags;
local_irq_save(flags);
- perf_disable();
+ perf_pmu_disable(event->pmu);
n0 = cpuc->n_events;
- if (n0 >= perf_max_events)
+ if (n0 >= MAX_HWEVENTS)
goto out;
cpuc->event[n0] = event;
cpuc->events[n0] = event->hw.event_base;
cpuc->current_idx[n0] = PIC_NO_INDEX;
+ event->hw.state = PERF_HES_UPTODATE;
+ if (!(ef_flags & PERF_EF_START))
+ event->hw.state |= PERF_HES_STOPPED;
+
/*
* If group events scheduling transaction was started,
* skip the schedulability test here, it will be peformed
ret = 0;
out:
- perf_enable();
+ perf_pmu_enable(event->pmu);
local_irq_restore(flags);
return ret;
}
-static int __hw_perf_event_init(struct perf_event *event)
+static int sparc_pmu_event_init(struct perf_event *event)
{
struct perf_event_attr *attr = &event->attr;
struct perf_event *evts[MAX_HWEVENTS];
if (atomic_read(&nmi_active) < 0)
return -ENODEV;
- if (attr->type == PERF_TYPE_HARDWARE) {
+ switch (attr->type) {
+ case PERF_TYPE_HARDWARE:
if (attr->config >= sparc_pmu->max_events)
return -EINVAL;
pmap = sparc_pmu->event_map(attr->config);
- } else if (attr->type == PERF_TYPE_HW_CACHE) {
+ break;
+
+ case PERF_TYPE_HW_CACHE:
pmap = sparc_map_cache_event(attr->config);
if (IS_ERR(pmap))
return PTR_ERR(pmap);
- } else
- return -EOPNOTSUPP;
+ break;
+
+ case PERF_TYPE_RAW:
+ pmap = NULL;
+ break;
+
+ default:
+ return -ENOENT;
+
+ }
+
+ if (pmap) {
+ hwc->event_base = perf_event_encode(pmap);
+ } else {
+ /*
+ * User gives us "(encoding << 16) | pic_mask" for
+ * PERF_TYPE_RAW events.
+ */
+ hwc->event_base = attr->config;
+ }
/* We save the enable bits in the config_base. */
hwc->config_base = sparc_pmu->irq_bit;
if (!attr->exclude_hv)
hwc->config_base |= sparc_pmu->hv_bit;
- hwc->event_base = perf_event_encode(pmap);
-
n = 0;
if (event->group_leader != event) {
n = collect_events(event->group_leader,
- perf_max_events - 1,
+ MAX_HWEVENTS - 1,
evts, events, current_idx_dmy);
if (n < 0)
return -EINVAL;
* Set the flag to make pmu::enable() not perform the
* schedulability test, it will be performed at commit time
*/
-static void sparc_pmu_start_txn(const struct pmu *pmu)
+static void sparc_pmu_start_txn(struct pmu *pmu)
{
struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
+ perf_pmu_disable(pmu);
cpuhw->group_flag |= PERF_EVENT_TXN;
}
* Clear the flag and pmu::enable() will perform the
* schedulability test.
*/
-static void sparc_pmu_cancel_txn(const struct pmu *pmu)
+static void sparc_pmu_cancel_txn(struct pmu *pmu)
{
struct cpu_hw_events *cpuhw = &__get_cpu_var(cpu_hw_events);
cpuhw->group_flag &= ~PERF_EVENT_TXN;
+ perf_pmu_enable(pmu);
}
/*
* Perform the group schedulability test as a whole
* Return 0 if success
*/
-static int sparc_pmu_commit_txn(const struct pmu *pmu)
+static int sparc_pmu_commit_txn(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int n;
return -EAGAIN;
cpuc->group_flag &= ~PERF_EVENT_TXN;
+ perf_pmu_enable(pmu);
return 0;
}
-static const struct pmu pmu = {
- .enable = sparc_pmu_enable,
- .disable = sparc_pmu_disable,
+static struct pmu pmu = {
+ .pmu_enable = sparc_pmu_enable,
+ .pmu_disable = sparc_pmu_disable,
+ .event_init = sparc_pmu_event_init,
+ .add = sparc_pmu_add,
+ .del = sparc_pmu_del,
+ .start = sparc_pmu_start,
+ .stop = sparc_pmu_stop,
.read = sparc_pmu_read,
- .unthrottle = sparc_pmu_unthrottle,
.start_txn = sparc_pmu_start_txn,
.cancel_txn = sparc_pmu_cancel_txn,
.commit_txn = sparc_pmu_commit_txn,
};
-const struct pmu *hw_perf_event_init(struct perf_event *event)
-{
- int err = __hw_perf_event_init(event);
-
- if (err)
- return ERR_PTR(err);
- return &pmu;
-}
-
void perf_event_print_debug(void)
{
unsigned long flags;
continue;
if (perf_event_overflow(event, 1, &data, regs))
- sparc_pmu_disable_event(cpuc, hwc, idx);
+ sparc_pmu_stop(event, 0);
}
return NOTIFY_STOP;
pr_cont("Supported PMU type is '%s'\n", sparc_pmu_type);
- /* All sparc64 PMUs currently have 2 events. */
- perf_max_events = 2;
-
+ perf_pmu_register(&pmu);
register_die_notifier(&perf_event_nmi_notifier);
}
-static inline void callchain_store(struct perf_callchain_entry *entry, u64 ip)
-{
- if (entry->nr < PERF_MAX_STACK_DEPTH)
- entry->ip[entry->nr++] = ip;
-}
-
-static void perf_callchain_kernel(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
+void perf_callchain_kernel(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
{
unsigned long ksp, fp;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
int graph = 0;
#endif
- callchain_store(entry, PERF_CONTEXT_KERNEL);
- callchain_store(entry, regs->tpc);
+ stack_trace_flush();
+
+ perf_callchain_store(entry, regs->tpc);
ksp = regs->u_regs[UREG_I6];
fp = ksp + STACK_BIAS;
pc = sf->callers_pc;
fp = (unsigned long)sf->fp + STACK_BIAS;
}
- callchain_store(entry, pc);
+ perf_callchain_store(entry, pc);
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
if ((pc + 8UL) == (unsigned long) &return_to_handler) {
int index = current->curr_ret_stack;
if (current->ret_stack && index >= graph) {
pc = current->ret_stack[index - graph].ret;
- callchain_store(entry, pc);
+ perf_callchain_store(entry, pc);
graph++;
}
}
} while (entry->nr < PERF_MAX_STACK_DEPTH);
}
-static void perf_callchain_user_64(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
+static void perf_callchain_user_64(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
{
unsigned long ufp;
- callchain_store(entry, PERF_CONTEXT_USER);
- callchain_store(entry, regs->tpc);
+ perf_callchain_store(entry, regs->tpc);
ufp = regs->u_regs[UREG_I6] + STACK_BIAS;
do {
pc = sf.callers_pc;
ufp = (unsigned long)sf.fp + STACK_BIAS;
- callchain_store(entry, pc);
+ perf_callchain_store(entry, pc);
} while (entry->nr < PERF_MAX_STACK_DEPTH);
}
-static void perf_callchain_user_32(struct pt_regs *regs,
- struct perf_callchain_entry *entry)
+static void perf_callchain_user_32(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
{
unsigned long ufp;
- callchain_store(entry, PERF_CONTEXT_USER);
- callchain_store(entry, regs->tpc);
+ perf_callchain_store(entry, regs->tpc);
ufp = regs->u_regs[UREG_I6] & 0xffffffffUL;
do {
pc = sf.callers_pc;
ufp = (unsigned long)sf.fp;
- callchain_store(entry, pc);
+ perf_callchain_store(entry, pc);
} while (entry->nr < PERF_MAX_STACK_DEPTH);
}
-/* Like powerpc we can't get PMU interrupts within the PMU handler,
- * so no need for separate NMI and IRQ chains as on x86.
- */
-static DEFINE_PER_CPU(struct perf_callchain_entry, callchain);
-
-struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
+void
+perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
{
- struct perf_callchain_entry *entry = &__get_cpu_var(callchain);
-
- entry->nr = 0;
- if (!user_mode(regs)) {
- stack_trace_flush();
- perf_callchain_kernel(regs, entry);
- if (current->mm)
- regs = task_pt_regs(current);
- else
- regs = NULL;
- }
- if (regs) {
- flushw_user();
- if (test_thread_flag(TIF_32BIT))
- perf_callchain_user_32(regs, entry);
- else
- perf_callchain_user_64(regs, entry);
- }
- return entry;
+ flushw_user();
+ if (test_thread_flag(TIF_32BIT))
+ perf_callchain_user_32(entry, regs);
+ else
+ perf_callchain_user_64(entry, regs);
}
return err;
}
-static void setup_frame32(struct k_sigaction *ka, struct pt_regs *regs,
- int signo, sigset_t *oldset)
+/* The I-cache flush instruction only works in the primary ASI, which
+ * right now is the nucleus, aka. kernel space.
+ *
+ * Therefore we have to kick the instructions out using the kernel
+ * side linear mapping of the physical address backing the user
+ * instructions.
+ */
+static void flush_signal_insns(unsigned long address)
+{
+ unsigned long pstate, paddr;
+ pte_t *ptep, pte;
+ pgd_t *pgdp;
+ pud_t *pudp;
+ pmd_t *pmdp;
+
+ /* Commit all stores of the instructions we are about to flush. */
+ wmb();
+
+ /* Disable cross-call reception. In this way even a very wide
+ * munmap() on another cpu can't tear down the page table
+ * hierarchy from underneath us, since that can't complete
+ * until the IPI tlb flush returns.
+ */
+
+ __asm__ __volatile__("rdpr %%pstate, %0" : "=r" (pstate));
+ __asm__ __volatile__("wrpr %0, %1, %%pstate"
+ : : "r" (pstate), "i" (PSTATE_IE));
+
+ pgdp = pgd_offset(current->mm, address);
+ if (pgd_none(*pgdp))
+ goto out_irqs_on;
+ pudp = pud_offset(pgdp, address);
+ if (pud_none(*pudp))
+ goto out_irqs_on;
+ pmdp = pmd_offset(pudp, address);
+ if (pmd_none(*pmdp))
+ goto out_irqs_on;
+
+ ptep = pte_offset_map(pmdp, address);
+ pte = *ptep;
+ if (!pte_present(pte))
+ goto out_unmap;
+
+ paddr = (unsigned long) page_address(pte_page(pte));
+
+ __asm__ __volatile__("flush %0 + %1"
+ : /* no outputs */
+ : "r" (paddr),
+ "r" (address & (PAGE_SIZE - 1))
+ : "memory");
+
+out_unmap:
+ pte_unmap(ptep);
+out_irqs_on:
+ __asm__ __volatile__("wrpr %0, 0x0, %%pstate" : : "r" (pstate));
+
+}
+
+static int setup_frame32(struct k_sigaction *ka, struct pt_regs *regs,
+ int signo, sigset_t *oldset)
{
struct signal_frame32 __user *sf;
int sigframe_size;
if (ka->ka_restorer) {
regs->u_regs[UREG_I7] = (unsigned long)ka->ka_restorer;
} else {
- /* Flush instruction space. */
unsigned long address = ((unsigned long)&(sf->insns[0]));
- pgd_t *pgdp = pgd_offset(current->mm, address);
- pud_t *pudp = pud_offset(pgdp, address);
- pmd_t *pmdp = pmd_offset(pudp, address);
- pte_t *ptep;
- pte_t pte;
regs->u_regs[UREG_I7] = (unsigned long) (&(sf->insns[0]) - 2);
if (err)
goto sigsegv;
- preempt_disable();
- ptep = pte_offset_map(pmdp, address);
- pte = *ptep;
- if (pte_present(pte)) {
- unsigned long page = (unsigned long)
- page_address(pte_page(pte));
-
- wmb();
- __asm__ __volatile__("flush %0 + %1"
- : /* no outputs */
- : "r" (page),
- "r" (address & (PAGE_SIZE - 1))
- : "memory");
- }
- pte_unmap(ptep);
- preempt_enable();
+ flush_signal_insns(address);
}
- return;
+ return 0;
sigill:
do_exit(SIGILL);
+ return -EINVAL;
+
sigsegv:
force_sigsegv(signo, current);
+ return -EFAULT;
}
-static void setup_rt_frame32(struct k_sigaction *ka, struct pt_regs *regs,
- unsigned long signr, sigset_t *oldset,
- siginfo_t *info)
+static int setup_rt_frame32(struct k_sigaction *ka, struct pt_regs *regs,
+ unsigned long signr, sigset_t *oldset,
+ siginfo_t *info)
{
struct rt_signal_frame32 __user *sf;
int sigframe_size;
if (ka->ka_restorer)
regs->u_regs[UREG_I7] = (unsigned long)ka->ka_restorer;
else {
- /* Flush instruction space. */
unsigned long address = ((unsigned long)&(sf->insns[0]));
- pgd_t *pgdp = pgd_offset(current->mm, address);
- pud_t *pudp = pud_offset(pgdp, address);
- pmd_t *pmdp = pmd_offset(pudp, address);
- pte_t *ptep;
regs->u_regs[UREG_I7] = (unsigned long) (&(sf->insns[0]) - 2);
if (err)
goto sigsegv;
- preempt_disable();
- ptep = pte_offset_map(pmdp, address);
- if (pte_present(*ptep)) {
- unsigned long page = (unsigned long)
- page_address(pte_page(*ptep));
-
- wmb();
- __asm__ __volatile__("flush %0 + %1"
- : /* no outputs */
- : "r" (page),
- "r" (address & (PAGE_SIZE - 1))
- : "memory");
- }
- pte_unmap(ptep);
- preempt_enable();
+ flush_signal_insns(address);
}
- return;
+ return 0;
sigill:
do_exit(SIGILL);
+ return -EINVAL;
+
sigsegv:
force_sigsegv(signr, current);
+ return -EFAULT;
}
-static inline void handle_signal32(unsigned long signr, struct k_sigaction *ka,
- siginfo_t *info,
- sigset_t *oldset, struct pt_regs *regs)
+static inline int handle_signal32(unsigned long signr, struct k_sigaction *ka,
+ siginfo_t *info,
+ sigset_t *oldset, struct pt_regs *regs)
{
+ int err;
+
if (ka->sa.sa_flags & SA_SIGINFO)
- setup_rt_frame32(ka, regs, signr, oldset, info);
+ err = setup_rt_frame32(ka, regs, signr, oldset, info);
else
- setup_frame32(ka, regs, signr, oldset);
+ err = setup_frame32(ka, regs, signr, oldset);
+
+ if (err)
+ return err;
spin_lock_irq(¤t->sighand->siglock);
sigorsets(¤t->blocked,¤t->blocked,&ka->sa.sa_mask);
sigaddset(¤t->blocked,signr);
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
+
+ tracehook_signal_handler(signr, info, ka, regs, 0);
+
+ return 0;
}
static inline void syscall_restart32(unsigned long orig_i0, struct pt_regs *regs,
if (signr > 0) {
if (restart_syscall)
syscall_restart32(orig_i0, regs, &ka.sa);
- handle_signal32(signr, &ka, &info, oldset, regs);
-
- /* A signal was successfully delivered; the saved
- * sigmask will have been stored in the signal frame,
- * and will be restored by sigreturn, so we can simply
- * clear the TS_RESTORE_SIGMASK flag.
- */
- current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
-
- tracehook_signal_handler(signr, &info, &ka, regs, 0);
+ if (handle_signal32(signr, &ka, &info, oldset, regs) == 0) {
+ /* A signal was successfully delivered; the saved
+ * sigmask will have been stored in the signal frame,
+ * and will be restored by sigreturn, so we can simply
+ * clear the TS_RESTORE_SIGMASK flag.
+ */
+ current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
+ }
return;
}
if (restart_syscall &&
regs->u_regs[UREG_I0] = orig_i0;
regs->tpc -= 4;
regs->tnpc -= 4;
+ pt_regs_clear_syscall(regs);
}
if (restart_syscall &&
regs->u_regs[UREG_I0] == ERESTART_RESTARTBLOCK) {
regs->u_regs[UREG_G1] = __NR_restart_syscall;
regs->tpc -= 4;
regs->tnpc -= 4;
+ pt_regs_clear_syscall(regs);
}
/* If there's no signal to deliver, we just put the saved sigmask
return err;
}
-static void setup_frame(struct k_sigaction *ka, struct pt_regs *regs,
- int signo, sigset_t *oldset)
+static int setup_frame(struct k_sigaction *ka, struct pt_regs *regs,
+ int signo, sigset_t *oldset)
{
struct signal_frame __user *sf;
int sigframe_size, err;
/* Flush instruction space. */
flush_sig_insns(current->mm, (unsigned long) &(sf->insns[0]));
}
- return;
+ return 0;
sigill_and_return:
do_exit(SIGILL);
+ return -EINVAL;
+
sigsegv:
force_sigsegv(signo, current);
+ return -EFAULT;
}
-static void setup_rt_frame(struct k_sigaction *ka, struct pt_regs *regs,
- int signo, sigset_t *oldset, siginfo_t *info)
+static int setup_rt_frame(struct k_sigaction *ka, struct pt_regs *regs,
+ int signo, sigset_t *oldset, siginfo_t *info)
{
struct rt_signal_frame __user *sf;
int sigframe_size;
/* Flush instruction space. */
flush_sig_insns(current->mm, (unsigned long) &(sf->insns[0]));
}
- return;
+ return 0;
sigill:
do_exit(SIGILL);
+ return -EINVAL;
+
sigsegv:
force_sigsegv(signo, current);
+ return -EFAULT;
}
-static inline void
+static inline int
handle_signal(unsigned long signr, struct k_sigaction *ka,
siginfo_t *info, sigset_t *oldset, struct pt_regs *regs)
{
+ int err;
+
if (ka->sa.sa_flags & SA_SIGINFO)
- setup_rt_frame(ka, regs, signr, oldset, info);
+ err = setup_rt_frame(ka, regs, signr, oldset, info);
else
- setup_frame(ka, regs, signr, oldset);
+ err = setup_frame(ka, regs, signr, oldset);
+
+ if (err)
+ return err;
spin_lock_irq(¤t->sighand->siglock);
sigorsets(¤t->blocked,¤t->blocked,&ka->sa.sa_mask);
sigaddset(¤t->blocked, signr);
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
+
+ tracehook_signal_handler(signr, info, ka, regs, 0);
+
+ return 0;
}
static inline void syscall_restart(unsigned long orig_i0, struct pt_regs *regs,
if (signr > 0) {
if (restart_syscall)
syscall_restart(orig_i0, regs, &ka.sa);
- handle_signal(signr, &ka, &info, oldset, regs);
-
- /* a signal was successfully delivered; the saved
- * sigmask will have been stored in the signal frame,
- * and will be restored by sigreturn, so we can simply
- * clear the TIF_RESTORE_SIGMASK flag.
- */
- if (test_thread_flag(TIF_RESTORE_SIGMASK))
- clear_thread_flag(TIF_RESTORE_SIGMASK);
-
- tracehook_signal_handler(signr, &info, &ka, regs, 0);
+ if (handle_signal(signr, &ka, &info, oldset, regs) == 0) {
+ /* a signal was successfully delivered; the saved
+ * sigmask will have been stored in the signal frame,
+ * and will be restored by sigreturn, so we can simply
+ * clear the TIF_RESTORE_SIGMASK flag.
+ */
+ if (test_thread_flag(TIF_RESTORE_SIGMASK))
+ clear_thread_flag(TIF_RESTORE_SIGMASK);
+ }
return;
}
if (restart_syscall &&
regs->u_regs[UREG_I0] = orig_i0;
regs->pc -= 4;
regs->npc -= 4;
+ pt_regs_clear_syscall(regs);
}
if (restart_syscall &&
regs->u_regs[UREG_I0] == ERESTART_RESTARTBLOCK) {
regs->u_regs[UREG_G1] = __NR_restart_syscall;
regs->pc -= 4;
regs->npc -= 4;
+ pt_regs_clear_syscall(regs);
}
/* if there's no signal to deliver, we just put the saved sigmask
return (void __user *) sp;
}
-static inline void
+static inline int
setup_rt_frame(struct k_sigaction *ka, struct pt_regs *regs,
int signo, sigset_t *oldset, siginfo_t *info)
{
}
/* 4. return to kernel instructions */
regs->u_regs[UREG_I7] = (unsigned long)ka->ka_restorer;
- return;
+ return 0;
sigill:
do_exit(SIGILL);
+ return -EINVAL;
+
sigsegv:
force_sigsegv(signo, current);
+ return -EFAULT;
}
-static inline void handle_signal(unsigned long signr, struct k_sigaction *ka,
- siginfo_t *info,
- sigset_t *oldset, struct pt_regs *regs)
+static inline int handle_signal(unsigned long signr, struct k_sigaction *ka,
+ siginfo_t *info,
+ sigset_t *oldset, struct pt_regs *regs)
{
- setup_rt_frame(ka, regs, signr, oldset,
- (ka->sa.sa_flags & SA_SIGINFO) ? info : NULL);
+ int err;
+
+ err = setup_rt_frame(ka, regs, signr, oldset,
+ (ka->sa.sa_flags & SA_SIGINFO) ? info : NULL);
+ if (err)
+ return err;
spin_lock_irq(¤t->sighand->siglock);
sigorsets(¤t->blocked,¤t->blocked,&ka->sa.sa_mask);
if (!(ka->sa.sa_flags & SA_NOMASK))
sigaddset(¤t->blocked,signr);
recalc_sigpending();
spin_unlock_irq(¤t->sighand->siglock);
+
+ tracehook_signal_handler(signr, info, ka, regs, 0);
+
+ return 0;
}
static inline void syscall_restart(unsigned long orig_i0, struct pt_regs *regs,
if (signr > 0) {
if (restart_syscall)
syscall_restart(orig_i0, regs, &ka.sa);
- handle_signal(signr, &ka, &info, oldset, regs);
-
- /* A signal was successfully delivered; the saved
- * sigmask will have been stored in the signal frame,
- * and will be restored by sigreturn, so we can simply
- * clear the TS_RESTORE_SIGMASK flag.
- */
- current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
-
- tracehook_signal_handler(signr, &info, &ka, regs, 0);
+ if (handle_signal(signr, &ka, &info, oldset, regs) == 0) {
+ /* A signal was successfully delivered; the saved
+ * sigmask will have been stored in the signal frame,
+ * and will be restored by sigreturn, so we can simply
+ * clear the TS_RESTORE_SIGMASK flag.
+ */
+ current_thread_info()->status &= ~TS_RESTORE_SIGMASK;
+ }
return;
}
if (restart_syscall &&
regs->u_regs[UREG_I0] = orig_i0;
regs->tpc -= 4;
regs->tnpc -= 4;
+ pt_regs_clear_syscall(regs);
}
if (restart_syscall &&
regs->u_regs[UREG_I0] == ERESTART_RESTARTBLOCK) {
regs->u_regs[UREG_G1] = __NR_restart_syscall;
regs->tpc -= 4;
regs->tnpc -= 4;
+ pt_regs_clear_syscall(regs);
}
/* If there's no signal to deliver, we just put the saved sigmask
{
siginfo_t info;
- lock_kernel();
#ifdef DEBUG_SPARC_BREAKPOINT
printk ("TRAP: Entering kernel PC=%x, nPC=%x\n", regs->pc, regs->npc);
#endif
#ifdef DEBUG_SPARC_BREAKPOINT
printk ("TRAP: Returning to space: PC=%x nPC=%x\n", regs->pc, regs->npc);
#endif
- unlock_kernel();
}
asmlinkage int
{
enum direction dir;
- lock_kernel();
if(!(current->thread.flags & SPARC_FLAG_UNALIGNED) ||
(((insn >> 30) & 3) != 3))
goto kill_user;
kill_user:
user_mna_trap_fault(regs, insn);
out:
- unlock_kernel();
+ ;
}
struct thread_info *tp = current_thread_info();
int window;
- lock_kernel();
flush_user_windows();
for(window = 0; window < tp->w_saved; window++) {
unsigned long sp = tp->rwbuf_stkptrs[window];
do_exit(SIGILL);
}
tp->w_saved = 0;
- unlock_kernel();
}
/** Is the PROC_STATUS SPR supported? */
#define CHIP_HAS_PROC_STATUS_SPR() 0
+/** Is the DSTREAM_PF SPR supported? */
+#define CHIP_HAS_DSTREAM_PF() 0
+
/** Log of the number of mshims we have. */
#define CHIP_LOG_NUM_MSHIMS() 2
/** Is the PROC_STATUS SPR supported? */
#define CHIP_HAS_PROC_STATUS_SPR() 1
+/** Is the DSTREAM_PF SPR supported? */
+#define CHIP_HAS_DSTREAM_PF() 0
+
/** Log of the number of mshims we have. */
#define CHIP_LOG_NUM_MSHIMS() 2
return (long)(int)(long __force)uptr;
}
-static inline void __user *compat_alloc_user_space(long len)
+static inline void __user *arch_compat_alloc_user_space(long len)
{
struct pt_regs *regs = task_pt_regs(current);
return (void __user *)regs->sp - len;
struct compat_sigaction;
struct compat_siginfo;
struct compat_sigaltstack;
-long compat_sys_execve(char __user *path, compat_uptr_t __user *argv,
- compat_uptr_t __user *envp);
+long compat_sys_execve(const char __user *path,
+ const compat_uptr_t __user *argv,
+ const compat_uptr_t __user *envp);
long compat_sys_rt_sigaction(int sig, struct compat_sigaction __user *act,
struct compat_sigaction __user *oact,
size_t sigsetsize);
#define iowrite32 writel
#define iowrite64 writeq
-static inline void *memcpy_fromio(void *dst, void *src, int len)
+static inline void memcpy_fromio(void *dst, const volatile void __iomem *src,
+ size_t len)
{
int x;
BUG_ON((unsigned long)src & 0x3);
for (x = 0; x < len; x += 4)
*(u32 *)(dst + x) = readl(src + x);
- return dst;
}
-static inline void *memcpy_toio(void *dst, void *src, int len)
+static inline void memcpy_toio(volatile void __iomem *dst, const void *src,
+ size_t len)
{
int x;
BUG_ON((unsigned long)dst & 0x3);
for (x = 0; x < len; x += 4)
writel(*(u32 *)(src + x), dst + x);
- return dst;
}
/*
/* Any other miscellaneous processor state bits */
unsigned long proc_status;
#endif
+#if !CHIP_HAS_FIXED_INTVEC_BASE()
+ /* Interrupt base for PL0 interrupts */
+ unsigned long interrupt_vector_base;
+#endif
+#if CHIP_HAS_TILE_RTF_HWM()
+ /* Tile cache retry fifo high-water mark */
+ unsigned long tile_rtf_hwm;
+#endif
+#if CHIP_HAS_DSTREAM_PF()
+ /* Data stream prefetch control */
+ unsigned long dstream_pf;
+#endif
#ifdef CONFIG_HARDWALL
/* Is this task tied to an activated hardwall? */
struct hardwall_info *hardwall;
/*
* This struct defines the way the registers are stored on the stack during a
- * system call/exception. It should be a multiple of 8 bytes to preserve
- * normal stack alignment rules.
- *
- * Must track <sys/ucontext.h> and <sys/procfs.h>
+ * system call or exception. "struct sigcontext" has the same shape.
*/
struct pt_regs {
/* Saved main processor registers; 56..63 are special. */
#endif /* __ASSEMBLY__ */
-/* Flag bits in pt_regs.flags */
-#define PT_FLAGS_DISABLE_IRQ 1 /* on return to kernel, disable irqs */
-#define PT_FLAGS_CALLER_SAVES 2 /* caller-save registers are valid */
-#define PT_FLAGS_RESTORE_REGS 4 /* restore callee-save regs on return */
-
#define PTRACE_GETREGS 12
#define PTRACE_SETREGS 13
#define PTRACE_GETFPREGS 14
#ifdef __KERNEL__
+/* Flag bits in pt_regs.flags */
+#define PT_FLAGS_DISABLE_IRQ 1 /* on return to kernel, disable irqs */
+#define PT_FLAGS_CALLER_SAVES 2 /* caller-save registers are valid */
+#define PT_FLAGS_RESTORE_REGS 4 /* restore callee-save regs on return */
+
#ifndef __ASSEMBLY__
#define instruction_pointer(regs) ((regs)->pc)
#ifndef _ASM_TILE_SIGCONTEXT_H
#define _ASM_TILE_SIGCONTEXT_H
-/* NOTE: we can't include <linux/ptrace.h> due to #include dependencies. */
-#include <asm/ptrace.h>
-
-/* Must track <sys/ucontext.h> */
+#include <arch/abi.h>
+/*
+ * struct sigcontext has the same shape as struct pt_regs,
+ * but is simplified since we know the fault is from userspace.
+ */
struct sigcontext {
- struct pt_regs regs;
+ uint_reg_t gregs[53]; /* General-purpose registers. */
+ uint_reg_t tp; /* Aliases gregs[TREG_TP]. */
+ uint_reg_t sp; /* Aliases gregs[TREG_SP]. */
+ uint_reg_t lr; /* Aliases gregs[TREG_LR]. */
+ uint_reg_t pc; /* Program counter. */
+ uint_reg_t ics; /* In Interrupt Critical Section? */
+ uint_reg_t faultnum; /* Fault number. */
+ uint_reg_t pad[5];
};
#endif /* _ASM_TILE_SIGCONTEXT_H */
#include <asm-generic/signal.h>
#if defined(__KERNEL__) && !defined(__ASSEMBLY__)
+struct pt_regs;
int restore_sigcontext(struct pt_regs *, struct sigcontext __user *, long *);
int setup_sigcontext(struct sigcontext __user *, struct pt_regs *);
void do_signal(struct pt_regs *regs);
long _sys_fork(struct pt_regs *regs);
long sys_vfork(void);
long _sys_vfork(struct pt_regs *regs);
-long sys_execve(char __user *filename, char __user * __user *argv,
- char __user * __user *envp);
-long _sys_execve(char __user *filename, char __user * __user *argv,
- char __user * __user *envp, struct pt_regs *regs);
+long sys_execve(const char __user *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp);
+long _sys_execve(const char __user *filename,
+ const char __user *const __user *argv,
+ const char __user *const __user *envp, struct pt_regs *regs);
/* kernel/signal.c */
long sys_sigaltstack(const stack_t __user *, stack_t __user *);
#endif
#ifdef CONFIG_COMPAT
-long compat_sys_execve(char __user *path, compat_uptr_t __user *argv,
- compat_uptr_t __user *envp);
-long _compat_sys_execve(char __user *path, compat_uptr_t __user *argv,
- compat_uptr_t __user *envp, struct pt_regs *regs);
+long compat_sys_execve(const char __user *path,
+ const compat_uptr_t __user *argv,
+ const compat_uptr_t __user *envp);
+long _compat_sys_execve(const char __user *path,
+ const compat_uptr_t __user *argv,
+ const compat_uptr_t __user *envp,
+ struct pt_regs *regs);
long compat_sys_sigaltstack(const struct compat_sigaltstack __user *uss_ptr,
struct compat_sigaltstack __user *uoss_ptr);
long _compat_sys_sigaltstack(const struct compat_sigaltstack __user *uss_ptr,
}
STD_ENDPROC(handle_ill)
- .pushsection .rodata, "a"
- .align 8
-bpt_code:
- bpt
- ENDPROC(bpt_code)
- .popsection
-
/* Various stub interrupt handlers and syscall handlers */
STD_ENTRY_LOCAL(_kernel_double_fault)
#if CHIP_HAS_PROC_STATUS_SPR()
t->proc_status = __insn_mfspr(SPR_PROC_STATUS);
#endif
+#if !CHIP_HAS_FIXED_INTVEC_BASE()
+ t->interrupt_vector_base = __insn_mfspr(SPR_INTERRUPT_VECTOR_BASE_0);
+#endif
+#if CHIP_HAS_TILE_RTF_HWM()
+ t->tile_rtf_hwm = __insn_mfspr(SPR_TILE_RTF_HWM);
+#endif
+#if CHIP_HAS_DSTREAM_PF()
+ t->dstream_pf = __insn_mfspr(SPR_DSTREAM_PF);
+#endif
}
static void restore_arch_state(const struct thread_struct *t)
#if CHIP_HAS_PROC_STATUS_SPR()
__insn_mtspr(SPR_PROC_STATUS, t->proc_status);
#endif
+#if !CHIP_HAS_FIXED_INTVEC_BASE()
+ __insn_mtspr(SPR_INTERRUPT_VECTOR_BASE_0, t->interrupt_vector_base);
+#endif
#if CHIP_HAS_TILE_RTF_HWM()
- /*
- * Clear this whenever we switch back to a process in case
- * the previous process was monkeying with it. Even if enabled
- * in CBOX_MSR1 via TILE_RTF_HWM_MIN, it's still just a
- * performance hint, so isn't worth a full save/restore.
- */
- __insn_mtspr(SPR_TILE_RTF_HWM, 0);
+ __insn_mtspr(SPR_TILE_RTF_HWM, t->tile_rtf_hwm);
+#endif
+#if CHIP_HAS_DSTREAM_PF()
+ __insn_mtspr(SPR_DSTREAM_PF, t->dstream_pf);
#endif
}
}
#ifdef CONFIG_COMPAT
-long _compat_sys_execve(char __user *path, compat_uptr_t __user *argv,
- compat_uptr_t __user *envp, struct pt_regs *regs)
+long _compat_sys_execve(const char __user *path,
+ const compat_uptr_t __user *argv,
+ const compat_uptr_t __user *envp, struct pt_regs *regs)
{
long error;
char *filename;
regs->regs[51], regs->regs[52], regs->tp);
pr_err(" sp : "REGFMT" lr : "REGFMT"\n", regs->sp, regs->lr);
#else
- for (i = 0; i < 52; i += 3)
+ for (i = 0; i < 52; i += 4)
pr_err(" r%-2d: "REGFMT" r%-2d: "REGFMT
" r%-2d: "REGFMT" r%-2d: "REGFMT"\n",
i, regs->regs[i], i+1, regs->regs[i+1],
/* Always make any pending restarted system calls return -EINTR */
current_thread_info()->restart_block.fn = do_no_restart_syscall;
+ /*
+ * Enforce that sigcontext is like pt_regs, and doesn't mess
+ * up our stack alignment rules.
+ */
+ BUILD_BUG_ON(sizeof(struct sigcontext) != sizeof(struct pt_regs));
+ BUILD_BUG_ON(sizeof(struct sigcontext) % 8 != 0);
+
for (i = 0; i < sizeof(struct pt_regs)/sizeof(long); ++i)
- err |= __get_user(((long *)regs)[i],
- &((long __user *)(&sc->regs))[i]);
+ err |= __get_user(regs->regs[i], &sc->gregs[i]);
regs->faultnum = INT_SWINT_1_SIGRETURN;
- err |= __get_user(*pr0, &sc->regs.regs[0]);
+ err |= __get_user(*pr0, &sc->gregs[0]);
return err;
}
int i, err = 0;
for (i = 0; i < sizeof(struct pt_regs)/sizeof(long); ++i)
- err |= __put_user(((long *)regs)[i],
- &((long __user *)(&sc->regs))[i]);
+ err |= __put_user(regs->regs[i], &sc->gregs[i]);
return err;
}
* Set up registers for signal handler.
* Registers that we don't modify keep the value they had from
* user-space at the time we took the signal.
+ * We always pass siginfo and mcontext, regardless of SA_SIGINFO,
+ * since some things rely on this (e.g. glibc's debug/segfault.c).
*/
regs->pc = (unsigned long) ka->sa.sa_handler;
regs->ex1 = PL_ICS_EX1(USER_PL, 1); /* set crit sec in handler */
regs->sp = (unsigned long) frame;
regs->lr = restorer;
regs->regs[0] = (unsigned long) usig;
-
- if (ka->sa.sa_flags & SA_SIGINFO) {
- /* Need extra arguments, so mark to restore caller-saves. */
- regs->regs[1] = (unsigned long) &frame->info;
- regs->regs[2] = (unsigned long) &frame->uc;
- regs->flags |= PT_FLAGS_CALLER_SAVES;
- }
+ regs->regs[1] = (unsigned long) &frame->info;
+ regs->regs[2] = (unsigned long) &frame->uc;
+ regs->flags |= PT_FLAGS_CALLER_SAVES;
/*
* Notify any tracer that was single-stepping it.
pr_err(" <received signal %d>\n",
frame->info.si_signo);
}
- return &frame->uc.uc_mcontext.regs;
+ return (struct pt_regs *)&frame->uc.uc_mcontext;
}
return NULL;
}
" This is used to specify the host mixer device to the hostaudio driver.\n"\
" The default is \"" HOSTAUDIO_DEV_MIXER "\".\n\n"
+module_param(dsp, charp, 0644);
+MODULE_PARM_DESC(dsp, DSP_HELP);
+module_param(mixer, charp, 0644);
+MODULE_PARM_DESC(mixer, MIXER_HELP);
+
#ifndef MODULE
static int set_dsp(char *name, int *add)
{
}
__uml_setup("mixer=", set_mixer, "mixer=<mixer device>\n" MIXER_HELP);
-
-#else /*MODULE*/
-
-module_param(dsp, charp, 0644);
-MODULE_PARM_DESC(dsp, DSP_HELP);
-
-module_param(mixer, charp, 0644);
-MODULE_PARM_DESC(mixer, MIXER_HELP);
-
#endif
/* /dev/dsp file operations */
netif_wake_queue(dev);
}
-static int uml_net_set_mac(struct net_device *dev, void *addr)
-{
- struct uml_net_private *lp = netdev_priv(dev);
- struct sockaddr *hwaddr = addr;
-
- spin_lock_irq(&lp->lock);
- eth_mac_addr(dev, hwaddr->sa_data);
- spin_unlock_irq(&lp->lock);
-
- return 0;
-}
-
static int uml_net_change_mtu(struct net_device *dev, int new_mtu)
{
dev->mtu = new_mtu;
.ndo_start_xmit = uml_net_start_xmit,
.ndo_set_multicast_list = uml_net_set_multicast_list,
.ndo_tx_timeout = uml_net_tx_timeout,
- .ndo_set_mac_address = uml_net_set_mac,
+ .ndo_set_mac_address = eth_mac_addr,
.ndo_change_mtu = uml_net_change_mtu,
.ndo_validate_addr = eth_validate_addr,
};
((*transport->user->init)(&lp->user, dev) != 0))
goto out_unregister;
- eth_mac_addr(dev, device->mac);
+ /* don't use eth_mac_addr, it will not work here */
+ memcpy(dev->dev_addr, device->mac, ETH_ALEN);
dev->mtu = transport->user->mtu;
dev->netdev_ops = ¨_netdev_ops;
dev->ethtool_ops = ¨_net_ethtool_ops;
struct scatterlist sg[MAX_SG];
struct request *request;
int start_sg, end_sg;
+ sector_t rq_pos;
};
#define DEFAULT_COW { \
.request = NULL, \
.start_sg = 0, \
.end_sg = 0, \
+ .rq_pos = 0, \
}
/* Protected by ubd_lock */
{
struct io_thread_req *io_req;
struct request *req;
- sector_t sector;
int n;
while(1){
return;
dev->request = req;
+ dev->rq_pos = blk_rq_pos(req);
dev->start_sg = 0;
dev->end_sg = blk_rq_map_sg(q, req, dev->sg);
}
req = dev->request;
- sector = blk_rq_pos(req);
while(dev->start_sg < dev->end_sg){
struct scatterlist *sg = &dev->sg[dev->start_sg];
return;
}
prepare_request(req, io_req,
- (unsigned long long)sector << 9,
+ (unsigned long long)dev->rq_pos << 9,
sg->offset, sg->length, sg_page(sg));
- sector += sg->length >> 9;
n = os_write_file(thread_fd, &io_req,
sizeof(struct io_thread_req *));
if(n != sizeof(struct io_thread_req *)){
return;
}
+ dev->rq_pos += sg->length >> 9;
dev->start_sg++;
}
dev->end_sg = 0;
return error;
}
-long um_execve(const char *file, char __user *__user *argv, char __user *__user *env)
+long um_execve(const char *file, const char __user *const __user *argv, const char __user *const __user *env)
{
long err;
return err;
}
-long sys_execve(const char __user *file, char __user *__user *argv,
- char __user *__user *env)
+long sys_execve(const char __user *file, const char __user *const __user *argv,
+ const char __user *const __user *env)
{
long error;
char *filename;
-extern long um_execve(const char *file, char __user *__user *argv, char __user *__user *env);
+extern long um_execve(const char *file, const char __user *const __user *argv, const char __user *const __user *env);
fs = get_fs();
set_fs(KERNEL_DS);
- ret = um_execve(filename, (char __user *__user *)argv,
- (char __user *__user *) envp);
+ ret = um_execve(filename, (const char __user *const __user *)argv,
+ (const char __user *const __user *) envp);
set_fs(fs);
return ret;
select HAVE_IDE
select HAVE_OPROFILE
select HAVE_PERF_EVENTS if (!M386 && !M486)
+ select HAVE_IRQ_WORK
select HAVE_IOREMAP_PROT
select HAVE_KPROBES
select ARCH_WANT_OPTIONAL_GPIOLIB
select HAVE_KRETPROBES
select HAVE_OPTPROBES
select HAVE_FTRACE_MCOUNT_RECORD
+ select HAVE_C_RECORDMCOUNT
select HAVE_DYNAMIC_FTRACE
select HAVE_FUNCTION_TRACER
select HAVE_FUNCTION_GRAPH_TRACER
select ANON_INODES
select HAVE_ARCH_KMEMCHECK
select HAVE_USER_RETURN_NOTIFIER
+ select HAVE_ARCH_JUMP_LABEL
+ select HAVE_TEXT_POKE_SMP
config INSTRUCTION_DECODER
def_bool (KPROBES || PERF_EVENTS)
source "arch/x86/xen/Kconfig"
-config VMI
- bool "VMI Guest support (DEPRECATED)"
- select PARAVIRT
- depends on X86_32
- ---help---
- VMI provides a paravirtualized interface to the VMware ESX server
- (it could be used by other hypervisors in theory too, but is not
- at the moment), by linking the kernel to a GPL-ed ROM module
- provided by the hypervisor.
-
- As of September 2009, VMware has started a phased retirement
- of this feature from VMware's products. Please see
- feature-removal-schedule.txt for details. If you are
- planning to enable this option, please note that you cannot
- live migrate a VMI enabled VM to a future VMware product,
- which doesn't support VMI. So if you expect your kernel to
- seamlessly migrate to newer VMware products, keep this
- disabled.
-
config KVM_CLOCK
bool "KVM paravirtualized clock"
select PARAVIRT
bool "GART IOMMU support" if EMBEDDED
default y
select SWIOTLB
- depends on X86_64 && PCI && K8_NB
+ depends on X86_64 && PCI && AMD_NB
---help---
Support for full DMA access of devices with 32bit memory access only
on systems with more than 3GB. This is usually needed for USB,
making when dealing with multi-core CPU chips at a cost of slightly
increased overhead in some places. If unsure say N here.
+config IRQ_TIME_ACCOUNTING
+ bool "Fine granularity task level IRQ time accounting"
+ default n
+ ---help---
+ Select this option to enable fine granularity task irq time
+ accounting. This is done by reading a timestamp on each
+ transitions between softirq and hardirq state, so there can be a
+ small performance impact.
+
+ If in doubt, say N here.
+
source "kernel/Kconfig.preempt"
config X86_UP_APIC
config ARCH_PHYS_ADDR_T_64BIT
def_bool X86_64 || X86_PAE
+config ARCH_DMA_ADDR_T_64BIT
+ def_bool X86_64 || HIGHMEM64G
+
config DIRECT_GBPAGES
bool "Enable 1GB pages for kernel pagetables" if EMBEDDED
default y
Set whether the default state of memory_corruption_check is
on or off.
-config X86_RESERVE_LOW_64K
- bool "Reserve low 64K of RAM on AMI/Phoenix BIOSen"
- default y
+config X86_RESERVE_LOW
+ int "Amount of low memory, in kilobytes, to reserve for the BIOS"
+ default 64
+ range 4 640
---help---
- Reserve the first 64K of physical RAM on BIOSes that are known
- to potentially corrupt that memory range. A numbers of BIOSes are
- known to utilize this area during suspend/resume, so it must not
- be used by the kernel.
+ Specify the amount of low memory to reserve for the BIOS.
+
+ The first page contains BIOS data structures that the kernel
+ must not use, so that page must always be reserved.
+
+ By default we reserve the first 64K of physical RAM, as a
+ number of BIOSes are known to corrupt that memory range
+ during events such as suspend/resume or monitor cable
+ insertion, so it must not be used by the kernel.
- Set this to N if you are absolutely sure that you trust the BIOS
- to get all its memory reservations and usages right.
+ You can set this to 4 if you are absolutely sure that you
+ trust the BIOS to get all its memory reservations and usages
+ right. If you know your BIOS have problems beyond the
+ default 64K area, you can set this to 640 to avoid using the
+ entire low memory range.
- If you have doubts about the BIOS (e.g. suspend/resume does not
- work or there's kernel crashes after certain hardware hotplug
- events) and it's not AMI or Phoenix, then you might want to enable
- X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check typical
- corruption patterns.
+ If you have doubts about the BIOS (e.g. suspend/resume does
+ not work or there's kernel crashes after certain hardware
+ hotplug events) then you might want to enable
+ X86_CHECK_BIOS_CORRUPTION=y to allow the kernel to check
+ typical corruption patterns.
- Say Y if unsure.
+ Leave this to the default value of 64 if you are unsure.
config MATH_EMULATION
bool
bool "Direct"
config PCI_GOOLPC
- bool "OLPC"
+ bool "OLPC XO-1"
depends on OLPC
config PCI_GOANY
config OLPC
bool "One Laptop Per Child support"
select GPIOLIB
+ select OLPC_OPENFIRMWARE
---help---
Add support for detecting the unique features of the OLPC
XO hardware.
+config OLPC_XO1
+ tristate "OLPC XO-1 support"
+ depends on OLPC && PCI
+ ---help---
+ Add support for non-essential features of the OLPC XO-1 laptop.
+
config OLPC_OPENFIRMWARE
bool "Support for OLPC's Open Firmware"
depends on !X86_64 && !X86_PAE
- default y if OLPC
+ default n
help
This option adds support for the implementation of Open Firmware
that is used on the OLPC XO-1 Children's Machine.
endif # X86_32
-config K8_NB
+config AMD_NB
def_bool y
depends on CPU_SUP_AMD && PCI
def_bool y
depends on X86_32
+config HAVE_TEXT_POKE_SMP
+ bool
+ select STOP_MACHINE if SMP
+
source "net/Kconfig"
source "drivers/Kconfig"
with klogd/syslogd or the X server. You should normally N here,
unless you want to debug such a crash.
+config EARLY_PRINTK_MRST
+ bool "Early printk for MRST platform support"
+ depends on EARLY_PRINTK && X86_MRST
+
config EARLY_PRINTK_DBGP
bool "Early printk via EHCI debug port"
depends on EARLY_PRINTK && PCI
ifdef CONFIG_CC_STACKPROTECTOR
cc_has_sp := $(srctree)/scripts/gcc-x86_$(BITS)-has-stack-protector.sh
- ifeq ($(shell $(CONFIG_SHELL) $(cc_has_sp) $(CC) $(biarch)),y)
+ ifeq ($(shell $(CONFIG_SHELL) $(cc_has_sp) $(CC) $(KBUILD_CPPFLAGS) $(biarch)),y)
stackp-y := -fstack-protector
KBUILD_CFLAGS += $(stackp-y)
else
# is .cfi_signal_frame supported too?
cfi-sigframe := $(call as-instr,.cfi_startproc\n.cfi_signal_frame\n.cfi_endproc,-DCONFIG_AS_CFI_SIGNAL_FRAME=1)
cfi-sections := $(call as-instr,.cfi_sections .debug_frame,-DCONFIG_AS_CFI_SECTIONS=1)
-KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections)
-KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections)
+
+# does binutils support specific instructions?
+asinstr := $(call as-instr,fxsaveq (%rax),-DCONFIG_AS_FXSAVEQ=1)
+
+KBUILD_AFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr)
+KBUILD_CFLAGS += $(cfi) $(cfi-sigframe) $(cfi-sections) $(asinstr)
LDFLAGS := -m elf_$(UTS_MACHINE)
if (arg[pos] == ',')
pos++;
- if (!strncmp(arg, "ttyS", 4)) {
+ /*
+ * make sure we have
+ * "serial,0x3f8,115200"
+ * "serial,ttyS0,115200"
+ * "ttyS0,115200"
+ */
+ if (pos == 7 && !strncmp(arg + pos, "0x", 2)) {
+ port = simple_strtoull(arg + pos, &e, 16);
+ if (port == 0 || arg + pos == e)
+ port = DEFAULT_SERIAL_PORT;
+ else
+ pos = e - arg;
+ } else if (!strncmp(arg + pos, "ttyS", 4)) {
static const int bases[] = { 0x3f8, 0x2f8 };
int idx = 0;
#include <asm/ia32.h>
#undef WARN_OLD
-#undef CORE_DUMP /* probably broken */
+#undef CORE_DUMP /* definitely broken */
static int load_aout_binary(struct linux_binprm *, struct pt_regs *regs);
static int load_aout_library(struct file *);
* macros to write out all the necessary info.
*/
-static int dump_write(struct file *file, const void *addr, int nr)
-{
- return file->f_op->write(file, addr, nr, &file->f_pos) == nr;
-}
+#include <linux/coredump.h>
#define DUMP_WRITE(addr, nr) \
if (!dump_write(file, (void *)(addr), (nr))) \
goto end_coredump;
-#define DUMP_SEEK(offset) \
- if (file->f_op->llseek) { \
- if (file->f_op->llseek(file, (offset), 0) != (offset)) \
- goto end_coredump; \
- } else \
- file->f_pos = (offset)
+#define DUMP_SEEK(offset) \
+ if (!dump_seek(file, offset)) \
+ goto end_coredump;
#define START_DATA() (u.u_tsize << PAGE_SHIFT)
#define START_STACK(u) (u.start_stack)
dump_size = dump.u_ssize << PAGE_SHIFT;
DUMP_WRITE(dump_start, dump_size);
}
- /*
- * Finally dump the task struct. Not be used by gdb, but
- * could be useful
- */
- set_fs(KERNEL_DS);
- DUMP_WRITE(current, sizeof(*current));
end_coredump:
set_fs(fs);
return has_dumped;
/*
* Reload arg registers from stack in case ptrace changed them.
* We don't reload %eax because syscall_trace_enter() returned
- * the value it wants us to use in the table lookup.
+ * the %rax value we should see. Instead, we just truncate that
+ * value to 32 bits again as we did on entry from user mode.
+ * If it's a new value set by user_regset during entry tracing,
+ * this matches the normal truncation of the user-mode value.
+ * If it's -1 to make us punt the syscall, then (u32)-1 is still
+ * an appropriately invalid value.
*/
.macro LOAD_ARGS32 offset, _r9=0
.if \_r9
movl \offset+48(%rsp),%edx
movl \offset+56(%rsp),%esi
movl \offset+64(%rsp),%edi
+ movl %eax,%eax /* zero extension */
.endm
.macro CFI_STARTPROC32 simple
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%r10)
CFI_REMEMBER_STATE
jnz sysenter_tracesys
- cmpl $(IA32_NR_syscalls-1),%eax
+ cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
sysenter_do_call:
IA32_ARG_FIXUP
movl $AUDIT_ARCH_I386,%edi /* 1st arg: audit arch */
call audit_syscall_entry
movl RAX-ARGOFFSET(%rsp),%eax /* reload syscall number */
- cmpl $(IA32_NR_syscalls-1),%eax
+ cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
movl %ebx,%edi /* reload 1st syscall arg */
movl RCX-ARGOFFSET(%rsp),%esi /* reload 2nd syscall arg */
call syscall_trace_enter
LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */
RESTORE_REST
- cmpl $(IA32_NR_syscalls-1),%eax
+ cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* sysenter_tracesys has set RAX(%rsp) */
jmp sysenter_do_call
CFI_ENDPROC
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%r10)
CFI_REMEMBER_STATE
jnz cstar_tracesys
- cmpl $IA32_NR_syscalls-1,%eax
+ cmpq $IA32_NR_syscalls-1,%rax
ja ia32_badsys
cstar_do_call:
IA32_ARG_FIXUP 1
LOAD_ARGS32 ARGOFFSET, 1 /* reload args from stack in case ptrace changed it */
RESTORE_REST
xchgl %ebp,%r9d
- cmpl $(IA32_NR_syscalls-1),%eax
+ cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* cstar_tracesys has set RAX(%rsp) */
jmp cstar_do_call
END(ia32_cstar_target)
orl $TS_COMPAT,TI_status(%r10)
testl $_TIF_WORK_SYSCALL_ENTRY,TI_flags(%r10)
jnz ia32_tracesys
- cmpl $(IA32_NR_syscalls-1),%eax
+ cmpq $(IA32_NR_syscalls-1),%rax
ja ia32_badsys
ia32_do_call:
IA32_ARG_FIXUP
call syscall_trace_enter
LOAD_ARGS32 ARGOFFSET /* reload args from stack in case ptrace changed it */
RESTORE_REST
- cmpl $(IA32_NR_syscalls-1),%eax
+ cmpq $(IA32_NR_syscalls-1),%rax
ja int_ret_from_sys_call /* ia32_tracesys has set RAX(%rsp) */
jmp ia32_do_call
END(ia32_syscall)
#include <linux/types.h>
#include <linux/stddef.h>
#include <linux/stringify.h>
+#include <linux/jump_label.h>
#include <asm/asm.h>
/*
#define __parainstructions_end NULL
#endif
+extern void *text_poke_early(void *addr, const void *opcode, size_t len);
+
/*
* Clear and restore the kernel write-protection flag on the local CPU.
* Allows the kernel to edit read-only pages.
extern void *text_poke(void *addr, const void *opcode, size_t len);
extern void *text_poke_smp(void *addr, const void *opcode, size_t len);
+#if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
+#define IDEAL_NOP_SIZE_5 5
+extern unsigned char ideal_nop5[IDEAL_NOP_SIZE_5];
+extern void arch_init_ideal_nop5(void);
+#else
+static inline void arch_init_ideal_nop5(void) {}
+#endif
+
#endif /* _ASM_X86_ALTERNATIVE_H */
/*
- * Copyright (C) 2007-2009 Advanced Micro Devices, Inc.
+ * Copyright (C) 2007-2010 Advanced Micro Devices, Inc.
* Author: Joerg Roedel <joerg.roedel@amd.com>
* Leo Duran <leo.duran@amd.com>
*
/*
- * Copyright (C) 2009 Advanced Micro Devices, Inc.
+ * Copyright (C) 2009-2010 Advanced Micro Devices, Inc.
* Author: Joerg Roedel <joerg.roedel@amd.com>
*
* This program is free software; you can redistribute it and/or modify it
#endif /* !CONFIG_AMD_IOMMU_STATS */
+static inline bool is_rd890_iommu(struct pci_dev *pdev)
+{
+ return (pdev->vendor == PCI_VENDOR_ID_ATI) &&
+ (pdev->device == PCI_DEVICE_ID_RD890_IOMMU);
+}
+
#endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
/*
- * Copyright (C) 2007-2009 Advanced Micro Devices, Inc.
+ * Copyright (C) 2007-2010 Advanced Micro Devices, Inc.
* Author: Joerg Roedel <joerg.roedel@amd.com>
* Leo Duran <leo.duran@amd.com>
*
/* capabilities of that IOMMU read from ACPI */
u32 cap;
+ /* flags read from acpi table */
+ u8 acpi_flags;
+
/*
* Capability pointer. There could be more than one IOMMU per PCI
* device function if there are more than one AMD IOMMU capability
/* default dma_ops domain for that IOMMU */
struct dma_ops_domain *default_dom;
+
+ /*
+ * We can't rely on the BIOS to restore all values on reinit, so we
+ * need to stash them
+ */
+
+ /* The iommu BAR */
+ u32 stored_addr_lo;
+ u32 stored_addr_hi;
+
+ /*
+ * Each iommu has 6 l1s, each of which is documented as having 0x12
+ * registers
+ */
+ u32 stored_l1[6][0x12];
+
+ /* The l2 indirect registers */
+ u32 stored_l2[0x83];
};
/*
-#ifndef _ASM_X86_K8_H
-#define _ASM_X86_K8_H
+#ifndef _ASM_X86_AMD_NB_H
+#define _ASM_X86_AMD_NB_H
#include <linux/pci.h>
struct bootnode;
extern int early_is_k8_nb(u32 value);
-extern struct pci_dev **k8_northbridges;
-extern int num_k8_northbridges;
extern int cache_k8_northbridges(void);
extern void k8_flush_garts(void);
extern int k8_get_nodes(struct bootnode *nodes);
extern int k8_numa_init(unsigned long start_pfn, unsigned long end_pfn);
extern int k8_scan_nodes(void);
-#ifdef CONFIG_K8_NB
-extern int num_k8_northbridges;
+struct k8_northbridge_info {
+ u16 num;
+ u8 gart_supported;
+ struct pci_dev **nb_misc;
+};
+extern struct k8_northbridge_info k8_northbridges;
+
+#ifdef CONFIG_AMD_NB
static inline struct pci_dev *node_to_k8_nb_misc(int node)
{
- return (node < num_k8_northbridges) ? k8_northbridges[node] : NULL;
+ return (node < k8_northbridges.num) ? k8_northbridges.nb_misc[node] : NULL;
}
#else
-#define num_k8_northbridges 0
static inline struct pci_dev *node_to_k8_nb_misc(int node)
{
#endif
-#endif /* _ASM_X86_K8_H */
+#endif /* _ASM_X86_AMD_NB_H */
extern unsigned long apbt_quick_calibrate(void);
extern int arch_setup_apbt_irqs(int irq, int trigger, int mask, int cpu);
extern void apbt_setup_secondary_clock(void);
-extern unsigned int boot_cpu_id;
extern struct sfi_timer_table_entry *sfi_get_mtmr(int hint);
extern void sfi_free_mtmr(struct sfi_timer_table_entry *mtmr);
static __always_inline int constant_test_bit(unsigned int nr, const volatile unsigned long *addr)
{
return ((1UL << (nr % BITS_PER_LONG)) &
- (((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;
+ (addr[nr / BITS_PER_LONG])) != 0;
}
static inline int variable_test_bit(int nr, volatile const unsigned long *addr)
return (u32)(unsigned long)uptr;
}
-static inline void __user *compat_alloc_user_space(long len)
+static inline void __user *arch_compat_alloc_user_space(long len)
{
struct pt_regs *regs = task_pt_regs(current);
return (void __user *)regs->sp - len;
DECLARE_PER_CPU(int, cpu_state);
-extern unsigned int boot_cpu_id;
#endif /* _ASM_X86_CPU_H */
#define X86_FEATURE_3DNOWPREFETCH (6*32+ 8) /* 3DNow prefetch instructions */
#define X86_FEATURE_OSVW (6*32+ 9) /* OS Visible Workaround */
#define X86_FEATURE_IBS (6*32+10) /* Instruction Based Sampling */
-#define X86_FEATURE_SSE5 (6*32+11) /* SSE-5 */
+#define X86_FEATURE_XOP (6*32+11) /* extended AVX instructions */
#define X86_FEATURE_SKINIT (6*32+12) /* SKINIT/STGI instructions */
#define X86_FEATURE_WDT (6*32+13) /* Watchdog timer */
+#define X86_FEATURE_LWP (6*32+15) /* Light Weight Profiling */
+#define X86_FEATURE_FMA4 (6*32+16) /* 4 operands MAC instructions */
#define X86_FEATURE_NODEID_MSR (6*32+19) /* NodeId MSR */
+#define X86_FEATURE_TBM (6*32+21) /* trailing bit manipulations */
+#define X86_FEATURE_TOPOEXT (6*32+22) /* topology extensions CPUID leafs */
/*
* Auxiliary flags: Linux defined - For features scattered in various
#define X86_FEATURE_XSAVEOPT (7*32+ 4) /* Optimized Xsave */
#define X86_FEATURE_PLN (7*32+ 5) /* Intel Power Limit Notification */
#define X86_FEATURE_PTS (7*32+ 6) /* Intel Package Thermal Status */
+#define X86_FEATURE_DTS (7*32+ 7) /* Digital Thermal Sensor */
/* Virtualization flags: Linux defined, word 8 */
#define X86_FEATURE_TPR_SHADOW (8*32+ 0) /* Intel TPR Shadow */
#define X86_FEATURE_LBRV (8*32+ 6) /* AMD LBR Virtualization support */
#define X86_FEATURE_SVML (8*32+ 7) /* "svm_lock" AMD SVM locking MSR */
#define X86_FEATURE_NRIPS (8*32+ 8) /* "nrip_save" AMD SVM next_rip save */
+#define X86_FEATURE_TSCRATEMSR (8*32+ 9) /* "tsc_scale" AMD TSC scaling support */
+#define X86_FEATURE_VMCBCLEAN (8*32+10) /* "vmcb_clean" AMD VMCB clean bits support */
+#define X86_FEATURE_FLUSHBYASID (8*32+11) /* AMD flush-by-ASID support */
+#define X86_FEATURE_DECODEASSISTS (8*32+12) /* AMD Decode Assists support */
+#define X86_FEATURE_PAUSEFILTER (8*32+13) /* AMD filtered pause intercept */
+#define X86_FEATURE_PFTHRESHOLD (8*32+14) /* AMD pause filter threshold */
+
/* Intel-defined CPU features, CPUID level 0x00000007:0 (ebx), word 9 */
#define X86_FEATURE_FSGSBASE (9*32+ 0) /* {RD/WR}{FS/GS}BASE instructions*/
#endif /* CONFIG_X86_64 */
+#if __GNUC__ >= 4
/*
* Static testing of CPU features. Used the same as boot_cpu_has().
* These are only valid after alternatives have run, but will statically
*/
static __always_inline __pure bool __static_cpu_has(u16 bit)
{
-#if __GNUC__ > 4 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 5)
+#if __GNUC__ > 4 || __GNUC_MINOR__ >= 5
asm goto("1: jmp %l[t_no]\n"
"2:\n"
".section .altinstructions,\"a\"\n"
#endif
}
-#if __GNUC__ >= 4
#define static_cpu_has(bit) \
( \
__builtin_constant_p(boot_cpu_has(bit)) ? \
CFI_ADJUST_CFA_OFFSET -8
.endm
+ .macro pushfq_cfi
+ pushfq
+ CFI_ADJUST_CFA_OFFSET 8
+ .endm
+
+ .macro popfq_cfi
+ popfq
+ CFI_ADJUST_CFA_OFFSET -8
+ .endm
+
.macro movq_cfi reg offset=0
movq %\reg, \offset(%rsp)
CFI_REL_OFFSET \reg, \offset
CFI_ADJUST_CFA_OFFSET -4
.endm
+ .macro pushfl_cfi
+ pushfl
+ CFI_ADJUST_CFA_OFFSET 4
+ .endm
+
+ .macro popfl_cfi
+ popfl
+ CFI_ADJUST_CFA_OFFSET -4
+ .endm
+
.macro movl_cfi reg offset=0
movl %\reg, \offset(%esp)
CFI_REL_OFFSET \reg, \offset
BUILD_INTERRUPT(error_interrupt,ERROR_APIC_VECTOR)
BUILD_INTERRUPT(spurious_interrupt,SPURIOUS_APIC_VECTOR)
-#ifdef CONFIG_PERF_EVENTS
-BUILD_INTERRUPT(perf_pending_interrupt, LOCAL_PENDING_VECTOR)
+#ifdef CONFIG_IRQ_WORK
+BUILD_INTERRUPT(irq_work_interrupt, IRQ_WORK_VECTOR)
#endif
#ifdef CONFIG_X86_THERMAL_VECTOR
BUG_ON(vaddr >= FIXADDR_TOP || vaddr < FIXADDR_START);
return __virt_to_fix(vaddr);
}
+
+/* Return an pointer with offset calculated */
+static inline unsigned long __set_fixmap_offset(enum fixed_addresses idx,
+ phys_addr_t phys, pgprot_t flags)
+{
+ __set_fixmap(idx, phys, flags);
+ return fix_to_virt(idx) + (phys & (PAGE_SIZE - 1));
+}
+
+#define set_fixmap_offset(idx, phys) \
+ __set_fixmap_offset(idx, phys, PAGE_KERNEL)
+
+#define set_fixmap_offset_nocache(idx, phys) \
+ __set_fixmap_offset(idx, phys, PAGE_KERNEL_NOCACHE)
+
#endif /* !__ASSEMBLY__ */
#endif /* _ASM_X86_FIXMAP_H */
#define GARTEN (1<<0)
#define DISGARTCPU (1<<4)
#define DISGARTIO (1<<5)
+#define DISTLBWALKPRB (1<<6)
/* GART cache control register bits. */
#define INVGART (1<<0)
#define AMD64_GARTAPERTUREBASE 0x94
#define AMD64_GARTTABLEBASE 0x98
#define AMD64_GARTCACHECTL 0x9c
-#define AMD64_GARTEN (1<<0)
#ifdef CONFIG_GART_IOMMU
extern int gart_iommu_aperture;
extern int agp_amd64_init(void);
+static inline void gart_set_size_and_enable(struct pci_dev *dev, u32 order)
+{
+ u32 ctl;
+
+ /*
+ * Don't enable translation but enable GART IO and CPU accesses.
+ * Also, set DISTLBWALKPRB since GART tables memory is UC.
+ */
+ ctl = DISTLBWALKPRB | order << 1;
+
+ pci_write_config_dword(dev, AMD64_GARTAPERTURECTL, ctl);
+}
+
static inline void enable_gart_translation(struct pci_dev *dev, u64 addr)
{
u32 tmp, ctl;
#endif
unsigned int x86_platform_ipis; /* arch dependent */
unsigned int apic_perf_irqs;
- unsigned int apic_pending_irqs;
+ unsigned int apic_irq_work_irqs;
#ifdef CONFIG_SMP
unsigned int irq_resched_count;
unsigned int irq_call_count;
extern u8 hpet_blockid;
extern int hpet_force_user;
extern u8 hpet_msi_disable;
-extern u8 hpet_readback_cmp;
extern int is_hpet_enabled(void);
extern int hpet_enable(void);
extern void hpet_disable(void);
#include <linux/list.h>
/* Available HW breakpoint length encodings */
-#define X86_BREAKPOINT_LEN_X 0x00
+#define X86_BREAKPOINT_LEN_X 0x40
#define X86_BREAKPOINT_LEN_1 0x40
#define X86_BREAKPOINT_LEN_2 0x44
#define X86_BREAKPOINT_LEN_4 0x4c
extern void apic_timer_interrupt(void);
extern void x86_platform_ipi(void);
extern void error_interrupt(void);
-extern void perf_pending_interrupt(void);
+extern void irq_work_interrupt(void);
extern void spurious_interrupt(void);
extern void thermal_interrupt(void);
extern int restore_i387_xstate_ia32(void __user *buf);
#endif
+#ifdef CONFIG_MATH_EMULATION
+extern void finit_soft_fpu(struct i387_soft_struct *soft);
+#else
+static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
+#endif
+
#define X87_FSW_ES (1 << 7) /* Exception Summary */
static __always_inline __pure bool use_xsaveopt(void)
return static_cpu_has(X86_FEATURE_XSAVE);
}
+static __always_inline __pure bool use_fxsr(void)
+{
+ return static_cpu_has(X86_FEATURE_FXSR);
+}
+
extern void __sanitize_i387_state(struct task_struct *);
static inline void sanitize_i387_state(struct task_struct *tsk)
}
#ifdef CONFIG_X86_64
-
-/* Ignore delayed exceptions from user space */
-static inline void tolerant_fwait(void)
-{
- asm volatile("1: fwait\n"
- "2:\n"
- _ASM_EXTABLE(1b, 2b));
-}
-
static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
{
int err;
+ /* See comment in fxsave() below. */
asm volatile("1: rex64/fxrstor (%[fx])\n\t"
"2:\n"
".section .fixup,\"ax\"\n"
".previous\n"
_ASM_EXTABLE(1b, 3b)
: [err] "=r" (err)
-#if 0 /* See comment in fxsave() below. */
- : [fx] "r" (fx), "m" (*fx), "0" (0));
-#else
- : [fx] "cdaSDb" (fx), "m" (*fx), "0" (0));
-#endif
+ : [fx] "R" (fx), "m" (*fx), "0" (0));
return err;
}
-/* AMD CPUs don't save/restore FDP/FIP/FOP unless an exception
- is pending. Clear the x87 state here by setting it to fixed
- values. The kernel data segment can be sometimes 0 and sometimes
- new user value. Both should be ok.
- Use the PDA as safe address because it should be already in L1. */
-static inline void fpu_clear(struct fpu *fpu)
-{
- struct xsave_struct *xstate = &fpu->state->xsave;
- struct i387_fxsave_struct *fx = &fpu->state->fxsave;
-
- /*
- * xsave header may indicate the init state of the FP.
- */
- if (use_xsave() &&
- !(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
- return;
-
- if (unlikely(fx->swd & X87_FSW_ES))
- asm volatile("fnclex");
- alternative_input(ASM_NOP8 ASM_NOP2,
- " emms\n" /* clear stack tags */
- " fildl %%gs:0", /* load to clear state */
- X86_FEATURE_FXSAVE_LEAK);
-}
-
-static inline void clear_fpu_state(struct task_struct *tsk)
-{
- fpu_clear(&tsk->thread.fpu);
-}
-
static inline int fxsave_user(struct i387_fxsave_struct __user *fx)
{
int err;
if (unlikely(err))
return -EFAULT;
+ /* See comment in fxsave() below. */
asm volatile("1: rex64/fxsave (%[fx])\n\t"
"2:\n"
".section .fixup,\"ax\"\n"
".previous\n"
_ASM_EXTABLE(1b, 3b)
: [err] "=r" (err), "=m" (*fx)
-#if 0 /* See comment in fxsave() below. */
- : [fx] "r" (fx), "0" (0));
-#else
- : [fx] "cdaSDb" (fx), "0" (0));
-#endif
+ : [fx] "R" (fx), "0" (0));
if (unlikely(err) &&
__clear_user(fx, sizeof(struct i387_fxsave_struct)))
err = -EFAULT;
uses any extended registers for addressing, a second REX prefix
will be generated (to the assembler, rex64 followed by semicolon
is a separate instruction), and hence the 64-bitness is lost. */
-#if 0
+
+#ifdef CONFIG_AS_FXSAVEQ
/* Using "fxsaveq %0" would be the ideal choice, but is only supported
starting with gas 2.16. */
__asm__ __volatile__("fxsaveq %0"
: "=m" (fpu->state->fxsave));
-#elif 0
+#else
/* Using, as a workaround, the properly prefixed form below isn't
accepted by any binutils version so far released, complaining that
the same type of prefix is used twice if an extended register is
- needed for addressing (fix submitted to mainline 2005-11-21). */
- __asm__ __volatile__("rex64/fxsave %0"
- : "=m" (fpu->state->fxsave));
-#else
- /* This, however, we can work around by forcing the compiler to select
+ needed for addressing (fix submitted to mainline 2005-11-21).
+ asm volatile("rex64/fxsave %0"
+ : "=m" (fpu->state->fxsave));
+ This, however, we can work around by forcing the compiler to select
an addressing mode that doesn't require extended registers. */
- __asm__ __volatile__("rex64/fxsave (%1)"
- : "=m" (fpu->state->fxsave)
- : "cdaSDb" (&fpu->state->fxsave));
+ asm volatile("rex64/fxsave (%[fx])"
+ : "=m" (fpu->state->fxsave)
+ : [fx] "R" (&fpu->state->fxsave));
#endif
}
-static inline void fpu_save_init(struct fpu *fpu)
-{
- if (use_xsave())
- fpu_xsave(fpu);
- else
- fpu_fxsave(fpu);
-
- fpu_clear(fpu);
-}
-
-static inline void __save_init_fpu(struct task_struct *tsk)
-{
- fpu_save_init(&tsk->thread.fpu);
- task_thread_info(tsk)->status &= ~TS_USEDFPU;
-}
-
#else /* CONFIG_X86_32 */
-#ifdef CONFIG_MATH_EMULATION
-extern void finit_soft_fpu(struct i387_soft_struct *soft);
-#else
-static inline void finit_soft_fpu(struct i387_soft_struct *soft) {}
-#endif
-
-static inline void tolerant_fwait(void)
-{
- asm volatile("fnclex ; fwait");
-}
-
/* perform fxrstor iff the processor has extended states, otherwise frstor */
static inline int fxrstor_checking(struct i387_fxsave_struct *fx)
{
return 0;
}
+static inline void fpu_fxsave(struct fpu *fpu)
+{
+ asm volatile("fxsave %[fx]"
+ : [fx] "=m" (fpu->state->fxsave));
+}
+
+#endif /* CONFIG_X86_64 */
+
/* We need a safe address that is cheap to find and that is already
in L1 during context switch. The best choices are unfortunately
different for UP and SMP */
static inline void fpu_save_init(struct fpu *fpu)
{
if (use_xsave()) {
- struct xsave_struct *xstate = &fpu->state->xsave;
- struct i387_fxsave_struct *fx = &fpu->state->fxsave;
-
fpu_xsave(fpu);
/*
* xsave header may indicate the init state of the FP.
*/
- if (!(xstate->xsave_hdr.xstate_bv & XSTATE_FP))
- goto end;
-
- if (unlikely(fx->swd & X87_FSW_ES))
- asm volatile("fnclex");
-
- /*
- * we can do a simple return here or be paranoid :)
- */
- goto clear_state;
+ if (!(fpu->state->xsave.xsave_hdr.xstate_bv & XSTATE_FP))
+ return;
+ } else if (use_fxsr()) {
+ fpu_fxsave(fpu);
+ } else {
+ asm volatile("fsave %[fx]; fwait"
+ : [fx] "=m" (fpu->state->fsave));
+ return;
}
- /* Use more nops than strictly needed in case the compiler
- varies code */
- alternative_input(
- "fnsave %[fx] ;fwait;" GENERIC_NOP8 GENERIC_NOP4,
- "fxsave %[fx]\n"
- "bt $7,%[fsw] ; jnc 1f ; fnclex\n1:",
- X86_FEATURE_FXSR,
- [fx] "m" (fpu->state->fxsave),
- [fsw] "m" (fpu->state->fxsave.swd) : "memory");
-clear_state:
+ if (unlikely(fpu->state->fxsave.swd & X87_FSW_ES))
+ asm volatile("fnclex");
+
/* AMD K7/K8 CPUs don't save/restore FDP/FIP/FOP unless an exception
is pending. Clear the x87 state here by setting it to fixed
values. safe_address is a random variable that should be in L1 */
alternative_input(
- GENERIC_NOP8 GENERIC_NOP2,
+ ASM_NOP8 ASM_NOP2,
"emms\n\t" /* clear stack tags */
- "fildl %[addr]", /* set F?P to defined value */
+ "fildl %P[addr]", /* set F?P to defined value */
X86_FEATURE_FXSAVE_LEAK,
[addr] "m" (safe_address));
-end:
- ;
}
static inline void __save_init_fpu(struct task_struct *tsk)
task_thread_info(tsk)->status &= ~TS_USEDFPU;
}
-
-#endif /* CONFIG_X86_64 */
-
static inline int fpu_fxrstor_checking(struct fpu *fpu)
{
return fxrstor_checking(&fpu->state->fxsave);
static inline void __clear_fpu(struct task_struct *tsk)
{
if (task_thread_info(tsk)->status & TS_USEDFPU) {
- tolerant_fwait();
+ /* Ignore delayed exceptions from user space */
+ asm volatile("1: fwait\n"
+ "2:\n"
+ _ASM_EXTABLE(1b, 2b));
task_thread_info(tsk)->status &= ~TS_USEDFPU;
stts();
}
stts();
}
-#ifdef CONFIG_X86_64
-
-static inline void save_init_fpu(struct task_struct *tsk)
-{
- __save_init_fpu(tsk);
- stts();
-}
-
-#define unlazy_fpu __unlazy_fpu
-#define clear_fpu __clear_fpu
-
-#else /* CONFIG_X86_32 */
-
/*
* These disable preemption on their own and are safe
*/
preempt_enable();
}
-#endif /* CONFIG_X86_64 */
-
/*
* i387 state interaction
*/
#endif /* __ASSEMBLY__ */
-#define PSHUFB_XMM5_XMM0 .byte 0x66, 0x0f, 0x38, 0x00, 0xc5
-#define PSHUFB_XMM5_XMM6 .byte 0x66, 0x0f, 0x38, 0x00, 0xf5
-
#endif /* _ASM_X86_I387_H */
extern void iounmap(volatile void __iomem *addr);
+extern void set_iounmap_nonlazy(void);
#ifdef __KERNEL__
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
-void *
+void __iomem *
iomap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot);
void
-iounmap_atomic(void *kvaddr, enum km_type type);
+iounmap_atomic(void __iomem *kvaddr, enum km_type type);
int
iomap_create_wc(resource_size_t base, unsigned long size, pgprot_t *prot);
#define IRTE_DEST(dest) ((x2apic_mode) ? dest : dest << 8)
+#ifdef CONFIG_INTR_REMAP
+static inline void prepare_irte(struct irte *irte, int vector,
+ unsigned int dest)
+{
+ memset(irte, 0, sizeof(*irte));
+
+ irte->present = 1;
+ irte->dst_mode = apic->irq_dest_mode;
+ /*
+ * Trigger mode in the IRTE will always be edge, and for IO-APIC, the
+ * actual level or edge trigger will be setup in the IO-APIC
+ * RTE. This will help simplify level triggered irq migration.
+ * For more details, see the comments (in io_apic.c) explainig IO-APIC
+ * irq migration in the presence of interrupt-remapping.
+ */
+ irte->trigger_mode = 0;
+ irte->dlvry_mode = apic->irq_delivery_mode;
+ irte->vector = vector;
+ irte->dest_id = IRTE_DEST(dest);
+ irte->redir_hint = 1;
+}
+#else
+static void prepare_irte(struct irte *irte, int vector, unsigned int dest)
+{
+}
+#endif
+
#endif /* _ASM_X86_IRQ_REMAPPING_H */
#define X86_PLATFORM_IPI_VECTOR 0xed
/*
- * Performance monitoring pending work vector:
+ * IRQ work vector:
*/
-#define LOCAL_PENDING_VECTOR 0xec
+#define IRQ_WORK_VECTOR 0xec
#define UV_BAU_MESSAGE 0xea
--- /dev/null
+#ifndef _ASM_X86_JUMP_LABEL_H
+#define _ASM_X86_JUMP_LABEL_H
+
+#ifdef __KERNEL__
+
+#include <linux/types.h>
+#include <asm/nops.h>
+
+#define JUMP_LABEL_NOP_SIZE 5
+
+# define JUMP_LABEL_INITIAL_NOP ".byte 0xe9 \n\t .long 0\n\t"
+
+# define JUMP_LABEL(key, label) \
+ do { \
+ asm goto("1:" \
+ JUMP_LABEL_INITIAL_NOP \
+ ".pushsection __jump_table, \"a\" \n\t"\
+ _ASM_PTR "1b, %l[" #label "], %c0 \n\t" \
+ ".popsection \n\t" \
+ : : "i" (key) : : label); \
+ } while (0)
+
+#endif /* __KERNEL__ */
+
+#ifdef CONFIG_X86_64
+typedef u64 jump_label_t;
+#else
+typedef u32 jump_label_t;
+#endif
+
+struct jump_entry {
+ jump_label_t code;
+ jump_label_t target;
+ jump_label_t key;
+};
+
+#endif
struct operand {
enum { OP_REG, OP_MEM, OP_IMM, OP_NONE } type;
unsigned int bytes;
- unsigned long orig_val, *ptr;
+ union {
+ unsigned long orig_val;
+ u64 orig_val64;
+ };
+ unsigned long *ptr;
union {
unsigned long val;
+ u64 val64;
char valptr[sizeof(unsigned long) + 2];
};
};
return (struct kvm_mmu_page *)page_private(page);
}
-static inline u16 kvm_read_fs(void)
-{
- u16 seg;
- asm("mov %%fs, %0" : "=g"(seg));
- return seg;
-}
-
-static inline u16 kvm_read_gs(void)
-{
- u16 seg;
- asm("mov %%gs, %0" : "=g"(seg));
- return seg;
-}
-
static inline u16 kvm_read_ldt(void)
{
u16 ldt;
return ldt;
}
-static inline void kvm_load_fs(u16 sel)
-{
- asm("mov %0, %%fs" : : "rm"(sel));
-}
-
-static inline void kvm_load_gs(u16 sel)
-{
- asm("mov %0, %%gs" : : "rm"(sel));
-}
-
static inline void kvm_load_ldt(u16 sel)
{
asm("lldt %0" : : "rm"(sel));
*/
#ifndef _ASM_X86_MRST_H
#define _ASM_X86_MRST_H
+
+#include <linux/sfi.h>
+
extern int pci_mrst_init(void);
int __init sfi_parse_mrtc(struct sfi_table_header *table);
};
extern enum mrst_cpu_type __mrst_cpu_chip;
-static enum mrst_cpu_type mrst_identify_cpu(void)
+static inline enum mrst_cpu_type mrst_identify_cpu(void)
{
return __mrst_cpu_chip;
}
#define SFI_MTMR_MAX_NUM 8
#define SFI_MRTC_MAX 8
+extern struct console early_mrst_console;
+extern void mrst_early_console_init(void);
+
+extern struct console early_hsu_console;
+extern void hsu_early_console_init(void);
#endif /* _ASM_X86_MRST_H */
--- /dev/null
+#ifndef _ASM_X86_MWAIT_H
+#define _ASM_X86_MWAIT_H
+
+#define MWAIT_SUBSTATE_MASK 0xf
+#define MWAIT_CSTATE_MASK 0xf
+#define MWAIT_SUBSTATE_SIZE 4
+#define MWAIT_MAX_NUM_CSTATES 8
+
+#define CPUID_MWAIT_LEAF 5
+#define CPUID5_ECX_EXTENSIONS_SUPPORTED 0x1
+#define CPUID5_ECX_INTERRUPT_BREAK 0x2
+
+#define MWAIT_ECX_INTERRUPT_BREAK 0x1
+
+#endif /* _ASM_X86_MWAIT_H */
/* install OFW's pde permanently into the kernel's pgtable */
extern void setup_olpc_ofw_pgd(void);
+/* check if OFW was detected during boot */
+extern bool olpc_ofw_present(void);
+
#else /* !CONFIG_OLPC_OPENFIRMWARE */
static inline void olpc_ofw_detect(void) { }
static inline void setup_olpc_ofw_pgd(void) { }
+static inline bool olpc_ofw_present(void) { return false; }
#endif /* !CONFIG_OLPC_OPENFIRMWARE */
#define PAGE_SIZE (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK (~(PAGE_SIZE-1))
-#define __PHYSICAL_MASK ((phys_addr_t)(1ULL << __PHYSICAL_MASK_SHIFT) - 1)
+#define __PHYSICAL_MASK ((phys_addr_t)((1ULL << __PHYSICAL_MASK_SHIFT) - 1))
#define __VIRTUAL_MASK ((1UL << __VIRTUAL_MASK_SHIFT) - 1)
/* Cast PAGE_MASK to a signed type so that it is sign-extended if
PVOP_VCALL2(pv_mmu_ops.alloc_pmd, mm, pfn);
}
-static inline void paravirt_alloc_pmd_clone(unsigned long pfn, unsigned long clonepfn,
- unsigned long start, unsigned long count)
-{
- PVOP_VCALL4(pv_mmu_ops.alloc_pmd_clone, pfn, clonepfn, start, count);
-}
static inline void paravirt_release_pmd(unsigned long pfn)
{
PVOP_VCALL1(pv_mmu_ops.release_pmd, pfn);
*/
void (*alloc_pte)(struct mm_struct *mm, unsigned long pfn);
void (*alloc_pmd)(struct mm_struct *mm, unsigned long pfn);
- void (*alloc_pmd_clone)(unsigned long pfn, unsigned long clonepfn, unsigned long start, unsigned long count);
void (*alloc_pud)(struct mm_struct *mm, unsigned long pfn);
void (*release_pte)(unsigned long pfn);
void (*release_pmd)(unsigned long pfn);
#define P4_ESCR_EMASK(v) ((v) << P4_ESCR_EVENTMASK_SHIFT)
#define P4_ESCR_TAG(v) ((v) << P4_ESCR_TAG_SHIFT)
-/* Non HT mask */
-#define P4_ESCR_MASK \
- (P4_ESCR_EVENT_MASK | \
- P4_ESCR_EVENTMASK_MASK | \
- P4_ESCR_TAG_MASK | \
- P4_ESCR_TAG_ENABLE | \
- P4_ESCR_T0_OS | \
- P4_ESCR_T0_USR)
-
-/* HT mask */
-#define P4_ESCR_MASK_HT \
- (P4_ESCR_MASK | P4_ESCR_T1_OS | P4_ESCR_T1_USR)
-
#define P4_CCCR_OVF 0x80000000U
#define P4_CCCR_CASCADE 0x40000000U
#define P4_CCCR_OVF_PMI_T0 0x04000000U
#define P4_CCCR_THRESHOLD(v) ((v) << P4_CCCR_THRESHOLD_SHIFT)
#define P4_CCCR_ESEL(v) ((v) << P4_CCCR_ESCR_SELECT_SHIFT)
-/* Non HT mask */
-#define P4_CCCR_MASK \
- (P4_CCCR_OVF | \
- P4_CCCR_CASCADE | \
- P4_CCCR_OVF_PMI_T0 | \
- P4_CCCR_FORCE_OVF | \
- P4_CCCR_EDGE | \
- P4_CCCR_THRESHOLD_MASK | \
- P4_CCCR_COMPLEMENT | \
- P4_CCCR_COMPARE | \
- P4_CCCR_ESCR_SELECT_MASK | \
- P4_CCCR_ENABLE)
-
-/* HT mask */
-#define P4_CCCR_MASK_HT \
- (P4_CCCR_MASK | P4_CCCR_OVF_PMI_T1 | P4_CCCR_THREAD_ANY)
-
#define P4_GEN_ESCR_EMASK(class, name, bit) \
class##__##name = ((1 << bit) << P4_ESCR_EVENTMASK_SHIFT)
#define P4_ESCR_EMASK_BIT(class, name) class##__##name
#define P4_CONFIG_HT_SHIFT 63
#define P4_CONFIG_HT (1ULL << P4_CONFIG_HT_SHIFT)
+/*
+ * The bits we allow to pass for RAW events
+ */
+#define P4_CONFIG_MASK_ESCR \
+ P4_ESCR_EVENT_MASK | \
+ P4_ESCR_EVENTMASK_MASK | \
+ P4_ESCR_TAG_MASK | \
+ P4_ESCR_TAG_ENABLE
+
+#define P4_CONFIG_MASK_CCCR \
+ P4_CCCR_EDGE | \
+ P4_CCCR_THRESHOLD_MASK | \
+ P4_CCCR_COMPLEMENT | \
+ P4_CCCR_COMPARE | \
+ P4_CCCR_THREAD_ANY | \
+ P4_CCCR_RESERVED
+
+/* some dangerous bits are reserved for kernel internals */
+#define P4_CONFIG_MASK \
+ (p4_config_pack_escr(P4_CONFIG_MASK_ESCR)) | \
+ (p4_config_pack_cccr(P4_CONFIG_MASK_CCCR))
+
static inline bool p4_is_event_cascaded(u64 config)
{
u32 cccr = p4_config_unpack_cccr(config);
extern spinlock_t pgd_lock;
extern struct list_head pgd_list;
+extern struct mm_struct *pgd_page_get_mm(struct page *page);
+
#ifdef CONFIG_PARAVIRT
#include <asm/paravirt.h>
#else /* !CONFIG_PARAVIRT */
pte_update(mm, addr, ptep);
}
+#define flush_tlb_fix_spurious_fault(vma, address)
+
/*
* clone_pgd_range(pgd_t *dst, pgd_t *src, int count);
*
native_set_pgd(pgd, native_make_pgd(0));
}
+extern void sync_global_pgds(unsigned long start, unsigned long end);
+
/*
* Conversion functions: convert a page and protection to a page entry,
* and a page entry and page directory to the page they refer to.
u16 phys_proc_id;
/* Core id: */
u16 cpu_core_id;
+ /* Compute unit id */
+ u8 compute_unit_id;
/* Index into per_cpu list: */
u16 cpu_index;
#endif
static inline void set_in_cr4(unsigned long mask)
{
- unsigned cr4;
+ unsigned long cr4;
mmu_cr4_features |= mask;
cr4 = read_cr4();
static inline void clear_in_cr4(unsigned long mask)
{
- unsigned cr4;
+ unsigned long cr4;
mmu_cr4_features &= ~mask;
cr4 = read_cr4();
extern unsigned long idle_nomwait;
extern bool c1e_detected;
-/*
- * on systems with caches, caches must be flashed as the absolute
- * last instruction before going into a suspended halt. Otherwise,
- * dirty data can linger in the cache and become stale on resume,
- * leading to strange errors.
- *
- * perform a variety of operations to guarantee that the compiler
- * will not reorder instructions. wbinvd itself is serializing
- * so the processor will not reorder.
- *
- * Systems without cache can just go into halt.
- */
-static inline void wbinvd_halt(void)
-{
- mb();
- /* check for clflush to determine if wbinvd is legal */
- if (cpu_has_clflush)
- asm volatile("cli; wbinvd; 1: hlt; jmp 1b" : : : "memory");
- else
- while (1)
- halt();
-}
-
extern void enable_sep_cpu(void);
extern int sysenter_setup(void);
: : "i" (sz)); \
}
+/* Helper for reserving space for arrays of things */
+#define RESERVE_BRK_ARRAY(type, name, entries) \
+ type *name; \
+ RESERVE_BRK(name, sizeof(type) * entries)
+
#ifdef __i386__
void __init i386_start_kernel(void);
+++ /dev/null
-/*
- * VMI interface definition
- *
- * Copyright (C) 2005, VMware, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT. See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Maintained by: Zachary Amsden zach@vmware.com
- *
- */
-#include <linux/types.h>
-
-/*
- *---------------------------------------------------------------------
- *
- * VMI Option ROM API
- *
- *---------------------------------------------------------------------
- */
-#define VMI_SIGNATURE 0x696d5663 /* "cVmi" */
-
-#define PCI_VENDOR_ID_VMWARE 0x15AD
-#define PCI_DEVICE_ID_VMWARE_VMI 0x0801
-
-/*
- * We use two version numbers for compatibility, with the major
- * number signifying interface breakages, and the minor number
- * interface extensions.
- */
-#define VMI_API_REV_MAJOR 3
-#define VMI_API_REV_MINOR 0
-
-#define VMI_CALL_CPUID 0
-#define VMI_CALL_WRMSR 1
-#define VMI_CALL_RDMSR 2
-#define VMI_CALL_SetGDT 3
-#define VMI_CALL_SetLDT 4
-#define VMI_CALL_SetIDT 5
-#define VMI_CALL_SetTR 6
-#define VMI_CALL_GetGDT 7
-#define VMI_CALL_GetLDT 8
-#define VMI_CALL_GetIDT 9
-#define VMI_CALL_GetTR 10
-#define VMI_CALL_WriteGDTEntry 11
-#define VMI_CALL_WriteLDTEntry 12
-#define VMI_CALL_WriteIDTEntry 13
-#define VMI_CALL_UpdateKernelStack 14
-#define VMI_CALL_SetCR0 15
-#define VMI_CALL_SetCR2 16
-#define VMI_CALL_SetCR3 17
-#define VMI_CALL_SetCR4 18
-#define VMI_CALL_GetCR0 19
-#define VMI_CALL_GetCR2 20
-#define VMI_CALL_GetCR3 21
-#define VMI_CALL_GetCR4 22
-#define VMI_CALL_WBINVD 23
-#define VMI_CALL_SetDR 24
-#define VMI_CALL_GetDR 25
-#define VMI_CALL_RDPMC 26
-#define VMI_CALL_RDTSC 27
-#define VMI_CALL_CLTS 28
-#define VMI_CALL_EnableInterrupts 29
-#define VMI_CALL_DisableInterrupts 30
-#define VMI_CALL_GetInterruptMask 31
-#define VMI_CALL_SetInterruptMask 32
-#define VMI_CALL_IRET 33
-#define VMI_CALL_SYSEXIT 34
-#define VMI_CALL_Halt 35
-#define VMI_CALL_Reboot 36
-#define VMI_CALL_Shutdown 37
-#define VMI_CALL_SetPxE 38
-#define VMI_CALL_SetPxELong 39
-#define VMI_CALL_UpdatePxE 40
-#define VMI_CALL_UpdatePxELong 41
-#define VMI_CALL_MachineToPhysical 42
-#define VMI_CALL_PhysicalToMachine 43
-#define VMI_CALL_AllocatePage 44
-#define VMI_CALL_ReleasePage 45
-#define VMI_CALL_InvalPage 46
-#define VMI_CALL_FlushTLB 47
-#define VMI_CALL_SetLinearMapping 48
-
-#define VMI_CALL_SetIOPLMask 61
-#define VMI_CALL_SetInitialAPState 62
-#define VMI_CALL_APICWrite 63
-#define VMI_CALL_APICRead 64
-#define VMI_CALL_IODelay 65
-#define VMI_CALL_SetLazyMode 73
-
-/*
- *---------------------------------------------------------------------
- *
- * MMU operation flags
- *
- *---------------------------------------------------------------------
- */
-
-/* Flags used by VMI_{Allocate|Release}Page call */
-#define VMI_PAGE_PAE 0x10 /* Allocate PAE shadow */
-#define VMI_PAGE_CLONE 0x20 /* Clone from another shadow */
-#define VMI_PAGE_ZEROED 0x40 /* Page is pre-zeroed */
-
-
-/* Flags shared by Allocate|Release Page and PTE updates */
-#define VMI_PAGE_PT 0x01
-#define VMI_PAGE_PD 0x02
-#define VMI_PAGE_PDP 0x04
-#define VMI_PAGE_PML4 0x08
-
-#define VMI_PAGE_NORMAL 0x00 /* for debugging */
-
-/* Flags used by PTE updates */
-#define VMI_PAGE_CURRENT_AS 0x10 /* implies VMI_PAGE_VA_MASK is valid */
-#define VMI_PAGE_DEFER 0x20 /* may queue update until TLB inval */
-#define VMI_PAGE_VA_MASK 0xfffff000
-
-#ifdef CONFIG_X86_PAE
-#define VMI_PAGE_L1 (VMI_PAGE_PT | VMI_PAGE_PAE | VMI_PAGE_ZEROED)
-#define VMI_PAGE_L2 (VMI_PAGE_PD | VMI_PAGE_PAE | VMI_PAGE_ZEROED)
-#else
-#define VMI_PAGE_L1 (VMI_PAGE_PT | VMI_PAGE_ZEROED)
-#define VMI_PAGE_L2 (VMI_PAGE_PD | VMI_PAGE_ZEROED)
-#endif
-
-/* Flags used by VMI_FlushTLB call */
-#define VMI_FLUSH_TLB 0x01
-#define VMI_FLUSH_GLOBAL 0x02
-
-/*
- *---------------------------------------------------------------------
- *
- * VMI relocation definitions for ROM call get_reloc
- *
- *---------------------------------------------------------------------
- */
-
-/* VMI Relocation types */
-#define VMI_RELOCATION_NONE 0
-#define VMI_RELOCATION_CALL_REL 1
-#define VMI_RELOCATION_JUMP_REL 2
-#define VMI_RELOCATION_NOP 3
-
-#ifndef __ASSEMBLY__
-struct vmi_relocation_info {
- unsigned char *eip;
- unsigned char type;
- unsigned char reserved[3];
-};
-#endif
-
-
-/*
- *---------------------------------------------------------------------
- *
- * Generic ROM structures and definitions
- *
- *---------------------------------------------------------------------
- */
-
-#ifndef __ASSEMBLY__
-
-struct vrom_header {
- u16 rom_signature; /* option ROM signature */
- u8 rom_length; /* ROM length in 512 byte chunks */
- u8 rom_entry[4]; /* 16-bit code entry point */
- u8 rom_pad0; /* 4-byte align pad */
- u32 vrom_signature; /* VROM identification signature */
- u8 api_version_min;/* Minor version of API */
- u8 api_version_maj;/* Major version of API */
- u8 jump_slots; /* Number of jump slots */
- u8 reserved1; /* Reserved for expansion */
- u32 virtual_top; /* Hypervisor virtual address start */
- u16 reserved2; /* Reserved for expansion */
- u16 license_offs; /* Offset to License string */
- u16 pci_header_offs;/* Offset to PCI OPROM header */
- u16 pnp_header_offs;/* Offset to PnP OPROM header */
- u32 rom_pad3; /* PnP reserverd / VMI reserved */
- u8 reserved[96]; /* Reserved for headers */
- char vmi_init[8]; /* VMI_Init jump point */
- char get_reloc[8]; /* VMI_GetRelocationInfo jump point */
-} __attribute__((packed));
-
-struct pnp_header {
- char sig[4];
- char rev;
- char size;
- short next;
- short res;
- long devID;
- unsigned short manufacturer_offset;
- unsigned short product_offset;
-} __attribute__((packed));
-
-struct pci_header {
- char sig[4];
- short vendorID;
- short deviceID;
- short vpdData;
- short size;
- char rev;
- char class;
- char subclass;
- char interface;
- short chunks;
- char rom_version_min;
- char rom_version_maj;
- char codetype;
- char lastRom;
- short reserved;
-} __attribute__((packed));
-
-/* Function prototypes for bootstrapping */
-#ifdef CONFIG_VMI
-extern void vmi_init(void);
-extern void vmi_activate(void);
-extern void vmi_bringup(void);
-#else
-static inline void vmi_init(void) {}
-static inline void vmi_activate(void) {}
-static inline void vmi_bringup(void) {}
-#endif
-
-/* State needed to start an application processor in an SMP system. */
-struct vmi_ap_state {
- u32 cr0;
- u32 cr2;
- u32 cr3;
- u32 cr4;
-
- u64 efer;
-
- u32 eip;
- u32 eflags;
- u32 eax;
- u32 ebx;
- u32 ecx;
- u32 edx;
- u32 esp;
- u32 ebp;
- u32 esi;
- u32 edi;
- u16 cs;
- u16 ss;
- u16 ds;
- u16 es;
- u16 fs;
- u16 gs;
- u16 ldtr;
-
- u16 gdtr_limit;
- u32 gdtr_base;
- u32 idtr_base;
- u16 idtr_limit;
-};
-
-#endif
+++ /dev/null
-/*
- * VMI Time wrappers
- *
- * Copyright (C) 2006, VMware, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT. See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send feedback to dhecht@vmware.com
- *
- */
-
-#ifndef _ASM_X86_VMI_TIME_H
-#define _ASM_X86_VMI_TIME_H
-
-/*
- * Raw VMI call indices for timer functions
- */
-#define VMI_CALL_GetCycleFrequency 66
-#define VMI_CALL_GetCycleCounter 67
-#define VMI_CALL_SetAlarm 68
-#define VMI_CALL_CancelAlarm 69
-#define VMI_CALL_GetWallclockTime 70
-#define VMI_CALL_WallclockUpdated 71
-
-/* Cached VMI timer operations */
-extern struct vmi_timer_ops {
- u64 (*get_cycle_frequency)(void);
- u64 (*get_cycle_counter)(int);
- u64 (*get_wallclock)(void);
- int (*wallclock_updated)(void);
- void (*set_alarm)(u32 flags, u64 expiry, u64 period);
- void (*cancel_alarm)(u32 flags);
-} vmi_timer_ops;
-
-/* Prototypes */
-extern void __init vmi_time_init(void);
-extern unsigned long vmi_get_wallclock(void);
-extern int vmi_set_wallclock(unsigned long now);
-extern unsigned long long vmi_sched_clock(void);
-extern unsigned long vmi_tsc_khz(void);
-
-#ifdef CONFIG_X86_LOCAL_APIC
-extern void __devinit vmi_time_bsp_init(void);
-extern void __devinit vmi_time_ap_init(void);
-#endif
-
-/*
- * When run under a hypervisor, a vcpu is always in one of three states:
- * running, halted, or ready. The vcpu is in the 'running' state if it
- * is executing. When the vcpu executes the halt interface, the vcpu
- * enters the 'halted' state and remains halted until there is some work
- * pending for the vcpu (e.g. an alarm expires, host I/O completes on
- * behalf of virtual I/O). At this point, the vcpu enters the 'ready'
- * state (waiting for the hypervisor to reschedule it). Finally, at any
- * time when the vcpu is not in the 'running' state nor the 'halted'
- * state, it is in the 'ready' state.
- *
- * Real time is advances while the vcpu is 'running', 'ready', or
- * 'halted'. Stolen time is the time in which the vcpu is in the
- * 'ready' state. Available time is the remaining time -- the vcpu is
- * either 'running' or 'halted'.
- *
- * All three views of time are accessible through the VMI cycle
- * counters.
- */
-
-/* The cycle counters. */
-#define VMI_CYCLES_REAL 0
-#define VMI_CYCLES_AVAILABLE 1
-#define VMI_CYCLES_STOLEN 2
-
-/* The alarm interface 'flags' bits */
-#define VMI_ALARM_COUNTERS 2
-
-#define VMI_ALARM_COUNTER_MASK 0x000000ff
-
-#define VMI_ALARM_WIRED_IRQ0 0x00000000
-#define VMI_ALARM_WIRED_LVTT 0x00010000
-
-#define VMI_ALARM_IS_ONESHOT 0x00000000
-#define VMI_ALARM_IS_PERIODIC 0x00000100
-
-#define CONFIG_VMI_ALARM_HZ 100
-
-#endif /* _ASM_X86_VMI_TIME_H */
CFLAGS_REMOVE_tsc.o = -pg
CFLAGS_REMOVE_rtc.o = -pg
CFLAGS_REMOVE_paravirt-spinlocks.o = -pg
+CFLAGS_REMOVE_pvclock.o = -pg
+CFLAGS_REMOVE_kvmclock.o = -pg
CFLAGS_REMOVE_ftrace.o = -pg
CFLAGS_REMOVE_early_printk.o = -pg
endif
obj-y := process_$(BITS).o signal.o entry_$(BITS).o
obj-y += traps.o irq.o irq_$(BITS).o dumpstack_$(BITS).o
obj-y += time.o ioport.o ldt.o dumpstack.o
-obj-y += setup.o x86_init.o i8259.o irqinit.o
+obj-y += setup.o x86_init.o i8259.o irqinit.o jump_label.o
+obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-$(CONFIG_X86_VISWS) += visws_quirks.o
obj-$(CONFIG_X86_32) += probe_roms_32.o
obj-$(CONFIG_X86_32) += sys_i386_32.o i386_ksyms_32.o
obj-$(CONFIG_KGDB) += kgdb.o
obj-$(CONFIG_VM86) += vm86_32.o
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
+obj-$(CONFIG_EARLY_PRINTK_MRST) += early_printk_mrst.o
obj-$(CONFIG_HPET_TIMER) += hpet.o
obj-$(CONFIG_APB_TIMER) += apb_timer.o
-obj-$(CONFIG_K8_NB) += k8.o
+obj-$(CONFIG_AMD_NB) += amd_nb.o
obj-$(CONFIG_DEBUG_RODATA_TEST) += test_rodata.o
obj-$(CONFIG_DEBUG_NX_TEST) += test_nx.o
-obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o
obj-$(CONFIG_KVM_GUEST) += kvm.o
obj-$(CONFIG_KVM_CLOCK) += kvmclock.o
obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o
scx200-y += scx200_32.o
obj-$(CONFIG_OLPC) += olpc.o
+obj-$(CONFIG_OLPC_XO1) += olpc-xo1.o
obj-$(CONFIG_OLPC_OPENFIRMWARE) += olpc_ofw.o
obj-$(CONFIG_X86_MRST) += mrst.o
# 64 bit specific files
ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_X86_UV) += tlb_uv.o bios_uv.o uv_irq.o uv_sysfs.o uv_time.o
- obj-$(CONFIG_X86_PM_TIMER) += pmtimer_64.o
obj-$(CONFIG_AUDIT) += audit_64.o
obj-$(CONFIG_GART_IOMMU) += pci-gart_64.o aperture_64.o
#include <acpi/processor.h>
#include <asm/acpi.h>
+#include <asm/mwait.h>
/*
* Initialize bm_flags based on the CPU cache properties
unsigned int ecx;
} states[ACPI_PROCESSOR_MAX_POWER];
};
-static struct cstate_entry *cpu_cstate_entry; /* per CPU ptr */
+static struct cstate_entry __percpu *cpu_cstate_entry; /* per CPU ptr */
static short mwait_supported[ACPI_PROCESSOR_MAX_POWER];
-#define MWAIT_SUBSTATE_MASK (0xf)
-#define MWAIT_CSTATE_MASK (0xf)
-#define MWAIT_SUBSTATE_SIZE (4)
-
-#define CPUID_MWAIT_LEAF (5)
-#define CPUID5_ECX_EXTENSIONS_SUPPORTED (0x1)
-#define CPUID5_ECX_INTERRUPT_BREAK (0x2)
-
-#define MWAIT_ECX_INTERRUPT_BREAK (0x1)
-
#define NATIVE_CSTATE_BEYOND_HALT (2)
static long acpi_processor_ffh_cstate_probe_cpu(void *_cx)
extern struct alt_instr __alt_instructions[], __alt_instructions_end[];
extern s32 __smp_locks[], __smp_locks_end[];
-static void *text_poke_early(void *addr, const void *opcode, size_t len);
+void *text_poke_early(void *addr, const void *opcode, size_t len);
/* Replace instructions with better alternatives for this CPU type.
This runs before SMP is initialized to avoid SMP problems with
* instructions. And on the local CPU you need to be protected again NMI or MCE
* handlers seeing an inconsistent instruction while you patch.
*/
-static void *__init_or_module text_poke_early(void *addr, const void *opcode,
+void *__init_or_module text_poke_early(void *addr, const void *opcode,
size_t len)
{
unsigned long flags;
tpp.len = len;
atomic_set(&stop_machine_first, 1);
wrote_text = 0;
- stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
+ /* Use __stop_machine() because the caller already got online_cpus. */
+ __stop_machine(stop_machine_text_poke, (void *)&tpp, NULL);
return addr;
}
+#if defined(CONFIG_DYNAMIC_FTRACE) || defined(HAVE_JUMP_LABEL)
+
+unsigned char ideal_nop5[IDEAL_NOP_SIZE_5];
+
+void __init arch_init_ideal_nop5(void)
+{
+ extern const unsigned char ftrace_test_p6nop[];
+ extern const unsigned char ftrace_test_nop5[];
+ extern const unsigned char ftrace_test_jmp[];
+ int faulted = 0;
+
+ /*
+ * There is no good nop for all x86 archs.
+ * We will default to using the P6_NOP5, but first we
+ * will test to make sure that the nop will actually
+ * work on this CPU. If it faults, we will then
+ * go to a lesser efficient 5 byte nop. If that fails
+ * we then just use a jmp as our nop. This isn't the most
+ * efficient nop, but we can not use a multi part nop
+ * since we would then risk being preempted in the middle
+ * of that nop, and if we enabled tracing then, it might
+ * cause a system crash.
+ *
+ * TODO: check the cpuid to determine the best nop.
+ */
+ asm volatile (
+ "ftrace_test_jmp:"
+ "jmp ftrace_test_p6nop\n"
+ "nop\n"
+ "nop\n"
+ "nop\n" /* 2 byte jmp + 3 bytes */
+ "ftrace_test_p6nop:"
+ P6_NOP5
+ "jmp 1f\n"
+ "ftrace_test_nop5:"
+ ".byte 0x66,0x66,0x66,0x66,0x90\n"
+ "1:"
+ ".section .fixup, \"ax\"\n"
+ "2: movl $1, %0\n"
+ " jmp ftrace_test_nop5\n"
+ "3: movl $2, %0\n"
+ " jmp 1b\n"
+ ".previous\n"
+ _ASM_EXTABLE(ftrace_test_p6nop, 2b)
+ _ASM_EXTABLE(ftrace_test_nop5, 3b)
+ : "=r"(faulted) : "0" (faulted));
+
+ switch (faulted) {
+ case 0:
+ pr_info("converting mcount calls to 0f 1f 44 00 00\n");
+ memcpy(ideal_nop5, ftrace_test_p6nop, IDEAL_NOP_SIZE_5);
+ break;
+ case 1:
+ pr_info("converting mcount calls to 66 66 66 66 90\n");
+ memcpy(ideal_nop5, ftrace_test_nop5, IDEAL_NOP_SIZE_5);
+ break;
+ case 2:
+ pr_info("converting mcount calls to jmp . + 5\n");
+ memcpy(ideal_nop5, ftrace_test_jmp, IDEAL_NOP_SIZE_5);
+ break;
+ }
+
+}
+#endif
/*
- * Copyright (C) 2007-2009 Advanced Micro Devices, Inc.
+ * Copyright (C) 2007-2010 Advanced Micro Devices, Inc.
* Author: Joerg Roedel <joerg.roedel@amd.com>
* Leo Duran <leo.duran@amd.com>
*
size_t size,
int dir)
{
+ dma_addr_t flush_addr;
dma_addr_t i, start;
unsigned int pages;
(dma_addr + size > dma_dom->aperture_size))
return;
+ flush_addr = dma_addr;
pages = iommu_num_pages(dma_addr, size, PAGE_SIZE);
dma_addr &= PAGE_MASK;
start = dma_addr;
dma_ops_free_addresses(dma_dom, dma_addr, pages);
if (amd_iommu_unmap_flush || dma_dom->need_flush) {
- iommu_flush_pages(&dma_dom->domain, dma_addr, size);
+ iommu_flush_pages(&dma_dom->domain, flush_addr, size);
dma_dom->need_flush = false;
}
}
/*
- * Copyright (C) 2007-2009 Advanced Micro Devices, Inc.
+ * Copyright (C) 2007-2010 Advanced Micro Devices, Inc.
* Author: Joerg Roedel <joerg.roedel@amd.com>
* Leo Duran <leo.duran@amd.com>
*
return 1UL << shift;
}
+/* Access to l1 and l2 indexed register spaces */
+
+static u32 iommu_read_l1(struct amd_iommu *iommu, u16 l1, u8 address)
+{
+ u32 val;
+
+ pci_write_config_dword(iommu->dev, 0xf8, (address | l1 << 16));
+ pci_read_config_dword(iommu->dev, 0xfc, &val);
+ return val;
+}
+
+static void iommu_write_l1(struct amd_iommu *iommu, u16 l1, u8 address, u32 val)
+{
+ pci_write_config_dword(iommu->dev, 0xf8, (address | l1 << 16 | 1 << 31));
+ pci_write_config_dword(iommu->dev, 0xfc, val);
+ pci_write_config_dword(iommu->dev, 0xf8, (address | l1 << 16));
+}
+
+static u32 iommu_read_l2(struct amd_iommu *iommu, u8 address)
+{
+ u32 val;
+
+ pci_write_config_dword(iommu->dev, 0xf0, address);
+ pci_read_config_dword(iommu->dev, 0xf4, &val);
+ return val;
+}
+
+static void iommu_write_l2(struct amd_iommu *iommu, u8 address, u32 val)
+{
+ pci_write_config_dword(iommu->dev, 0xf0, (address | 1 << 8));
+ pci_write_config_dword(iommu->dev, 0xf4, val);
+}
+
/****************************************************************************
*
* AMD IOMMU MMIO register space handling functions
{
int cap_ptr = iommu->cap_ptr;
u32 range, misc;
+ int i, j;
pci_read_config_dword(iommu->dev, cap_ptr + MMIO_CAP_HDR_OFFSET,
&iommu->cap);
iommu->last_device = calc_devid(MMIO_GET_BUS(range),
MMIO_GET_LD(range));
iommu->evt_msi_num = MMIO_MSI_NUM(misc);
+
+ if (!is_rd890_iommu(iommu->dev))
+ return;
+
+ /*
+ * Some rd890 systems may not be fully reconfigured by the BIOS, so
+ * it's necessary for us to store this information so it can be
+ * reprogrammed on resume
+ */
+
+ pci_read_config_dword(iommu->dev, iommu->cap_ptr + 4,
+ &iommu->stored_addr_lo);
+ pci_read_config_dword(iommu->dev, iommu->cap_ptr + 8,
+ &iommu->stored_addr_hi);
+
+ /* Low bit locks writes to configuration space */
+ iommu->stored_addr_lo &= ~1;
+
+ for (i = 0; i < 6; i++)
+ for (j = 0; j < 0x12; j++)
+ iommu->stored_l1[i][j] = iommu_read_l1(iommu, i, j);
+
+ for (i = 0; i < 0x83; i++)
+ iommu->stored_l2[i] = iommu_read_l2(iommu, i);
}
/*
struct ivhd_entry *e;
/*
- * First set the recommended feature enable bits from ACPI
- * into the IOMMU control registers
- */
- h->flags & IVHD_FLAG_HT_TUN_EN_MASK ?
- iommu_feature_enable(iommu, CONTROL_HT_TUN_EN) :
- iommu_feature_disable(iommu, CONTROL_HT_TUN_EN);
-
- h->flags & IVHD_FLAG_PASSPW_EN_MASK ?
- iommu_feature_enable(iommu, CONTROL_PASSPW_EN) :
- iommu_feature_disable(iommu, CONTROL_PASSPW_EN);
-
- h->flags & IVHD_FLAG_RESPASSPW_EN_MASK ?
- iommu_feature_enable(iommu, CONTROL_RESPASSPW_EN) :
- iommu_feature_disable(iommu, CONTROL_RESPASSPW_EN);
-
- h->flags & IVHD_FLAG_ISOC_EN_MASK ?
- iommu_feature_enable(iommu, CONTROL_ISOC_EN) :
- iommu_feature_disable(iommu, CONTROL_ISOC_EN);
-
- /*
- * make IOMMU memory accesses cache coherent
+ * First save the recommended feature enable bits from ACPI
*/
- iommu_feature_enable(iommu, CONTROL_COHERENT_EN);
+ iommu->acpi_flags = h->flags;
/*
* Done. Now parse the device entries
}
}
+static void iommu_init_flags(struct amd_iommu *iommu)
+{
+ iommu->acpi_flags & IVHD_FLAG_HT_TUN_EN_MASK ?
+ iommu_feature_enable(iommu, CONTROL_HT_TUN_EN) :
+ iommu_feature_disable(iommu, CONTROL_HT_TUN_EN);
+
+ iommu->acpi_flags & IVHD_FLAG_PASSPW_EN_MASK ?
+ iommu_feature_enable(iommu, CONTROL_PASSPW_EN) :
+ iommu_feature_disable(iommu, CONTROL_PASSPW_EN);
+
+ iommu->acpi_flags & IVHD_FLAG_RESPASSPW_EN_MASK ?
+ iommu_feature_enable(iommu, CONTROL_RESPASSPW_EN) :
+ iommu_feature_disable(iommu, CONTROL_RESPASSPW_EN);
+
+ iommu->acpi_flags & IVHD_FLAG_ISOC_EN_MASK ?
+ iommu_feature_enable(iommu, CONTROL_ISOC_EN) :
+ iommu_feature_disable(iommu, CONTROL_ISOC_EN);
+
+ /*
+ * make IOMMU memory accesses cache coherent
+ */
+ iommu_feature_enable(iommu, CONTROL_COHERENT_EN);
+}
+
+static void iommu_apply_resume_quirks(struct amd_iommu *iommu)
+{
+ int i, j;
+ u32 ioc_feature_control;
+ struct pci_dev *pdev = NULL;
+
+ /* RD890 BIOSes may not have completely reconfigured the iommu */
+ if (!is_rd890_iommu(iommu->dev))
+ return;
+
+ /*
+ * First, we need to ensure that the iommu is enabled. This is
+ * controlled by a register in the northbridge
+ */
+ pdev = pci_get_bus_and_slot(iommu->dev->bus->number, PCI_DEVFN(0, 0));
+
+ if (!pdev)
+ return;
+
+ /* Select Northbridge indirect register 0x75 and enable writing */
+ pci_write_config_dword(pdev, 0x60, 0x75 | (1 << 7));
+ pci_read_config_dword(pdev, 0x64, &ioc_feature_control);
+
+ /* Enable the iommu */
+ if (!(ioc_feature_control & 0x1))
+ pci_write_config_dword(pdev, 0x64, ioc_feature_control | 1);
+
+ pci_dev_put(pdev);
+
+ /* Restore the iommu BAR */
+ pci_write_config_dword(iommu->dev, iommu->cap_ptr + 4,
+ iommu->stored_addr_lo);
+ pci_write_config_dword(iommu->dev, iommu->cap_ptr + 8,
+ iommu->stored_addr_hi);
+
+ /* Restore the l1 indirect regs for each of the 6 l1s */
+ for (i = 0; i < 6; i++)
+ for (j = 0; j < 0x12; j++)
+ iommu_write_l1(iommu, i, j, iommu->stored_l1[i][j]);
+
+ /* Restore the l2 indirect regs */
+ for (i = 0; i < 0x83; i++)
+ iommu_write_l2(iommu, i, iommu->stored_l2[i]);
+
+ /* Lock PCI setup registers */
+ pci_write_config_dword(iommu->dev, iommu->cap_ptr + 4,
+ iommu->stored_addr_lo | 1);
+}
+
/*
* This function finally enables all IOMMUs found in the system after
* they have been initialized
for_each_iommu(iommu) {
iommu_disable(iommu);
+ iommu_init_flags(iommu);
iommu_set_device_table(iommu);
iommu_enable_command_buffer(iommu);
iommu_enable_event_buffer(iommu);
static int amd_iommu_resume(struct sys_device *dev)
{
+ struct amd_iommu *iommu;
+
+ for_each_iommu(iommu)
+ iommu_apply_resume_quirks(iommu);
+
/* re-load the hardware */
enable_iommus();
#include <linux/errno.h>
#include <linux/module.h>
#include <linux/spinlock.h>
-#include <asm/k8.h>
-
-int num_k8_northbridges;
-EXPORT_SYMBOL(num_k8_northbridges);
+#include <asm/amd_nb.h>
static u32 *flush_words;
struct pci_device_id k8_nb_ids[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB_MISC) },
{ PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_MISC) },
+ { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_NB_MISC) },
{}
};
EXPORT_SYMBOL(k8_nb_ids);
-struct pci_dev **k8_northbridges;
+struct k8_northbridge_info k8_northbridges;
EXPORT_SYMBOL(k8_northbridges);
static struct pci_dev *next_k8_northbridge(struct pci_dev *dev)
int i;
struct pci_dev *dev;
- if (num_k8_northbridges)
+ if (k8_northbridges.num)
return 0;
dev = NULL;
while ((dev = next_k8_northbridge(dev)) != NULL)
- num_k8_northbridges++;
+ k8_northbridges.num++;
+
+ /* some CPU families (e.g. family 0x11) do not support GART */
+ if (boot_cpu_data.x86 == 0xf || boot_cpu_data.x86 == 0x10 ||
+ boot_cpu_data.x86 == 0x15)
+ k8_northbridges.gart_supported = 1;
- k8_northbridges = kmalloc((num_k8_northbridges + 1) * sizeof(void *),
- GFP_KERNEL);
- if (!k8_northbridges)
+ k8_northbridges.nb_misc = kmalloc((k8_northbridges.num + 1) *
+ sizeof(void *), GFP_KERNEL);
+ if (!k8_northbridges.nb_misc)
return -ENOMEM;
- if (!num_k8_northbridges) {
- k8_northbridges[0] = NULL;
+ if (!k8_northbridges.num) {
+ k8_northbridges.nb_misc[0] = NULL;
return 0;
}
- flush_words = kmalloc(num_k8_northbridges * sizeof(u32), GFP_KERNEL);
- if (!flush_words) {
- kfree(k8_northbridges);
- return -ENOMEM;
+ if (k8_northbridges.gart_supported) {
+ flush_words = kmalloc(k8_northbridges.num * sizeof(u32),
+ GFP_KERNEL);
+ if (!flush_words) {
+ kfree(k8_northbridges.nb_misc);
+ return -ENOMEM;
+ }
}
dev = NULL;
i = 0;
while ((dev = next_k8_northbridge(dev)) != NULL) {
- k8_northbridges[i] = dev;
- pci_read_config_dword(dev, 0x9c, &flush_words[i++]);
+ k8_northbridges.nb_misc[i] = dev;
+ if (k8_northbridges.gart_supported)
+ pci_read_config_dword(dev, 0x9c, &flush_words[i++]);
}
- k8_northbridges[i] = NULL;
+ k8_northbridges.nb_misc[i] = NULL;
return 0;
}
EXPORT_SYMBOL_GPL(cache_k8_northbridges);
unsigned long flags;
static DEFINE_SPINLOCK(gart_lock);
+ if (!k8_northbridges.gart_supported)
+ return;
+
/* Avoid races between AGP and IOMMU. In theory it's not needed
but I'm not sure if the hardware won't lose flush requests
when another is pending. This whole thing is so expensive anyways
that it doesn't matter to serialize more. -AK */
spin_lock_irqsave(&gart_lock, flags);
flushed = 0;
- for (i = 0; i < num_k8_northbridges; i++) {
- pci_write_config_dword(k8_northbridges[i], 0x9c,
+ for (i = 0; i < k8_northbridges.num; i++) {
+ pci_write_config_dword(k8_northbridges.nb_misc[i], 0x9c,
flush_words[i]|1);
flushed++;
}
- for (i = 0; i < num_k8_northbridges; i++) {
+ for (i = 0; i < k8_northbridges.num; i++) {
u32 w;
/* Make sure the hardware actually executed the flush*/
for (;;) {
- pci_read_config_dword(k8_northbridges[i],
+ pci_read_config_dword(k8_northbridges.nb_misc[i],
0x9c, &w);
if (!(w & 1))
break;
/* Don't register boot CPU clockevent */
cpu = smp_processor_id();
- if (cpu == boot_cpu_id)
+ if (!cpu)
return;
/*
* We need to calculate the scaled math multiplication factor for
}
break;
default:
- pr_debug(KERN_INFO "APBT notified %lu, no action\n", action);
+ pr_debug("APBT notified %lu, no action\n", action);
}
return NOTIFY_OK;
}
pr_debug("APB CS going back %lx:%lx:%lx ",
t2, last_read, t2 - last_read);
bad_count_x3:
- pr_debug(KERN_INFO "tripple check enforced\n");
+ pr_debug("triple check enforced\n");
t0 = apbt_readl(phy_cs_timer_id,
APBTMR_N_CURRENT_VALUE);
udelay(1);
#include <asm/gart.h>
#include <asm/pci-direct.h>
#include <asm/dma.h>
-#include <asm/k8.h>
+#include <asm/amd_nb.h>
#include <asm/x86_init.h>
int gart_iommu_aperture;
continue;
ctl = read_pci_config(bus, slot, 3, AMD64_GARTAPERTURECTL);
- aper_enabled = ctl & AMD64_GARTEN;
+ aper_enabled = ctl & GARTEN;
aper_order = (ctl >> 1) & 7;
aper_size = (32 * 1024 * 1024) << aper_order;
aper_base = read_pci_config(bus, slot, 3, AMD64_GARTAPERTUREBASE) & 0x7fff;
continue;
ctl = read_pci_config(bus, slot, 3, AMD64_GARTAPERTURECTL);
- ctl &= ~AMD64_GARTEN;
+ ctl &= ~GARTEN;
write_pci_config(bus, slot, 3, AMD64_GARTAPERTURECTL, ctl);
}
}
/* Fix up the north bridges */
for (i = 0; i < ARRAY_SIZE(bus_dev_ranges); i++) {
- int bus;
- int dev_base, dev_limit;
+ int bus, dev_base, dev_limit;
+
+ /*
+ * Don't enable translation yet but enable GART IO and CPU
+ * accesses and set DISTLBWALKPRB since GART table memory is UC.
+ */
+ u32 ctl = DISTLBWALKPRB | aper_order << 1;
bus = bus_dev_ranges[i].bus;
dev_base = bus_dev_ranges[i].dev_base;
if (!early_is_k8_nb(read_pci_config(bus, slot, 3, 0x00)))
continue;
- /* Don't enable translation yet. That is done later.
- Assume this BIOS didn't initialise the GART so
- just overwrite all previous bits */
- write_pci_config(bus, slot, 3, AMD64_GARTAPERTURECTL, aper_order << 1);
+ write_pci_config(bus, slot, 3, AMD64_GARTAPERTURECTL, ctl);
write_pci_config(bus, slot, 3, AMD64_GARTAPERTUREBASE, aper_alloc >> 25);
}
}
}
#endif
-#ifndef CONFIG_SMP
- enable_IR_x2apic();
default_setup_apic_routing();
-#endif
verify_local_APIC();
connect_bsp_APIC();
cfg = irq_cfgx;
count = ARRAY_SIZE(irq_cfgx);
- node= cpu_to_node(boot_cpu_id);
+ node = cpu_to_node(0);
for (i = 0; i < count; i++) {
desc = irq_to_desc(i);
old_cfg = old_desc->chip_data;
- memcpy(cfg, old_cfg, sizeof(struct irq_cfg));
+ cfg->vector = old_cfg->vector;
+ cfg->move_in_progress = old_cfg->move_in_progress;
+ cpumask_copy(cfg->domain, old_cfg->domain);
+ cpumask_copy(cfg->old_domain, old_cfg->old_domain);
init_copy_irq_2_pin(old_cfg, cfg, node);
}
-static void free_irq_cfg(struct irq_cfg *old_cfg)
+static void free_irq_cfg(struct irq_cfg *cfg)
{
- kfree(old_cfg);
+ free_cpumask_var(cfg->domain);
+ free_cpumask_var(cfg->old_domain);
+ kfree(cfg);
}
void arch_free_chip_data(struct irq_desc *old_desc, struct irq_desc *desc)
if (index < 0)
panic("Failed to allocate IRTE for ioapic %d\n", apic_id);
- memset(&irte, 0, sizeof(irte));
-
- irte.present = 1;
- irte.dst_mode = apic->irq_dest_mode;
- /*
- * Trigger mode in the IRTE will always be edge, and the
- * actual level or edge trigger will be setup in the IO-APIC
- * RTE. This will help simplify level triggered irq migration.
- * For more details, see the comments above explainig IO-APIC
- * irq migration in the presence of interrupt-remapping.
- */
- irte.trigger_mode = 0;
- irte.dlvry_mode = apic->irq_delivery_mode;
- irte.vector = vector;
- irte.dest_id = IRTE_DEST(destination);
+ prepare_irte(&irte, vector, destination);
/* Set source-id of interrupt request */
set_ioapic_sid(&irte, apic_id);
int notcon = 0;
struct irq_desc *desc;
struct irq_cfg *cfg;
- int node = cpu_to_node(boot_cpu_id);
+ int node = cpu_to_node(0);
apic_printk(APIC_VERBOSE, KERN_DEBUG "init IO_APIC IRQs\n");
void setup_IO_APIC_irq_extra(u32 gsi)
{
int apic_id = 0, pin, idx, irq;
- int node = cpu_to_node(boot_cpu_id);
+ int node = cpu_to_node(0);
struct irq_desc *desc;
struct irq_cfg *cfg;
{
struct irq_desc *desc = irq_to_desc(0);
struct irq_cfg *cfg = desc->chip_data;
- int node = cpu_to_node(boot_cpu_id);
+ int node = cpu_to_node(0);
int apic1, pin1, apic2, pin2;
unsigned long flags;
int no_pin1 = 0;
int create_irq(void)
{
- int node = cpu_to_node(boot_cpu_id);
+ int node = cpu_to_node(0);
unsigned int irq_want;
int irq;
ir_index = map_irq_to_irte_handle(irq, &sub_handle);
BUG_ON(ir_index == -1);
- memset (&irte, 0, sizeof(irte));
-
- irte.present = 1;
- irte.dst_mode = apic->irq_dest_mode;
- irte.trigger_mode = 0; /* edge */
- irte.dlvry_mode = apic->irq_delivery_mode;
- irte.vector = cfg->vector;
- irte.dest_id = IRTE_DEST(dest);
+ prepare_irte(&irte, cfg->vector, dest);
/* Set source-id of interrupt request */
if (pdev)
if (dev)
node = dev_to_node(dev);
else
- node = cpu_to_node(boot_cpu_id);
+ node = cpu_to_node(0);
desc = irq_to_desc_alloc_node(irq, node);
if (!desc) {
*/
void __init default_setup_apic_routing(void)
{
+
+ enable_IR_x2apic();
+
#ifdef CONFIG_X86_X2APIC
if (x2apic_mode
#ifdef CONFIG_X86_UV
for (j = 0; j < 64; j++) {
if (!test_bit(j, &present))
continue;
- uv_blade_info[blade].pnode = (i * 64 + j);
+ pnode = (i * 64 + j);
+ uv_blade_info[blade].pnode = pnode;
uv_blade_info[blade].nr_possible_cpus = 0;
uv_blade_info[blade].nr_online_cpus = 0;
+ max_pnode = max(pnode, max_pnode);
blade++;
}
}
uv_cpu_hub_info(cpu)->scir.offset = uv_scir_offset(apicid);
uv_node_to_blade[nid] = blade;
uv_cpu_to_blade[cpu] = blade;
- max_pnode = max(pnode, max_pnode);
}
/* Add blade/pnode info for nodes without cpus */
pnode = (paddr >> m_val) & pnode_mask;
blade = boot_pnode_to_blade(pnode);
uv_node_to_blade[nid] = blade;
- max_pnode = max(pnode, max_pnode);
}
map_gru_high(max_pnode);
{
#ifdef CONFIG_SMP
/* calling is from identify_secondary_cpu() ? */
- if (c->cpu_index == boot_cpu_id)
+ if (!c->cpu_index)
return;
/*
#endif
/*
- * Fixup core topology information for AMD multi-node processors.
- * Assumption: Number of cores in each internal node is the same.
+ * Fixup core topology information for
+ * (1) AMD multi-node processors
+ * Assumption: Number of cores in each internal node is the same.
+ * (2) AMD processors supporting compute units
*/
#ifdef CONFIG_X86_HT
-static void __cpuinit amd_fixup_dcm(struct cpuinfo_x86 *c)
+static void __cpuinit amd_get_topology(struct cpuinfo_x86 *c)
{
- unsigned long long value;
- u32 nodes, cores_per_node;
+ u32 nodes;
+ u8 node_id;
int cpu = smp_processor_id();
- if (!cpu_has(c, X86_FEATURE_NODEID_MSR))
- return;
+ /* get information required for multi-node processors */
+ if (cpu_has(c, X86_FEATURE_TOPOEXT)) {
+ u32 eax, ebx, ecx, edx;
- /* fixup topology information only once for a core */
- if (cpu_has(c, X86_FEATURE_AMD_DCM))
- return;
+ cpuid(0x8000001e, &eax, &ebx, &ecx, &edx);
+ nodes = ((ecx >> 8) & 7) + 1;
+ node_id = ecx & 7;
- rdmsrl(MSR_FAM10H_NODE_ID, value);
+ /* get compute unit information */
+ smp_num_siblings = ((ebx >> 8) & 3) + 1;
+ c->compute_unit_id = ebx & 0xff;
+ } else if (cpu_has(c, X86_FEATURE_NODEID_MSR)) {
+ u64 value;
- nodes = ((value >> 3) & 7) + 1;
- if (nodes == 1)
+ rdmsrl(MSR_FAM10H_NODE_ID, value);
+ nodes = ((value >> 3) & 7) + 1;
+ node_id = value & 7;
+ } else
return;
- set_cpu_cap(c, X86_FEATURE_AMD_DCM);
- cores_per_node = c->x86_max_cores / nodes;
+ /* fixup multi-node processor information */
+ if (nodes > 1) {
+ u32 cores_per_node;
+
+ set_cpu_cap(c, X86_FEATURE_AMD_DCM);
+ cores_per_node = c->x86_max_cores / nodes;
- /* store NodeID, use llc_shared_map to store sibling info */
- per_cpu(cpu_llc_id, cpu) = value & 7;
+ /* store NodeID, use llc_shared_map to store sibling info */
+ per_cpu(cpu_llc_id, cpu) = node_id;
- /* fixup core id to be in range from 0 to (cores_per_node - 1) */
- c->cpu_core_id = c->cpu_core_id % cores_per_node;
+ /* core id to be in range from 0 to (cores_per_node - 1) */
+ c->cpu_core_id = c->cpu_core_id % cores_per_node;
+ }
}
#endif
c->phys_proc_id = c->initial_apicid >> bits;
/* use socket ID also for last level cache */
per_cpu(cpu_llc_id, cpu) = c->phys_proc_id;
- /* fixup topology information on multi-node processors */
- if ((c->x86 == 0x10) && (c->x86_model == 9))
- amd_fixup_dcm(c);
+ amd_get_topology(c);
#endif
}
set_cpu_cap(c, X86_FEATURE_EXTD_APICID);
}
#endif
+
+ /* We need to do the following only once */
+ if (c != &boot_cpu_data)
+ return;
+
+ if (cpu_has(c, X86_FEATURE_CONSTANT_TSC)) {
+
+ if (c->x86 > 0x10 ||
+ (c->x86 == 0x10 && c->x86_model >= 0x2)) {
+ u64 val;
+
+ rdmsrl(MSR_K7_HWCR, val);
+ if (!(val & BIT(24)))
+ printk(KERN_WARNING FW_BUG "TSC doesn't count "
+ "with P0 frequency!\n");
+ }
+ }
}
static void __cpuinit init_amd(struct cpuinfo_x86 *c)
#endif
if (c->extended_cpuid_level >= 0x80000006) {
- if ((c->x86 >= 0x0f) && (cpuid_edx(0x80000006) & 0xf000))
+ if (cpuid_edx(0x80000006) & 0xf000)
num_cache_leaves = 4;
else
num_cache_leaves = 3;
}
}
-static void __cpuinit get_cpu_cap(struct cpuinfo_x86 *c)
+void __cpuinit get_cpu_cap(struct cpuinfo_x86 *c)
{
u32 tfms, xlvl;
u32 ebx;
this_cpu->c_early_init(c);
#ifdef CONFIG_SMP
- c->cpu_index = boot_cpu_id;
+ c->cpu_index = 0;
#endif
filter_cpuid_features(c, false);
}
}
/*
- * The NOPL instruction is supposed to exist on all CPUs with
- * family >= 6; unfortunately, that's not true in practice because
- * of early VIA chips and (more importantly) broken virtualizers that
- * are not easy to detect. In the latter case it doesn't even *fail*
- * reliably, so probing for it doesn't even work. Disable it completely
+ * The NOPL instruction is supposed to exist on all CPUs of family >= 6;
+ * unfortunately, that's not true in practice because of early VIA
+ * chips and (more importantly) broken virtualizers that are not easy
+ * to detect. In the latter case it doesn't even *fail* reliably, so
+ * probing for it doesn't even work. Disable it completely on 32-bit
* unless we can find a reliable way to detect all the broken cases.
+ * Enable it explicitly on 64-bit for non-constant inputs of cpu_has().
*/
static void __cpuinit detect_nopl(struct cpuinfo_x86 *c)
{
+#ifdef CONFIG_X86_32
clear_cpu_cap(c, X86_FEATURE_NOPL);
+#else
+ set_cpu_cap(c, X86_FEATURE_NOPL);
+#endif
}
static void __cpuinit generic_identify(struct cpuinfo_x86 *c)
clear_all_debug_regs();
dbg_restore_debug_regs();
- /*
- * Force FPU initialization:
- */
- current_thread_info()->status = 0;
- clear_used_math();
- mxcsr_feature_mask_init();
-
fpu_init();
xsave_init();
}
extern const struct cpu_dev *const __x86_cpu_dev_start[],
*const __x86_cpu_dev_end[];
+extern void get_cpu_cap(struct cpuinfo_x86 *c);
extern void cpu_detect_cache_sizes(struct cpuinfo_x86 *c);
+extern void get_cpu_cap(struct cpuinfo_x86 *c);
#endif
return -ENODEV;
out_obj = output.pointer;
- if (out_obj->type != ACPI_TYPE_BUFFER)
- return -ENODEV;
+ if (out_obj->type != ACPI_TYPE_BUFFER) {
+ ret = -ENODEV;
+ goto out_free;
+ }
errors = *((u32 *)out_obj->buffer.pointer) & ~(1 << 0);
- if (errors)
- return -ENODEV;
+ if (errors) {
+ ret = -ENODEV;
+ goto out_free;
+ }
supported = *((u32 *)(out_obj->buffer.pointer + 4));
- if (!(supported & 0x1))
- return -ENODEV;
+ if (!(supported & 0x1)) {
+ ret = -ENODEV;
+ goto out_free;
+ }
out_free:
kfree(output.pointer);
misc_enable &= ~MSR_IA32_MISC_ENABLE_LIMIT_CPUID;
wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
c->cpuid_level = cpuid_eax(0);
+ get_cpu_cap(c);
}
}
{
#ifdef CONFIG_SMP
/* calling is from identify_secondary_cpu() ? */
- if (c->cpu_index == boot_cpu_id)
+ if (!c->cpu_index)
return;
/*
#include <asm/processor.h>
#include <linux/smp.h>
-#include <asm/k8.h>
+#include <asm/amd_nb.h>
#include <asm/smp.h>
#define LVL_1_INST 1
ssize_t (*store)(struct _cpuid4_info *, const char *, size_t count);
};
-#ifdef CONFIG_CPU_SUP_AMD
+#ifdef CONFIG_AMD_NB
/*
* L3 cache descriptors
return;
/* not in virtualized environments */
- if (num_k8_northbridges == 0)
+ if (k8_northbridges.num == 0)
return;
/*
* never freed but this is done only on shutdown so it doesn't matter.
*/
if (!l3_caches) {
- int size = num_k8_northbridges * sizeof(struct amd_l3_cache *);
+ int size = k8_northbridges.num * sizeof(struct amd_l3_cache *);
l3_caches = kzalloc(size, GFP_ATOMIC);
if (!l3_caches)
static struct _cache_attr cache_disable_1 = __ATTR(cache_disable_1, 0644,
show_cache_disable_1, store_cache_disable_1);
-#else /* CONFIG_CPU_SUP_AMD */
+#else /* CONFIG_AMD_NB */
static void __cpuinit
amd_check_l3_disable(struct _cpuid4_info_regs *this_leaf, int index)
{
};
-#endif /* CONFIG_CPU_SUP_AMD */
+#endif /* CONFIG_AMD_NB */
static int
__cpuinit cpuid4_cache_lookup_regs(int index,
static struct attribute *default_l3_attrs[] = {
DEFAULT_SYSFS_CACHE_ATTRS,
-#ifdef CONFIG_CPU_SUP_AMD
+#ifdef CONFIG_AMD_NB
&cache_disable_0.attr,
&cache_disable_1.attr,
#endif
address = (low & MASK_BLKPTR_LO) >> 21;
if (!address)
break;
+
address += MCG_XBLK_ADDR;
} else
++address;
if (rdmsr_safe(address, &low, &high))
break;
- if (!(high & MASK_VALID_HI)) {
- if (block)
- continue;
- else
- break;
- }
+ if (!(high & MASK_VALID_HI))
+ continue;
if (!(high & MASK_CNTP_HI) ||
(high & MASK_LOCKED_HI))
err = -ENOMEM;
goto out;
}
- if (!alloc_cpumask_var(&b->cpus, GFP_KERNEL)) {
+ if (!zalloc_cpumask_var(&b->cpus, GFP_KERNEL)) {
kfree(b);
err = -ENOMEM;
goto out;
#ifndef CONFIG_SMP
cpumask_setall(b->cpus);
#else
- cpumask_copy(b->cpus, c->llc_shared_map);
+ cpumask_set_cpu(cpu, b->cpus);
#endif
per_cpu(threshold_banks, cpu)[bank] = b;
#ifdef CONFIG_SYSFS
/* Add/Remove thermal_throttle interface for CPU device: */
-static __cpuinit int thermal_throttle_add_dev(struct sys_device *sys_dev)
+static __cpuinit int thermal_throttle_add_dev(struct sys_device *sys_dev,
+ unsigned int cpu)
{
int err;
- struct cpuinfo_x86 *c = &cpu_data(smp_processor_id());
+ struct cpuinfo_x86 *c = &cpu_data(cpu);
err = sysfs_create_group(&sys_dev->kobj, &thermal_attr_group);
if (err)
err = sysfs_add_file_to_group(&sys_dev->kobj,
&attr_core_power_limit_count.attr,
thermal_attr_group.name);
- if (cpu_has(c, X86_FEATURE_PTS))
+ if (cpu_has(c, X86_FEATURE_PTS)) {
err = sysfs_add_file_to_group(&sys_dev->kobj,
&attr_package_throttle_count.attr,
thermal_attr_group.name);
err = sysfs_add_file_to_group(&sys_dev->kobj,
&attr_package_power_limit_count.attr,
thermal_attr_group.name);
+ }
return err;
}
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
mutex_lock(&therm_cpu_lock);
- err = thermal_throttle_add_dev(sys_dev);
+ err = thermal_throttle_add_dev(sys_dev, cpu);
mutex_unlock(&therm_cpu_lock);
WARN_ON(err);
break;
#endif
/* connect live CPUs to sysfs */
for_each_online_cpu(cpu) {
- err = thermal_throttle_add_dev(get_cpu_sysdev(cpu));
+ err = thermal_throttle_add_dev(get_cpu_sysdev(cpu), cpu);
WARN_ON(err);
}
#ifdef CONFIG_HOTPLUG_CPU
static void unexpected_thermal_interrupt(void)
{
- printk(KERN_ERR "CPU%d: Unexpected LVT TMR interrupt!\n",
+ printk(KERN_ERR "CPU%d: Unexpected LVT thermal interrupt!\n",
smp_processor_id());
add_taint(TAINT_MACHINE_CHECK);
}
if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
return 0;
- if (boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x11)
+ if (boot_cpu_data.x86 < 0xf)
return 0;
/* In case some hypervisor doesn't pass SYSCFG through: */
if (rdmsr_safe(MSR_K8_SYSCFG, &l, &h) < 0)
}
}
+/* Get the size of contiguous MTRR range */
+static u64 get_mtrr_size(u64 mask)
+{
+ u64 size;
+
+ mask >>= PAGE_SHIFT;
+ mask |= size_or_mask;
+ size = -mask;
+ size <<= PAGE_SHIFT;
+ return size;
+}
+
/*
- * Returns the effective MTRR type for the region
- * Error returns:
- * - 0xFE - when the range is "not entirely covered" by _any_ var range MTRR
- * - 0xFF - when MTRR is not enabled
+ * Check and return the effective type for MTRR-MTRR type overlap.
+ * Returns 1 if the effective type is UNCACHEABLE, else returns 0
*/
-u8 mtrr_type_lookup(u64 start, u64 end)
+static int check_type_overlap(u8 *prev, u8 *curr)
+{
+ if (*prev == MTRR_TYPE_UNCACHABLE || *curr == MTRR_TYPE_UNCACHABLE) {
+ *prev = MTRR_TYPE_UNCACHABLE;
+ *curr = MTRR_TYPE_UNCACHABLE;
+ return 1;
+ }
+
+ if ((*prev == MTRR_TYPE_WRBACK && *curr == MTRR_TYPE_WRTHROUGH) ||
+ (*prev == MTRR_TYPE_WRTHROUGH && *curr == MTRR_TYPE_WRBACK)) {
+ *prev = MTRR_TYPE_WRTHROUGH;
+ *curr = MTRR_TYPE_WRTHROUGH;
+ }
+
+ if (*prev != *curr) {
+ *prev = MTRR_TYPE_UNCACHABLE;
+ *curr = MTRR_TYPE_UNCACHABLE;
+ return 1;
+ }
+
+ return 0;
+}
+
+/*
+ * Error/Semi-error returns:
+ * 0xFF - when MTRR is not enabled
+ * *repeat == 1 implies [start:end] spanned across MTRR range and type returned
+ * corresponds only to [start:*partial_end].
+ * Caller has to lookup again for [*partial_end:end].
+ */
+static u8 __mtrr_type_lookup(u64 start, u64 end, u64 *partial_end, int *repeat)
{
int i;
u64 base, mask;
u8 prev_match, curr_match;
+ *repeat = 0;
if (!mtrr_state_set)
return 0xFF;
start_state = ((start & mask) == (base & mask));
end_state = ((end & mask) == (base & mask));
- if (start_state != end_state)
- return 0xFE;
+
+ if (start_state != end_state) {
+ /*
+ * We have start:end spanning across an MTRR.
+ * We split the region into
+ * either
+ * (start:mtrr_end) (mtrr_end:end)
+ * or
+ * (start:mtrr_start) (mtrr_start:end)
+ * depending on kind of overlap.
+ * Return the type for first region and a pointer to
+ * the start of second region so that caller will
+ * lookup again on the second region.
+ * Note: This way we handle multiple overlaps as well.
+ */
+ if (start_state)
+ *partial_end = base + get_mtrr_size(mask);
+ else
+ *partial_end = base;
+
+ if (unlikely(*partial_end <= start)) {
+ WARN_ON(1);
+ *partial_end = start + PAGE_SIZE;
+ }
+
+ end = *partial_end - 1; /* end is inclusive */
+ *repeat = 1;
+ }
if ((start & mask) != (base & mask))
continue;
continue;
}
- if (prev_match == MTRR_TYPE_UNCACHABLE ||
- curr_match == MTRR_TYPE_UNCACHABLE) {
- return MTRR_TYPE_UNCACHABLE;
- }
-
- if ((prev_match == MTRR_TYPE_WRBACK &&
- curr_match == MTRR_TYPE_WRTHROUGH) ||
- (prev_match == MTRR_TYPE_WRTHROUGH &&
- curr_match == MTRR_TYPE_WRBACK)) {
- prev_match = MTRR_TYPE_WRTHROUGH;
- curr_match = MTRR_TYPE_WRTHROUGH;
- }
-
- if (prev_match != curr_match)
- return MTRR_TYPE_UNCACHABLE;
+ if (check_type_overlap(&prev_match, &curr_match))
+ return curr_match;
}
if (mtrr_tom2) {
return mtrr_state.def_type;
}
+/*
+ * Returns the effective MTRR type for the region
+ * Error return:
+ * 0xFF - when MTRR is not enabled
+ */
+u8 mtrr_type_lookup(u64 start, u64 end)
+{
+ u8 type, prev_type;
+ int repeat;
+ u64 partial_end;
+
+ type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+
+ /*
+ * Common path is with repeat = 0.
+ * However, we can have cases where [start:end] spans across some
+ * MTRR range. Do repeated lookups for that case here.
+ */
+ while (repeat) {
+ prev_type = type;
+ start = partial_end;
+ type = __mtrr_type_lookup(start, end, &partial_end, &repeat);
+
+ if (check_type_overlap(&prev_type, &type))
+ return type;
+ }
+
+ return type;
+}
+
/* Get the MSR pair relating to a var range */
static void
get_mtrr_var_range(unsigned int index, struct mtrr_var_range *vr)
*/
struct perf_event *events[X86_PMC_IDX_MAX]; /* in counter order */
unsigned long active_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
+ unsigned long running[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
int enabled;
int n_events;
/*
* Setup the hardware configuration for a given attr_type
*/
-static int __hw_perf_event_init(struct perf_event *event)
+static int __x86_pmu_event_init(struct perf_event *event)
{
int err;
}
}
-void hw_perf_disable(void)
+static void x86_pmu_disable(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
}
}
-static const struct pmu pmu;
+static struct pmu pmu;
static inline int is_x86_event(struct perf_event *event)
{
hwc->last_tag == cpuc->tags[i];
}
-static int x86_pmu_start(struct perf_event *event);
-static void x86_pmu_stop(struct perf_event *event);
+static void x86_pmu_start(struct perf_event *event, int flags);
+static void x86_pmu_stop(struct perf_event *event, int flags);
-void hw_perf_enable(void)
+static void x86_pmu_enable(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct perf_event *event;
match_prev_assignment(hwc, cpuc, i))
continue;
- x86_pmu_stop(event);
+ /*
+ * Ensure we don't accidentally enable a stopped
+ * counter simply because we rescheduled.
+ */
+ if (hwc->state & PERF_HES_STOPPED)
+ hwc->state |= PERF_HES_ARCH;
+
+ x86_pmu_stop(event, PERF_EF_UPDATE);
}
for (i = 0; i < cpuc->n_events; i++) {
else if (i < n_running)
continue;
- x86_pmu_start(event);
+ if (hwc->state & PERF_HES_ARCH)
+ continue;
+
+ x86_pmu_start(event, PERF_EF_RELOAD);
}
cpuc->n_added = 0;
perf_events_lapic_init();
}
/*
- * activate a single event
+ * Add a single event to the PMU.
*
* The event is added to the group of enabled events
* but only if it can be scehduled with existing events.
- *
- * Called with PMU disabled. If successful and return value 1,
- * then guaranteed to call perf_enable() and hw_perf_enable()
*/
-static int x86_pmu_enable(struct perf_event *event)
+static int x86_pmu_add(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct hw_perf_event *hwc;
hwc = &event->hw;
+ perf_pmu_disable(event->pmu);
n0 = cpuc->n_events;
- n = collect_events(cpuc, event, false);
- if (n < 0)
- return n;
+ ret = n = collect_events(cpuc, event, false);
+ if (ret < 0)
+ goto out;
+
+ hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+ if (!(flags & PERF_EF_START))
+ hwc->state |= PERF_HES_ARCH;
/*
* If group events scheduling transaction was started,
* skip the schedulability test here, it will be peformed
- * at commit time(->commit_txn) as a whole
+ * at commit time (->commit_txn) as a whole
*/
if (cpuc->group_flag & PERF_EVENT_TXN)
- goto out;
+ goto done_collect;
ret = x86_pmu.schedule_events(cpuc, n, assign);
if (ret)
- return ret;
+ goto out;
/*
* copy new assignment, now we know it is possible
* will be used by hw_perf_enable()
*/
memcpy(cpuc->assign, assign, n*sizeof(int));
-out:
+done_collect:
cpuc->n_events = n;
cpuc->n_added += n - n0;
cpuc->n_txn += n - n0;
- return 0;
+ ret = 0;
+out:
+ perf_pmu_enable(event->pmu);
+ return ret;
}
-static int x86_pmu_start(struct perf_event *event)
+static void x86_pmu_start(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int idx = event->hw.idx;
- if (idx == -1)
- return -EAGAIN;
+ if (WARN_ON_ONCE(!(event->hw.state & PERF_HES_STOPPED)))
+ return;
+
+ if (WARN_ON_ONCE(idx == -1))
+ return;
+
+ if (flags & PERF_EF_RELOAD) {
+ WARN_ON_ONCE(!(event->hw.state & PERF_HES_UPTODATE));
+ x86_perf_event_set_period(event);
+ }
+
+ event->hw.state = 0;
- x86_perf_event_set_period(event);
cpuc->events[idx] = event;
__set_bit(idx, cpuc->active_mask);
+ __set_bit(idx, cpuc->running);
x86_pmu.enable(event);
perf_event_update_userpage(event);
-
- return 0;
-}
-
-static void x86_pmu_unthrottle(struct perf_event *event)
-{
- int ret = x86_pmu_start(event);
- WARN_ON_ONCE(ret);
}
void perf_event_print_debug(void)
local_irq_restore(flags);
}
-static void x86_pmu_stop(struct perf_event *event)
+static void x86_pmu_stop(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct hw_perf_event *hwc = &event->hw;
- int idx = hwc->idx;
-
- if (!__test_and_clear_bit(idx, cpuc->active_mask))
- return;
-
- x86_pmu.disable(event);
- /*
- * Drain the remaining delta count out of a event
- * that we are disabling:
- */
- x86_perf_event_update(event);
+ if (__test_and_clear_bit(hwc->idx, cpuc->active_mask)) {
+ x86_pmu.disable(event);
+ cpuc->events[hwc->idx] = NULL;
+ WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+ hwc->state |= PERF_HES_STOPPED;
+ }
- cpuc->events[idx] = NULL;
+ if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+ /*
+ * Drain the remaining delta count out of a event
+ * that we are disabling:
+ */
+ x86_perf_event_update(event);
+ hwc->state |= PERF_HES_UPTODATE;
+ }
}
-static void x86_pmu_disable(struct perf_event *event)
+static void x86_pmu_del(struct perf_event *event, int flags)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int i;
if (cpuc->group_flag & PERF_EVENT_TXN)
return;
- x86_pmu_stop(event);
+ x86_pmu_stop(event, PERF_EF_UPDATE);
for (i = 0; i < cpuc->n_events; i++) {
if (event == cpuc->event_list[i]) {
struct perf_sample_data data;
struct cpu_hw_events *cpuc;
struct perf_event *event;
- struct hw_perf_event *hwc;
int idx, handled = 0;
u64 val;
cpuc = &__get_cpu_var(cpu_hw_events);
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
- if (!test_bit(idx, cpuc->active_mask))
+ if (!test_bit(idx, cpuc->active_mask)) {
+ /*
+ * Though we deactivated the counter some cpus
+ * might still deliver spurious interrupts still
+ * in flight. Catch them:
+ */
+ if (__test_and_clear_bit(idx, cpuc->running))
+ handled++;
continue;
+ }
event = cpuc->events[idx];
- hwc = &event->hw;
val = x86_perf_event_update(event);
if (val & (1ULL << (x86_pmu.cntval_bits - 1)))
/*
* event overflow
*/
- handled = 1;
+ handled++;
data.period = event->hw.last_period;
if (!x86_perf_event_set_period(event))
continue;
if (perf_event_overflow(event, 1, &data, regs))
- x86_pmu_stop(event);
+ x86_pmu_stop(event, 0);
}
if (handled)
return handled;
}
-void smp_perf_pending_interrupt(struct pt_regs *regs)
-{
- irq_enter();
- ack_APIC_irq();
- inc_irq_stat(apic_pending_irqs);
- perf_event_do_pending();
- irq_exit();
-}
-
-void set_perf_event_pending(void)
-{
-#ifdef CONFIG_X86_LOCAL_APIC
- if (!x86_pmu.apic || !x86_pmu_initialized())
- return;
-
- apic->send_IPI_self(LOCAL_PENDING_VECTOR);
-#endif
-}
-
void perf_events_lapic_init(void)
{
if (!x86_pmu.apic || !x86_pmu_initialized())
apic_write(APIC_LVTPC, APIC_DM_NMI);
}
+struct pmu_nmi_state {
+ unsigned int marked;
+ int handled;
+};
+
+static DEFINE_PER_CPU(struct pmu_nmi_state, pmu_nmi);
+
static int __kprobes
perf_event_nmi_handler(struct notifier_block *self,
unsigned long cmd, void *__args)
{
struct die_args *args = __args;
- struct pt_regs *regs;
+ unsigned int this_nmi;
+ int handled;
if (!atomic_read(&active_events))
return NOTIFY_DONE;
case DIE_NMI:
case DIE_NMI_IPI:
break;
-
+ case DIE_NMIUNKNOWN:
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if (this_nmi != __get_cpu_var(pmu_nmi).marked)
+ /* let the kernel handle the unknown nmi */
+ return NOTIFY_DONE;
+ /*
+ * This one is a PMU back-to-back nmi. Two events
+ * trigger 'simultaneously' raising two back-to-back
+ * NMIs. If the first NMI handles both, the latter
+ * will be empty and daze the CPU. So, we drop it to
+ * avoid false-positive 'unknown nmi' messages.
+ */
+ return NOTIFY_STOP;
default:
return NOTIFY_DONE;
}
- regs = args->regs;
-
apic_write(APIC_LVTPC, APIC_DM_NMI);
- /*
- * Can't rely on the handled return value to say it was our NMI, two
- * events could trigger 'simultaneously' raising two back-to-back NMIs.
- *
- * If the first NMI handles both, the latter will be empty and daze
- * the CPU.
- */
- x86_pmu.handle_irq(regs);
+
+ handled = x86_pmu.handle_irq(args->regs);
+ if (!handled)
+ return NOTIFY_DONE;
+
+ this_nmi = percpu_read(irq_stat.__nmi_count);
+ if ((handled > 1) ||
+ /* the next nmi could be a back-to-back nmi */
+ ((__get_cpu_var(pmu_nmi).marked == this_nmi) &&
+ (__get_cpu_var(pmu_nmi).handled > 1))) {
+ /*
+ * We could have two subsequent back-to-back nmis: The
+ * first handles more than one counter, the 2nd
+ * handles only one counter and the 3rd handles no
+ * counter.
+ *
+ * This is the 2nd nmi because the previous was
+ * handling more than one counter. We will mark the
+ * next (3rd) and then drop it if unhandled.
+ */
+ __get_cpu_var(pmu_nmi).marked = this_nmi + 1;
+ __get_cpu_var(pmu_nmi).handled = handled;
+ }
return NOTIFY_STOP;
}
x86_pmu.num_counters = X86_PMC_MAX_GENERIC;
}
x86_pmu.intel_ctrl = (1 << x86_pmu.num_counters) - 1;
- perf_max_events = x86_pmu.num_counters;
if (x86_pmu.num_counters_fixed > X86_PMC_MAX_FIXED) {
WARN(1, KERN_ERR "hw perf events fixed %d > max(%d), clipping!",
pr_info("... fixed-purpose events: %d\n", x86_pmu.num_counters_fixed);
pr_info("... event mask: %016Lx\n", x86_pmu.intel_ctrl);
+ perf_pmu_register(&pmu);
perf_cpu_notifier(x86_pmu_notifier);
}
* Set the flag to make pmu::enable() not perform the
* schedulability test, it will be performed at commit time
*/
-static void x86_pmu_start_txn(const struct pmu *pmu)
+static void x86_pmu_start_txn(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
+ perf_pmu_disable(pmu);
cpuc->group_flag |= PERF_EVENT_TXN;
cpuc->n_txn = 0;
}
* Clear the flag and pmu::enable() will perform the
* schedulability test.
*/
-static void x86_pmu_cancel_txn(const struct pmu *pmu)
+static void x86_pmu_cancel_txn(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
*/
cpuc->n_added -= cpuc->n_txn;
cpuc->n_events -= cpuc->n_txn;
+ perf_pmu_enable(pmu);
}
/*
* Perform the group schedulability test as a whole
* Return 0 if success
*/
-static int x86_pmu_commit_txn(const struct pmu *pmu)
+static int x86_pmu_commit_txn(struct pmu *pmu)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
int assign[X86_PMC_IDX_MAX];
memcpy(cpuc->assign, assign, n*sizeof(int));
cpuc->group_flag &= ~PERF_EVENT_TXN;
-
+ perf_pmu_enable(pmu);
return 0;
}
-static const struct pmu pmu = {
- .enable = x86_pmu_enable,
- .disable = x86_pmu_disable,
- .start = x86_pmu_start,
- .stop = x86_pmu_stop,
- .read = x86_pmu_read,
- .unthrottle = x86_pmu_unthrottle,
- .start_txn = x86_pmu_start_txn,
- .cancel_txn = x86_pmu_cancel_txn,
- .commit_txn = x86_pmu_commit_txn,
-};
-
/*
* validate that we can schedule this event
*/
return ret;
}
-const struct pmu *hw_perf_event_init(struct perf_event *event)
+int x86_pmu_event_init(struct perf_event *event)
{
- const struct pmu *tmp;
+ struct pmu *tmp;
int err;
- err = __hw_perf_event_init(event);
+ switch (event->attr.type) {
+ case PERF_TYPE_RAW:
+ case PERF_TYPE_HARDWARE:
+ case PERF_TYPE_HW_CACHE:
+ break;
+
+ default:
+ return -ENOENT;
+ }
+
+ err = __x86_pmu_event_init(event);
if (!err) {
/*
* we temporarily connect event to its pmu
if (err) {
if (event->destroy)
event->destroy(event);
- return ERR_PTR(err);
}
- return &pmu;
+ return err;
}
-/*
- * callchain support
- */
+static struct pmu pmu = {
+ .pmu_enable = x86_pmu_enable,
+ .pmu_disable = x86_pmu_disable,
-static inline
-void callchain_store(struct perf_callchain_entry *entry, u64 ip)
-{
- if (entry->nr < PERF_MAX_STACK_DEPTH)
- entry->ip[entry->nr++] = ip;
-}
+ .event_init = x86_pmu_event_init,
-static DEFINE_PER_CPU(struct perf_callchain_entry, pmc_irq_entry);
-static DEFINE_PER_CPU(struct perf_callchain_entry, pmc_nmi_entry);
+ .add = x86_pmu_add,
+ .del = x86_pmu_del,
+ .start = x86_pmu_start,
+ .stop = x86_pmu_stop,
+ .read = x86_pmu_read,
+ .start_txn = x86_pmu_start_txn,
+ .cancel_txn = x86_pmu_cancel_txn,
+ .commit_txn = x86_pmu_commit_txn,
+};
+
+/*
+ * callchain support
+ */
static void
backtrace_warning_symbol(void *data, char *msg, unsigned long symbol)
{
struct perf_callchain_entry *entry = data;
- callchain_store(entry, addr);
+ perf_callchain_store(entry, addr);
}
static const struct stacktrace_ops backtrace_ops = {
.walk_stack = print_context_stack_bp,
};
-static void
-perf_callchain_kernel(struct pt_regs *regs, struct perf_callchain_entry *entry)
+void
+perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
{
- callchain_store(entry, PERF_CONTEXT_KERNEL);
- callchain_store(entry, regs->ip);
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+ /* TODO: We don't support guest os callchain now */
+ return;
+ }
+
+ perf_callchain_store(entry, regs->ip);
dump_trace(NULL, regs, NULL, regs->bp, &backtrace_ops, entry);
}
if (fp < compat_ptr(regs->sp))
break;
- callchain_store(entry, frame.return_address);
+ perf_callchain_store(entry, frame.return_address);
fp = compat_ptr(frame.next_frame);
}
return 1;
}
#endif
-static void
-perf_callchain_user(struct pt_regs *regs, struct perf_callchain_entry *entry)
+void
+perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
{
struct stack_frame frame;
const void __user *fp;
- if (!user_mode(regs))
- regs = task_pt_regs(current);
+ if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+ /* TODO: We don't support guest os callchain now */
+ return;
+ }
fp = (void __user *)regs->bp;
- callchain_store(entry, PERF_CONTEXT_USER);
- callchain_store(entry, regs->ip);
+ perf_callchain_store(entry, regs->ip);
if (perf_callchain_user32(regs, entry))
return;
if ((unsigned long)fp < regs->sp)
break;
- callchain_store(entry, frame.return_address);
+ perf_callchain_store(entry, frame.return_address);
fp = frame.next_frame;
}
}
-static void
-perf_do_callchain(struct pt_regs *regs, struct perf_callchain_entry *entry)
-{
- int is_user;
-
- if (!regs)
- return;
-
- is_user = user_mode(regs);
-
- if (is_user && current->state != TASK_RUNNING)
- return;
-
- if (!is_user)
- perf_callchain_kernel(regs, entry);
-
- if (current->mm)
- perf_callchain_user(regs, entry);
-}
-
-struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
-{
- struct perf_callchain_entry *entry;
-
- if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
- /* TODO: We don't support guest os callchain now */
- return NULL;
- }
-
- if (in_nmi())
- entry = &__get_cpu_var(pmc_nmi_entry);
- else
- entry = &__get_cpu_var(pmc_irq_entry);
-
- entry->nr = 0;
-
- perf_do_callchain(regs, entry);
-
- return entry;
-}
-
unsigned long perf_instruction_pointer(struct pt_regs *regs)
{
unsigned long ip;
[ C(DTLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0040, /* Data Cache Accesses */
- [ C(RESULT_MISS) ] = 0x0046, /* L1 DTLB and L2 DLTB Miss */
+ [ C(RESULT_MISS) ] = 0x0746, /* L1_DTLB_AND_L2_DLTB_MISS.ALL */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = 0,
[ C(ITLB) ] = {
[ C(OP_READ) ] = {
[ C(RESULT_ACCESS) ] = 0x0080, /* Instruction fecthes */
- [ C(RESULT_MISS) ] = 0x0085, /* Instr. fetch ITLB misses */
+ [ C(RESULT_MISS) ] = 0x0385, /* L1_ITLB_AND_L2_ITLB_MISS.ALL */
},
[ C(OP_WRITE) ] = {
[ C(RESULT_ACCESS) ] = -1,
struct perf_sample_data data;
struct cpu_hw_events *cpuc;
int bit, loops;
- u64 ack, status;
+ u64 status;
+ int handled;
perf_sample_data_init(&data, 0);
cpuc = &__get_cpu_var(cpu_hw_events);
intel_pmu_disable_all();
- intel_pmu_drain_bts_buffer();
+ handled = intel_pmu_drain_bts_buffer();
status = intel_pmu_get_status();
if (!status) {
intel_pmu_enable_all(0);
- return 0;
+ return handled;
}
loops = 0;
again:
+ intel_pmu_ack_status(status);
if (++loops > 100) {
WARN_ONCE(1, "perfevents: irq loop stuck!\n");
perf_event_print_debug();
}
inc_irq_stat(apic_perf_irqs);
- ack = status;
intel_pmu_lbr_read();
/*
* PEBS overflow sets bit 62 in the global status register
*/
- if (__test_and_clear_bit(62, (unsigned long *)&status))
+ if (__test_and_clear_bit(62, (unsigned long *)&status)) {
+ handled++;
x86_pmu.drain_pebs(regs);
+ }
for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
struct perf_event *event = cpuc->events[bit];
+ handled++;
+
if (!test_bit(bit, cpuc->active_mask))
continue;
data.period = event->hw.last_period;
if (perf_event_overflow(event, 1, &data, regs))
- x86_pmu_stop(event);
+ x86_pmu_stop(event, 0);
}
- intel_pmu_ack_status(ack);
-
/*
* Repeat if there is more work to be done:
*/
done:
intel_pmu_enable_all(0);
- return 1;
+ return handled;
}
static struct event_constraint *
update_debugctlmsr(debugctlmsr);
}
-static void intel_pmu_drain_bts_buffer(void)
+static int intel_pmu_drain_bts_buffer(void)
{
struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
struct debug_store *ds = cpuc->ds;
struct pt_regs regs;
if (!event)
- return;
+ return 0;
if (!ds)
- return;
+ return 0;
at = (struct bts_record *)(unsigned long)ds->bts_buffer_base;
top = (struct bts_record *)(unsigned long)ds->bts_index;
if (top <= at)
- return;
+ return 0;
ds->bts_index = ds->bts_buffer_base;
perf_prepare_sample(&header, &data, event, ®s);
if (perf_output_begin(&handle, event, header.size * (top - at), 1, 1))
- return;
+ return 1;
for (; at < top; at++) {
data.ip = at->from;
/* There's new data available. */
event->hw.interrupts++;
event->pending_kill = POLL_IN;
+ return 1;
}
/*
regs.flags &= ~PERF_EFLAGS_EXACT;
if (perf_event_overflow(event, 1, &data, ®s))
- x86_pmu_stop(event);
+ x86_pmu_stop(event, 0);
}
static void intel_pmu_drain_pebs_core(struct pt_regs *iregs)
struct p4_event_bind {
unsigned int opcode; /* Event code and ESCR selector */
unsigned int escr_msr[2]; /* ESCR MSR for this event */
+ unsigned int escr_emask; /* valid ESCR EventMask bits */
+ unsigned int shared; /* event is shared across threads */
char cntr[2][P4_CNTR_LIMIT]; /* counter index (offset), -1 on abscence */
};
[P4_EVENT_TC_DELIVER_MODE] = {
.opcode = P4_OPCODE(P4_EVENT_TC_DELIVER_MODE),
.escr_msr = { MSR_P4_TC_ESCR0, MSR_P4_TC_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_TC_DELIVER_MODE, DD) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_TC_DELIVER_MODE, DB) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_TC_DELIVER_MODE, DI) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_TC_DELIVER_MODE, BD) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_TC_DELIVER_MODE, BB) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_TC_DELIVER_MODE, BI) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_TC_DELIVER_MODE, ID),
+ .shared = 1,
.cntr = { {4, 5, -1}, {6, 7, -1} },
},
[P4_EVENT_BPU_FETCH_REQUEST] = {
.opcode = P4_OPCODE(P4_EVENT_BPU_FETCH_REQUEST),
.escr_msr = { MSR_P4_BPU_ESCR0, MSR_P4_BPU_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_BPU_FETCH_REQUEST, TCMISS),
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_ITLB_REFERENCE] = {
.opcode = P4_OPCODE(P4_EVENT_ITLB_REFERENCE),
.escr_msr = { MSR_P4_ITLB_ESCR0, MSR_P4_ITLB_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_ITLB_REFERENCE, HIT) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_ITLB_REFERENCE, MISS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_ITLB_REFERENCE, HIT_UK),
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_MEMORY_CANCEL] = {
.opcode = P4_OPCODE(P4_EVENT_MEMORY_CANCEL),
.escr_msr = { MSR_P4_DAC_ESCR0, MSR_P4_DAC_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_MEMORY_CANCEL, ST_RB_FULL) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_MEMORY_CANCEL, 64K_CONF),
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_MEMORY_COMPLETE] = {
.opcode = P4_OPCODE(P4_EVENT_MEMORY_COMPLETE),
.escr_msr = { MSR_P4_SAAT_ESCR0 , MSR_P4_SAAT_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_MEMORY_COMPLETE, LSC) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_MEMORY_COMPLETE, SSC),
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_LOAD_PORT_REPLAY] = {
.opcode = P4_OPCODE(P4_EVENT_LOAD_PORT_REPLAY),
.escr_msr = { MSR_P4_SAAT_ESCR0, MSR_P4_SAAT_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_LOAD_PORT_REPLAY, SPLIT_LD),
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_STORE_PORT_REPLAY] = {
.opcode = P4_OPCODE(P4_EVENT_STORE_PORT_REPLAY),
.escr_msr = { MSR_P4_SAAT_ESCR0 , MSR_P4_SAAT_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_STORE_PORT_REPLAY, SPLIT_ST),
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_MOB_LOAD_REPLAY] = {
.opcode = P4_OPCODE(P4_EVENT_MOB_LOAD_REPLAY),
.escr_msr = { MSR_P4_MOB_ESCR0, MSR_P4_MOB_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_MOB_LOAD_REPLAY, NO_STA) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_MOB_LOAD_REPLAY, NO_STD) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_MOB_LOAD_REPLAY, PARTIAL_DATA) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_MOB_LOAD_REPLAY, UNALGN_ADDR),
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_PAGE_WALK_TYPE] = {
.opcode = P4_OPCODE(P4_EVENT_PAGE_WALK_TYPE),
.escr_msr = { MSR_P4_PMH_ESCR0, MSR_P4_PMH_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_PAGE_WALK_TYPE, DTMISS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_PAGE_WALK_TYPE, ITMISS),
+ .shared = 1,
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_BSQ_CACHE_REFERENCE] = {
.opcode = P4_OPCODE(P4_EVENT_BSQ_CACHE_REFERENCE),
.escr_msr = { MSR_P4_BSU_ESCR0, MSR_P4_BSU_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_CACHE_REFERENCE, RD_2ndL_HITS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_CACHE_REFERENCE, RD_2ndL_HITE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_CACHE_REFERENCE, RD_2ndL_HITM) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_CACHE_REFERENCE, RD_3rdL_HITS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_CACHE_REFERENCE, RD_3rdL_HITE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_CACHE_REFERENCE, RD_3rdL_HITM) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_CACHE_REFERENCE, RD_2ndL_MISS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_CACHE_REFERENCE, RD_3rdL_MISS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_CACHE_REFERENCE, WR_2ndL_MISS),
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_IOQ_ALLOCATION] = {
.opcode = P4_OPCODE(P4_EVENT_IOQ_ALLOCATION),
.escr_msr = { MSR_P4_FSB_ESCR0, MSR_P4_FSB_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, DEFAULT) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, ALL_READ) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, ALL_WRITE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, MEM_UC) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, MEM_WC) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, MEM_WT) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, MEM_WP) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, MEM_WB) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, OWN) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, OTHER) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ALLOCATION, PREFETCH),
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_IOQ_ACTIVE_ENTRIES] = { /* shared ESCR */
.opcode = P4_OPCODE(P4_EVENT_IOQ_ACTIVE_ENTRIES),
.escr_msr = { MSR_P4_FSB_ESCR1, MSR_P4_FSB_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, DEFAULT) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, ALL_READ) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, ALL_WRITE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, MEM_UC) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, MEM_WC) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, MEM_WT) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, MEM_WP) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, MEM_WB) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, OWN) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, OTHER) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_IOQ_ACTIVE_ENTRIES, PREFETCH),
.cntr = { {2, -1, -1}, {3, -1, -1} },
},
[P4_EVENT_FSB_DATA_ACTIVITY] = {
.opcode = P4_OPCODE(P4_EVENT_FSB_DATA_ACTIVITY),
.escr_msr = { MSR_P4_FSB_ESCR0, MSR_P4_FSB_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_FSB_DATA_ACTIVITY, DRDY_DRV) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_FSB_DATA_ACTIVITY, DRDY_OWN) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_FSB_DATA_ACTIVITY, DRDY_OTHER) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_FSB_DATA_ACTIVITY, DBSY_DRV) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_FSB_DATA_ACTIVITY, DBSY_OWN) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_FSB_DATA_ACTIVITY, DBSY_OTHER),
+ .shared = 1,
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_BSQ_ALLOCATION] = { /* shared ESCR, broken CCCR1 */
.opcode = P4_OPCODE(P4_EVENT_BSQ_ALLOCATION),
.escr_msr = { MSR_P4_BSU_ESCR0, MSR_P4_BSU_ESCR0 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_TYPE0) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_TYPE1) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_LEN0) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_LEN1) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_IO_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_LOCK_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_CACHE_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_SPLIT_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_DEM_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, REQ_ORD_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, MEM_TYPE0) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, MEM_TYPE1) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ALLOCATION, MEM_TYPE2),
.cntr = { {0, -1, -1}, {1, -1, -1} },
},
[P4_EVENT_BSQ_ACTIVE_ENTRIES] = { /* shared ESCR */
.opcode = P4_OPCODE(P4_EVENT_BSQ_ACTIVE_ENTRIES),
.escr_msr = { MSR_P4_BSU_ESCR1 , MSR_P4_BSU_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_TYPE0) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_TYPE1) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_LEN0) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_LEN1) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_IO_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_LOCK_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_CACHE_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_SPLIT_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_DEM_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, REQ_ORD_TYPE) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, MEM_TYPE0) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, MEM_TYPE1) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BSQ_ACTIVE_ENTRIES, MEM_TYPE2),
.cntr = { {2, -1, -1}, {3, -1, -1} },
},
[P4_EVENT_SSE_INPUT_ASSIST] = {
.opcode = P4_OPCODE(P4_EVENT_SSE_INPUT_ASSIST),
.escr_msr = { MSR_P4_FIRM_ESCR0, MSR_P4_FIRM_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_SSE_INPUT_ASSIST, ALL),
+ .shared = 1,
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_PACKED_SP_UOP] = {
.opcode = P4_OPCODE(P4_EVENT_PACKED_SP_UOP),
.escr_msr = { MSR_P4_FIRM_ESCR0, MSR_P4_FIRM_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_PACKED_SP_UOP, ALL),
+ .shared = 1,
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_PACKED_DP_UOP] = {
.opcode = P4_OPCODE(P4_EVENT_PACKED_DP_UOP),
.escr_msr = { MSR_P4_FIRM_ESCR0, MSR_P4_FIRM_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_PACKED_DP_UOP, ALL),
+ .shared = 1,
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_SCALAR_SP_UOP] = {
.opcode = P4_OPCODE(P4_EVENT_SCALAR_SP_UOP),
.escr_msr = { MSR_P4_FIRM_ESCR0, MSR_P4_FIRM_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_SCALAR_SP_UOP, ALL),
+ .shared = 1,
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_SCALAR_DP_UOP] = {
.opcode = P4_OPCODE(P4_EVENT_SCALAR_DP_UOP),
.escr_msr = { MSR_P4_FIRM_ESCR0, MSR_P4_FIRM_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_SCALAR_DP_UOP, ALL),
+ .shared = 1,
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_64BIT_MMX_UOP] = {
.opcode = P4_OPCODE(P4_EVENT_64BIT_MMX_UOP),
.escr_msr = { MSR_P4_FIRM_ESCR0, MSR_P4_FIRM_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_64BIT_MMX_UOP, ALL),
+ .shared = 1,
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_128BIT_MMX_UOP] = {
.opcode = P4_OPCODE(P4_EVENT_128BIT_MMX_UOP),
.escr_msr = { MSR_P4_FIRM_ESCR0, MSR_P4_FIRM_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_128BIT_MMX_UOP, ALL),
+ .shared = 1,
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_X87_FP_UOP] = {
.opcode = P4_OPCODE(P4_EVENT_X87_FP_UOP),
.escr_msr = { MSR_P4_FIRM_ESCR0, MSR_P4_FIRM_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_X87_FP_UOP, ALL),
+ .shared = 1,
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_TC_MISC] = {
.opcode = P4_OPCODE(P4_EVENT_TC_MISC),
.escr_msr = { MSR_P4_TC_ESCR0, MSR_P4_TC_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_TC_MISC, FLUSH),
.cntr = { {4, 5, -1}, {6, 7, -1} },
},
[P4_EVENT_GLOBAL_POWER_EVENTS] = {
.opcode = P4_OPCODE(P4_EVENT_GLOBAL_POWER_EVENTS),
.escr_msr = { MSR_P4_FSB_ESCR0, MSR_P4_FSB_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_GLOBAL_POWER_EVENTS, RUNNING),
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_TC_MS_XFER] = {
.opcode = P4_OPCODE(P4_EVENT_TC_MS_XFER),
.escr_msr = { MSR_P4_MS_ESCR0, MSR_P4_MS_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_TC_MS_XFER, CISC),
.cntr = { {4, 5, -1}, {6, 7, -1} },
},
[P4_EVENT_UOP_QUEUE_WRITES] = {
.opcode = P4_OPCODE(P4_EVENT_UOP_QUEUE_WRITES),
.escr_msr = { MSR_P4_MS_ESCR0, MSR_P4_MS_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_UOP_QUEUE_WRITES, FROM_TC_BUILD) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_UOP_QUEUE_WRITES, FROM_TC_DELIVER) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_UOP_QUEUE_WRITES, FROM_ROM),
.cntr = { {4, 5, -1}, {6, 7, -1} },
},
[P4_EVENT_RETIRED_MISPRED_BRANCH_TYPE] = {
.opcode = P4_OPCODE(P4_EVENT_RETIRED_MISPRED_BRANCH_TYPE),
.escr_msr = { MSR_P4_TBPU_ESCR0 , MSR_P4_TBPU_ESCR0 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_RETIRED_MISPRED_BRANCH_TYPE, CONDITIONAL) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_RETIRED_MISPRED_BRANCH_TYPE, CALL) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_RETIRED_MISPRED_BRANCH_TYPE, RETURN) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_RETIRED_MISPRED_BRANCH_TYPE, INDIRECT),
.cntr = { {4, 5, -1}, {6, 7, -1} },
},
[P4_EVENT_RETIRED_BRANCH_TYPE] = {
.opcode = P4_OPCODE(P4_EVENT_RETIRED_BRANCH_TYPE),
.escr_msr = { MSR_P4_TBPU_ESCR0 , MSR_P4_TBPU_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_RETIRED_BRANCH_TYPE, CONDITIONAL) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_RETIRED_BRANCH_TYPE, CALL) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_RETIRED_BRANCH_TYPE, RETURN) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_RETIRED_BRANCH_TYPE, INDIRECT),
.cntr = { {4, 5, -1}, {6, 7, -1} },
},
[P4_EVENT_RESOURCE_STALL] = {
.opcode = P4_OPCODE(P4_EVENT_RESOURCE_STALL),
.escr_msr = { MSR_P4_ALF_ESCR0, MSR_P4_ALF_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_RESOURCE_STALL, SBFULL),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_WC_BUFFER] = {
.opcode = P4_OPCODE(P4_EVENT_WC_BUFFER),
.escr_msr = { MSR_P4_DAC_ESCR0, MSR_P4_DAC_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_WC_BUFFER, WCB_EVICTS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_WC_BUFFER, WCB_FULL_EVICTS),
+ .shared = 1,
.cntr = { {8, 9, -1}, {10, 11, -1} },
},
[P4_EVENT_B2B_CYCLES] = {
.opcode = P4_OPCODE(P4_EVENT_B2B_CYCLES),
.escr_msr = { MSR_P4_FSB_ESCR0, MSR_P4_FSB_ESCR1 },
+ .escr_emask = 0,
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_BNR] = {
.opcode = P4_OPCODE(P4_EVENT_BNR),
.escr_msr = { MSR_P4_FSB_ESCR0, MSR_P4_FSB_ESCR1 },
+ .escr_emask = 0,
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_SNOOP] = {
.opcode = P4_OPCODE(P4_EVENT_SNOOP),
.escr_msr = { MSR_P4_FSB_ESCR0, MSR_P4_FSB_ESCR1 },
+ .escr_emask = 0,
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_RESPONSE] = {
.opcode = P4_OPCODE(P4_EVENT_RESPONSE),
.escr_msr = { MSR_P4_FSB_ESCR0, MSR_P4_FSB_ESCR1 },
+ .escr_emask = 0,
.cntr = { {0, -1, -1}, {2, -1, -1} },
},
[P4_EVENT_FRONT_END_EVENT] = {
.opcode = P4_OPCODE(P4_EVENT_FRONT_END_EVENT),
.escr_msr = { MSR_P4_CRU_ESCR2, MSR_P4_CRU_ESCR3 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_FRONT_END_EVENT, NBOGUS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_FRONT_END_EVENT, BOGUS),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_EXECUTION_EVENT] = {
.opcode = P4_OPCODE(P4_EVENT_EXECUTION_EVENT),
.escr_msr = { MSR_P4_CRU_ESCR2, MSR_P4_CRU_ESCR3 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS0) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS1) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS2) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, NBOGUS3) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS0) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS1) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS2) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_EXECUTION_EVENT, BOGUS3),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_REPLAY_EVENT] = {
.opcode = P4_OPCODE(P4_EVENT_REPLAY_EVENT),
.escr_msr = { MSR_P4_CRU_ESCR2, MSR_P4_CRU_ESCR3 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_REPLAY_EVENT, NBOGUS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_REPLAY_EVENT, BOGUS),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_INSTR_RETIRED] = {
.opcode = P4_OPCODE(P4_EVENT_INSTR_RETIRED),
.escr_msr = { MSR_P4_CRU_ESCR0, MSR_P4_CRU_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_INSTR_RETIRED, NBOGUSNTAG) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_INSTR_RETIRED, NBOGUSTAG) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_INSTR_RETIRED, BOGUSNTAG) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_INSTR_RETIRED, BOGUSTAG),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_UOPS_RETIRED] = {
.opcode = P4_OPCODE(P4_EVENT_UOPS_RETIRED),
.escr_msr = { MSR_P4_CRU_ESCR0, MSR_P4_CRU_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_UOPS_RETIRED, NBOGUS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_UOPS_RETIRED, BOGUS),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_UOP_TYPE] = {
.opcode = P4_OPCODE(P4_EVENT_UOP_TYPE),
.escr_msr = { MSR_P4_RAT_ESCR0, MSR_P4_RAT_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_UOP_TYPE, TAGLOADS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_UOP_TYPE, TAGSTORES),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_BRANCH_RETIRED] = {
.opcode = P4_OPCODE(P4_EVENT_BRANCH_RETIRED),
.escr_msr = { MSR_P4_CRU_ESCR2, MSR_P4_CRU_ESCR3 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_BRANCH_RETIRED, MMNP) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BRANCH_RETIRED, MMNM) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BRANCH_RETIRED, MMTP) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_BRANCH_RETIRED, MMTM),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_MISPRED_BRANCH_RETIRED] = {
.opcode = P4_OPCODE(P4_EVENT_MISPRED_BRANCH_RETIRED),
.escr_msr = { MSR_P4_CRU_ESCR0, MSR_P4_CRU_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_MISPRED_BRANCH_RETIRED, NBOGUS),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_X87_ASSIST] = {
.opcode = P4_OPCODE(P4_EVENT_X87_ASSIST),
.escr_msr = { MSR_P4_CRU_ESCR2, MSR_P4_CRU_ESCR3 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_X87_ASSIST, FPSU) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_X87_ASSIST, FPSO) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_X87_ASSIST, POAO) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_X87_ASSIST, POAU) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_X87_ASSIST, PREA),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_MACHINE_CLEAR] = {
.opcode = P4_OPCODE(P4_EVENT_MACHINE_CLEAR),
.escr_msr = { MSR_P4_CRU_ESCR2, MSR_P4_CRU_ESCR3 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_MACHINE_CLEAR, CLEAR) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_MACHINE_CLEAR, MOCLEAR) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_MACHINE_CLEAR, SMCLEAR),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
[P4_EVENT_INSTR_COMPLETED] = {
.opcode = P4_OPCODE(P4_EVENT_INSTR_COMPLETED),
.escr_msr = { MSR_P4_CRU_ESCR0, MSR_P4_CRU_ESCR1 },
+ .escr_emask =
+ P4_ESCR_EMASK_BIT(P4_EVENT_INSTR_COMPLETED, NBOGUS) |
+ P4_ESCR_EMASK_BIT(P4_EVENT_INSTR_COMPLETED, BOGUS),
.cntr = { {12, 13, 16}, {14, 15, 17} },
},
};
return config;
}
+/* check cpu model specifics */
+static bool p4_event_match_cpu_model(unsigned int event_idx)
+{
+ /* INSTR_COMPLETED event only exist for model 3, 4, 6 (Prescott) */
+ if (event_idx == P4_EVENT_INSTR_COMPLETED) {
+ if (boot_cpu_data.x86_model != 3 &&
+ boot_cpu_data.x86_model != 4 &&
+ boot_cpu_data.x86_model != 6)
+ return false;
+ }
+
+ /*
+ * For info
+ * - IQ_ESCR0, IQ_ESCR1 only for models 1 and 2
+ */
+
+ return true;
+}
+
static int p4_validate_raw_event(struct perf_event *event)
{
- unsigned int v;
+ unsigned int v, emask;
- /* user data may have out-of-bound event index */
+ /* User data may have out-of-bound event index */
v = p4_config_unpack_event(event->attr.config);
- if (v >= ARRAY_SIZE(p4_event_bind_map)) {
- pr_warning("P4 PMU: Unknown event code: %d\n", v);
+ if (v >= ARRAY_SIZE(p4_event_bind_map))
+ return -EINVAL;
+
+ /* It may be unsupported: */
+ if (!p4_event_match_cpu_model(v))
return -EINVAL;
+
+ /*
+ * NOTE: P4_CCCR_THREAD_ANY has not the same meaning as
+ * in Architectural Performance Monitoring, it means not
+ * on _which_ logical cpu to count but rather _when_, ie it
+ * depends on logical cpu state -- count event if one cpu active,
+ * none, both or any, so we just allow user to pass any value
+ * desired.
+ *
+ * In turn we always set Tx_OS/Tx_USR bits bound to logical
+ * cpu without their propagation to another cpu
+ */
+
+ /*
+ * if an event is shared accross the logical threads
+ * the user needs special permissions to be able to use it
+ */
+ if (p4_event_bind_map[v].shared) {
+ if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ return -EACCES;
}
+ /* ESCR EventMask bits may be invalid */
+ emask = p4_config_unpack_escr(event->attr.config) & P4_ESCR_EVENTMASK_MASK;
+ if (emask & ~p4_event_bind_map[v].escr_emask)
+ return -EINVAL;
+
/*
- * it may have some screwed PEBS bits
+ * it may have some invalid PEBS bits
*/
- if (p4_config_pebs_has(event->attr.config, P4_PEBS_CONFIG_ENABLE)) {
- pr_warning("P4 PMU: PEBS are not supported yet\n");
+ if (p4_config_pebs_has(event->attr.config, P4_PEBS_CONFIG_ENABLE))
return -EINVAL;
- }
+
v = p4_config_unpack_metric(event->attr.config);
- if (v >= ARRAY_SIZE(p4_pebs_bind_map)) {
- pr_warning("P4 PMU: Unknown metric code: %d\n", v);
+ if (v >= ARRAY_SIZE(p4_pebs_bind_map))
return -EINVAL;
- }
return 0;
}
if (event->attr.type == PERF_TYPE_RAW) {
+ /*
+ * Clear bits we reserve to be managed by kernel itself
+ * and never allowed from a user space
+ */
+ event->attr.config &= P4_CONFIG_MASK;
+
rc = p4_validate_raw_event(event);
if (rc)
goto out;
/*
- * We don't control raw events so it's up to the caller
- * to pass sane values (and we don't count the thread number
- * on HT machine but allow HT-compatible specifics to be
- * passed on)
- *
* Note that for RAW events we allow user to use P4_CCCR_RESERVED
* bits since we keep additional info here (for cache events and etc)
- *
- * XXX: HT wide things should check perf_paranoid_cpu() &&
- * CAP_SYS_ADMIN
*/
- event->hw.config |= event->attr.config &
- (p4_config_pack_escr(P4_ESCR_MASK_HT) |
- p4_config_pack_cccr(P4_CCCR_MASK_HT | P4_CCCR_RESERVED));
-
- event->hw.config &= ~P4_CCCR_FORCE_OVF;
+ event->hw.config |= event->attr.config;
}
rc = x86_setup_perfctr(event);
for (idx = 0; idx < x86_pmu.num_counters; idx++) {
int overflow;
- if (!test_bit(idx, cpuc->active_mask))
+ if (!test_bit(idx, cpuc->active_mask)) {
+ /* catch in-flight IRQs */
+ if (__test_and_clear_bit(idx, cpuc->running))
+ handled++;
continue;
+ }
event = cpuc->events[idx];
hwc = &event->hw;
inc_irq_stat(apic_perf_irqs);
}
- return handled > 0;
+ return handled;
}
/*
{
switch (boot_cpu_data.x86_vendor) {
case X86_VENDOR_AMD:
- if (boot_cpu_data.x86 != 6 && boot_cpu_data.x86 != 15 &&
- boot_cpu_data.x86 != 16 && boot_cpu_data.x86 != 17)
- return;
- wd_ops = &k7_wd_ops;
- break;
+ if (boot_cpu_data.x86 == 6 ||
+ (boot_cpu_data.x86 >= 0xf && boot_cpu_data.x86 <= 0x15))
+ wd_ops = &k7_wd_ops;
+ return;
case X86_VENDOR_INTEL:
/* Work around where perfctr1 doesn't have a working enable
* bit as described in the following errata:
const struct cpuid_bit *cb;
static const struct cpuid_bit __cpuinitconst cpuid_bits[] = {
+ { X86_FEATURE_DTS, CR_EAX, 0, 0x00000006, 0 },
{ X86_FEATURE_IDA, CR_EAX, 1, 0x00000006, 0 },
{ X86_FEATURE_ARAT, CR_EAX, 2, 0x00000006, 0 },
{ X86_FEATURE_PLN, CR_EAX, 4, 0x00000006, 0 },
{ X86_FEATURE_LBRV, CR_EDX, 1, 0x8000000a, 0 },
{ X86_FEATURE_SVML, CR_EDX, 2, 0x8000000a, 0 },
{ X86_FEATURE_NRIPS, CR_EDX, 3, 0x8000000a, 0 },
+ { X86_FEATURE_TSCRATEMSR, CR_EDX, 4, 0x8000000a, 0 },
+ { X86_FEATURE_VMCBCLEAN, CR_EDX, 5, 0x8000000a, 0 },
+ { X86_FEATURE_FLUSHBYASID, CR_EDX, 6, 0x8000000a, 0 },
+ { X86_FEATURE_DECODEASSISTS, CR_EDX, 7, 0x8000000a, 0 },
+ { X86_FEATURE_PAUSEFILTER, CR_EDX,10, 0x8000000a, 0 },
+ { X86_FEATURE_PFTHRESHOLD, CR_EDX,12, 0x8000000a, 0 },
{ 0, 0, 0, 0, 0 }
};
if (!csize)
return 0;
- vaddr = ioremap(pfn << PAGE_SHIFT, PAGE_SIZE);
+ vaddr = ioremap_cache(pfn << PAGE_SHIFT, PAGE_SIZE);
if (!vaddr)
return -ENOMEM;
} else
memcpy(buf, vaddr + offset, csize);
+ set_iounmap_nonlazy();
iounmap(vaddr);
return csize;
}
#include <asm/apic.h>
#include <asm/iommu.h>
#include <asm/gart.h>
-#include <asm/hpet.h>
static void __init fix_hypertransport_config(int num, int slot, int func)
{
}
#if defined(CONFIG_ACPI) && defined(CONFIG_X86_IO_APIC)
-#if defined(CONFIG_ACPI) && defined(CONFIG_X86_IO_APIC)
static u32 __init ati_ixp4x0_rev(int num, int slot, int func)
{
u32 d;
d &= 0xff;
return d;
}
-#endif
static void __init ati_bugs(int num, int slot, int func)
{
}
#endif
-/*
- * Force the read back of the CMP register in hpet_next_event()
- * to work around the problem that the CMP register write seems to be
- * delayed. See hpet_next_event() for details.
- *
- * We do this on all SMBUS incarnations for now until we have more
- * information about the affected chipsets.
- */
-static void __init ati_hpet_bugs(int num, int slot, int func)
-{
-#ifdef CONFIG_HPET_TIMER
- hpet_readback_cmp = 1;
-#endif
-}
-
#define QFLAG_APPLY_ONCE 0x1
#define QFLAG_APPLIED 0x2
#define QFLAG_DONE (QFLAG_APPLY_ONCE|QFLAG_APPLIED)
PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs },
{ PCI_VENDOR_ID_ATI, PCI_DEVICE_ID_ATI_SBX00_SMBUS,
PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_bugs_contd },
- { PCI_VENDOR_ID_ATI, PCI_ANY_ID,
- PCI_CLASS_SERIAL_SMBUS, PCI_ANY_ID, 0, ati_hpet_bugs },
{}
};
#include <xen/hvc-console.h>
#include <asm/pci-direct.h>
#include <asm/fixmap.h>
+#include <asm/mrst.h>
#include <asm/pgtable.h>
#include <linux/usb/ehci_def.h>
if (!strncmp(buf, "xen", 3))
early_console_register(&xenboot_console, keep);
#endif
+#ifdef CONFIG_X86_MRST_EARLY_PRINTK
+ if (!strncmp(buf, "mrst", 4)) {
+ mrst_early_console_init();
+ early_console_register(&early_mrst_console, keep);
+ }
+
+ if (!strncmp(buf, "hsu", 3)) {
+ hsu_early_console_init();
+ early_console_register(&early_hsu_console, keep);
+ }
+
+#endif
buf++;
}
return 0;
--- /dev/null
+/*
+ * early_printk_mrst.c - early consoles for Intel MID platforms
+ *
+ * Copyright (c) 2008-2010, Intel Corporation
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; version 2
+ * of the License.
+ */
+
+/*
+ * This file implements two early consoles named mrst and hsu.
+ * mrst is based on Maxim3110 spi-uart device, it exists in both
+ * Moorestown and Medfield platforms, while hsu is based on a High
+ * Speed UART device which only exists in the Medfield platform
+ */
+
+#include <linux/serial_reg.h>
+#include <linux/serial_mfd.h>
+#include <linux/kmsg_dump.h>
+#include <linux/console.h>
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/init.h>
+#include <linux/io.h>
+
+#include <asm/fixmap.h>
+#include <asm/pgtable.h>
+#include <asm/mrst.h>
+
+#define MRST_SPI_TIMEOUT 0x200000
+#define MRST_REGBASE_SPI0 0xff128000
+#define MRST_REGBASE_SPI1 0xff128400
+#define MRST_CLK_SPI0_REG 0xff11d86c
+
+/* Bit fields in CTRLR0 */
+#define SPI_DFS_OFFSET 0
+
+#define SPI_FRF_OFFSET 4
+#define SPI_FRF_SPI 0x0
+#define SPI_FRF_SSP 0x1
+#define SPI_FRF_MICROWIRE 0x2
+#define SPI_FRF_RESV 0x3
+
+#define SPI_MODE_OFFSET 6
+#define SPI_SCPH_OFFSET 6
+#define SPI_SCOL_OFFSET 7
+#define SPI_TMOD_OFFSET 8
+#define SPI_TMOD_TR 0x0 /* xmit & recv */
+#define SPI_TMOD_TO 0x1 /* xmit only */
+#define SPI_TMOD_RO 0x2 /* recv only */
+#define SPI_TMOD_EPROMREAD 0x3 /* eeprom read mode */
+
+#define SPI_SLVOE_OFFSET 10
+#define SPI_SRL_OFFSET 11
+#define SPI_CFS_OFFSET 12
+
+/* Bit fields in SR, 7 bits */
+#define SR_MASK 0x7f /* cover 7 bits */
+#define SR_BUSY (1 << 0)
+#define SR_TF_NOT_FULL (1 << 1)
+#define SR_TF_EMPT (1 << 2)
+#define SR_RF_NOT_EMPT (1 << 3)
+#define SR_RF_FULL (1 << 4)
+#define SR_TX_ERR (1 << 5)
+#define SR_DCOL (1 << 6)
+
+struct dw_spi_reg {
+ u32 ctrl0;
+ u32 ctrl1;
+ u32 ssienr;
+ u32 mwcr;
+ u32 ser;
+ u32 baudr;
+ u32 txfltr;
+ u32 rxfltr;
+ u32 txflr;
+ u32 rxflr;
+ u32 sr;
+ u32 imr;
+ u32 isr;
+ u32 risr;
+ u32 txoicr;
+ u32 rxoicr;
+ u32 rxuicr;
+ u32 msticr;
+ u32 icr;
+ u32 dmacr;
+ u32 dmatdlr;
+ u32 dmardlr;
+ u32 idr;
+ u32 version;
+
+ /* Currently operates as 32 bits, though only the low 16 bits matter */
+ u32 dr;
+} __packed;
+
+#define dw_readl(dw, name) __raw_readl(&(dw)->name)
+#define dw_writel(dw, name, val) __raw_writel((val), &(dw)->name)
+
+/* Default use SPI0 register for mrst, we will detect Penwell and use SPI1 */
+static unsigned long mrst_spi_paddr = MRST_REGBASE_SPI0;
+
+static u32 *pclk_spi0;
+/* Always contains an accessable address, start with 0 */
+static struct dw_spi_reg *pspi;
+
+static struct kmsg_dumper dw_dumper;
+static int dumper_registered;
+
+static void dw_kmsg_dump(struct kmsg_dumper *dumper,
+ enum kmsg_dump_reason reason,
+ const char *s1, unsigned long l1,
+ const char *s2, unsigned long l2)
+{
+ int i;
+
+ /* When run to this, we'd better re-init the HW */
+ mrst_early_console_init();
+
+ for (i = 0; i < l1; i++)
+ early_mrst_console.write(&early_mrst_console, s1 + i, 1);
+ for (i = 0; i < l2; i++)
+ early_mrst_console.write(&early_mrst_console, s2 + i, 1);
+}
+
+/* Set the ratio rate to 115200, 8n1, IRQ disabled */
+static void max3110_write_config(void)
+{
+ u16 config;
+
+ config = 0xc001;
+ dw_writel(pspi, dr, config);
+}
+
+/* Translate char to a eligible word and send to max3110 */
+static void max3110_write_data(char c)
+{
+ u16 data;
+
+ data = 0x8000 | c;
+ dw_writel(pspi, dr, data);
+}
+
+void mrst_early_console_init(void)
+{
+ u32 ctrlr0 = 0;
+ u32 spi0_cdiv;
+ u32 freq; /* Freqency info only need be searched once */
+
+ /* Base clk is 100 MHz, the actual clk = 100M / (clk_divider + 1) */
+ pclk_spi0 = (void *)set_fixmap_offset_nocache(FIX_EARLYCON_MEM_BASE,
+ MRST_CLK_SPI0_REG);
+ spi0_cdiv = ((*pclk_spi0) & 0xe00) >> 9;
+ freq = 100000000 / (spi0_cdiv + 1);
+
+ if (mrst_identify_cpu() == MRST_CPU_CHIP_PENWELL)
+ mrst_spi_paddr = MRST_REGBASE_SPI1;
+
+ pspi = (void *)set_fixmap_offset_nocache(FIX_EARLYCON_MEM_BASE,
+ mrst_spi_paddr);
+
+ /* Disable SPI controller */
+ dw_writel(pspi, ssienr, 0);
+
+ /* Set control param, 8 bits, transmit only mode */
+ ctrlr0 = dw_readl(pspi, ctrl0);
+
+ ctrlr0 &= 0xfcc0;
+ ctrlr0 |= 0xf | (SPI_FRF_SPI << SPI_FRF_OFFSET)
+ | (SPI_TMOD_TO << SPI_TMOD_OFFSET);
+ dw_writel(pspi, ctrl0, ctrlr0);
+
+ /*
+ * Change the spi0 clk to comply with 115200 bps, use 100000 to
+ * calculate the clk dividor to make the clock a little slower
+ * than real baud rate.
+ */
+ dw_writel(pspi, baudr, freq/100000);
+
+ /* Disable all INT for early phase */
+ dw_writel(pspi, imr, 0x0);
+
+ /* Set the cs to spi-uart */
+ dw_writel(pspi, ser, 0x2);
+
+ /* Enable the HW, the last step for HW init */
+ dw_writel(pspi, ssienr, 0x1);
+
+ /* Set the default configuration */
+ max3110_write_config();
+
+ /* Register the kmsg dumper */
+ if (!dumper_registered) {
+ dw_dumper.dump = dw_kmsg_dump;
+ kmsg_dump_register(&dw_dumper);
+ dumper_registered = 1;
+ }
+}
+
+/* Slave select should be called in the read/write function */
+static void early_mrst_spi_putc(char c)
+{
+ unsigned int timeout;
+ u32 sr;
+
+ timeout = MRST_SPI_TIMEOUT;
+ /* Early putc needs to make sure the TX FIFO is not full */
+ while (--timeout) {
+ sr = dw_readl(pspi, sr);
+ if (!(sr & SR_TF_NOT_FULL))
+ cpu_relax();
+ else
+ break;
+ }
+
+ if (!timeout)
+ pr_warning("MRST earlycon: timed out\n");
+ else
+ max3110_write_data(c);
+}
+
+/* Early SPI only uses polling mode */
+static void early_mrst_spi_write(struct console *con, const char *str, unsigned n)
+{
+ int i;
+
+ for (i = 0; i < n && *str; i++) {
+ if (*str == '\n')
+ early_mrst_spi_putc('\r');
+ early_mrst_spi_putc(*str);
+ str++;
+ }
+}
+
+struct console early_mrst_console = {
+ .name = "earlymrst",
+ .write = early_mrst_spi_write,
+ .flags = CON_PRINTBUFFER,
+ .index = -1,
+};
+
+/*
+ * Following is the early console based on Medfield HSU (High
+ * Speed UART) device.
+ */
+#define HSU_PORT2_PADDR 0xffa28180
+
+static void __iomem *phsu;
+
+void hsu_early_console_init(void)
+{
+ u8 lcr;
+
+ phsu = (void *)set_fixmap_offset_nocache(FIX_EARLYCON_MEM_BASE,
+ HSU_PORT2_PADDR);
+
+ /* Disable FIFO */
+ writeb(0x0, phsu + UART_FCR);
+
+ /* Set to default 115200 bps, 8n1 */
+ lcr = readb(phsu + UART_LCR);
+ writeb((0x80 | lcr), phsu + UART_LCR);
+ writeb(0x18, phsu + UART_DLL);
+ writeb(lcr, phsu + UART_LCR);
+ writel(0x3600, phsu + UART_MUL*4);
+
+ writeb(0x8, phsu + UART_MCR);
+ writeb(0x7, phsu + UART_FCR);
+ writeb(0x3, phsu + UART_LCR);
+
+ /* Clear IRQ status */
+ readb(phsu + UART_LSR);
+ readb(phsu + UART_RX);
+ readb(phsu + UART_IIR);
+ readb(phsu + UART_MSR);
+
+ /* Enable FIFO */
+ writeb(0x7, phsu + UART_FCR);
+}
+
+#define BOTH_EMPTY (UART_LSR_TEMT | UART_LSR_THRE)
+
+static void early_hsu_putc(char ch)
+{
+ unsigned int timeout = 10000; /* 10ms */
+ u8 status;
+
+ while (--timeout) {
+ status = readb(phsu + UART_LSR);
+ if (status & BOTH_EMPTY)
+ break;
+ udelay(1);
+ }
+
+ /* Only write the char when there was no timeout */
+ if (timeout)
+ writeb(ch, phsu + UART_TX);
+}
+
+static void early_hsu_write(struct console *con, const char *str, unsigned n)
+{
+ int i;
+
+ for (i = 0; i < n && *str; i++) {
+ if (*str == '\n')
+ early_hsu_putc('\r');
+ early_hsu_putc(*str);
+ str++;
+ }
+}
+
+struct console early_hsu_console = {
+ .name = "earlyhsu",
+ .write = early_hsu_write,
+ .flags = CON_PRINTBUFFER,
+ .index = -1,
+};
/* unfortunately push/pop can't be no-op */
.macro PUSH_GS
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
.endm
.macro POP_GS pop=0
addl $(4 + \pop), %esp
#else /* CONFIG_X86_32_LAZY_GS */
.macro PUSH_GS
- pushl %gs
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %gs
/*CFI_REL_OFFSET gs, 0*/
.endm
.macro POP_GS pop=0
-98: popl %gs
- CFI_ADJUST_CFA_OFFSET -4
+98: popl_cfi %gs
/*CFI_RESTORE gs*/
.if \pop <> 0
add $\pop, %esp
.macro SAVE_ALL
cld
PUSH_GS
- pushl %fs
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %fs
/*CFI_REL_OFFSET fs, 0;*/
- pushl %es
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %es
/*CFI_REL_OFFSET es, 0;*/
- pushl %ds
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ds
/*CFI_REL_OFFSET ds, 0;*/
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax
CFI_REL_OFFSET eax, 0
- pushl %ebp
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ebp
CFI_REL_OFFSET ebp, 0
- pushl %edi
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %edi
CFI_REL_OFFSET edi, 0
- pushl %esi
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %esi
CFI_REL_OFFSET esi, 0
- pushl %edx
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %edx
CFI_REL_OFFSET edx, 0
- pushl %ecx
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ecx
CFI_REL_OFFSET ecx, 0
- pushl %ebx
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ebx
CFI_REL_OFFSET ebx, 0
movl $(__USER_DS), %edx
movl %edx, %ds
.endm
.macro RESTORE_INT_REGS
- popl %ebx
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %ebx
CFI_RESTORE ebx
- popl %ecx
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %ecx
CFI_RESTORE ecx
- popl %edx
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %edx
CFI_RESTORE edx
- popl %esi
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %esi
CFI_RESTORE esi
- popl %edi
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %edi
CFI_RESTORE edi
- popl %ebp
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %ebp
CFI_RESTORE ebp
- popl %eax
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %eax
CFI_RESTORE eax
.endm
.macro RESTORE_REGS pop=0
RESTORE_INT_REGS
-1: popl %ds
- CFI_ADJUST_CFA_OFFSET -4
+1: popl_cfi %ds
/*CFI_RESTORE ds;*/
-2: popl %es
- CFI_ADJUST_CFA_OFFSET -4
+2: popl_cfi %es
/*CFI_RESTORE es;*/
-3: popl %fs
- CFI_ADJUST_CFA_OFFSET -4
+3: popl_cfi %fs
/*CFI_RESTORE fs;*/
POP_GS \pop
.pushsection .fixup, "ax"
ENTRY(ret_from_fork)
CFI_STARTPROC
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax
call schedule_tail
GET_THREAD_INFO(%ebp)
- popl %eax
- CFI_ADJUST_CFA_OFFSET -4
- pushl $0x0202 # Reset kernel eflags
- CFI_ADJUST_CFA_OFFSET 4
- popfl
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %eax
+ pushl_cfi $0x0202 # Reset kernel eflags
+ popfl_cfi
jmp syscall_exit
CFI_ENDPROC
END(ret_from_fork)
* enough kernel state to call TRACE_IRQS_OFF can be called - but
* we immediately enable interrupts at that point anyway.
*/
- pushl $(__USER_DS)
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $(__USER_DS)
/*CFI_REL_OFFSET ss, 0*/
- pushl %ebp
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ebp
CFI_REL_OFFSET esp, 0
- pushfl
+ pushfl_cfi
orl $X86_EFLAGS_IF, (%esp)
- CFI_ADJUST_CFA_OFFSET 4
- pushl $(__USER_CS)
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $(__USER_CS)
/*CFI_REL_OFFSET cs, 0*/
/*
* Push current_thread_info()->sysenter_return to the stack.
* A tiny bit of offset fixup is necessary - 4*4 means the 4 words
* pushed above; +8 corresponds to copy_thread's esp0 setting.
*/
- pushl (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp)
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp)
CFI_REL_OFFSET eip, 0
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax
SAVE_ALL
ENABLE_INTERRUPTS(CLBR_NONE)
movl %eax,%edx /* 2nd arg: syscall number */
movl $AUDIT_ARCH_I386,%eax /* 1st arg: audit arch */
call audit_syscall_entry
- pushl %ebx
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ebx
movl PT_EAX(%esp),%eax /* reload syscall number */
jmp sysenter_do_call
# system call handler stub
ENTRY(system_call)
RING0_INT_FRAME # can't unwind into user space anyway
- pushl %eax # save orig_eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax # save orig_eax
SAVE_ALL
GET_THREAD_INFO(%ebp)
# system call tracing in operation / emulation
je ldt_ss # returning to user-space with LDT SS
restore_nocheck:
RESTORE_REGS 4 # skip orig_eax/error_code
- CFI_ADJUST_CFA_OFFSET -4
irq_return:
INTERRUPT_RETURN
.section .fixup,"ax"
shr $16, %edx
mov %dl, GDT_ESPFIX_SS + 4 /* bits 16..23 */
mov %dh, GDT_ESPFIX_SS + 7 /* bits 24..31 */
- pushl $__ESPFIX_SS
- CFI_ADJUST_CFA_OFFSET 4
- push %eax /* new kernel esp */
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $__ESPFIX_SS
+ pushl_cfi %eax /* new kernel esp */
/* Disable interrupts, but do not irqtrace this section: we
* will soon execute iret and the tracer was already set to
* the irqstate after the iret */
ALIGN
work_notifysig_v86:
- pushl %ecx # save ti_flags for do_notify_resume
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ecx # save ti_flags for do_notify_resume
call save_v86_state # %eax contains pt_regs pointer
- popl %ecx
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %ecx
movl %eax, %esp
#else
movl %esp, %eax
#define PTREGSCALL3(name) \
ALIGN; \
ptregs_##name: \
+ CFI_STARTPROC; \
leal 4(%esp),%eax; \
- pushl %eax; \
+ pushl_cfi %eax; \
movl PT_EDX(%eax),%ecx; \
movl PT_ECX(%eax),%edx; \
movl PT_EBX(%eax),%eax; \
call sys_##name; \
addl $4,%esp; \
- ret
+ CFI_ADJUST_CFA_OFFSET -4; \
+ ret; \
+ CFI_ENDPROC; \
+ENDPROC(ptregs_##name)
PTREGSCALL1(iopl)
PTREGSCALL0(fork)
/* Clone is an oddball. The 4th arg is in %edi */
ALIGN;
ptregs_clone:
+ CFI_STARTPROC
leal 4(%esp),%eax
- pushl %eax
- pushl PT_EDI(%eax)
+ pushl_cfi %eax
+ pushl_cfi PT_EDI(%eax)
movl PT_EDX(%eax),%ecx
movl PT_ECX(%eax),%edx
movl PT_EBX(%eax),%eax
call sys_clone
addl $8,%esp
+ CFI_ADJUST_CFA_OFFSET -8
ret
+ CFI_ENDPROC
+ENDPROC(ptregs_clone)
.macro FIXUP_ESPFIX_STACK
/*
mov GDT_ESPFIX_SS + 7, %ah /* bits 24..31 */
shl $16, %eax
addl %esp, %eax /* the adjusted stack pointer */
- pushl $__KERNEL_DS
- CFI_ADJUST_CFA_OFFSET 4
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $__KERNEL_DS
+ pushl_cfi %eax
lss (%esp), %esp /* switch to the normal stack segment */
CFI_ADJUST_CFA_OFFSET -8
.endm
.if vector <> FIRST_EXTERNAL_VECTOR
CFI_ADJUST_CFA_OFFSET -4
.endif
-1: pushl $(~vector+0x80) /* Note: always in signed byte range */
- CFI_ADJUST_CFA_OFFSET 4
+1: pushl_cfi $(~vector+0x80) /* Note: always in signed byte range */
.if ((vector-FIRST_EXTERNAL_VECTOR)%7) <> 6
jmp 2f
.endif
#define BUILD_INTERRUPT3(name, nr, fn) \
ENTRY(name) \
RING0_INT_FRAME; \
- pushl $~(nr); \
- CFI_ADJUST_CFA_OFFSET 4; \
+ pushl_cfi $~(nr); \
SAVE_ALL; \
TRACE_IRQS_OFF \
movl %esp,%eax; \
ENTRY(coprocessor_error)
RING0_INT_FRAME
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_coprocessor_error
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
+ pushl_cfi $do_coprocessor_error
jmp error_code
CFI_ENDPROC
END(coprocessor_error)
ENTRY(simd_coprocessor_error)
RING0_INT_FRAME
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
#ifdef CONFIG_X86_INVD_BUG
/* AMD 486 bug: invd from userspace calls exception 19 instead of #GP */
-661: pushl $do_general_protection
+661: pushl_cfi $do_general_protection
662:
.section .altinstructions,"a"
.balign 4
664:
.previous
#else
- pushl $do_simd_coprocessor_error
+ pushl_cfi $do_simd_coprocessor_error
#endif
- CFI_ADJUST_CFA_OFFSET 4
jmp error_code
CFI_ENDPROC
END(simd_coprocessor_error)
ENTRY(device_not_available)
RING0_INT_FRAME
- pushl $-1 # mark this as an int
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_device_not_available
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $-1 # mark this as an int
+ pushl_cfi $do_device_not_available
jmp error_code
CFI_ENDPROC
END(device_not_available)
ENTRY(overflow)
RING0_INT_FRAME
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_overflow
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
+ pushl_cfi $do_overflow
jmp error_code
CFI_ENDPROC
END(overflow)
ENTRY(bounds)
RING0_INT_FRAME
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_bounds
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
+ pushl_cfi $do_bounds
jmp error_code
CFI_ENDPROC
END(bounds)
ENTRY(invalid_op)
RING0_INT_FRAME
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_invalid_op
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
+ pushl_cfi $do_invalid_op
jmp error_code
CFI_ENDPROC
END(invalid_op)
ENTRY(coprocessor_segment_overrun)
RING0_INT_FRAME
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_coprocessor_segment_overrun
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
+ pushl_cfi $do_coprocessor_segment_overrun
jmp error_code
CFI_ENDPROC
END(coprocessor_segment_overrun)
ENTRY(invalid_TSS)
RING0_EC_FRAME
- pushl $do_invalid_TSS
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $do_invalid_TSS
jmp error_code
CFI_ENDPROC
END(invalid_TSS)
ENTRY(segment_not_present)
RING0_EC_FRAME
- pushl $do_segment_not_present
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $do_segment_not_present
jmp error_code
CFI_ENDPROC
END(segment_not_present)
ENTRY(stack_segment)
RING0_EC_FRAME
- pushl $do_stack_segment
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $do_stack_segment
jmp error_code
CFI_ENDPROC
END(stack_segment)
ENTRY(alignment_check)
RING0_EC_FRAME
- pushl $do_alignment_check
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $do_alignment_check
jmp error_code
CFI_ENDPROC
END(alignment_check)
ENTRY(divide_error)
RING0_INT_FRAME
- pushl $0 # no error code
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_divide_error
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0 # no error code
+ pushl_cfi $do_divide_error
jmp error_code
CFI_ENDPROC
END(divide_error)
#ifdef CONFIG_X86_MCE
ENTRY(machine_check)
RING0_INT_FRAME
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
- pushl machine_check_vector
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
+ pushl_cfi machine_check_vector
jmp error_code
CFI_ENDPROC
END(machine_check)
ENTRY(spurious_interrupt_bug)
RING0_INT_FRAME
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
- pushl $do_spurious_interrupt_bug
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
+ pushl_cfi $do_spurious_interrupt_bug
jmp error_code
CFI_ENDPROC
END(spurious_interrupt_bug)
ENTRY(xen_hypervisor_callback)
CFI_STARTPROC
- pushl $0
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $0
SAVE_ALL
TRACE_IRQS_OFF
# We distinguish between categories by maintaining a status value in EAX.
ENTRY(xen_failsafe_callback)
CFI_STARTPROC
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax
movl $1,%eax
1: mov 4(%esp),%ds
2: mov 8(%esp),%es
3: mov 12(%esp),%fs
4: mov 16(%esp),%gs
testl %eax,%eax
- popl %eax
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %eax
lea 16(%esp),%esp
CFI_ADJUST_CFA_OFFSET -16
jz 5f
addl $16,%esp
jmp iret_exc # EAX != 0 => Category 2 (Bad IRET)
-5: pushl $0 # EAX == 0 => Category 1 (Bad segment)
- CFI_ADJUST_CFA_OFFSET 4
+5: pushl_cfi $0 # EAX == 0 => Category 1 (Bad segment)
SAVE_ALL
jmp ret_from_exception
CFI_ENDPROC
ENTRY(page_fault)
RING0_EC_FRAME
- pushl $do_page_fault
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $do_page_fault
ALIGN
error_code:
/* the function address is in %gs's slot on the stack */
- pushl %fs
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %fs
/*CFI_REL_OFFSET fs, 0*/
- pushl %es
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %es
/*CFI_REL_OFFSET es, 0*/
- pushl %ds
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ds
/*CFI_REL_OFFSET ds, 0*/
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax
CFI_REL_OFFSET eax, 0
- pushl %ebp
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ebp
CFI_REL_OFFSET ebp, 0
- pushl %edi
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %edi
CFI_REL_OFFSET edi, 0
- pushl %esi
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %esi
CFI_REL_OFFSET esi, 0
- pushl %edx
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %edx
CFI_REL_OFFSET edx, 0
- pushl %ecx
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ecx
CFI_REL_OFFSET ecx, 0
- pushl %ebx
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ebx
CFI_REL_OFFSET ebx, 0
cld
movl $(__KERNEL_PERCPU), %ecx
movl TSS_sysenter_sp0 + \offset(%esp), %esp
CFI_DEF_CFA esp, 0
CFI_UNDEFINED eip
- pushfl
- CFI_ADJUST_CFA_OFFSET 4
- pushl $__KERNEL_CS
- CFI_ADJUST_CFA_OFFSET 4
- pushl $sysenter_past_esp
- CFI_ADJUST_CFA_OFFSET 4
+ pushfl_cfi
+ pushl_cfi $__KERNEL_CS
+ pushl_cfi $sysenter_past_esp
CFI_REL_OFFSET eip, 0
.endm
jne debug_stack_correct
FIX_STACK 12, debug_stack_correct, debug_esp_fix_insn
debug_stack_correct:
- pushl $-1 # mark this as an int
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $-1 # mark this as an int
SAVE_ALL
TRACE_IRQS_OFF
xorl %edx,%edx # error code 0
*/
ENTRY(nmi)
RING0_INT_FRAME
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax
movl %ss, %eax
cmpw $__ESPFIX_SS, %ax
- popl %eax
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %eax
je nmi_espfix_stack
cmpl $ia32_sysenter_target,(%esp)
je nmi_stack_fixup
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax
movl %esp,%eax
/* Do not access memory above the end of our stack page,
* it might not exist.
*/
andl $(THREAD_SIZE-1),%eax
cmpl $(THREAD_SIZE-20),%eax
- popl %eax
- CFI_ADJUST_CFA_OFFSET -4
+ popl_cfi %eax
jae nmi_stack_correct
cmpl $ia32_sysenter_target,12(%esp)
je nmi_debug_stack_check
nmi_stack_correct:
/* We have a RING0_INT_FRAME here */
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax
SAVE_ALL
xorl %edx,%edx # zero error code
movl %esp,%eax # pt_regs pointer
*
* create the pointer to lss back
*/
- pushl %ss
- CFI_ADJUST_CFA_OFFSET 4
- pushl %esp
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %ss
+ pushl_cfi %esp
addl $4, (%esp)
/* copy the iret frame of 12 bytes */
.rept 3
- pushl 16(%esp)
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi 16(%esp)
.endr
- pushl %eax
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi %eax
SAVE_ALL
FIXUP_ESPFIX_STACK # %eax == %esp
xorl %edx,%edx # zero error code
ENTRY(int3)
RING0_INT_FRAME
- pushl $-1 # mark this as an int
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $-1 # mark this as an int
SAVE_ALL
TRACE_IRQS_OFF
xorl %edx,%edx # zero error code
ENTRY(general_protection)
RING0_EC_FRAME
- pushl $do_general_protection
- CFI_ADJUST_CFA_OFFSET 4
+ pushl_cfi $do_general_protection
jmp error_code
CFI_ENDPROC
END(general_protection)
.macro FAKE_STACK_FRAME child_rip
/* push in order ss, rsp, eflags, cs, rip */
xorl %eax, %eax
- pushq $__KERNEL_DS /* ss */
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi $__KERNEL_DS /* ss */
/*CFI_REL_OFFSET ss,0*/
- pushq %rax /* rsp */
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi %rax /* rsp */
CFI_REL_OFFSET rsp,0
- pushq $X86_EFLAGS_IF /* eflags - interrupts on */
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi $X86_EFLAGS_IF /* eflags - interrupts on */
/*CFI_REL_OFFSET rflags,0*/
- pushq $__KERNEL_CS /* cs */
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi $__KERNEL_CS /* cs */
/*CFI_REL_OFFSET cs,0*/
- pushq \child_rip /* rip */
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi \child_rip /* rip */
CFI_REL_OFFSET rip,0
- pushq %rax /* orig rax */
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi %rax /* orig rax */
.endm
.macro UNFAKE_STACK_FRAME
LOCK ; btr $TIF_FORK,TI_flags(%r8)
- push kernel_eflags(%rip)
- CFI_ADJUST_CFA_OFFSET 8
- popf # reset kernel eflags
- CFI_ADJUST_CFA_OFFSET -8
+ pushq_cfi kernel_eflags(%rip)
+ popfq_cfi # reset kernel eflags
call schedule_tail # rdi: 'prev' task parameter
jnc sysret_signal
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
- pushq %rdi
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi %rdi
call schedule
- popq %rdi
- CFI_ADJUST_CFA_OFFSET -8
+ popq_cfi %rdi
jmp sysret_check
/* Handle a signal */
jnc int_very_careful
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
- pushq %rdi
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi %rdi
call schedule
- popq %rdi
- CFI_ADJUST_CFA_OFFSET -8
+ popq_cfi %rdi
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check
/* Check for syscall exit trace */
testl $_TIF_WORK_SYSCALL_EXIT,%edx
jz int_signal
- pushq %rdi
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi %rdi
leaq 8(%rsp),%rdi # &ptregs -> arg1
call syscall_trace_leave
- popq %rdi
- CFI_ADJUST_CFA_OFFSET -8
+ popq_cfi %rdi
andl $~(_TIF_WORK_SYSCALL_EXIT|_TIF_SYSCALL_EMU),%edi
jmp int_restore_rest
ENTRY(stub_execve)
CFI_STARTPROC
- popq %r11
- CFI_ADJUST_CFA_OFFSET -8
- CFI_REGISTER rip, r11
+ addq $8, %rsp
+ PARTIAL_FRAME 0
SAVE_REST
FIXUP_TOP_OF_STACK %r11
movq %rsp, %rcx
ENTRY(stub_rt_sigreturn)
CFI_STARTPROC
addq $8, %rsp
- CFI_ADJUST_CFA_OFFSET -8
+ PARTIAL_FRAME 0
SAVE_REST
movq %rsp,%rdi
FIXUP_TOP_OF_STACK %r11
.if vector <> FIRST_EXTERNAL_VECTOR
CFI_ADJUST_CFA_OFFSET -8
.endif
-1: pushq $(~vector+0x80) /* Note: always in signed byte range */
- CFI_ADJUST_CFA_OFFSET 8
+1: pushq_cfi $(~vector+0x80) /* Note: always in signed byte range */
.if ((vector-FIRST_EXTERNAL_VECTOR)%7) <> 6
jmp 2f
.endif
/* 0(%rsp): ~(interrupt number) */
.macro interrupt func
- subq $10*8, %rsp
- CFI_ADJUST_CFA_OFFSET 10*8
+ subq $ORIG_RAX-ARGOFFSET+8, %rsp
+ CFI_ADJUST_CFA_OFFSET ORIG_RAX-ARGOFFSET+8
call save_args
PARTIAL_FRAME 0
call \func
TRACE_IRQS_OFF
decl PER_CPU_VAR(irq_count)
leaveq
+ CFI_RESTORE rbp
CFI_DEF_CFA_REGISTER rsp
CFI_ADJUST_CFA_OFFSET -8
exit_intr:
jnc retint_signal
TRACE_IRQS_ON
ENABLE_INTERRUPTS(CLBR_NONE)
- pushq %rdi
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi %rdi
call schedule
- popq %rdi
- CFI_ADJUST_CFA_OFFSET -8
+ popq_cfi %rdi
GET_THREAD_INFO(%rcx)
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
.macro apicinterrupt num sym do_sym
ENTRY(\sym)
INTR_FRAME
- pushq $~(\num)
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi $~(\num)
interrupt \do_sym
jmp ret_from_intr
CFI_ENDPROC
apicinterrupt SPURIOUS_APIC_VECTOR \
spurious_interrupt smp_spurious_interrupt
-#ifdef CONFIG_PERF_EVENTS
-apicinterrupt LOCAL_PENDING_VECTOR \
- perf_pending_interrupt smp_perf_pending_interrupt
+#ifdef CONFIG_IRQ_WORK
+apicinterrupt IRQ_WORK_VECTOR \
+ irq_work_interrupt smp_irq_work_interrupt
#endif
/*
INTR_FRAME
PARAVIRT_ADJUST_EXCEPTION_FRAME
pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */
- subq $15*8,%rsp
- CFI_ADJUST_CFA_OFFSET 15*8
+ subq $ORIG_RAX-R15, %rsp
+ CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
call error_entry
DEFAULT_FRAME 0
movq %rsp,%rdi /* pt_regs pointer */
ENTRY(\sym)
INTR_FRAME
PARAVIRT_ADJUST_EXCEPTION_FRAME
- pushq $-1 /* ORIG_RAX: no syscall to restart */
- CFI_ADJUST_CFA_OFFSET 8
- subq $15*8, %rsp
+ pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */
+ subq $ORIG_RAX-R15, %rsp
+ CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
call save_paranoid
TRACE_IRQS_OFF
movq %rsp,%rdi /* pt_regs pointer */
ENTRY(\sym)
INTR_FRAME
PARAVIRT_ADJUST_EXCEPTION_FRAME
- pushq $-1 /* ORIG_RAX: no syscall to restart */
- CFI_ADJUST_CFA_OFFSET 8
- subq $15*8, %rsp
+ pushq_cfi $-1 /* ORIG_RAX: no syscall to restart */
+ subq $ORIG_RAX-R15, %rsp
+ CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
call save_paranoid
TRACE_IRQS_OFF
movq %rsp,%rdi /* pt_regs pointer */
ENTRY(\sym)
XCPT_FRAME
PARAVIRT_ADJUST_EXCEPTION_FRAME
- subq $15*8,%rsp
- CFI_ADJUST_CFA_OFFSET 15*8
+ subq $ORIG_RAX-R15, %rsp
+ CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
call error_entry
DEFAULT_FRAME 0
movq %rsp,%rdi /* pt_regs pointer */
ENTRY(\sym)
XCPT_FRAME
PARAVIRT_ADJUST_EXCEPTION_FRAME
- subq $15*8,%rsp
- CFI_ADJUST_CFA_OFFSET 15*8
+ subq $ORIG_RAX-R15, %rsp
+ CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
call save_paranoid
DEFAULT_FRAME 0
TRACE_IRQS_OFF
/* edi: new selector */
ENTRY(native_load_gs_index)
CFI_STARTPROC
- pushf
- CFI_ADJUST_CFA_OFFSET 8
+ pushfq_cfi
DISABLE_INTERRUPTS(CLBR_ANY & ~CLBR_RDI)
SWAPGS
gs_change:
movl %edi,%gs
2: mfence /* workaround */
SWAPGS
- popf
- CFI_ADJUST_CFA_OFFSET -8
+ popfq_cfi
ret
CFI_ENDPROC
END(native_load_gs_index)
/* Call softirq on interrupt stack. Interrupts are off. */
ENTRY(call_softirq)
CFI_STARTPROC
- push %rbp
- CFI_ADJUST_CFA_OFFSET 8
+ pushq_cfi %rbp
CFI_REL_OFFSET rbp,0
mov %rsp,%rbp
CFI_DEF_CFA_REGISTER rbp
push %rbp # backlink for old unwinder
call __do_softirq
leaveq
+ CFI_RESTORE rbp
CFI_DEF_CFA_REGISTER rsp
CFI_ADJUST_CFA_OFFSET -8
decl PER_CPU_VAR(irq_count)
/* ebx: no swapgs flag */
ENTRY(paranoid_exit)
- INTR_FRAME
+ DEFAULT_FRAME
DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
testl %ebx,%ebx /* swapgs needed? */
error_sti:
TRACE_IRQS_OFF
ret
- CFI_ENDPROC
/*
* There are two places in the kernel that can potentially fault with
/* Fix truncated RIP */
movq %rcx,RIP+8(%rsp)
jmp error_swapgs
+ CFI_ENDPROC
END(error_entry)
INTR_FRAME
PARAVIRT_ADJUST_EXCEPTION_FRAME
pushq_cfi $-1
- subq $15*8, %rsp
- CFI_ADJUST_CFA_OFFSET 15*8
+ subq $ORIG_RAX-R15, %rsp
+ CFI_ADJUST_CFA_OFFSET ORIG_RAX-R15
call save_paranoid
DEFAULT_FRAME 0
/* paranoidentry do_nmi, 0; without TRACE_IRQS_OFF */
return mod_code_status;
}
-
-
-
-static unsigned char ftrace_nop[MCOUNT_INSN_SIZE];
-
static unsigned char *ftrace_nop_replace(void)
{
- return ftrace_nop;
+ return ideal_nop5;
}
static int
int __init ftrace_dyn_arch_init(void *data)
{
- extern const unsigned char ftrace_test_p6nop[];
- extern const unsigned char ftrace_test_nop5[];
- extern const unsigned char ftrace_test_jmp[];
- int faulted = 0;
-
- /*
- * There is no good nop for all x86 archs.
- * We will default to using the P6_NOP5, but first we
- * will test to make sure that the nop will actually
- * work on this CPU. If it faults, we will then
- * go to a lesser efficient 5 byte nop. If that fails
- * we then just use a jmp as our nop. This isn't the most
- * efficient nop, but we can not use a multi part nop
- * since we would then risk being preempted in the middle
- * of that nop, and if we enabled tracing then, it might
- * cause a system crash.
- *
- * TODO: check the cpuid to determine the best nop.
- */
- asm volatile (
- "ftrace_test_jmp:"
- "jmp ftrace_test_p6nop\n"
- "nop\n"
- "nop\n"
- "nop\n" /* 2 byte jmp + 3 bytes */
- "ftrace_test_p6nop:"
- P6_NOP5
- "jmp 1f\n"
- "ftrace_test_nop5:"
- ".byte 0x66,0x66,0x66,0x66,0x90\n"
- "1:"
- ".section .fixup, \"ax\"\n"
- "2: movl $1, %0\n"
- " jmp ftrace_test_nop5\n"
- "3: movl $2, %0\n"
- " jmp 1b\n"
- ".previous\n"
- _ASM_EXTABLE(ftrace_test_p6nop, 2b)
- _ASM_EXTABLE(ftrace_test_nop5, 3b)
- : "=r"(faulted) : "0" (faulted));
-
- switch (faulted) {
- case 0:
- pr_info("converting mcount calls to 0f 1f 44 00 00\n");
- memcpy(ftrace_nop, ftrace_test_p6nop, MCOUNT_INSN_SIZE);
- break;
- case 1:
- pr_info("converting mcount calls to 66 66 66 66 90\n");
- memcpy(ftrace_nop, ftrace_test_nop5, MCOUNT_INSN_SIZE);
- break;
- case 2:
- pr_info("converting mcount calls to jmp . + 5\n");
- memcpy(ftrace_nop, ftrace_test_jmp, MCOUNT_INSN_SIZE);
- break;
- }
-
/* The return code is retured via data */
*(unsigned long *)data = 0;
unsigned long hpet_address;
u8 hpet_blockid; /* OS timer block num */
u8 hpet_msi_disable;
-u8 hpet_readback_cmp;
#ifdef CONFIG_PCI_MSI
static unsigned long hpet_num_timers;
* at that point and we would wait for the next hpet interrupt
* forever. We found out that reading the CMP register back
* forces the transfer so we can rely on the comparison with
- * the counter register below.
+ * the counter register below. If the read back from the
+ * compare register does not match the value we programmed
+ * then we might have a real hardware problem. We can not do
+ * much about it here, but at least alert the user/admin with
+ * a prominent warning.
*
- * That works fine on those ATI chipsets, but on newer Intel
- * chipsets (ICH9...) this triggers due to an erratum: Reading
- * the comparator immediately following a write is returning
- * the old value.
+ * An erratum on some chipsets (ICH9,..), results in
+ * comparator read immediately following a write returning old
+ * value. Workaround for this is to read this value second
+ * time, when first read returns old value.
*
- * We restrict the read back to the affected ATI chipsets (set
- * by quirks) and also run it with hpet=verbose for debugging
- * purposes.
+ * In fact the write to the comparator register is delayed up
+ * to two HPET cycles so the workaround we tried to restrict
+ * the readback to those known to be borked ATI chipsets
+ * failed miserably. So we give up on optimizations forever
+ * and penalize all HPET incarnations unconditionally.
*/
- if (hpet_readback_cmp || hpet_verbose) {
- u32 cmp = hpet_readl(HPET_Tn_CMP(timer));
-
- if (cmp != cnt)
+ if (unlikely((u32)hpet_readl(HPET_Tn_CMP(timer)) != cnt)) {
+ if (hpet_readl(HPET_Tn_CMP(timer)) != cnt)
printk_once(KERN_WARNING
- "hpet: compare register read back failed.\n");
+ "hpet: compare register read back failed.\n");
}
return (s32)(hpet_readl(HPET_COUNTER) - cnt) >= 0 ? -ETIME : 0;
{
unsigned int irq;
- irq = create_irq();
+ irq = create_irq_nr(0, -1);
if (!irq)
return -EINVAL;
int arch_bp_generic_fields(int x86_len, int x86_type,
int *gen_len, int *gen_type)
{
- /* Len */
- switch (x86_len) {
- case X86_BREAKPOINT_LEN_X:
+ /* Type */
+ switch (x86_type) {
+ case X86_BREAKPOINT_EXECUTE:
+ if (x86_len != X86_BREAKPOINT_LEN_X)
+ return -EINVAL;
+
+ *gen_type = HW_BREAKPOINT_X;
*gen_len = sizeof(long);
+ return 0;
+ case X86_BREAKPOINT_WRITE:
+ *gen_type = HW_BREAKPOINT_W;
break;
+ case X86_BREAKPOINT_RW:
+ *gen_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ /* Len */
+ switch (x86_len) {
case X86_BREAKPOINT_LEN_1:
*gen_len = HW_BREAKPOINT_LEN_1;
break;
return -EINVAL;
}
- /* Type */
- switch (x86_type) {
- case X86_BREAKPOINT_EXECUTE:
- *gen_type = HW_BREAKPOINT_X;
- break;
- case X86_BREAKPOINT_WRITE:
- *gen_type = HW_BREAKPOINT_W;
- break;
- case X86_BREAKPOINT_RW:
- *gen_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
- break;
- default:
- return -EINVAL;
- }
-
return 0;
}
ret = -EINVAL;
switch (info->len) {
- case X86_BREAKPOINT_LEN_X:
- align = sizeof(long) -1;
- break;
case X86_BREAKPOINT_LEN_1:
align = 0;
break;
*/
if (!HAVE_HWFP) {
+ /*
+ * Disable xsave as we do not support it if i387
+ * emulation is enabled.
+ */
+ setup_clear_cpu_cap(X86_FEATURE_XSAVE);
+ setup_clear_cpu_cap(X86_FEATURE_XSAVEOPT);
xstate_size = sizeof(struct i387_soft_struct);
return;
}
if (cpu_has_fxsr)
xstate_size = sizeof(struct i387_fxsave_struct);
-#ifdef CONFIG_X86_32
else
xstate_size = sizeof(struct i387_fsave_struct);
-#endif
}
-#ifdef CONFIG_X86_64
/*
* Called at bootup to set up the initial FPU state that is later cloned
* into all processes.
void __cpuinit fpu_init(void)
{
- unsigned long oldcr0 = read_cr0();
-
- set_in_cr4(X86_CR4_OSFXSR);
- set_in_cr4(X86_CR4_OSXMMEXCPT);
+ unsigned long cr0;
+ unsigned long cr4_mask = 0;
- write_cr0(oldcr0 & ~(X86_CR0_TS|X86_CR0_EM)); /* clear TS and EM */
+ if (cpu_has_fxsr)
+ cr4_mask |= X86_CR4_OSFXSR;
+ if (cpu_has_xmm)
+ cr4_mask |= X86_CR4_OSXMMEXCPT;
+ if (cr4_mask)
+ set_in_cr4(cr4_mask);
+
+ cr0 = read_cr0();
+ cr0 &= ~(X86_CR0_TS|X86_CR0_EM); /* clear TS and EM */
+ if (!HAVE_HWFP)
+ cr0 |= X86_CR0_EM;
+ write_cr0(cr0);
if (!smp_processor_id())
init_thread_xstate();
clear_used_math();
}
-#else /* CONFIG_X86_64 */
-
-void __cpuinit fpu_init(void)
-{
- if (!smp_processor_id())
- init_thread_xstate();
-}
-
-#endif /* CONFIG_X86_32 */
-
void fpu_finit(struct fpu *fpu)
{
-#ifdef CONFIG_X86_32
if (!HAVE_HWFP) {
finit_soft_fpu(&fpu->state->soft);
return;
}
-#endif
if (cpu_has_fxsr) {
struct i387_fxsave_struct *fx = &fpu->state->fxsave;
#ifdef CONFIG_X86_64
env->fip = fxsave->rip;
env->foo = fxsave->rdp;
+ /*
+ * should be actually ds/cs at fpu exception time, but
+ * that information is not available in 64bit mode.
+ */
+ env->fcs = task_pt_regs(tsk)->cs;
if (tsk == current) {
- /*
- * should be actually ds/cs at fpu exception time, but
- * that information is not available in 64bit mode.
- */
- asm("mov %%ds, %[fos]" : [fos] "=r" (env->fos));
- asm("mov %%cs, %[fcs]" : [fcs] "=r" (env->fcs));
+ savesegment(ds, env->fos);
} else {
- struct pt_regs *regs = task_pt_regs(tsk);
-
- env->fos = 0xffff0000 | tsk->thread.ds;
- env->fcs = regs->cs;
+ env->fos = tsk->thread.ds;
}
+ env->fos |= 0xffff0000;
#else
env->fip = fxsave->fip;
env->fcs = (u16) fxsave->fcs | ((u32) fxsave->fop << 16);
for_each_online_cpu(j)
seq_printf(p, "%10u ", irq_stats(j)->apic_perf_irqs);
seq_printf(p, " Performance monitoring interrupts\n");
- seq_printf(p, "%*s: ", prec, "PND");
+ seq_printf(p, "%*s: ", prec, "IWI");
for_each_online_cpu(j)
- seq_printf(p, "%10u ", irq_stats(j)->apic_pending_irqs);
- seq_printf(p, " Performance pending work\n");
+ seq_printf(p, "%10u ", irq_stats(j)->apic_irq_work_irqs);
+ seq_printf(p, " IRQ work interrupts\n");
#endif
if (x86_platform_ipi_callback) {
seq_printf(p, "%*s: ", prec, "PLT");
sum += irq_stats(cpu)->apic_timer_irqs;
sum += irq_stats(cpu)->irq_spurious_count;
sum += irq_stats(cpu)->apic_perf_irqs;
- sum += irq_stats(cpu)->apic_pending_irqs;
+ sum += irq_stats(cpu)->apic_irq_work_irqs;
#endif
if (x86_platform_ipi_callback)
sum += irq_stats(cpu)->x86_platform_ipis;
--- /dev/null
+/*
+ * x86 specific code for irq_work
+ *
+ * Copyright (C) 2010 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
+ */
+
+#include <linux/kernel.h>
+#include <linux/irq_work.h>
+#include <linux/hardirq.h>
+#include <asm/apic.h>
+
+void smp_irq_work_interrupt(struct pt_regs *regs)
+{
+ irq_enter();
+ ack_APIC_irq();
+ inc_irq_stat(apic_irq_work_irqs);
+ irq_work_run();
+ irq_exit();
+}
+
+void arch_irq_work_raise(void)
+{
+#ifdef CONFIG_X86_LOCAL_APIC
+ if (!cpu_has_apic)
+ return;
+
+ apic->send_IPI_self(IRQ_WORK_VECTOR);
+ apic_wait_icr_idle();
+#endif
+}
alloc_intr_gate(SPURIOUS_APIC_VECTOR, spurious_interrupt);
alloc_intr_gate(ERROR_APIC_VECTOR, error_interrupt);
- /* Performance monitoring interrupts: */
-# ifdef CONFIG_PERF_EVENTS
- alloc_intr_gate(LOCAL_PENDING_VECTOR, perf_pending_interrupt);
+ /* IRQ work interrupts: */
+# ifdef CONFIG_IRQ_WORK
+ alloc_intr_gate(IRQ_WORK_VECTOR, irq_work_interrupt);
# endif
#endif
--- /dev/null
+/*
+ * jump label x86 support
+ *
+ * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
+ *
+ */
+#include <linux/jump_label.h>
+#include <linux/memory.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/jhash.h>
+#include <linux/cpu.h>
+#include <asm/kprobes.h>
+#include <asm/alternative.h>
+
+#ifdef HAVE_JUMP_LABEL
+
+union jump_code_union {
+ char code[JUMP_LABEL_NOP_SIZE];
+ struct {
+ char jump;
+ int offset;
+ } __attribute__((packed));
+};
+
+void arch_jump_label_transform(struct jump_entry *entry,
+ enum jump_label_type type)
+{
+ union jump_code_union code;
+
+ if (type == JUMP_LABEL_ENABLE) {
+ code.jump = 0xe9;
+ code.offset = entry->target -
+ (entry->code + JUMP_LABEL_NOP_SIZE);
+ } else
+ memcpy(&code, ideal_nop5, JUMP_LABEL_NOP_SIZE);
+ get_online_cpus();
+ mutex_lock(&text_mutex);
+ text_poke_smp((void *)entry->code, &code, JUMP_LABEL_NOP_SIZE);
+ mutex_unlock(&text_mutex);
+ put_online_cpus();
+}
+
+void arch_jump_label_text_poke_early(jump_label_t addr)
+{
+ text_poke_early((void *)addr, ideal_nop5, JUMP_LABEL_NOP_SIZE);
+}
+
+#endif
return 0;
}
-/* Dummy buffers for kallsyms_lookup */
-static char __dummy_buf[KSYM_NAME_LEN];
-
/* Check if paddr is at an instruction boundary */
static int __kprobes can_probe(unsigned long paddr)
{
struct insn insn;
kprobe_opcode_t buf[MAX_INSN_SIZE];
- if (!kallsyms_lookup(paddr, NULL, &offset, NULL, __dummy_buf))
+ if (!kallsyms_lookup_size_offset(paddr, NULL, &offset))
return 0;
/* Decode instructions */
*(unsigned long *)addr = val;
}
-void __kprobes kprobes_optinsn_template_holder(void)
+static void __used __kprobes kprobes_optinsn_template_holder(void)
{
asm volatile (
".global optprobe_template_entry\n"
}
/* Check whether the address range is reserved */
if (ftrace_text_reserved(src, src + len - 1) ||
- alternatives_text_reserved(src, src + len - 1))
+ alternatives_text_reserved(src, src + len - 1) ||
+ jump_label_text_reserved(src, src + len - 1))
return -EBUSY;
return len;
unsigned long addr, size = 0, offset = 0;
struct insn insn;
kprobe_opcode_t buf[MAX_INSN_SIZE];
- /* Dummy buffers for lookup_symbol_attrs */
- static char __dummy_buf[KSYM_NAME_LEN];
/* Lookup symbol including addr */
- if (!kallsyms_lookup(paddr, &size, &offset, NULL, __dummy_buf))
+ if (!kallsyms_lookup_size_offset(paddr, &size, &offset))
return 0;
/* Check there is enough space for a relative jump. */
if (!page)
goto out;
pud = (pud_t *)page_address(page);
- memset(pud, 0, PAGE_SIZE);
+ clear_page(pud);
set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
}
pud = pud_offset(pgd, addr);
if (!page)
goto out;
pmd = (pmd_t *)page_address(page);
- memset(pmd, 0, PAGE_SIZE);
+ clear_page(pmd);
set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
}
pmd = pmd_offset(pud, addr);
apply_paravirt(pseg, pseg + para->sh_size);
}
- return module_bug_finalize(hdr, sechdrs, me);
+ /* make jump label nops */
+ jump_label_apply_nops(me);
+
+ return 0;
}
void module_arch_cleanup(struct module *mod)
{
alternatives_smp_module_del(mod);
- module_bug_cleanup(mod);
}
--- /dev/null
+/*
+ * Support for features of the OLPC XO-1 laptop
+ *
+ * Copyright (C) 2010 One Laptop per Child
+ * Copyright (C) 2006 Red Hat, Inc.
+ * Copyright (C) 2006 Advanced Micro Devices, Inc.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/module.h>
+#include <linux/pci.h>
+#include <linux/pci_ids.h>
+#include <linux/platform_device.h>
+#include <linux/pm.h>
+
+#include <asm/io.h>
+#include <asm/olpc.h>
+
+#define DRV_NAME "olpc-xo1"
+
+#define PMS_BAR 4
+#define ACPI_BAR 5
+
+/* PMC registers (PMS block) */
+#define PM_SCLK 0x10
+#define PM_IN_SLPCTL 0x20
+#define PM_WKXD 0x34
+#define PM_WKD 0x30
+#define PM_SSC 0x54
+
+/* PM registers (ACPI block) */
+#define PM1_CNT 0x08
+#define PM_GPE0_STS 0x18
+
+static unsigned long acpi_base;
+static unsigned long pms_base;
+
+static void xo1_power_off(void)
+{
+ printk(KERN_INFO "OLPC XO-1 power off sequence...\n");
+
+ /* Enable all of these controls with 0 delay */
+ outl(0x40000000, pms_base + PM_SCLK);
+ outl(0x40000000, pms_base + PM_IN_SLPCTL);
+ outl(0x40000000, pms_base + PM_WKXD);
+ outl(0x40000000, pms_base + PM_WKD);
+
+ /* Clear status bits (possibly unnecessary) */
+ outl(0x0002ffff, pms_base + PM_SSC);
+ outl(0xffffffff, acpi_base + PM_GPE0_STS);
+
+ /* Write SLP_EN bit to start the machinery */
+ outl(0x00002000, acpi_base + PM1_CNT);
+}
+
+/* Read the base addresses from the PCI BAR info */
+static int __devinit setup_bases(struct pci_dev *pdev)
+{
+ int r;
+
+ r = pci_enable_device_io(pdev);
+ if (r) {
+ dev_err(&pdev->dev, "can't enable device IO\n");
+ return r;
+ }
+
+ r = pci_request_region(pdev, ACPI_BAR, DRV_NAME);
+ if (r) {
+ dev_err(&pdev->dev, "can't alloc PCI BAR #%d\n", ACPI_BAR);
+ return r;
+ }
+
+ r = pci_request_region(pdev, PMS_BAR, DRV_NAME);
+ if (r) {
+ dev_err(&pdev->dev, "can't alloc PCI BAR #%d\n", PMS_BAR);
+ pci_release_region(pdev, ACPI_BAR);
+ return r;
+ }
+
+ acpi_base = pci_resource_start(pdev, ACPI_BAR);
+ pms_base = pci_resource_start(pdev, PMS_BAR);
+
+ return 0;
+}
+
+static int __devinit olpc_xo1_probe(struct platform_device *pdev)
+{
+ struct pci_dev *pcidev;
+ int r;
+
+ pcidev = pci_get_device(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CS5536_ISA,
+ NULL);
+ if (!pdev)
+ return -ENODEV;
+
+ r = setup_bases(pcidev);
+ if (r)
+ return r;
+
+ pm_power_off = xo1_power_off;
+
+ printk(KERN_INFO "OLPC XO-1 support registered\n");
+ return 0;
+}
+
+static int __devexit olpc_xo1_remove(struct platform_device *pdev)
+{
+ pm_power_off = NULL;
+ return 0;
+}
+
+static struct platform_driver olpc_xo1_driver = {
+ .driver = {
+ .name = DRV_NAME,
+ .owner = THIS_MODULE,
+ },
+ .probe = olpc_xo1_probe,
+ .remove = __devexit_p(olpc_xo1_remove),
+};
+
+static int __init olpc_xo1_init(void)
+{
+ return platform_driver_register(&olpc_xo1_driver);
+}
+
+static void __exit olpc_xo1_exit(void)
+{
+ platform_driver_unregister(&olpc_xo1_driver);
+}
+
+MODULE_AUTHOR("Daniel Drake <dsd@laptop.org>");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS("platform:olpc-xo1");
+
+module_init(olpc_xo1_init);
+module_exit(olpc_xo1_exit);
#include <linux/spinlock.h>
#include <linux/io.h>
#include <linux/string.h>
+#include <linux/platform_device.h>
#include <asm/geode.h>
#include <asm/setup.h>
unsigned long flags;
int ret = -EIO;
int i;
+ int restarts = 0;
spin_lock_irqsave(&ec_lock, flags);
if (wait_on_obf(0x6c, 1)) {
printk(KERN_ERR "olpc-ec: timeout waiting for"
" EC to provide data!\n");
- goto restart;
+ if (restarts++ < 10)
+ goto restart;
+ goto err;
}
outbuf[i] = inb(0x68);
pr_devel("olpc-ec: received 0x%x\n", outbuf[i]);
}
EXPORT_SYMBOL_GPL(olpc_ec_cmd);
-#ifdef CONFIG_OLPC_OPENFIRMWARE
-static void __init platform_detect(void)
+static bool __init check_ofw_architecture(void)
+{
+ size_t propsize;
+ char olpc_arch[5];
+ const void *args[] = { NULL, "architecture", olpc_arch, (void *)5 };
+ void *res[] = { &propsize };
+
+ if (olpc_ofw("getprop", args, res)) {
+ printk(KERN_ERR "ofw: getprop call failed!\n");
+ return false;
+ }
+ return propsize == 5 && strncmp("OLPC", olpc_arch, 5) == 0;
+}
+
+static u32 __init get_board_revision(void)
{
size_t propsize;
__be32 rev;
if (olpc_ofw("getprop", args, res) || propsize != 4) {
printk(KERN_ERR "ofw: getprop call failed!\n");
- rev = cpu_to_be32(0);
+ return cpu_to_be32(0);
}
- olpc_platform_info.boardrev = be32_to_cpu(rev);
+ return be32_to_cpu(rev);
}
-#else
-static void __init platform_detect(void)
+
+static bool __init platform_detect(void)
{
- /* stopgap until OFW support is added to the kernel */
- olpc_platform_info.boardrev = olpc_board(0xc2);
+ if (!check_ofw_architecture())
+ return false;
+ olpc_platform_info.flags |= OLPC_F_PRESENT;
+ olpc_platform_info.boardrev = get_board_revision();
+ return true;
}
-#endif
-static int __init olpc_init(void)
+static int __init add_xo1_platform_devices(void)
{
- unsigned char *romsig;
+ struct platform_device *pdev;
- /* The ioremap check is dangerous; limit what we run it on */
- if (!is_geode() || cs5535_has_vsa2())
- return 0;
+ pdev = platform_device_register_simple("xo1-rfkill", -1, NULL, 0);
+ if (IS_ERR(pdev))
+ return PTR_ERR(pdev);
- spin_lock_init(&ec_lock);
+ pdev = platform_device_register_simple("olpc-xo1", -1, NULL, 0);
+ if (IS_ERR(pdev))
+ return PTR_ERR(pdev);
- romsig = ioremap(0xffffffc0, 16);
- if (!romsig)
- return 0;
+ return 0;
+}
- if (strncmp(romsig, "CL1 Q", 7))
- goto unmap;
- if (strncmp(romsig+6, romsig+13, 3)) {
- printk(KERN_INFO "OLPC BIOS signature looks invalid. "
- "Assuming not OLPC\n");
- goto unmap;
- }
+static int __init olpc_init(void)
+{
+ int r = 0;
- printk(KERN_INFO "OLPC board with OpenFirmware %.16s\n", romsig);
- olpc_platform_info.flags |= OLPC_F_PRESENT;
+ if (!olpc_ofw_present() || !platform_detect())
+ return 0;
- /* get the platform revision */
- platform_detect();
+ spin_lock_init(&ec_lock);
/* assume B1 and above models always have a DCON */
if (olpc_board_at_least(olpc_board(0xb1)))
(unsigned char *) &olpc_platform_info.ecver, 1);
#ifdef CONFIG_PCI_OLPC
- /* If the VSA exists let it emulate PCI, if not emulate in kernel */
- if (!cs5535_has_vsa2())
+ /* If the VSA exists let it emulate PCI, if not emulate in kernel.
+ * XO-1 only. */
+ if (olpc_platform_info.boardrev < olpc_board_pre(0xd0) &&
+ !cs5535_has_vsa2())
x86_init.pci.arch_init = pci_olpc_init;
#endif
olpc_platform_info.boardrev >> 4,
olpc_platform_info.ecver);
-unmap:
- iounmap(romsig);
+ if (olpc_platform_info.boardrev < olpc_board_pre(0xd0)) { /* XO-1 */
+ r = add_xo1_platform_devices();
+ if (r)
+ return r;
+ }
+
return 0;
}
}
EXPORT_SYMBOL_GPL(__olpc_ofw);
+bool olpc_ofw_present(void)
+{
+ return olpc_ofw_cif != NULL;
+}
+EXPORT_SYMBOL_GPL(olpc_ofw_present);
+
/* OFW cif _should_ be above this address */
#define OFW_MIN 0xff000000
.alloc_pte = paravirt_nop,
.alloc_pmd = paravirt_nop,
- .alloc_pmd_clone = paravirt_nop,
.alloc_pud = paravirt_nop,
.release_pte = paravirt_nop,
.release_pmd = paravirt_nop,
#include <asm/cacheflush.h>
#include <asm/swiotlb.h>
#include <asm/dma.h>
-#include <asm/k8.h>
+#include <asm/amd_nb.h>
#include <asm/x86_init.h>
static unsigned long iommu_bus_base; /* GART remapping area (physical) */
{
int i;
- for (i = 0; i < num_k8_northbridges; i++) {
- struct pci_dev *dev = k8_northbridges[i];
+ if (!k8_northbridges.gart_supported)
+ return;
+
+ for (i = 0; i < k8_northbridges.num; i++) {
+ struct pci_dev *dev = k8_northbridges.nb_misc[i];
enable_gart_translation(dev, __pa(agp_gatt_table));
}
if (!fix_up_north_bridges)
return;
+ if (!k8_northbridges.gart_supported)
+ return;
+
pr_info("PCI-DMA: Restoring GART aperture settings\n");
- for (i = 0; i < num_k8_northbridges; i++) {
- struct pci_dev *dev = k8_northbridges[i];
+ for (i = 0; i < k8_northbridges.num; i++) {
+ struct pci_dev *dev = k8_northbridges.nb_misc[i];
/*
* Don't enable translations just yet. That is the next
* step. Restore the pre-suspend aperture settings.
*/
- pci_write_config_dword(dev, AMD64_GARTAPERTURECTL, aperture_order << 1);
+ gart_set_size_and_enable(dev, aperture_order);
pci_write_config_dword(dev, AMD64_GARTAPERTUREBASE, aperture_alloc >> 25);
}
}
aper_size = aper_base = info->aper_size = 0;
dev = NULL;
- for (i = 0; i < num_k8_northbridges; i++) {
- dev = k8_northbridges[i];
+ for (i = 0; i < k8_northbridges.num; i++) {
+ dev = k8_northbridges.nb_misc[i];
new_aper_base = read_aperture(dev, &new_aper_size);
if (!new_aper_base)
goto nommu;
if (!no_agp)
return;
- for (i = 0; i < num_k8_northbridges; i++) {
+ if (!k8_northbridges.gart_supported)
+ return;
+
+ for (i = 0; i < k8_northbridges.num; i++) {
u32 ctl;
- dev = k8_northbridges[i];
+ dev = k8_northbridges.nb_misc[i];
pci_read_config_dword(dev, AMD64_GARTAPERTURECTL, &ctl);
ctl &= ~GARTEN;
unsigned long scratch;
long i;
- if (num_k8_northbridges == 0)
+ if (!k8_northbridges.gart_supported)
return 0;
#ifndef CONFIG_AGP_AMD64
+++ /dev/null
-/* Ported over from i386 by AK, original copyright was:
- *
- * (C) Dominik Brodowski <linux@brodo.de> 2003
- *
- * Driver to use the Power Management Timer (PMTMR) available in some
- * southbridges as primary timing source for the Linux kernel.
- *
- * Based on parts of linux/drivers/acpi/hardware/hwtimer.c, timer_pit.c,
- * timer_hpet.c, and on Arjan van de Ven's implementation for 2.4.
- *
- * This file is licensed under the GPL v2.
- *
- * Dropped all the hardware bug workarounds for now. Hopefully they
- * are not needed on 64bit chipsets.
- */
-
-#include <linux/jiffies.h>
-#include <linux/kernel.h>
-#include <linux/time.h>
-#include <linux/init.h>
-#include <linux/cpumask.h>
-#include <linux/acpi_pmtmr.h>
-
-#include <asm/io.h>
-#include <asm/proto.h>
-#include <asm/msr.h>
-#include <asm/vsyscall.h>
-
-static inline u32 cyc2us(u32 cycles)
-{
- /* The Power Management Timer ticks at 3.579545 ticks per microsecond.
- * 1 / PM_TIMER_FREQUENCY == 0.27936511 =~ 286/1024 [error: 0.024%]
- *
- * Even with HZ = 100, delta is at maximum 35796 ticks, so it can
- * easily be multiplied with 286 (=0x11E) without having to fear
- * u32 overflows.
- */
- cycles *= 286;
- return (cycles >> 10);
-}
-
-static unsigned pmtimer_wait_tick(void)
-{
- u32 a, b;
- for (a = b = inl(pmtmr_ioport) & ACPI_PM_MASK;
- a == b;
- b = inl(pmtmr_ioport) & ACPI_PM_MASK)
- cpu_relax();
- return b;
-}
-
-/* note: wait time is rounded up to one tick */
-void pmtimer_wait(unsigned us)
-{
- u32 a, b;
- a = pmtimer_wait_tick();
- do {
- b = inl(pmtmr_ioport);
- cpu_relax();
- } while (cyc2us(b - a) < us);
-}
-
-static int __init nopmtimer_setup(char *s)
-{
- pmtmr_ioport = 0;
- return 1;
-}
-
-__setup("nopmtimer", nopmtimer_setup);
load_TLS(next, cpu);
/* Must be after DS reload */
- unlazy_fpu(prev_p);
+ __unlazy_fpu(prev_p);
/* Make sure cpu is ready for new context */
if (preload_fpu)
}
/* we will leave sorting out the final value
when we are ready to reboot, since we might not
- have set up boot_cpu_id or smp_num_cpu */
+ have detected BSP APIC ID or smp_num_cpu */
break;
#endif /* CONFIG_SMP */
#include <asm/dmi.h>
#include <asm/io_apic.h>
#include <asm/ist.h>
-#include <asm/vmi.h>
#include <asm/setup_arch.h>
#include <asm/bios_ebda.h>
#include <asm/cacheflush.h>
#include <asm/percpu.h>
#include <asm/topology.h>
#include <asm/apicdef.h>
-#include <asm/k8.h>
+#include <asm/amd_nb.h>
#ifdef CONFIG_X86_64
#include <asm/numa_64.h>
#endif
#include <asm/mce.h>
+#include <asm/alternative.h>
/*
* end_pfn only includes RAM, while max_pfn_mapped includes all e820 entries.
RESERVE_BRK(dmi_alloc, 65536);
#endif
-unsigned int boot_cpu_id __read_mostly;
static __initdata unsigned long _brk_start = (unsigned long)__brk_base;
unsigned long _brk_end = (unsigned long)__brk_base;
reserve_early_overlap_ok(addr, addr + size, "ibft");
}
-#ifdef CONFIG_X86_RESERVE_LOW_64K
-static int __init dmi_low_memory_corruption(const struct dmi_system_id *d)
-{
- printk(KERN_NOTICE
- "%s detected: BIOS may corrupt low RAM, working around it.\n",
- d->ident);
-
- e820_update_range(0, 0x10000, E820_RAM, E820_RESERVED);
- sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
-
- return 0;
-}
-#endif
-
-/* List of systems that have known low memory corruption BIOS problems */
-static struct dmi_system_id __initdata bad_bios_dmi_table[] = {
-#ifdef CONFIG_X86_RESERVE_LOW_64K
- {
- .callback = dmi_low_memory_corruption,
- .ident = "AMI BIOS",
- .matches = {
- DMI_MATCH(DMI_BIOS_VENDOR, "American Megatrends Inc."),
- },
- },
- {
- .callback = dmi_low_memory_corruption,
- .ident = "Phoenix BIOS",
- .matches = {
- DMI_MATCH(DMI_BIOS_VENDOR, "Phoenix Technologies"),
- },
- },
- {
- .callback = dmi_low_memory_corruption,
- .ident = "Phoenix/MSC BIOS",
- .matches = {
- DMI_MATCH(DMI_BIOS_VENDOR, "Phoenix/MSC"),
- },
- },
- /*
- * AMI BIOS with low memory corruption was found on Intel DG45ID and
- * DG45FC boards.
- * It has a different DMI_BIOS_VENDOR = "Intel Corp.", for now we will
- * match only DMI_BOARD_NAME and see if there is more bad products
- * with this vendor.
- */
- {
- .callback = dmi_low_memory_corruption,
- .ident = "AMI BIOS",
- .matches = {
- DMI_MATCH(DMI_BOARD_NAME, "DG45ID"),
- },
- },
- {
- .callback = dmi_low_memory_corruption,
- .ident = "AMI BIOS",
- .matches = {
- DMI_MATCH(DMI_BOARD_NAME, "DG45FC"),
- },
- },
- /*
- * The Dell Inspiron Mini 1012 has DMI_BIOS_VENDOR = "Dell Inc.", so
- * match on the product name.
- */
- {
- .callback = dmi_low_memory_corruption,
- .ident = "Phoenix BIOS",
- .matches = {
- DMI_MATCH(DMI_PRODUCT_NAME, "Inspiron 1012"),
- },
- },
-#endif
- {}
-};
+static unsigned reserve_low = CONFIG_X86_RESERVE_LOW << 10;
static void __init trim_bios_range(void)
{
* A special case is the first 4Kb of memory;
* This is a BIOS owned area, not kernel ram, but generally
* not listed as such in the E820 table.
+ *
+ * This typically reserves additional memory (64KiB by default)
+ * since some BIOSes are known to corrupt low memory. See the
+ * Kconfig help text for X86_RESERVE_LOW.
*/
- e820_update_range(0, PAGE_SIZE, E820_RAM, E820_RESERVED);
+ e820_update_range(0, ALIGN(reserve_low, PAGE_SIZE),
+ E820_RAM, E820_RESERVED);
+
/*
* special case: Some BIOSen report the PC BIOS
* area (640->1Mb) as ram even though it is not.
sanitize_e820_map(e820.map, ARRAY_SIZE(e820.map), &e820.nr_map);
}
+static int __init parse_reservelow(char *p)
+{
+ unsigned long long size;
+
+ if (!p)
+ return -EINVAL;
+
+ size = memparse(p, &p);
+
+ if (size < 4096)
+ size = 4096;
+
+ if (size > 640*1024)
+ size = 640*1024;
+
+ reserve_low = size;
+
+ return 0;
+}
+
+early_param("reservelow", parse_reservelow);
+
/*
* Determine if we were loaded by an EFI loader. If so, then we have also been
* passed the efi memmap, systab, etc., so we should use these data structures
{
int acpi = 0;
int k8 = 0;
+ unsigned long flags;
#ifdef CONFIG_X86_32
memcpy(&boot_cpu_data, &new_cpu_data, sizeof(new_cpu_data));
printk(KERN_INFO "Command line: %s\n", boot_command_line);
#endif
- /* VMI may relocate the fixmap; do this before touching ioremap area */
- vmi_init();
-
- /* OFW also may relocate the fixmap */
+ /*
+ * If we have OLPC OFW, we might end up relocating the fixmap due to
+ * reserve_top(), so do this before touching the ioremap area.
+ */
olpc_ofw_detect();
early_trap_init();
x86_report_nx();
- /* Must be before kernel pagetables are setup */
- vmi_activate();
-
/* after early param, so could get panic from serial */
reserve_early_setup_data();
dmi_scan_machine();
- dmi_check_system(bad_bios_dmi_table);
-
/*
* VMware detection requires dmi to be available, so this
* needs to be done after dmi_scan_machine, for the BP.
x86_init.oem.banner();
mcheck_init();
+
+ local_irq_save(flags);
+ arch_init_ideal_nop5();
+ local_irq_restore(flags);
}
#ifdef CONFIG_X86_32
* Up to this point, the boot CPU has been using .init.data
* area. Reload any changed state for the boot CPU.
*/
- if (cpu == boot_cpu_id)
+ if (!cpu)
switch_to_new_gdt(cpu);
}
#ifdef CONFIG_X86_LOCAL_APIC
static unsigned long sfi_lapic_addr __initdata = APIC_DEFAULT_PHYS_BASE;
-void __init mp_sfi_register_lapic_address(unsigned long address)
+static void __init mp_sfi_register_lapic_address(unsigned long address)
{
mp_lapic_addr = address;
}
/* All CPUs enumerated by SFI must be present and enabled */
-void __cpuinit mp_sfi_register_lapic(u8 id)
+static void __cpuinit mp_sfi_register_lapic(u8 id)
{
if (MAX_APICS - id <= 0) {
pr_warning("Processor #%d invalid (max %d)\n",
#include <asm/pgtable.h>
#include <asm/tlbflush.h>
#include <asm/mtrr.h>
-#include <asm/vmi.h>
+#include <asm/mwait.h>
#include <asm/apic.h>
#include <asm/setup.h>
#include <asm/uv/uv.h>
__flush_tlb_all();
#endif
- vmi_bringup();
cpu_init();
preempt_disable();
smp_callin();
identify_secondary_cpu(c);
}
+static void __cpuinit link_thread_siblings(int cpu1, int cpu2)
+{
+ struct cpuinfo_x86 *c1 = &cpu_data(cpu1);
+ struct cpuinfo_x86 *c2 = &cpu_data(cpu2);
+
+ cpumask_set_cpu(cpu1, cpu_sibling_mask(cpu2));
+ cpumask_set_cpu(cpu2, cpu_sibling_mask(cpu1));
+ cpumask_set_cpu(cpu1, cpu_core_mask(cpu2));
+ cpumask_set_cpu(cpu2, cpu_core_mask(cpu1));
+ cpumask_set_cpu(cpu1, c2->llc_shared_map);
+ cpumask_set_cpu(cpu2, c1->llc_shared_map);
+}
+
void __cpuinit set_cpu_sibling_map(int cpu)
{
for_each_cpu(i, cpu_sibling_setup_mask) {
struct cpuinfo_x86 *o = &cpu_data(i);
- if (c->phys_proc_id == o->phys_proc_id &&
- c->cpu_core_id == o->cpu_core_id) {
- cpumask_set_cpu(i, cpu_sibling_mask(cpu));
- cpumask_set_cpu(cpu, cpu_sibling_mask(i));
- cpumask_set_cpu(i, cpu_core_mask(cpu));
- cpumask_set_cpu(cpu, cpu_core_mask(i));
- cpumask_set_cpu(i, c->llc_shared_map);
- cpumask_set_cpu(cpu, o->llc_shared_map);
+ if (cpu_has(c, X86_FEATURE_TOPOEXT)) {
+ if (c->phys_proc_id == o->phys_proc_id &&
+ c->compute_unit_id == o->compute_unit_id)
+ link_thread_siblings(cpu, i);
+ } else if (c->phys_proc_id == o->phys_proc_id &&
+ c->cpu_core_id == o->cpu_core_id) {
+ link_thread_siblings(cpu, i);
}
}
} else {
}
set_cpu_sibling_map(0);
- enable_IR_x2apic();
- default_setup_apic_routing();
if (smp_sanity_check(max_cpus) < 0) {
printk(KERN_INFO "SMP disabled\n");
goto out;
}
+ default_setup_apic_routing();
+
preempt_disable();
if (read_apic_id() != boot_cpu_physical_apicid) {
panic("Boot APIC ID in local APIC unexpected (%d vs %d)",
local_irq_disable();
}
+/*
+ * We need to flush the caches before going to sleep, lest we have
+ * dirty data in our caches when we come back up.
+ */
+static inline void mwait_play_dead(void)
+{
+ unsigned int eax, ebx, ecx, edx;
+ unsigned int highest_cstate = 0;
+ unsigned int highest_subcstate = 0;
+ int i;
+ void *mwait_ptr;
+
+ if (!cpu_has(¤t_cpu_data, X86_FEATURE_MWAIT))
+ return;
+ if (!cpu_has(¤t_cpu_data, X86_FEATURE_CLFLSH))
+ return;
+ if (current_cpu_data.cpuid_level < CPUID_MWAIT_LEAF)
+ return;
+
+ eax = CPUID_MWAIT_LEAF;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+
+ /*
+ * eax will be 0 if EDX enumeration is not valid.
+ * Initialized below to cstate, sub_cstate value when EDX is valid.
+ */
+ if (!(ecx & CPUID5_ECX_EXTENSIONS_SUPPORTED)) {
+ eax = 0;
+ } else {
+ edx >>= MWAIT_SUBSTATE_SIZE;
+ for (i = 0; i < 7 && edx; i++, edx >>= MWAIT_SUBSTATE_SIZE) {
+ if (edx & MWAIT_SUBSTATE_MASK) {
+ highest_cstate = i;
+ highest_subcstate = edx & MWAIT_SUBSTATE_MASK;
+ }
+ }
+ eax = (highest_cstate << MWAIT_SUBSTATE_SIZE) |
+ (highest_subcstate - 1);
+ }
+
+ /*
+ * This should be a memory location in a cache line which is
+ * unlikely to be touched by other processors. The actual
+ * content is immaterial as it is not actually modified in any way.
+ */
+ mwait_ptr = ¤t_thread_info()->flags;
+
+ wbinvd();
+
+ while (1) {
+ /*
+ * The CLFLUSH is a workaround for erratum AAI65 for
+ * the Xeon 7400 series. It's not clear it is actually
+ * needed, but it should be harmless in either case.
+ * The WBINVD is insufficient due to the spurious-wakeup
+ * case where we return around the loop.
+ */
+ clflush(mwait_ptr);
+ __monitor(mwait_ptr, 0, 0);
+ mb();
+ __mwait(eax, 0);
+ }
+}
+
+static inline void hlt_play_dead(void)
+{
+ if (current_cpu_data.x86 >= 4)
+ wbinvd();
+
+ while (1) {
+ native_halt();
+ }
+}
+
void native_play_dead(void)
{
play_dead_common();
tboot_shutdown(TB_SHUTDOWN_WFS);
- wbinvd_halt();
+
+ mwait_play_dead(); /* Only returns on failure */
+ hlt_play_dead();
}
#else /* ... !CONFIG_HOTPLUG_CPU */
const char *const envp[])
{
long __res;
- asm volatile ("push %%ebx ; movl %2,%%ebx ; int $0x80 ; pop %%ebx"
+ asm volatile ("int $0x80"
: "=a" (__res)
- : "0" (__NR_execve), "ri" (filename), "c" (argv), "d" (envp) : "memory");
+ : "0" (__NR_execve), "b" (filename), "c" (argv), "d" (envp) : "memory");
return __res;
}
/* Copy kernel address range */
clone_pgd_range(trampoline_pg_dir + KERNEL_PGD_BOUNDARY,
swapper_pg_dir + KERNEL_PGD_BOUNDARY,
- min_t(unsigned long, KERNEL_PGD_PTRS,
- KERNEL_PGD_BOUNDARY));
+ KERNEL_PGD_PTRS);
/* Initialize low mappings */
clone_pgd_range(trampoline_pg_dir,
}
EXPORT_SYMBOL_GPL(math_state_restore);
-#ifndef CONFIG_MATH_EMULATION
-void math_emulate(struct math_emu_info *info)
-{
- printk(KERN_EMERG
- "math-emulation not enabled and no coprocessor found.\n");
- printk(KERN_EMERG "killing %s.\n", current->comm);
- force_sig(SIGFPE, current);
- schedule();
-}
-#endif /* CONFIG_MATH_EMULATION */
-
dotraplinkage void __kprobes
do_device_not_available(struct pt_regs *regs, long error_code)
{
-#ifdef CONFIG_X86_32
+#ifdef CONFIG_MATH_EMULATION
if (read_cr0() & X86_CR0_EM) {
struct math_emu_info info = { };
info.regs = regs;
math_emulate(&info);
- } else {
- math_state_restore(); /* interrupts still off */
- conditional_sti(regs);
+ return;
}
-#else
- math_state_restore();
+#endif
+ math_state_restore(); /* interrupts still off */
+#ifdef CONFIG_X86_32
+ conditional_sti(regs);
#endif
}
#endif
#ifdef CONFIG_X86_32
- if (cpu_has_fxsr) {
- printk(KERN_INFO "Enabling fast FPU save and restore... ");
- set_in_cr4(X86_CR4_OSFXSR);
- printk("done.\n");
- }
- if (cpu_has_xmm) {
- printk(KERN_INFO
- "Enabling unmasked SIMD FPU exception support... ");
- set_in_cr4(X86_CR4_OSXMMEXCPT);
- printk("done.\n");
- }
-
set_system_trap_gate(SYSCALL_VECTOR, &system_call);
set_bit(SYSCALL_VECTOR, used_vectors);
#endif
__setup("notsc", notsc_setup);
+static int no_sched_irq_time;
+
static int __init tsc_setup(char *str)
{
if (!strcmp(str, "reliable"))
tsc_clocksource_reliable = 1;
+ if (!strncmp(str, "noirqtime", 9))
+ no_sched_irq_time = 1;
return 1;
}
local_irq_save(flags);
- get_cpu_var(cyc2ns_offset) = 0;
+ __get_cpu_var(cyc2ns_offset) = 0;
offset = cyc2ns_suspend - sched_clock();
for_each_possible_cpu(cpu)
if (!tsc_unstable) {
tsc_unstable = 1;
sched_clock_stable = 0;
+ disable_sched_clock_irqtime();
printk(KERN_INFO "Marking TSC unstable due to %s\n", reason);
/* Change only the rating, when not registered */
if (clocksource_tsc.mult)
clocksource_register_khz(&clocksource_tsc, tsc_khz);
}
-#ifdef CONFIG_X86_64
-/*
- * calibrate_cpu is used on systems with fixed rate TSCs to determine
- * processor frequency
- */
-#define TICK_COUNT 100000000
-static unsigned long __init calibrate_cpu(void)
-{
- int tsc_start, tsc_now;
- int i, no_ctr_free;
- unsigned long evntsel3 = 0, pmc3 = 0, pmc_now = 0;
- unsigned long flags;
-
- for (i = 0; i < 4; i++)
- if (avail_to_resrv_perfctr_nmi_bit(i))
- break;
- no_ctr_free = (i == 4);
- if (no_ctr_free) {
- WARN(1, KERN_WARNING "Warning: AMD perfctrs busy ... "
- "cpu_khz value may be incorrect.\n");
- i = 3;
- rdmsrl(MSR_K7_EVNTSEL3, evntsel3);
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- rdmsrl(MSR_K7_PERFCTR3, pmc3);
- } else {
- reserve_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- reserve_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
- local_irq_save(flags);
- /* start measuring cycles, incrementing from 0 */
- wrmsrl(MSR_K7_PERFCTR0 + i, 0);
- wrmsrl(MSR_K7_EVNTSEL0 + i, 1 << 22 | 3 << 16 | 0x76);
- rdtscl(tsc_start);
- do {
- rdmsrl(MSR_K7_PERFCTR0 + i, pmc_now);
- tsc_now = get_cycles();
- } while ((tsc_now - tsc_start) < TICK_COUNT);
-
- local_irq_restore(flags);
- if (no_ctr_free) {
- wrmsrl(MSR_K7_EVNTSEL3, 0);
- wrmsrl(MSR_K7_PERFCTR3, pmc3);
- wrmsrl(MSR_K7_EVNTSEL3, evntsel3);
- } else {
- release_perfctr_nmi(MSR_K7_PERFCTR0 + i);
- release_evntsel_nmi(MSR_K7_EVNTSEL0 + i);
- }
-
- return pmc_now * tsc_khz / (tsc_now - tsc_start);
-}
-#else
-static inline unsigned long calibrate_cpu(void) { return cpu_khz; }
-#endif
-
void __init tsc_init(void)
{
u64 lpj;
return;
}
- if (cpu_has(&boot_cpu_data, X86_FEATURE_CONSTANT_TSC) &&
- (boot_cpu_data.x86_vendor == X86_VENDOR_AMD))
- cpu_khz = calibrate_cpu();
-
printk("Detected %lu.%03lu MHz processor.\n",
(unsigned long)cpu_khz / 1000,
(unsigned long)cpu_khz % 1000);
/* now allow native_sched_clock() to use rdtsc */
tsc_disabled = 0;
+ if (!no_sched_irq_time)
+ enable_sched_clock_irqtime();
+
lpj = ((u64)tsc_khz * 1000);
do_div(lpj, HZ);
lpj_fine = lpj;
+++ /dev/null
-/*
- * VMI specific paravirt-ops implementation
- *
- * Copyright (C) 2005, VMware, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT. See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- * Send feedback to zach@vmware.com
- *
- */
-
-#include <linux/module.h>
-#include <linux/cpu.h>
-#include <linux/bootmem.h>
-#include <linux/mm.h>
-#include <linux/highmem.h>
-#include <linux/sched.h>
-#include <linux/gfp.h>
-#include <asm/vmi.h>
-#include <asm/io.h>
-#include <asm/fixmap.h>
-#include <asm/apicdef.h>
-#include <asm/apic.h>
-#include <asm/pgalloc.h>
-#include <asm/processor.h>
-#include <asm/timer.h>
-#include <asm/vmi_time.h>
-#include <asm/kmap_types.h>
-#include <asm/setup.h>
-
-/* Convenient for calling VMI functions indirectly in the ROM */
-typedef u32 __attribute__((regparm(1))) (VROMFUNC)(void);
-typedef u64 __attribute__((regparm(2))) (VROMLONGFUNC)(int);
-
-#define call_vrom_func(rom,func) \
- (((VROMFUNC *)(rom->func))())
-
-#define call_vrom_long_func(rom,func,arg) \
- (((VROMLONGFUNC *)(rom->func)) (arg))
-
-static struct vrom_header *vmi_rom;
-static int disable_pge;
-static int disable_pse;
-static int disable_sep;
-static int disable_tsc;
-static int disable_mtrr;
-static int disable_noidle;
-static int disable_vmi_timer;
-
-/* Cached VMI operations */
-static struct {
- void (*cpuid)(void /* non-c */);
- void (*_set_ldt)(u32 selector);
- void (*set_tr)(u32 selector);
- void (*write_idt_entry)(struct desc_struct *, int, u32, u32);
- void (*write_gdt_entry)(struct desc_struct *, int, u32, u32);
- void (*write_ldt_entry)(struct desc_struct *, int, u32, u32);
- void (*set_kernel_stack)(u32 selector, u32 sp0);
- void (*allocate_page)(u32, u32, u32, u32, u32);
- void (*release_page)(u32, u32);
- void (*set_pte)(pte_t, pte_t *, unsigned);
- void (*update_pte)(pte_t *, unsigned);
- void (*set_linear_mapping)(int, void *, u32, u32);
- void (*_flush_tlb)(int);
- void (*set_initial_ap_state)(int, int);
- void (*halt)(void);
- void (*set_lazy_mode)(int mode);
-} vmi_ops;
-
-/* Cached VMI operations */
-struct vmi_timer_ops vmi_timer_ops;
-
-/*
- * VMI patching routines.
- */
-#define MNEM_CALL 0xe8
-#define MNEM_JMP 0xe9
-#define MNEM_RET 0xc3
-
-#define IRQ_PATCH_INT_MASK 0
-#define IRQ_PATCH_DISABLE 5
-
-static inline void patch_offset(void *insnbuf,
- unsigned long ip, unsigned long dest)
-{
- *(unsigned long *)(insnbuf+1) = dest-ip-5;
-}
-
-static unsigned patch_internal(int call, unsigned len, void *insnbuf,
- unsigned long ip)
-{
- u64 reloc;
- struct vmi_relocation_info *const rel = (struct vmi_relocation_info *)&reloc;
- reloc = call_vrom_long_func(vmi_rom, get_reloc, call);
- switch(rel->type) {
- case VMI_RELOCATION_CALL_REL:
- BUG_ON(len < 5);
- *(char *)insnbuf = MNEM_CALL;
- patch_offset(insnbuf, ip, (unsigned long)rel->eip);
- return 5;
-
- case VMI_RELOCATION_JUMP_REL:
- BUG_ON(len < 5);
- *(char *)insnbuf = MNEM_JMP;
- patch_offset(insnbuf, ip, (unsigned long)rel->eip);
- return 5;
-
- case VMI_RELOCATION_NOP:
- /* obliterate the whole thing */
- return 0;
-
- case VMI_RELOCATION_NONE:
- /* leave native code in place */
- break;
-
- default:
- BUG();
- }
- return len;
-}
-
-/*
- * Apply patch if appropriate, return length of new instruction
- * sequence. The callee does nop padding for us.
- */
-static unsigned vmi_patch(u8 type, u16 clobbers, void *insns,
- unsigned long ip, unsigned len)
-{
- switch (type) {
- case PARAVIRT_PATCH(pv_irq_ops.irq_disable):
- return patch_internal(VMI_CALL_DisableInterrupts, len,
- insns, ip);
- case PARAVIRT_PATCH(pv_irq_ops.irq_enable):
- return patch_internal(VMI_CALL_EnableInterrupts, len,
- insns, ip);
- case PARAVIRT_PATCH(pv_irq_ops.restore_fl):
- return patch_internal(VMI_CALL_SetInterruptMask, len,
- insns, ip);
- case PARAVIRT_PATCH(pv_irq_ops.save_fl):
- return patch_internal(VMI_CALL_GetInterruptMask, len,
- insns, ip);
- case PARAVIRT_PATCH(pv_cpu_ops.iret):
- return patch_internal(VMI_CALL_IRET, len, insns, ip);
- case PARAVIRT_PATCH(pv_cpu_ops.irq_enable_sysexit):
- return patch_internal(VMI_CALL_SYSEXIT, len, insns, ip);
- default:
- break;
- }
- return len;
-}
-
-/* CPUID has non-C semantics, and paravirt-ops API doesn't match hardware ISA */
-static void vmi_cpuid(unsigned int *ax, unsigned int *bx,
- unsigned int *cx, unsigned int *dx)
-{
- int override = 0;
- if (*ax == 1)
- override = 1;
- asm volatile ("call *%6"
- : "=a" (*ax),
- "=b" (*bx),
- "=c" (*cx),
- "=d" (*dx)
- : "0" (*ax), "2" (*cx), "r" (vmi_ops.cpuid));
- if (override) {
- if (disable_pse)
- *dx &= ~X86_FEATURE_PSE;
- if (disable_pge)
- *dx &= ~X86_FEATURE_PGE;
- if (disable_sep)
- *dx &= ~X86_FEATURE_SEP;
- if (disable_tsc)
- *dx &= ~X86_FEATURE_TSC;
- if (disable_mtrr)
- *dx &= ~X86_FEATURE_MTRR;
- }
-}
-
-static inline void vmi_maybe_load_tls(struct desc_struct *gdt, int nr, struct desc_struct *new)
-{
- if (gdt[nr].a != new->a || gdt[nr].b != new->b)
- write_gdt_entry(gdt, nr, new, 0);
-}
-
-static void vmi_load_tls(struct thread_struct *t, unsigned int cpu)
-{
- struct desc_struct *gdt = get_cpu_gdt_table(cpu);
- vmi_maybe_load_tls(gdt, GDT_ENTRY_TLS_MIN + 0, &t->tls_array[0]);
- vmi_maybe_load_tls(gdt, GDT_ENTRY_TLS_MIN + 1, &t->tls_array[1]);
- vmi_maybe_load_tls(gdt, GDT_ENTRY_TLS_MIN + 2, &t->tls_array[2]);
-}
-
-static void vmi_set_ldt(const void *addr, unsigned entries)
-{
- unsigned cpu = smp_processor_id();
- struct desc_struct desc;
-
- pack_descriptor(&desc, (unsigned long)addr,
- entries * sizeof(struct desc_struct) - 1,
- DESC_LDT, 0);
- write_gdt_entry(get_cpu_gdt_table(cpu), GDT_ENTRY_LDT, &desc, DESC_LDT);
- vmi_ops._set_ldt(entries ? GDT_ENTRY_LDT*sizeof(struct desc_struct) : 0);
-}
-
-static void vmi_set_tr(void)
-{
- vmi_ops.set_tr(GDT_ENTRY_TSS*sizeof(struct desc_struct));
-}
-
-static void vmi_write_idt_entry(gate_desc *dt, int entry, const gate_desc *g)
-{
- u32 *idt_entry = (u32 *)g;
- vmi_ops.write_idt_entry(dt, entry, idt_entry[0], idt_entry[1]);
-}
-
-static void vmi_write_gdt_entry(struct desc_struct *dt, int entry,
- const void *desc, int type)
-{
- u32 *gdt_entry = (u32 *)desc;
- vmi_ops.write_gdt_entry(dt, entry, gdt_entry[0], gdt_entry[1]);
-}
-
-static void vmi_write_ldt_entry(struct desc_struct *dt, int entry,
- const void *desc)
-{
- u32 *ldt_entry = (u32 *)desc;
- vmi_ops.write_ldt_entry(dt, entry, ldt_entry[0], ldt_entry[1]);
-}
-
-static void vmi_load_sp0(struct tss_struct *tss,
- struct thread_struct *thread)
-{
- tss->x86_tss.sp0 = thread->sp0;
-
- /* This can only happen when SEP is enabled, no need to test "SEP"arately */
- if (unlikely(tss->x86_tss.ss1 != thread->sysenter_cs)) {
- tss->x86_tss.ss1 = thread->sysenter_cs;
- wrmsr(MSR_IA32_SYSENTER_CS, thread->sysenter_cs, 0);
- }
- vmi_ops.set_kernel_stack(__KERNEL_DS, tss->x86_tss.sp0);
-}
-
-static void vmi_flush_tlb_user(void)
-{
- vmi_ops._flush_tlb(VMI_FLUSH_TLB);
-}
-
-static void vmi_flush_tlb_kernel(void)
-{
- vmi_ops._flush_tlb(VMI_FLUSH_TLB | VMI_FLUSH_GLOBAL);
-}
-
-/* Stub to do nothing at all; used for delays and unimplemented calls */
-static void vmi_nop(void)
-{
-}
-
-static void vmi_allocate_pte(struct mm_struct *mm, unsigned long pfn)
-{
- vmi_ops.allocate_page(pfn, VMI_PAGE_L1, 0, 0, 0);
-}
-
-static void vmi_allocate_pmd(struct mm_struct *mm, unsigned long pfn)
-{
- /*
- * This call comes in very early, before mem_map is setup.
- * It is called only for swapper_pg_dir, which already has
- * data on it.
- */
- vmi_ops.allocate_page(pfn, VMI_PAGE_L2, 0, 0, 0);
-}
-
-static void vmi_allocate_pmd_clone(unsigned long pfn, unsigned long clonepfn, unsigned long start, unsigned long count)
-{
- vmi_ops.allocate_page(pfn, VMI_PAGE_L2 | VMI_PAGE_CLONE, clonepfn, start, count);
-}
-
-static void vmi_release_pte(unsigned long pfn)
-{
- vmi_ops.release_page(pfn, VMI_PAGE_L1);
-}
-
-static void vmi_release_pmd(unsigned long pfn)
-{
- vmi_ops.release_page(pfn, VMI_PAGE_L2);
-}
-
-/*
- * We use the pgd_free hook for releasing the pgd page:
- */
-static void vmi_pgd_free(struct mm_struct *mm, pgd_t *pgd)
-{
- unsigned long pfn = __pa(pgd) >> PAGE_SHIFT;
-
- vmi_ops.release_page(pfn, VMI_PAGE_L2);
-}
-
-/*
- * Helper macros for MMU update flags. We can defer updates until a flush
- * or page invalidation only if the update is to the current address space
- * (otherwise, there is no flush). We must check against init_mm, since
- * this could be a kernel update, which usually passes init_mm, although
- * sometimes this check can be skipped if we know the particular function
- * is only called on user mode PTEs. We could change the kernel to pass
- * current->active_mm here, but in particular, I was unsure if changing
- * mm/highmem.c to do this would still be correct on other architectures.
- */
-#define is_current_as(mm, mustbeuser) ((mm) == current->active_mm || \
- (!mustbeuser && (mm) == &init_mm))
-#define vmi_flags_addr(mm, addr, level, user) \
- ((level) | (is_current_as(mm, user) ? \
- (VMI_PAGE_CURRENT_AS | ((addr) & VMI_PAGE_VA_MASK)) : 0))
-#define vmi_flags_addr_defer(mm, addr, level, user) \
- ((level) | (is_current_as(mm, user) ? \
- (VMI_PAGE_DEFER | VMI_PAGE_CURRENT_AS | ((addr) & VMI_PAGE_VA_MASK)) : 0))
-
-static void vmi_update_pte(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
-{
- vmi_ops.update_pte(ptep, vmi_flags_addr(mm, addr, VMI_PAGE_PT, 0));
-}
-
-static void vmi_update_pte_defer(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
-{
- vmi_ops.update_pte(ptep, vmi_flags_addr_defer(mm, addr, VMI_PAGE_PT, 0));
-}
-
-static void vmi_set_pte(pte_t *ptep, pte_t pte)
-{
- /* XXX because of set_pmd_pte, this can be called on PT or PD layers */
- vmi_ops.set_pte(pte, ptep, VMI_PAGE_PT);
-}
-
-static void vmi_set_pte_at(struct mm_struct *mm, unsigned long addr, pte_t *ptep, pte_t pte)
-{
- vmi_ops.set_pte(pte, ptep, vmi_flags_addr(mm, addr, VMI_PAGE_PT, 0));
-}
-
-static void vmi_set_pmd(pmd_t *pmdp, pmd_t pmdval)
-{
-#ifdef CONFIG_X86_PAE
- const pte_t pte = { .pte = pmdval.pmd };
-#else
- const pte_t pte = { pmdval.pud.pgd.pgd };
-#endif
- vmi_ops.set_pte(pte, (pte_t *)pmdp, VMI_PAGE_PD);
-}
-
-#ifdef CONFIG_X86_PAE
-
-static void vmi_set_pte_atomic(pte_t *ptep, pte_t pteval)
-{
- /*
- * XXX This is called from set_pmd_pte, but at both PT
- * and PD layers so the VMI_PAGE_PT flag is wrong. But
- * it is only called for large page mapping changes,
- * the Xen backend, doesn't support large pages, and the
- * ESX backend doesn't depend on the flag.
- */
- set_64bit((unsigned long long *)ptep,pte_val(pteval));
- vmi_ops.update_pte(ptep, VMI_PAGE_PT);
-}
-
-static void vmi_set_pud(pud_t *pudp, pud_t pudval)
-{
- /* Um, eww */
- const pte_t pte = { .pte = pudval.pgd.pgd };
- vmi_ops.set_pte(pte, (pte_t *)pudp, VMI_PAGE_PDP);
-}
-
-static void vmi_pte_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep)
-{
- const pte_t pte = { .pte = 0 };
- vmi_ops.set_pte(pte, ptep, vmi_flags_addr(mm, addr, VMI_PAGE_PT, 0));
-}
-
-static void vmi_pmd_clear(pmd_t *pmd)
-{
- const pte_t pte = { .pte = 0 };
- vmi_ops.set_pte(pte, (pte_t *)pmd, VMI_PAGE_PD);
-}
-#endif
-
-#ifdef CONFIG_SMP
-static void __devinit
-vmi_startup_ipi_hook(int phys_apicid, unsigned long start_eip,
- unsigned long start_esp)
-{
- struct vmi_ap_state ap;
-
- /* Default everything to zero. This is fine for most GPRs. */
- memset(&ap, 0, sizeof(struct vmi_ap_state));
-
- ap.gdtr_limit = GDT_SIZE - 1;
- ap.gdtr_base = (unsigned long) get_cpu_gdt_table(phys_apicid);
-
- ap.idtr_limit = IDT_ENTRIES * 8 - 1;
- ap.idtr_base = (unsigned long) idt_table;
-
- ap.ldtr = 0;
-
- ap.cs = __KERNEL_CS;
- ap.eip = (unsigned long) start_eip;
- ap.ss = __KERNEL_DS;
- ap.esp = (unsigned long) start_esp;
-
- ap.ds = __USER_DS;
- ap.es = __USER_DS;
- ap.fs = __KERNEL_PERCPU;
- ap.gs = __KERNEL_STACK_CANARY;
-
- ap.eflags = 0;
-
-#ifdef CONFIG_X86_PAE
- /* efer should match BSP efer. */
- if (cpu_has_nx) {
- unsigned l, h;
- rdmsr(MSR_EFER, l, h);
- ap.efer = (unsigned long long) h << 32 | l;
- }
-#endif
-
- ap.cr3 = __pa(swapper_pg_dir);
- /* Protected mode, paging, AM, WP, NE, MP. */
- ap.cr0 = 0x80050023;
- ap.cr4 = mmu_cr4_features;
- vmi_ops.set_initial_ap_state((u32)&ap, phys_apicid);
-}
-#endif
-
-static void vmi_start_context_switch(struct task_struct *prev)
-{
- paravirt_start_context_switch(prev);
- vmi_ops.set_lazy_mode(2);
-}
-
-static void vmi_end_context_switch(struct task_struct *next)
-{
- vmi_ops.set_lazy_mode(0);
- paravirt_end_context_switch(next);
-}
-
-static void vmi_enter_lazy_mmu(void)
-{
- paravirt_enter_lazy_mmu();
- vmi_ops.set_lazy_mode(1);
-}
-
-static void vmi_leave_lazy_mmu(void)
-{
- vmi_ops.set_lazy_mode(0);
- paravirt_leave_lazy_mmu();
-}
-
-static inline int __init check_vmi_rom(struct vrom_header *rom)
-{
- struct pci_header *pci;
- struct pnp_header *pnp;
- const char *manufacturer = "UNKNOWN";
- const char *product = "UNKNOWN";
- const char *license = "unspecified";
-
- if (rom->rom_signature != 0xaa55)
- return 0;
- if (rom->vrom_signature != VMI_SIGNATURE)
- return 0;
- if (rom->api_version_maj != VMI_API_REV_MAJOR ||
- rom->api_version_min+1 < VMI_API_REV_MINOR+1) {
- printk(KERN_WARNING "VMI: Found mismatched rom version %d.%d\n",
- rom->api_version_maj,
- rom->api_version_min);
- return 0;
- }
-
- /*
- * Relying on the VMI_SIGNATURE field is not 100% safe, so check
- * the PCI header and device type to make sure this is really a
- * VMI device.
- */
- if (!rom->pci_header_offs) {
- printk(KERN_WARNING "VMI: ROM does not contain PCI header.\n");
- return 0;
- }
-
- pci = (struct pci_header *)((char *)rom+rom->pci_header_offs);
- if (pci->vendorID != PCI_VENDOR_ID_VMWARE ||
- pci->deviceID != PCI_DEVICE_ID_VMWARE_VMI) {
- /* Allow it to run... anyways, but warn */
- printk(KERN_WARNING "VMI: ROM from unknown manufacturer\n");
- }
-
- if (rom->pnp_header_offs) {
- pnp = (struct pnp_header *)((char *)rom+rom->pnp_header_offs);
- if (pnp->manufacturer_offset)
- manufacturer = (const char *)rom+pnp->manufacturer_offset;
- if (pnp->product_offset)
- product = (const char *)rom+pnp->product_offset;
- }
-
- if (rom->license_offs)
- license = (char *)rom+rom->license_offs;
-
- printk(KERN_INFO "VMI: Found %s %s, API version %d.%d, ROM version %d.%d\n",
- manufacturer, product,
- rom->api_version_maj, rom->api_version_min,
- pci->rom_version_maj, pci->rom_version_min);
-
- /* Don't allow BSD/MIT here for now because we don't want to end up
- with any binary only shim layers */
- if (strcmp(license, "GPL") && strcmp(license, "GPL v2")) {
- printk(KERN_WARNING "VMI: Non GPL license `%s' found for ROM. Not used.\n",
- license);
- return 0;
- }
-
- return 1;
-}
-
-/*
- * Probe for the VMI option ROM
- */
-static inline int __init probe_vmi_rom(void)
-{
- unsigned long base;
-
- /* VMI ROM is in option ROM area, check signature */
- for (base = 0xC0000; base < 0xE0000; base += 2048) {
- struct vrom_header *romstart;
- romstart = (struct vrom_header *)isa_bus_to_virt(base);
- if (check_vmi_rom(romstart)) {
- vmi_rom = romstart;
- return 1;
- }
- }
- return 0;
-}
-
-/*
- * VMI setup common to all processors
- */
-void vmi_bringup(void)
-{
- /* We must establish the lowmem mapping for MMU ops to work */
- if (vmi_ops.set_linear_mapping)
- vmi_ops.set_linear_mapping(0, (void *)__PAGE_OFFSET, MAXMEM_PFN, 0);
-}
-
-/*
- * Return a pointer to a VMI function or NULL if unimplemented
- */
-static void *vmi_get_function(int vmicall)
-{
- u64 reloc;
- const struct vmi_relocation_info *rel = (struct vmi_relocation_info *)&reloc;
- reloc = call_vrom_long_func(vmi_rom, get_reloc, vmicall);
- BUG_ON(rel->type == VMI_RELOCATION_JUMP_REL);
- if (rel->type == VMI_RELOCATION_CALL_REL)
- return (void *)rel->eip;
- else
- return NULL;
-}
-
-/*
- * Helper macro for making the VMI paravirt-ops fill code readable.
- * For unimplemented operations, fall back to default, unless nop
- * is returned by the ROM.
- */
-#define para_fill(opname, vmicall) \
-do { \
- reloc = call_vrom_long_func(vmi_rom, get_reloc, \
- VMI_CALL_##vmicall); \
- if (rel->type == VMI_RELOCATION_CALL_REL) \
- opname = (void *)rel->eip; \
- else if (rel->type == VMI_RELOCATION_NOP) \
- opname = (void *)vmi_nop; \
- else if (rel->type != VMI_RELOCATION_NONE) \
- printk(KERN_WARNING "VMI: Unknown relocation " \
- "type %d for " #vmicall"\n",\
- rel->type); \
-} while (0)
-
-/*
- * Helper macro for making the VMI paravirt-ops fill code readable.
- * For cached operations which do not match the VMI ROM ABI and must
- * go through a tranlation stub. Ignore NOPs, since it is not clear
- * a NOP * VMI function corresponds to a NOP paravirt-op when the
- * functions are not in 1-1 correspondence.
- */
-#define para_wrap(opname, wrapper, cache, vmicall) \
-do { \
- reloc = call_vrom_long_func(vmi_rom, get_reloc, \
- VMI_CALL_##vmicall); \
- BUG_ON(rel->type == VMI_RELOCATION_JUMP_REL); \
- if (rel->type == VMI_RELOCATION_CALL_REL) { \
- opname = wrapper; \
- vmi_ops.cache = (void *)rel->eip; \
- } \
-} while (0)
-
-/*
- * Activate the VMI interface and switch into paravirtualized mode
- */
-static inline int __init activate_vmi(void)
-{
- short kernel_cs;
- u64 reloc;
- const struct vmi_relocation_info *rel = (struct vmi_relocation_info *)&reloc;
-
- /*
- * Prevent page tables from being allocated in highmem, even if
- * CONFIG_HIGHPTE is enabled.
- */
- __userpte_alloc_gfp &= ~__GFP_HIGHMEM;
-
- if (call_vrom_func(vmi_rom, vmi_init) != 0) {
- printk(KERN_ERR "VMI ROM failed to initialize!");
- return 0;
- }
- savesegment(cs, kernel_cs);
-
- pv_info.paravirt_enabled = 1;
- pv_info.kernel_rpl = kernel_cs & SEGMENT_RPL_MASK;
- pv_info.name = "vmi [deprecated]";
-
- pv_init_ops.patch = vmi_patch;
-
- /*
- * Many of these operations are ABI compatible with VMI.
- * This means we can fill in the paravirt-ops with direct
- * pointers into the VMI ROM. If the calling convention for
- * these operations changes, this code needs to be updated.
- *
- * Exceptions
- * CPUID paravirt-op uses pointers, not the native ISA
- * halt has no VMI equivalent; all VMI halts are "safe"
- * no MSR support yet - just trap and emulate. VMI uses the
- * same ABI as the native ISA, but Linux wants exceptions
- * from bogus MSR read / write handled
- * rdpmc is not yet used in Linux
- */
-
- /* CPUID is special, so very special it gets wrapped like a present */
- para_wrap(pv_cpu_ops.cpuid, vmi_cpuid, cpuid, CPUID);
-
- para_fill(pv_cpu_ops.clts, CLTS);
- para_fill(pv_cpu_ops.get_debugreg, GetDR);
- para_fill(pv_cpu_ops.set_debugreg, SetDR);
- para_fill(pv_cpu_ops.read_cr0, GetCR0);
- para_fill(pv_mmu_ops.read_cr2, GetCR2);
- para_fill(pv_mmu_ops.read_cr3, GetCR3);
- para_fill(pv_cpu_ops.read_cr4, GetCR4);
- para_fill(pv_cpu_ops.write_cr0, SetCR0);
- para_fill(pv_mmu_ops.write_cr2, SetCR2);
- para_fill(pv_mmu_ops.write_cr3, SetCR3);
- para_fill(pv_cpu_ops.write_cr4, SetCR4);
-
- para_fill(pv_irq_ops.save_fl.func, GetInterruptMask);
- para_fill(pv_irq_ops.restore_fl.func, SetInterruptMask);
- para_fill(pv_irq_ops.irq_disable.func, DisableInterrupts);
- para_fill(pv_irq_ops.irq_enable.func, EnableInterrupts);
-
- para_fill(pv_cpu_ops.wbinvd, WBINVD);
- para_fill(pv_cpu_ops.read_tsc, RDTSC);
-
- /* The following we emulate with trap and emulate for now */
- /* paravirt_ops.read_msr = vmi_rdmsr */
- /* paravirt_ops.write_msr = vmi_wrmsr */
- /* paravirt_ops.rdpmc = vmi_rdpmc */
-
- /* TR interface doesn't pass TR value, wrap */
- para_wrap(pv_cpu_ops.load_tr_desc, vmi_set_tr, set_tr, SetTR);
-
- /* LDT is special, too */
- para_wrap(pv_cpu_ops.set_ldt, vmi_set_ldt, _set_ldt, SetLDT);
-
- para_fill(pv_cpu_ops.load_gdt, SetGDT);
- para_fill(pv_cpu_ops.load_idt, SetIDT);
- para_fill(pv_cpu_ops.store_gdt, GetGDT);
- para_fill(pv_cpu_ops.store_idt, GetIDT);
- para_fill(pv_cpu_ops.store_tr, GetTR);
- pv_cpu_ops.load_tls = vmi_load_tls;
- para_wrap(pv_cpu_ops.write_ldt_entry, vmi_write_ldt_entry,
- write_ldt_entry, WriteLDTEntry);
- para_wrap(pv_cpu_ops.write_gdt_entry, vmi_write_gdt_entry,
- write_gdt_entry, WriteGDTEntry);
- para_wrap(pv_cpu_ops.write_idt_entry, vmi_write_idt_entry,
- write_idt_entry, WriteIDTEntry);
- para_wrap(pv_cpu_ops.load_sp0, vmi_load_sp0, set_kernel_stack, UpdateKernelStack);
- para_fill(pv_cpu_ops.set_iopl_mask, SetIOPLMask);
- para_fill(pv_cpu_ops.io_delay, IODelay);
-
- para_wrap(pv_cpu_ops.start_context_switch, vmi_start_context_switch,
- set_lazy_mode, SetLazyMode);
- para_wrap(pv_cpu_ops.end_context_switch, vmi_end_context_switch,
- set_lazy_mode, SetLazyMode);
-
- para_wrap(pv_mmu_ops.lazy_mode.enter, vmi_enter_lazy_mmu,
- set_lazy_mode, SetLazyMode);
- para_wrap(pv_mmu_ops.lazy_mode.leave, vmi_leave_lazy_mmu,
- set_lazy_mode, SetLazyMode);
-
- /* user and kernel flush are just handled with different flags to FlushTLB */
- para_wrap(pv_mmu_ops.flush_tlb_user, vmi_flush_tlb_user, _flush_tlb, FlushTLB);
- para_wrap(pv_mmu_ops.flush_tlb_kernel, vmi_flush_tlb_kernel, _flush_tlb, FlushTLB);
- para_fill(pv_mmu_ops.flush_tlb_single, InvalPage);
-
- /*
- * Until a standard flag format can be agreed on, we need to
- * implement these as wrappers in Linux. Get the VMI ROM
- * function pointers for the two backend calls.
- */
-#ifdef CONFIG_X86_PAE
- vmi_ops.set_pte = vmi_get_function(VMI_CALL_SetPxELong);
- vmi_ops.update_pte = vmi_get_function(VMI_CALL_UpdatePxELong);
-#else
- vmi_ops.set_pte = vmi_get_function(VMI_CALL_SetPxE);
- vmi_ops.update_pte = vmi_get_function(VMI_CALL_UpdatePxE);
-#endif
-
- if (vmi_ops.set_pte) {
- pv_mmu_ops.set_pte = vmi_set_pte;
- pv_mmu_ops.set_pte_at = vmi_set_pte_at;
- pv_mmu_ops.set_pmd = vmi_set_pmd;
-#ifdef CONFIG_X86_PAE
- pv_mmu_ops.set_pte_atomic = vmi_set_pte_atomic;
- pv_mmu_ops.set_pud = vmi_set_pud;
- pv_mmu_ops.pte_clear = vmi_pte_clear;
- pv_mmu_ops.pmd_clear = vmi_pmd_clear;
-#endif
- }
-
- if (vmi_ops.update_pte) {
- pv_mmu_ops.pte_update = vmi_update_pte;
- pv_mmu_ops.pte_update_defer = vmi_update_pte_defer;
- }
-
- vmi_ops.allocate_page = vmi_get_function(VMI_CALL_AllocatePage);
- if (vmi_ops.allocate_page) {
- pv_mmu_ops.alloc_pte = vmi_allocate_pte;
- pv_mmu_ops.alloc_pmd = vmi_allocate_pmd;
- pv_mmu_ops.alloc_pmd_clone = vmi_allocate_pmd_clone;
- }
-
- vmi_ops.release_page = vmi_get_function(VMI_CALL_ReleasePage);
- if (vmi_ops.release_page) {
- pv_mmu_ops.release_pte = vmi_release_pte;
- pv_mmu_ops.release_pmd = vmi_release_pmd;
- pv_mmu_ops.pgd_free = vmi_pgd_free;
- }
-
- /* Set linear is needed in all cases */
- vmi_ops.set_linear_mapping = vmi_get_function(VMI_CALL_SetLinearMapping);
-
- /*
- * These MUST always be patched. Don't support indirect jumps
- * through these operations, as the VMI interface may use either
- * a jump or a call to get to these operations, depending on
- * the backend. They are performance critical anyway, so requiring
- * a patch is not a big problem.
- */
- pv_cpu_ops.irq_enable_sysexit = (void *)0xfeedbab0;
- pv_cpu_ops.iret = (void *)0xbadbab0;
-
-#ifdef CONFIG_SMP
- para_wrap(pv_apic_ops.startup_ipi_hook, vmi_startup_ipi_hook, set_initial_ap_state, SetInitialAPState);
-#endif
-
-#ifdef CONFIG_X86_LOCAL_APIC
- para_fill(apic->read, APICRead);
- para_fill(apic->write, APICWrite);
-#endif
-
- /*
- * Check for VMI timer functionality by probing for a cycle frequency method
- */
- reloc = call_vrom_long_func(vmi_rom, get_reloc, VMI_CALL_GetCycleFrequency);
- if (!disable_vmi_timer && rel->type != VMI_RELOCATION_NONE) {
- vmi_timer_ops.get_cycle_frequency = (void *)rel->eip;
- vmi_timer_ops.get_cycle_counter =
- vmi_get_function(VMI_CALL_GetCycleCounter);
- vmi_timer_ops.get_wallclock =
- vmi_get_function(VMI_CALL_GetWallclockTime);
- vmi_timer_ops.wallclock_updated =
- vmi_get_function(VMI_CALL_WallclockUpdated);
- vmi_timer_ops.set_alarm = vmi_get_function(VMI_CALL_SetAlarm);
- vmi_timer_ops.cancel_alarm =
- vmi_get_function(VMI_CALL_CancelAlarm);
- x86_init.timers.timer_init = vmi_time_init;
-#ifdef CONFIG_X86_LOCAL_APIC
- x86_init.timers.setup_percpu_clockev = vmi_time_bsp_init;
- x86_cpuinit.setup_percpu_clockev = vmi_time_ap_init;
-#endif
- pv_time_ops.sched_clock = vmi_sched_clock;
- x86_platform.calibrate_tsc = vmi_tsc_khz;
- x86_platform.get_wallclock = vmi_get_wallclock;
- x86_platform.set_wallclock = vmi_set_wallclock;
-
- /* We have true wallclock functions; disable CMOS clock sync */
- no_sync_cmos_clock = 1;
- } else {
- disable_noidle = 1;
- disable_vmi_timer = 1;
- }
-
- para_fill(pv_irq_ops.safe_halt, Halt);
-
- /*
- * Alternative instruction rewriting doesn't happen soon enough
- * to convert VMI_IRET to a call instead of a jump; so we have
- * to do this before IRQs get reenabled. Fortunately, it is
- * idempotent.
- */
- apply_paravirt(__parainstructions, __parainstructions_end);
-
- vmi_bringup();
-
- return 1;
-}
-
-#undef para_fill
-
-void __init vmi_init(void)
-{
- if (!vmi_rom)
- probe_vmi_rom();
- else
- check_vmi_rom(vmi_rom);
-
- /* In case probing for or validating the ROM failed, basil */
- if (!vmi_rom)
- return;
-
- reserve_top_address(-vmi_rom->virtual_top);
-
-#ifdef CONFIG_X86_IO_APIC
- /* This is virtual hardware; timer routing is wired correctly */
- no_timer_check = 1;
-#endif
-}
-
-void __init vmi_activate(void)
-{
- unsigned long flags;
-
- if (!vmi_rom)
- return;
-
- local_irq_save(flags);
- activate_vmi();
- local_irq_restore(flags & X86_EFLAGS_IF);
-}
-
-static int __init parse_vmi(char *arg)
-{
- if (!arg)
- return -EINVAL;
-
- if (!strcmp(arg, "disable_pge")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_PGE);
- disable_pge = 1;
- } else if (!strcmp(arg, "disable_pse")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_PSE);
- disable_pse = 1;
- } else if (!strcmp(arg, "disable_sep")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_SEP);
- disable_sep = 1;
- } else if (!strcmp(arg, "disable_tsc")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC);
- disable_tsc = 1;
- } else if (!strcmp(arg, "disable_mtrr")) {
- clear_cpu_cap(&boot_cpu_data, X86_FEATURE_MTRR);
- disable_mtrr = 1;
- } else if (!strcmp(arg, "disable_timer")) {
- disable_vmi_timer = 1;
- disable_noidle = 1;
- } else if (!strcmp(arg, "disable_noidle"))
- disable_noidle = 1;
- return 0;
-}
-
-early_param("vmi", parse_vmi);
+++ /dev/null
-/*
- * VMI paravirtual timer support routines.
- *
- * Copyright (C) 2007, VMware, Inc.
- *
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License as published by
- * the Free Software Foundation; either version 2 of the License, or
- * (at your option) any later version.
- *
- * This program is distributed in the hope that it will be useful, but
- * WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or
- * NON INFRINGEMENT. See the GNU General Public License for more
- * details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
- *
- */
-
-#include <linux/smp.h>
-#include <linux/interrupt.h>
-#include <linux/cpumask.h>
-#include <linux/clocksource.h>
-#include <linux/clockchips.h>
-
-#include <asm/vmi.h>
-#include <asm/vmi_time.h>
-#include <asm/apicdef.h>
-#include <asm/apic.h>
-#include <asm/timer.h>
-#include <asm/i8253.h>
-#include <asm/irq_vectors.h>
-
-#define VMI_ONESHOT (VMI_ALARM_IS_ONESHOT | VMI_CYCLES_REAL | vmi_get_alarm_wiring())
-#define VMI_PERIODIC (VMI_ALARM_IS_PERIODIC | VMI_CYCLES_REAL | vmi_get_alarm_wiring())
-
-static DEFINE_PER_CPU(struct clock_event_device, local_events);
-
-static inline u32 vmi_counter(u32 flags)
-{
- /* Given VMI_ONESHOT or VMI_PERIODIC, return the corresponding
- * cycle counter. */
- return flags & VMI_ALARM_COUNTER_MASK;
-}
-
-/* paravirt_ops.get_wallclock = vmi_get_wallclock */
-unsigned long vmi_get_wallclock(void)
-{
- unsigned long long wallclock;
- wallclock = vmi_timer_ops.get_wallclock(); // nsec
- (void)do_div(wallclock, 1000000000); // sec
-
- return wallclock;
-}
-
-/* paravirt_ops.set_wallclock = vmi_set_wallclock */
-int vmi_set_wallclock(unsigned long now)
-{
- return 0;
-}
-
-/* paravirt_ops.sched_clock = vmi_sched_clock */
-unsigned long long vmi_sched_clock(void)
-{
- return cycles_2_ns(vmi_timer_ops.get_cycle_counter(VMI_CYCLES_AVAILABLE));
-}
-
-/* x86_platform.calibrate_tsc = vmi_tsc_khz */
-unsigned long vmi_tsc_khz(void)
-{
- unsigned long long khz;
- khz = vmi_timer_ops.get_cycle_frequency();
- (void)do_div(khz, 1000);
- return khz;
-}
-
-static inline unsigned int vmi_get_timer_vector(void)
-{
- return IRQ0_VECTOR;
-}
-
-/** vmi clockchip */
-#ifdef CONFIG_X86_LOCAL_APIC
-static unsigned int startup_timer_irq(unsigned int irq)
-{
- unsigned long val = apic_read(APIC_LVTT);
- apic_write(APIC_LVTT, vmi_get_timer_vector());
-
- return (val & APIC_SEND_PENDING);
-}
-
-static void mask_timer_irq(unsigned int irq)
-{
- unsigned long val = apic_read(APIC_LVTT);
- apic_write(APIC_LVTT, val | APIC_LVT_MASKED);
-}
-
-static void unmask_timer_irq(unsigned int irq)
-{
- unsigned long val = apic_read(APIC_LVTT);
- apic_write(APIC_LVTT, val & ~APIC_LVT_MASKED);
-}
-
-static void ack_timer_irq(unsigned int irq)
-{
- ack_APIC_irq();
-}
-
-static struct irq_chip vmi_chip __read_mostly = {
- .name = "VMI-LOCAL",
- .startup = startup_timer_irq,
- .mask = mask_timer_irq,
- .unmask = unmask_timer_irq,
- .ack = ack_timer_irq
-};
-#endif
-
-/** vmi clockevent */
-#define VMI_ALARM_WIRED_IRQ0 0x00000000
-#define VMI_ALARM_WIRED_LVTT 0x00010000
-static int vmi_wiring = VMI_ALARM_WIRED_IRQ0;
-
-static inline int vmi_get_alarm_wiring(void)
-{
- return vmi_wiring;
-}
-
-static void vmi_timer_set_mode(enum clock_event_mode mode,
- struct clock_event_device *evt)
-{
- cycle_t now, cycles_per_hz;
- BUG_ON(!irqs_disabled());
-
- switch (mode) {
- case CLOCK_EVT_MODE_ONESHOT:
- case CLOCK_EVT_MODE_RESUME:
- break;
- case CLOCK_EVT_MODE_PERIODIC:
- cycles_per_hz = vmi_timer_ops.get_cycle_frequency();
- (void)do_div(cycles_per_hz, HZ);
- now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_PERIODIC));
- vmi_timer_ops.set_alarm(VMI_PERIODIC, now, cycles_per_hz);
- break;
- case CLOCK_EVT_MODE_UNUSED:
- case CLOCK_EVT_MODE_SHUTDOWN:
- switch (evt->mode) {
- case CLOCK_EVT_MODE_ONESHOT:
- vmi_timer_ops.cancel_alarm(VMI_ONESHOT);
- break;
- case CLOCK_EVT_MODE_PERIODIC:
- vmi_timer_ops.cancel_alarm(VMI_PERIODIC);
- break;
- default:
- break;
- }
- break;
- default:
- break;
- }
-}
-
-static int vmi_timer_next_event(unsigned long delta,
- struct clock_event_device *evt)
-{
- /* Unfortunately, set_next_event interface only passes relative
- * expiry, but we want absolute expiry. It'd be better if were
- * were passed an absolute expiry, since a bunch of time may
- * have been stolen between the time the delta is computed and
- * when we set the alarm below. */
- cycle_t now = vmi_timer_ops.get_cycle_counter(vmi_counter(VMI_ONESHOT));
-
- BUG_ON(evt->mode != CLOCK_EVT_MODE_ONESHOT);
- vmi_timer_ops.set_alarm(VMI_ONESHOT, now + delta, 0);
- return 0;
-}
-
-static struct clock_event_device vmi_clockevent = {
- .name = "vmi-timer",
- .features = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
- .shift = 22,
- .set_mode = vmi_timer_set_mode,
- .set_next_event = vmi_timer_next_event,
- .rating = 1000,
- .irq = 0,
-};
-
-static irqreturn_t vmi_timer_interrupt(int irq, void *dev_id)
-{
- struct clock_event_device *evt = &__get_cpu_var(local_events);
- evt->event_handler(evt);
- return IRQ_HANDLED;
-}
-
-static struct irqaction vmi_clock_action = {
- .name = "vmi-timer",
- .handler = vmi_timer_interrupt,
- .flags = IRQF_DISABLED | IRQF_NOBALANCING | IRQF_TIMER,
-};
-
-static void __devinit vmi_time_init_clockevent(void)
-{
- cycle_t cycles_per_msec;
- struct clock_event_device *evt;
-
- int cpu = smp_processor_id();
- evt = &__get_cpu_var(local_events);
-
- /* Use cycles_per_msec since div_sc params are 32-bits. */
- cycles_per_msec = vmi_timer_ops.get_cycle_frequency();
- (void)do_div(cycles_per_msec, 1000);
-
- memcpy(evt, &vmi_clockevent, sizeof(*evt));
- /* Must pick .shift such that .mult fits in 32-bits. Choosing
- * .shift to be 22 allows 2^(32-22) cycles per nano-seconds
- * before overflow. */
- evt->mult = div_sc(cycles_per_msec, NSEC_PER_MSEC, evt->shift);
- /* Upper bound is clockevent's use of ulong for cycle deltas. */
- evt->max_delta_ns = clockevent_delta2ns(ULONG_MAX, evt);
- evt->min_delta_ns = clockevent_delta2ns(1, evt);
- evt->cpumask = cpumask_of(cpu);
-
- printk(KERN_WARNING "vmi: registering clock event %s. mult=%u shift=%u\n",
- evt->name, evt->mult, evt->shift);
- clockevents_register_device(evt);
-}
-
-void __init vmi_time_init(void)
-{
- unsigned int cpu;
- /* Disable PIT: BIOSes start PIT CH0 with 18.2hz peridic. */
- outb_pit(0x3a, PIT_MODE); /* binary, mode 5, LSB/MSB, ch 0 */
-
- vmi_time_init_clockevent();
- setup_irq(0, &vmi_clock_action);
- for_each_possible_cpu(cpu)
- per_cpu(vector_irq, cpu)[vmi_get_timer_vector()] = 0;
-}
-
-#ifdef CONFIG_X86_LOCAL_APIC
-void __devinit vmi_time_bsp_init(void)
-{
- /*
- * On APIC systems, we want local timers to fire on each cpu. We do
- * this by programming LVTT to deliver timer events to the IRQ handler
- * for IRQ-0, since we can't re-use the APIC local timer handler
- * without interfering with that code.
- */
- clockevents_notify(CLOCK_EVT_NOTIFY_SUSPEND, NULL);
- local_irq_disable();
-#ifdef CONFIG_SMP
- /*
- * XXX handle_percpu_irq only defined for SMP; we need to switch over
- * to using it, since this is a local interrupt, which each CPU must
- * handle individually without locking out or dropping simultaneous
- * local timers on other CPUs. We also don't want to trigger the
- * quirk workaround code for interrupts which gets invoked from
- * handle_percpu_irq via eoi, so we use our own IRQ chip.
- */
- set_irq_chip_and_handler_name(0, &vmi_chip, handle_percpu_irq, "lvtt");
-#else
- set_irq_chip_and_handler_name(0, &vmi_chip, handle_edge_irq, "lvtt");
-#endif
- vmi_wiring = VMI_ALARM_WIRED_LVTT;
- apic_write(APIC_LVTT, vmi_get_timer_vector());
- local_irq_enable();
- clockevents_notify(CLOCK_EVT_NOTIFY_RESUME, NULL);
-}
-
-void __devinit vmi_time_ap_init(void)
-{
- vmi_time_init_clockevent();
- apic_write(APIC_LVTT, vmi_get_timer_vector());
-}
-#endif
-
-/** vmi clocksource */
-static struct clocksource clocksource_vmi;
-
-static cycle_t read_real_cycles(struct clocksource *cs)
-{
- cycle_t ret = (cycle_t)vmi_timer_ops.get_cycle_counter(VMI_CYCLES_REAL);
- return max(ret, clocksource_vmi.cycle_last);
-}
-
-static struct clocksource clocksource_vmi = {
- .name = "vmi-timer",
- .rating = 450,
- .read = read_real_cycles,
- .mask = CLOCKSOURCE_MASK(64),
- .mult = 0, /* to be set */
- .shift = 22,
- .flags = CLOCK_SOURCE_IS_CONTINUOUS,
-};
-
-static int __init init_vmi_clocksource(void)
-{
- cycle_t cycles_per_msec;
-
- if (!vmi_timer_ops.get_cycle_frequency)
- return 0;
- /* Use khz2mult rather than hz2mult since hz arg is only 32-bits. */
- cycles_per_msec = vmi_timer_ops.get_cycle_frequency();
- (void)do_div(cycles_per_msec, 1000);
-
- /* Note that clocksource.{mult, shift} converts in the opposite direction
- * as clockevents. */
- clocksource_vmi.mult = clocksource_khz2mult(cycles_per_msec,
- clocksource_vmi.shift);
-
- printk(KERN_WARNING "vmi: registering clock source khz=%lld\n", cycles_per_msec);
- return clocksource_register(&clocksource_vmi);
-
-}
-module_init(init_vmi_clocksource);
struct x86_emulate_ops *ops)
{
struct decode_cache *c = &ctxt->decode;
- u64 old = c->dst.orig_val;
+ u64 old = c->dst.orig_val64;
if (((u32) (old >> 0) != (u32) c->regs[VCPU_REGS_RAX]) ||
((u32) (old >> 32) != (u32) c->regs[VCPU_REGS_RDX])) {
-
c->regs[VCPU_REGS_RAX] = (u32) (old >> 0);
c->regs[VCPU_REGS_RDX] = (u32) (old >> 32);
ctxt->eflags &= ~EFLG_ZF;
} else {
- c->dst.val = ((u64)c->regs[VCPU_REGS_RCX] << 32) |
- (u32) c->regs[VCPU_REGS_RBX];
+ c->dst.val64 = ((u64)c->regs[VCPU_REGS_RCX] << 32) |
+ (u32) c->regs[VCPU_REGS_RBX];
ctxt->eflags |= EFLG_ZF;
}
c->src.valptr, c->src.bytes);
if (rc != X86EMUL_CONTINUE)
goto done;
- c->src.orig_val = c->src.val;
+ c->src.orig_val64 = c->src.val64;
}
if (c->src2.type == OP_MEM) {
if (!found)
found = s->kvm->bsp_vcpu;
+ if (!found)
+ return;
+
kvm_vcpu_kick(found);
}
}
u8 irr; /* interrupt request register */
u8 imr; /* interrupt mask register */
u8 isr; /* interrupt service register */
- u8 isr_ack; /* interrupt ack detection */
u8 priority_add; /* highest irq priority */
u8 irq_base;
u8 read_reg_select;
u8 init4; /* true if 4 byte init */
u8 elcr; /* PIIX edge/trigger selection */
u8 elcr_mask;
+ u8 isr_ack; /* interrupt ack detection */
struct kvm_pic *pics_state;
};
vcpu->arch.apic = apic;
- apic->regs_page = alloc_page(GFP_KERNEL);
+ apic->regs_page = alloc_page(GFP_KERNEL|__GFP_ZERO);
if (apic->regs_page == NULL) {
printk(KERN_ERR "malloc apic regs error for vcpu %x\n",
vcpu->vcpu_id);
goto nomem_free_apic;
}
apic->regs = page_address(apic->regs_page);
- memset(apic->regs, 0, PAGE_SIZE);
apic->vcpu = vcpu;
hrtimer_init(&apic->lapic_timer.timer, CLOCK_MONOTONIC,
control->iopm_base_pa = iopm_base;
control->msrpm_base_pa = __pa(svm->msrpm);
- control->tsc_offset = 0;
control->int_ctl = V_INTR_MASKING_MASK;
init_seg(&save->es);
svm->vmcb_pa = page_to_pfn(page) << PAGE_SHIFT;
svm->asid_generation = 0;
init_vmcb(svm);
+ svm->vmcb->control.tsc_offset = 0-native_read_tsc();
err = fx_init(&svm->vcpu);
if (err)
sync_lapic_to_cr8(vcpu);
save_host_msrs(vcpu);
- fs_selector = kvm_read_fs();
- gs_selector = kvm_read_gs();
+ savesegment(fs, fs_selector);
+ savesegment(gs, gs_selector);
ldt_selector = kvm_read_ldt();
svm->vmcb->save.cr2 = vcpu->arch.cr2;
/* required for live migration with NPT */
vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp;
vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip;
- kvm_load_fs(fs_selector);
- kvm_load_gs(gs_selector);
- kvm_load_ldt(ldt_selector);
load_host_msrs(vcpu);
+ loadsegment(fs, fs_selector);
+#ifdef CONFIG_X86_64
+ load_gs_index(gs_selector);
+ wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gs);
+#else
+ loadsegment(gs, gs_selector);
+#endif
+ kvm_load_ldt(ldt_selector);
reload_tss(vcpu);
*/
vmx->host_state.ldt_sel = kvm_read_ldt();
vmx->host_state.gs_ldt_reload_needed = vmx->host_state.ldt_sel;
- vmx->host_state.fs_sel = kvm_read_fs();
+ savesegment(fs, vmx->host_state.fs_sel);
if (!(vmx->host_state.fs_sel & 7)) {
vmcs_write16(HOST_FS_SELECTOR, vmx->host_state.fs_sel);
vmx->host_state.fs_reload_needed = 0;
vmcs_write16(HOST_FS_SELECTOR, 0);
vmx->host_state.fs_reload_needed = 1;
}
- vmx->host_state.gs_sel = kvm_read_gs();
+ savesegment(gs, vmx->host_state.gs_sel);
if (!(vmx->host_state.gs_sel & 7))
vmcs_write16(HOST_GS_SELECTOR, vmx->host_state.gs_sel);
else {
static void __vmx_load_host_state(struct vcpu_vmx *vmx)
{
- unsigned long flags;
-
if (!vmx->host_state.loaded)
return;
++vmx->vcpu.stat.host_state_reload;
vmx->host_state.loaded = 0;
if (vmx->host_state.fs_reload_needed)
- kvm_load_fs(vmx->host_state.fs_sel);
+ loadsegment(fs, vmx->host_state.fs_sel);
if (vmx->host_state.gs_ldt_reload_needed) {
kvm_load_ldt(vmx->host_state.ldt_sel);
- /*
- * If we have to reload gs, we must take care to
- * preserve our gs base.
- */
- local_irq_save(flags);
- kvm_load_gs(vmx->host_state.gs_sel);
#ifdef CONFIG_X86_64
- wrmsrl(MSR_GS_BASE, vmcs_readl(HOST_GS_BASE));
+ load_gs_index(vmx->host_state.gs_sel);
+ wrmsrl(MSR_KERNEL_GS_BASE, current->thread.gs);
+#else
+ loadsegment(gs, vmx->host_state.gs_sel);
#endif
- local_irq_restore(flags);
}
reload_tss();
#ifdef CONFIG_X86_64
vmcs_write16(HOST_CS_SELECTOR, __KERNEL_CS); /* 22.2.4 */
vmcs_write16(HOST_DS_SELECTOR, __KERNEL_DS); /* 22.2.4 */
vmcs_write16(HOST_ES_SELECTOR, __KERNEL_DS); /* 22.2.4 */
- vmcs_write16(HOST_FS_SELECTOR, kvm_read_fs()); /* 22.2.4 */
- vmcs_write16(HOST_GS_SELECTOR, kvm_read_gs()); /* 22.2.4 */
+ vmcs_write16(HOST_FS_SELECTOR, 0); /* 22.2.4 */
+ vmcs_write16(HOST_GS_SELECTOR, 0); /* 22.2.4 */
vmcs_write16(HOST_SS_SELECTOR, __KERNEL_DS); /* 22.2.4 */
#ifdef CONFIG_X86_64
rdmsrl(MSR_FS_BASE, a);
0 /* Reserved */ | F(CX16) | 0 /* xTPR Update, PDCM */ |
0 /* Reserved, DCA */ | F(XMM4_1) |
F(XMM4_2) | F(X2APIC) | F(MOVBE) | F(POPCNT) |
- 0 /* Reserved, AES */ | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX);
+ 0 /* Reserved*/ | F(AES) | F(XSAVE) | 0 /* OSXSAVE */ | F(AVX) |
+ F(F16C);
/* cpuid 0x80000001.ecx */
const u32 kvm_supported_word6_x86_features =
F(LAHF_LM) | F(CMP_LEGACY) | F(SVM) | 0 /* ExtApicSpace */ |
F(CR8_LEGACY) | F(ABM) | F(SSE4A) | F(MISALIGNSSE) |
- F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(SSE5) |
- 0 /* SKINIT */ | 0 /* WDT */;
+ F(3DNOWPREFETCH) | 0 /* OSVW */ | 0 /* IBS */ | F(XOP) |
+ 0 /* SKINIT, WDT, LWP */ | F(FMA4) | F(TBM);
/* all calls to cpuid_count() should be made on the same cpu */
get_cpu();
}
/*
- * For a single GDT entry which changes, we do the lazy thing: alter our GDT,
- * then tell the Host to reload the entire thing. This operation is so rare
- * that this naive implementation is reasonable.
+ * For a single GDT entry which changes, we simply change our copy and
+ * then tell the host about it.
*/
static void lguest_write_gdt_entry(struct desc_struct *dt, int entrynum,
const void *desc, int type)
}
/*
- * OK, I lied. There are three "thread local storage" GDT entries which change
+ * There are three "thread local storage" GDT entries which change
* on every context switch (these three entries are how glibc implements
- * __thread variables). So we have a hypercall specifically for this case.
+ * __thread variables). As an optimization, we have a hypercall
+ * specifically for this case.
+ *
+ * Wouldn't it be nicer to have a general LOAD_GDT_ENTRIES hypercall
+ * which took a range of entries?
*/
static void lguest_load_tls(struct thread_struct *t, unsigned int cpu)
{
void *memmove(void *dest, const void *src, size_t n)
{
- int d0, d1, d2;
-
- if (dest < src) {
- memcpy(dest, src, n);
- } else {
- __asm__ __volatile__(
- "std\n\t"
- "rep\n\t"
- "movsb\n\t"
- "cld"
- : "=&c" (d0), "=&S" (d1), "=&D" (d2)
- :"0" (n),
- "1" (n-1+src),
- "2" (n-1+dest)
- :"memory");
- }
- return dest;
+ int d0,d1,d2,d3,d4,d5;
+ char *ret = dest;
+
+ __asm__ __volatile__(
+ /* Handle more 16bytes in loop */
+ "cmp $0x10, %0\n\t"
+ "jb 1f\n\t"
+
+ /* Decide forward/backward copy mode */
+ "cmp %2, %1\n\t"
+ "jb 2f\n\t"
+
+ /*
+ * movs instruction have many startup latency
+ * so we handle small size by general register.
+ */
+ "cmp $680, %0\n\t"
+ "jb 3f\n\t"
+ /*
+ * movs instruction is only good for aligned case.
+ */
+ "mov %1, %3\n\t"
+ "xor %2, %3\n\t"
+ "and $0xff, %3\n\t"
+ "jz 4f\n\t"
+ "3:\n\t"
+ "sub $0x10, %0\n\t"
+
+ /*
+ * We gobble 16byts forward in each loop.
+ */
+ "3:\n\t"
+ "sub $0x10, %0\n\t"
+ "mov 0*4(%1), %3\n\t"
+ "mov 1*4(%1), %4\n\t"
+ "mov %3, 0*4(%2)\n\t"
+ "mov %4, 1*4(%2)\n\t"
+ "mov 2*4(%1), %3\n\t"
+ "mov 3*4(%1), %4\n\t"
+ "mov %3, 2*4(%2)\n\t"
+ "mov %4, 3*4(%2)\n\t"
+ "lea 0x10(%1), %1\n\t"
+ "lea 0x10(%2), %2\n\t"
+ "jae 3b\n\t"
+ "add $0x10, %0\n\t"
+ "jmp 1f\n\t"
+
+ /*
+ * Handle data forward by movs.
+ */
+ ".p2align 4\n\t"
+ "4:\n\t"
+ "mov -4(%1, %0), %3\n\t"
+ "lea -4(%2, %0), %4\n\t"
+ "shr $2, %0\n\t"
+ "rep movsl\n\t"
+ "mov %3, (%4)\n\t"
+ "jmp 11f\n\t"
+ /*
+ * Handle data backward by movs.
+ */
+ ".p2align 4\n\t"
+ "6:\n\t"
+ "mov (%1), %3\n\t"
+ "mov %2, %4\n\t"
+ "lea -4(%1, %0), %1\n\t"
+ "lea -4(%2, %0), %2\n\t"
+ "shr $2, %0\n\t"
+ "std\n\t"
+ "rep movsl\n\t"
+ "mov %3,(%4)\n\t"
+ "cld\n\t"
+ "jmp 11f\n\t"
+
+ /*
+ * Start to prepare for backward copy.
+ */
+ ".p2align 4\n\t"
+ "2:\n\t"
+ "cmp $680, %0\n\t"
+ "jb 5f\n\t"
+ "mov %1, %3\n\t"
+ "xor %2, %3\n\t"
+ "and $0xff, %3\n\t"
+ "jz 6b\n\t"
+
+ /*
+ * Calculate copy position to tail.
+ */
+ "5:\n\t"
+ "add %0, %1\n\t"
+ "add %0, %2\n\t"
+ "sub $0x10, %0\n\t"
+
+ /*
+ * We gobble 16byts backward in each loop.
+ */
+ "7:\n\t"
+ "sub $0x10, %0\n\t"
+
+ "mov -1*4(%1), %3\n\t"
+ "mov -2*4(%1), %4\n\t"
+ "mov %3, -1*4(%2)\n\t"
+ "mov %4, -2*4(%2)\n\t"
+ "mov -3*4(%1), %3\n\t"
+ "mov -4*4(%1), %4\n\t"
+ "mov %3, -3*4(%2)\n\t"
+ "mov %4, -4*4(%2)\n\t"
+ "lea -0x10(%1), %1\n\t"
+ "lea -0x10(%2), %2\n\t"
+ "jae 7b\n\t"
+ /*
+ * Calculate copy position to head.
+ */
+ "add $0x10, %0\n\t"
+ "sub %0, %1\n\t"
+ "sub %0, %2\n\t"
+
+ /*
+ * Move data from 8 bytes to 15 bytes.
+ */
+ ".p2align 4\n\t"
+ "1:\n\t"
+ "cmp $8, %0\n\t"
+ "jb 8f\n\t"
+ "mov 0*4(%1), %3\n\t"
+ "mov 1*4(%1), %4\n\t"
+ "mov -2*4(%1, %0), %5\n\t"
+ "mov -1*4(%1, %0), %1\n\t"
+
+ "mov %3, 0*4(%2)\n\t"
+ "mov %4, 1*4(%2)\n\t"
+ "mov %5, -2*4(%2, %0)\n\t"
+ "mov %1, -1*4(%2, %0)\n\t"
+ "jmp 11f\n\t"
+
+ /*
+ * Move data from 4 bytes to 7 bytes.
+ */
+ ".p2align 4\n\t"
+ "8:\n\t"
+ "cmp $4, %0\n\t"
+ "jb 9f\n\t"
+ "mov 0*4(%1), %3\n\t"
+ "mov -1*4(%1, %0), %4\n\t"
+ "mov %3, 0*4(%2)\n\t"
+ "mov %4, -1*4(%2, %0)\n\t"
+ "jmp 11f\n\t"
+
+ /*
+ * Move data from 2 bytes to 3 bytes.
+ */
+ ".p2align 4\n\t"
+ "9:\n\t"
+ "cmp $2, %0\n\t"
+ "jb 10f\n\t"
+ "movw 0*2(%1), %%dx\n\t"
+ "movw -1*2(%1, %0), %%bx\n\t"
+ "movw %%dx, 0*2(%2)\n\t"
+ "movw %%bx, -1*2(%2, %0)\n\t"
+ "jmp 11f\n\t"
+
+ /*
+ * Move data for 1 byte.
+ */
+ ".p2align 4\n\t"
+ "10:\n\t"
+ "cmp $1, %0\n\t"
+ "jb 11f\n\t"
+ "movb (%1), %%cl\n\t"
+ "movb %%cl, (%2)\n\t"
+ ".p2align 4\n\t"
+ "11:"
+ : "=&c" (d0), "=&S" (d1), "=&D" (d2),
+ "=r" (d3),"=r" (d4), "=r"(d5)
+ :"0" (n),
+ "1" (src),
+ "2" (dest)
+ :"memory");
+
+ return ret;
+
}
EXPORT_SYMBOL(memmove);
ENTRY(__memcpy)
ENTRY(memcpy)
CFI_STARTPROC
+ movq %rdi, %rax
/*
- * Put the number of full 64-byte blocks into %ecx.
- * Tail portion is handled at the end:
+ * Use 32bit CMP here to avoid long NOP padding.
*/
- movq %rdi, %rax
- movl %edx, %ecx
- shrl $6, %ecx
- jz .Lhandle_tail
+ cmp $0x20, %edx
+ jb .Lhandle_tail
- .p2align 4
-.Lloop_64:
/*
- * We decrement the loop index here - and the zero-flag is
- * checked at the end of the loop (instructions inbetween do
- * not change the zero flag):
+ * We check whether memory false dependece could occur,
+ * then jump to corresponding copy mode.
*/
- decl %ecx
+ cmp %dil, %sil
+ jl .Lcopy_backward
+ subl $0x20, %edx
+.Lcopy_forward_loop:
+ subq $0x20, %rdx
/*
- * Move in blocks of 4x16 bytes:
+ * Move in blocks of 4x8 bytes:
*/
- movq 0*8(%rsi), %r11
- movq 1*8(%rsi), %r8
- movq %r11, 0*8(%rdi)
- movq %r8, 1*8(%rdi)
-
- movq 2*8(%rsi), %r9
- movq 3*8(%rsi), %r10
- movq %r9, 2*8(%rdi)
- movq %r10, 3*8(%rdi)
-
- movq 4*8(%rsi), %r11
- movq 5*8(%rsi), %r8
- movq %r11, 4*8(%rdi)
- movq %r8, 5*8(%rdi)
-
- movq 6*8(%rsi), %r9
- movq 7*8(%rsi), %r10
- movq %r9, 6*8(%rdi)
- movq %r10, 7*8(%rdi)
-
- leaq 64(%rsi), %rsi
- leaq 64(%rdi), %rdi
-
- jnz .Lloop_64
+ movq 0*8(%rsi), %r8
+ movq 1*8(%rsi), %r9
+ movq 2*8(%rsi), %r10
+ movq 3*8(%rsi), %r11
+ leaq 4*8(%rsi), %rsi
+
+ movq %r8, 0*8(%rdi)
+ movq %r9, 1*8(%rdi)
+ movq %r10, 2*8(%rdi)
+ movq %r11, 3*8(%rdi)
+ leaq 4*8(%rdi), %rdi
+ jae .Lcopy_forward_loop
+ addq $0x20, %rdx
+ jmp .Lhandle_tail
+
+.Lcopy_backward:
+ /*
+ * Calculate copy position to tail.
+ */
+ addq %rdx, %rsi
+ addq %rdx, %rdi
+ subq $0x20, %rdx
+ /*
+ * At most 3 ALU operations in one cycle,
+ * so append NOPS in the same 16bytes trunk.
+ */
+ .p2align 4
+.Lcopy_backward_loop:
+ subq $0x20, %rdx
+ movq -1*8(%rsi), %r8
+ movq -2*8(%rsi), %r9
+ movq -3*8(%rsi), %r10
+ movq -4*8(%rsi), %r11
+ leaq -4*8(%rsi), %rsi
+ movq %r8, -1*8(%rdi)
+ movq %r9, -2*8(%rdi)
+ movq %r10, -3*8(%rdi)
+ movq %r11, -4*8(%rdi)
+ leaq -4*8(%rdi), %rdi
+ jae .Lcopy_backward_loop
+ /*
+ * Calculate copy position to head.
+ */
+ addq $0x20, %rdx
+ subq %rdx, %rsi
+ subq %rdx, %rdi
.Lhandle_tail:
- movl %edx, %ecx
- andl $63, %ecx
- shrl $3, %ecx
- jz .Lhandle_7
+ cmpq $16, %rdx
+ jb .Lless_16bytes
+ /*
+ * Move data from 16 bytes to 31 bytes.
+ */
+ movq 0*8(%rsi), %r8
+ movq 1*8(%rsi), %r9
+ movq -2*8(%rsi, %rdx), %r10
+ movq -1*8(%rsi, %rdx), %r11
+ movq %r8, 0*8(%rdi)
+ movq %r9, 1*8(%rdi)
+ movq %r10, -2*8(%rdi, %rdx)
+ movq %r11, -1*8(%rdi, %rdx)
+ retq
.p2align 4
-.Lloop_8:
- decl %ecx
- movq (%rsi), %r8
- movq %r8, (%rdi)
- leaq 8(%rdi), %rdi
- leaq 8(%rsi), %rsi
- jnz .Lloop_8
-
-.Lhandle_7:
- movl %edx, %ecx
- andl $7, %ecx
- jz .Lend
+.Lless_16bytes:
+ cmpq $8, %rdx
+ jb .Lless_8bytes
+ /*
+ * Move data from 8 bytes to 15 bytes.
+ */
+ movq 0*8(%rsi), %r8
+ movq -1*8(%rsi, %rdx), %r9
+ movq %r8, 0*8(%rdi)
+ movq %r9, -1*8(%rdi, %rdx)
+ retq
+ .p2align 4
+.Lless_8bytes:
+ cmpq $4, %rdx
+ jb .Lless_3bytes
+ /*
+ * Move data from 4 bytes to 7 bytes.
+ */
+ movl (%rsi), %ecx
+ movl -4(%rsi, %rdx), %r8d
+ movl %ecx, (%rdi)
+ movl %r8d, -4(%rdi, %rdx)
+ retq
.p2align 4
+.Lless_3bytes:
+ cmpl $0, %edx
+ je .Lend
+ /*
+ * Move data from 1 bytes to 3 bytes.
+ */
.Lloop_1:
movb (%rsi), %r8b
movb %r8b, (%rdi)
incq %rdi
incq %rsi
- decl %ecx
+ decl %edx
jnz .Lloop_1
.Lend:
- ret
+ retq
CFI_ENDPROC
ENDPROC(memcpy)
ENDPROC(__memcpy)
#undef memmove
void *memmove(void *dest, const void *src, size_t count)
{
- if (dest < src) {
- return memcpy(dest, src, count);
- } else {
- char *p = dest + count;
- const char *s = src + count;
- while (count--)
- *--p = *--s;
- }
- return dest;
+ unsigned long d0,d1,d2,d3,d4,d5,d6,d7;
+ char *ret;
+
+ __asm__ __volatile__(
+ /* Handle more 32bytes in loop */
+ "mov %2, %3\n\t"
+ "cmp $0x20, %0\n\t"
+ "jb 1f\n\t"
+
+ /* Decide forward/backward copy mode */
+ "cmp %2, %1\n\t"
+ "jb 2f\n\t"
+
+ /*
+ * movsq instruction have many startup latency
+ * so we handle small size by general register.
+ */
+ "cmp $680, %0\n\t"
+ "jb 3f\n\t"
+ /*
+ * movsq instruction is only good for aligned case.
+ */
+ "cmpb %%dil, %%sil\n\t"
+ "je 4f\n\t"
+ "3:\n\t"
+ "sub $0x20, %0\n\t"
+ /*
+ * We gobble 32byts forward in each loop.
+ */
+ "5:\n\t"
+ "sub $0x20, %0\n\t"
+ "movq 0*8(%1), %4\n\t"
+ "movq 1*8(%1), %5\n\t"
+ "movq 2*8(%1), %6\n\t"
+ "movq 3*8(%1), %7\n\t"
+ "leaq 4*8(%1), %1\n\t"
+
+ "movq %4, 0*8(%2)\n\t"
+ "movq %5, 1*8(%2)\n\t"
+ "movq %6, 2*8(%2)\n\t"
+ "movq %7, 3*8(%2)\n\t"
+ "leaq 4*8(%2), %2\n\t"
+ "jae 5b\n\t"
+ "addq $0x20, %0\n\t"
+ "jmp 1f\n\t"
+ /*
+ * Handle data forward by movsq.
+ */
+ ".p2align 4\n\t"
+ "4:\n\t"
+ "movq %0, %8\n\t"
+ "movq -8(%1, %0), %4\n\t"
+ "lea -8(%2, %0), %5\n\t"
+ "shrq $3, %8\n\t"
+ "rep movsq\n\t"
+ "movq %4, (%5)\n\t"
+ "jmp 13f\n\t"
+ /*
+ * Handle data backward by movsq.
+ */
+ ".p2align 4\n\t"
+ "7:\n\t"
+ "movq %0, %8\n\t"
+ "movq (%1), %4\n\t"
+ "movq %2, %5\n\t"
+ "leaq -8(%1, %0), %1\n\t"
+ "leaq -8(%2, %0), %2\n\t"
+ "shrq $3, %8\n\t"
+ "std\n\t"
+ "rep movsq\n\t"
+ "cld\n\t"
+ "movq %4, (%5)\n\t"
+ "jmp 13f\n\t"
+
+ /*
+ * Start to prepare for backward copy.
+ */
+ ".p2align 4\n\t"
+ "2:\n\t"
+ "cmp $680, %0\n\t"
+ "jb 6f \n\t"
+ "cmp %%dil, %%sil\n\t"
+ "je 7b \n\t"
+ "6:\n\t"
+ /*
+ * Calculate copy position to tail.
+ */
+ "addq %0, %1\n\t"
+ "addq %0, %2\n\t"
+ "subq $0x20, %0\n\t"
+ /*
+ * We gobble 32byts backward in each loop.
+ */
+ "8:\n\t"
+ "subq $0x20, %0\n\t"
+ "movq -1*8(%1), %4\n\t"
+ "movq -2*8(%1), %5\n\t"
+ "movq -3*8(%1), %6\n\t"
+ "movq -4*8(%1), %7\n\t"
+ "leaq -4*8(%1), %1\n\t"
+
+ "movq %4, -1*8(%2)\n\t"
+ "movq %5, -2*8(%2)\n\t"
+ "movq %6, -3*8(%2)\n\t"
+ "movq %7, -4*8(%2)\n\t"
+ "leaq -4*8(%2), %2\n\t"
+ "jae 8b\n\t"
+ /*
+ * Calculate copy position to head.
+ */
+ "addq $0x20, %0\n\t"
+ "subq %0, %1\n\t"
+ "subq %0, %2\n\t"
+ "1:\n\t"
+ "cmpq $16, %0\n\t"
+ "jb 9f\n\t"
+ /*
+ * Move data from 16 bytes to 31 bytes.
+ */
+ "movq 0*8(%1), %4\n\t"
+ "movq 1*8(%1), %5\n\t"
+ "movq -2*8(%1, %0), %6\n\t"
+ "movq -1*8(%1, %0), %7\n\t"
+ "movq %4, 0*8(%2)\n\t"
+ "movq %5, 1*8(%2)\n\t"
+ "movq %6, -2*8(%2, %0)\n\t"
+ "movq %7, -1*8(%2, %0)\n\t"
+ "jmp 13f\n\t"
+ ".p2align 4\n\t"
+ "9:\n\t"
+ "cmpq $8, %0\n\t"
+ "jb 10f\n\t"
+ /*
+ * Move data from 8 bytes to 15 bytes.
+ */
+ "movq 0*8(%1), %4\n\t"
+ "movq -1*8(%1, %0), %5\n\t"
+ "movq %4, 0*8(%2)\n\t"
+ "movq %5, -1*8(%2, %0)\n\t"
+ "jmp 13f\n\t"
+ "10:\n\t"
+ "cmpq $4, %0\n\t"
+ "jb 11f\n\t"
+ /*
+ * Move data from 4 bytes to 7 bytes.
+ */
+ "movl (%1), %4d\n\t"
+ "movl -4(%1, %0), %5d\n\t"
+ "movl %4d, (%2)\n\t"
+ "movl %5d, -4(%2, %0)\n\t"
+ "jmp 13f\n\t"
+ "11:\n\t"
+ "cmp $2, %0\n\t"
+ "jb 12f\n\t"
+ /*
+ * Move data from 2 bytes to 3 bytes.
+ */
+ "movw (%1), %4w\n\t"
+ "movw -2(%1, %0), %5w\n\t"
+ "movw %4w, (%2)\n\t"
+ "movw %5w, -2(%2, %0)\n\t"
+ "jmp 13f\n\t"
+ "12:\n\t"
+ "cmp $1, %0\n\t"
+ "jb 13f\n\t"
+ /*
+ * Move data for 1 byte.
+ */
+ "movb (%1), %4b\n\t"
+ "movb %4b, (%2)\n\t"
+ "13:\n\t"
+ : "=&d" (d0), "=&S" (d1), "=&D" (d2), "=&a" (ret) ,
+ "=r"(d3), "=r"(d4), "=r"(d5), "=r"(d6), "=&c" (d7)
+ :"0" (count),
+ "1" (src),
+ "2" (dest)
+ :"memory");
+
+ return ret;
+
}
EXPORT_SYMBOL(memmove);
spin_lock_irqsave(&pgd_lock, flags);
list_for_each_entry(page, &pgd_list, lru) {
- if (!vmalloc_sync_one(page_address(page), address))
+ spinlock_t *pgt_lock;
+ pmd_t *ret;
+
+ pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
+
+ spin_lock(pgt_lock);
+ ret = vmalloc_sync_one(page_address(page), address);
+ spin_unlock(pgt_lock);
+
+ if (!ret)
break;
}
spin_unlock_irqrestore(&pgd_lock, flags);
if (!(address >= VMALLOC_START && address < VMALLOC_END))
return -1;
+ WARN_ON_ONCE(in_nmi());
+
/*
* Synchronize this task's top level page-table
* with the 'reference' page table.
void vmalloc_sync_all(void)
{
- unsigned long address;
-
- for (address = VMALLOC_START & PGDIR_MASK; address <= VMALLOC_END;
- address += PGDIR_SIZE) {
-
- const pgd_t *pgd_ref = pgd_offset_k(address);
- unsigned long flags;
- struct page *page;
-
- if (pgd_none(*pgd_ref))
- continue;
-
- spin_lock_irqsave(&pgd_lock, flags);
- list_for_each_entry(page, &pgd_list, lru) {
- pgd_t *pgd;
- pgd = (pgd_t *)page_address(page) + pgd_index(address);
- if (pgd_none(*pgd))
- set_pgd(pgd, *pgd_ref);
- else
- BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
- }
- spin_unlock_irqrestore(&pgd_lock, flags);
- }
+ sync_global_pgds(VMALLOC_START & PGDIR_MASK, VMALLOC_END);
}
/*
if (!(address >= VMALLOC_START && address < VMALLOC_END))
return -1;
+ WARN_ON_ONCE(in_nmi());
+
/*
* Copy kernel mappings over when needed. This can also
* happen within a race in page table update. In the later
if (pmd_large(*pmd))
return spurious_fault_check(error_code, (pte_t *) pmd);
+ /*
+ * Note: don't use pte_present() here, since it returns true
+ * if the _PAGE_PROTNONE bit is set. However, this aliases the
+ * _PAGE_GLOBAL bit, which for kernel pages give false positives
+ * when CONFIG_DEBUG_PAGEALLOC is used.
+ */
pte = pte_offset_kernel(pmd, address);
- if (!pte_present(*pte))
+ if (!(pte_flags(*pte) & _PAGE_PRESENT))
return 0;
ret = spurious_fault_check(error_code, pte);
panic("alloc_low_page: ran out of memory");
adr = __va(pfn * PAGE_SIZE);
- memset(adr, 0, PAGE_SIZE);
+ clear_page(adr);
return adr;
}
static inline void save_pg_dir(void)
{
- memcpy(swsusp_pg_dir, swapper_pg_dir, PAGE_SIZE);
+ copy_page(swsusp_pg_dir, swapper_pg_dir);
}
#else /* !CONFIG_ACPI_SLEEP */
static inline void save_pg_dir(void)
__setup("noexec32=", nonx32_setup);
/*
+ * When memory was added/removed make sure all the processes MM have
+ * suitable PGD entries in the local PGD level page.
+ */
+void sync_global_pgds(unsigned long start, unsigned long end)
+{
+ unsigned long address;
+
+ for (address = start; address <= end; address += PGDIR_SIZE) {
+ const pgd_t *pgd_ref = pgd_offset_k(address);
+ unsigned long flags;
+ struct page *page;
+
+ if (pgd_none(*pgd_ref))
+ continue;
+
+ spin_lock_irqsave(&pgd_lock, flags);
+ list_for_each_entry(page, &pgd_list, lru) {
+ pgd_t *pgd;
+ spinlock_t *pgt_lock;
+
+ pgd = (pgd_t *)page_address(page) + pgd_index(address);
+ pgt_lock = &pgd_page_get_mm(page)->page_table_lock;
+ spin_lock(pgt_lock);
+
+ if (pgd_none(*pgd))
+ set_pgd(pgd, *pgd_ref);
+ else
+ BUG_ON(pgd_page_vaddr(*pgd)
+ != pgd_page_vaddr(*pgd_ref));
+
+ spin_unlock(pgt_lock);
+ }
+ spin_unlock_irqrestore(&pgd_lock, flags);
+ }
+}
+
+/*
* NOTE: This function is marked __ref because it calls __init function
* (alloc_bootmem_pages). It's safe to do it ONLY when after_bootmem == 0.
*/
panic("alloc_low_page: ran out of memory");
adr = early_memremap(pfn * PAGE_SIZE, PAGE_SIZE);
- memset(adr, 0, PAGE_SIZE);
+ clear_page(adr);
*phys = pfn * PAGE_SIZE;
return adr;
}
unsigned long end,
unsigned long page_size_mask)
{
-
+ bool pgd_changed = false;
unsigned long next, last_map_addr = end;
+ unsigned long addr;
start = (unsigned long)__va(start);
end = (unsigned long)__va(end);
+ addr = start;
for (; start < end; start = next) {
pgd_t *pgd = pgd_offset_k(start);
spin_lock(&init_mm.page_table_lock);
pgd_populate(&init_mm, pgd, __va(pud_phys));
spin_unlock(&init_mm.page_table_lock);
+ pgd_changed = true;
}
+
+ if (pgd_changed)
+ sync_global_pgds(addr, end);
+
__flush_tlb_all();
return last_map_addr;
}
}
+ sync_global_pgds((unsigned long)start_page, end);
return 0;
}
/*
* Map 'pfn' using fixed map 'type' and protections 'prot'
*/
-void *
+void __iomem *
iomap_atomic_prot_pfn(unsigned long pfn, enum km_type type, pgprot_t prot)
{
/*
if (!pat_enabled && pgprot_val(prot) == pgprot_val(PAGE_KERNEL_WC))
prot = PAGE_KERNEL_UC_MINUS;
- return kmap_atomic_prot_pfn(pfn, type, prot);
+ return (void __force __iomem *) kmap_atomic_prot_pfn(pfn, type, prot);
}
EXPORT_SYMBOL_GPL(iomap_atomic_prot_pfn);
void
-iounmap_atomic(void *kvaddr, enum km_type type)
+iounmap_atomic(void __iomem *kvaddr, enum km_type type)
{
unsigned long vaddr = (unsigned long) kvaddr & PAGE_MASK;
enum fixed_addresses idx = type + KM_TYPE_NR*smp_processor_id();
#include <asm/numa.h>
#include <asm/mpspec.h>
#include <asm/apic.h>
-#include <asm/k8.h>
+#include <asm/amd_nb.h>
static struct bootnode __initdata nodes[8];
static nodemask_t __initdata nodes_parsed = NODE_MASK_NONE;
static __init void early_get_boot_cpu_id(void)
{
/*
- * need to get boot_cpu_id so can use that to create apicid_to_node
- * in k8_scan_nodes()
+ * need to get the APIC ID of the BSP so can use that to
+ * create apicid_to_node in k8_scan_nodes()
*/
#ifdef CONFIG_X86_MPPARSE
/*
bits = boot_cpu_data.x86_coreid_bits;
cores = (1<<bits);
apicid_base = 0;
- /* need to get boot_cpu_id early for system with apicid lifting */
+ /* get the APIC ID of the BSP early for systems with apicid lifting */
early_get_boot_cpu_id();
if (boot_cpu_physical_apicid > 0) {
pr_info("BSP APIC ID: %02x\n", boot_cpu_physical_apicid);
if (!pte)
return false;
+ WARN_ON_ONCE(in_nmi());
+
if (error_code & 2)
kmemcheck_access(regs, address, KMEMCHECK_WRITE);
else
b == 0xf0 || b == 0xf2 || b == 0xf3
/* Group 2 */
|| b == 0x2e || b == 0x36 || b == 0x3e || b == 0x26
- || b == 0x64 || b == 0x65 || b == 0x2e || b == 0x3e
+ || b == 0x64 || b == 0x65
/* Group 3 */
|| b == 0x66
/* Group 4 */
#include <asm/dma.h>
#include <asm/numa.h>
#include <asm/acpi.h>
-#include <asm/k8.h>
+#include <asm/amd_nb.h>
struct pglist_data *node_data[MAX_NUMNODES] __read_mostly;
EXPORT_SYMBOL(node_data);
#define UNSHARED_PTRS_PER_PGD \
(SHARED_KERNEL_PMD ? KERNEL_PGD_BOUNDARY : PTRS_PER_PGD)
-static void pgd_ctor(pgd_t *pgd)
+
+static void pgd_set_mm(pgd_t *pgd, struct mm_struct *mm)
+{
+ BUILD_BUG_ON(sizeof(virt_to_page(pgd)->index) < sizeof(mm));
+ virt_to_page(pgd)->index = (pgoff_t)mm;
+}
+
+struct mm_struct *pgd_page_get_mm(struct page *page)
+{
+ return (struct mm_struct *)page->index;
+}
+
+static void pgd_ctor(struct mm_struct *mm, pgd_t *pgd)
{
/* If the pgd points to a shared pagetable level (either the
ptes in non-PAE, or shared PMD in PAE), then just copy the
clone_pgd_range(pgd + KERNEL_PGD_BOUNDARY,
swapper_pg_dir + KERNEL_PGD_BOUNDARY,
KERNEL_PGD_PTRS);
- paravirt_alloc_pmd_clone(__pa(pgd) >> PAGE_SHIFT,
- __pa(swapper_pg_dir) >> PAGE_SHIFT,
- KERNEL_PGD_BOUNDARY,
- KERNEL_PGD_PTRS);
}
/* list required to sync kernel mapping updates */
- if (!SHARED_KERNEL_PMD)
+ if (!SHARED_KERNEL_PMD) {
+ pgd_set_mm(pgd, mm);
pgd_list_add(pgd);
+ }
}
static void pgd_dtor(pgd_t *pgd)
*/
spin_lock_irqsave(&pgd_lock, flags);
- pgd_ctor(pgd);
+ pgd_ctor(mm, pgd);
pgd_prepopulate_pmd(mm, pgd, pmds);
spin_unlock_irqrestore(&pgd_lock, flags);
return -1;
}
- for_each_node_mask(i, nodes_parsed)
- e820_register_active_regions(i, nodes[i].start >> PAGE_SHIFT,
- nodes[i].end >> PAGE_SHIFT);
+ for (i = 0; i < num_node_memblks; i++)
+ e820_register_active_regions(memblk_nodeid[i],
+ node_memblk_range[i].start >> PAGE_SHIFT,
+ node_memblk_range[i].end >> PAGE_SHIFT);
+
/* for out of order entries in SRAT */
sort_node_map();
if (!nodes_cover_memory(nodes)) {
#include <linux/smp.h>
#include <linux/interrupt.h>
#include <linux/module.h>
+#include <linux/cpu.h>
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
want false sharing in the per cpu data segment. */
static union smp_flush_state flush_state[NUM_INVALIDATE_TLB_VECTORS];
+static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
+
/*
* We cannot call mmdrop() because we are in interrupt context,
* instead update mm->cpu_vm_mask.
union smp_flush_state *f;
/* Caller has disabled preemption */
- sender = smp_processor_id() % NUM_INVALIDATE_TLB_VECTORS;
+ sender = this_cpu_read(tlb_vector_offset);
f = &flush_state[sender];
/*
flush_tlb_others_ipi(cpumask, mm, va);
}
+static void __cpuinit calculate_tlb_offset(void)
+{
+ int cpu, node, nr_node_vecs;
+ /*
+ * we are changing tlb_vector_offset for each CPU in runtime, but this
+ * will not cause inconsistency, as the write is atomic under X86. we
+ * might see more lock contentions in a short time, but after all CPU's
+ * tlb_vector_offset are changed, everything should go normal
+ *
+ * Note: if NUM_INVALIDATE_TLB_VECTORS % nr_online_nodes !=0, we might
+ * waste some vectors.
+ **/
+ if (nr_online_nodes > NUM_INVALIDATE_TLB_VECTORS)
+ nr_node_vecs = 1;
+ else
+ nr_node_vecs = NUM_INVALIDATE_TLB_VECTORS/nr_online_nodes;
+
+ for_each_online_node(node) {
+ int node_offset = (node % NUM_INVALIDATE_TLB_VECTORS) *
+ nr_node_vecs;
+ int cpu_offset = 0;
+ for_each_cpu(cpu, cpumask_of_node(node)) {
+ per_cpu(tlb_vector_offset, cpu) = node_offset +
+ cpu_offset;
+ cpu_offset++;
+ cpu_offset = cpu_offset % nr_node_vecs;
+ }
+ }
+}
+
+static int tlb_cpuhp_notify(struct notifier_block *n,
+ unsigned long action, void *hcpu)
+{
+ switch (action & 0xf) {
+ case CPU_ONLINE:
+ case CPU_DEAD:
+ calculate_tlb_offset();
+ }
+ return NOTIFY_OK;
+}
+
static int __cpuinit init_smp_flush(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(flush_state); i++)
raw_spin_lock_init(&flush_state[i].tlbstate_lock);
+ calculate_tlb_offset();
+ hotcpu_notifier(tlb_cpuhp_notify, 0);
return 0;
}
core_initcall(init_smp_flush);
#include <asm/ptrace.h>
#include <asm/uaccess.h>
#include <asm/stacktrace.h>
+#include <linux/compat.h>
static void backtrace_warning_symbol(void *data, char *msg,
unsigned long symbol)
.walk_stack = print_context_stack,
};
-struct frame_head {
- struct frame_head *bp;
- unsigned long ret;
-} __attribute__((packed));
-
-static struct frame_head *dump_user_backtrace(struct frame_head *head)
+#ifdef CONFIG_COMPAT
+static struct stack_frame_ia32 *
+dump_user_backtrace_32(struct stack_frame_ia32 *head)
{
- struct frame_head bufhead[2];
+ struct stack_frame_ia32 bufhead[2];
+ struct stack_frame_ia32 *fp;
/* Also check accessibility of one struct frame_head beyond */
if (!access_ok(VERIFY_READ, head, sizeof(bufhead)))
if (__copy_from_user_inatomic(bufhead, head, sizeof(bufhead)))
return NULL;
- oprofile_add_trace(bufhead[0].ret);
+ fp = (struct stack_frame_ia32 *) compat_ptr(bufhead[0].next_frame);
+
+ oprofile_add_trace(bufhead[0].return_address);
+
+ /* frame pointers should strictly progress back up the stack
+ * (towards higher addresses) */
+ if (head >= fp)
+ return NULL;
+
+ return fp;
+}
+
+static inline int
+x86_backtrace_32(struct pt_regs * const regs, unsigned int depth)
+{
+ struct stack_frame_ia32 *head;
+
+ /* User process is 32-bit */
+ if (!current || !test_thread_flag(TIF_IA32))
+ return 0;
+
+ head = (struct stack_frame_ia32 *) regs->bp;
+ while (depth-- && head)
+ head = dump_user_backtrace_32(head);
+
+ return 1;
+}
+
+#else
+static inline int
+x86_backtrace_32(struct pt_regs * const regs, unsigned int depth)
+{
+ return 0;
+}
+#endif /* CONFIG_COMPAT */
+
+static struct stack_frame *dump_user_backtrace(struct stack_frame *head)
+{
+ struct stack_frame bufhead[2];
+
+ /* Also check accessibility of one struct stack_frame beyond */
+ if (!access_ok(VERIFY_READ, head, sizeof(bufhead)))
+ return NULL;
+ if (__copy_from_user_inatomic(bufhead, head, sizeof(bufhead)))
+ return NULL;
+
+ oprofile_add_trace(bufhead[0].return_address);
/* frame pointers should strictly progress back up the stack
* (towards higher addresses) */
- if (head >= bufhead[0].bp)
+ if (head >= bufhead[0].next_frame)
return NULL;
- return bufhead[0].bp;
+ return bufhead[0].next_frame;
}
void
x86_backtrace(struct pt_regs * const regs, unsigned int depth)
{
- struct frame_head *head = (struct frame_head *)frame_pointer(regs);
+ struct stack_frame *head = (struct stack_frame *)frame_pointer(regs);
if (!user_mode_vm(regs)) {
unsigned long stack = kernel_stack_pointer(regs);
return;
}
+ if (x86_backtrace_32(regs, depth))
+ return;
+
while (depth-- && head)
head = dump_user_backtrace(head);
}
int error;
error = sysdev_class_register(&oprofile_sysclass);
- if (!error)
- error = sysdev_register(&device_oprofile);
+ if (error)
+ return error;
+
+ error = sysdev_register(&device_oprofile);
+ if (error)
+ sysdev_class_unregister(&oprofile_sysclass);
+
return error;
}
}
#else
-#define init_sysfs() do { } while (0)
-#define exit_sysfs() do { } while (0)
+
+static inline int init_sysfs(void) { return 0; }
+static inline void exit_sysfs(void) { }
+
#endif /* CONFIG_PM */
static int __init p4_init(char **cpu_type)
case 14:
*cpu_type = "i386/core";
break;
- case 15: case 23:
+ case 0x0f:
+ case 0x16:
+ case 0x17:
+ case 0x1d:
*cpu_type = "i386/core_2";
break;
case 0x1a:
return 1;
}
-/* in order to get sysfs right */
-static int using_nmi;
-
int __init op_nmi_init(struct oprofile_operations *ops)
{
__u8 vendor = boot_cpu_data.x86_vendor;
mux_init(ops);
- init_sysfs();
- using_nmi = 1;
+ ret = init_sysfs();
+ if (ret)
+ return ret;
+
printk(KERN_INFO "oprofile: using NMI interrupt.\n");
return 0;
}
void op_nmi_exit(void)
{
- if (using_nmi)
- exit_sysfs();
+ exit_sysfs();
}
int __init pci_olpc_init(void)
{
- printk(KERN_INFO "PCI: Using configuration type OLPC\n");
+ printk(KERN_INFO "PCI: Using configuration type OLPC XO-1\n");
raw_pci_ops = &pci_olpc_conf;
is_lx = is_geode_lx();
return 0;
.alloc_pte = xen_alloc_pte_init,
.release_pte = xen_release_pte_init,
.alloc_pmd = xen_alloc_pmd_init,
- .alloc_pmd_clone = paravirt_nop,
.release_pmd = xen_release_pmd_init,
#ifdef CONFIG_X86_64
__init void xen_hvm_init_time_ops(void)
{
/* vector callback is needed otherwise we cannot receive interrupts
- * on cpu > 0 */
- if (!xen_have_vector_callback && num_present_cpus() > 1)
+ * on cpu > 0 and at this point we don't know how many cpus are
+ * available */
+ if (!xen_have_vector_callback)
return;
if (!xen_feature(XENFEAT_hvm_safe_pvclock)) {
printk(KERN_INFO "Xen doesn't support pvclock on HVM,"
/* Currently we do not support hierarchy deeper than two level (0,1) */
if (parent != cgroup->top_cgroup)
- return ERR_PTR(-EINVAL);
+ return ERR_PTR(-EPERM);
blkcg = kzalloc(sizeof(*blkcg), GFP_KERNEL);
if (!blkcg)
int el_ret;
unsigned int bytes = bio->bi_size;
const unsigned short prio = bio_prio(bio);
- const bool sync = (bio->bi_rw & REQ_SYNC);
- const bool unplug = (bio->bi_rw & REQ_UNPLUG);
- const unsigned int ff = bio->bi_rw & REQ_FAILFAST_MASK;
+ const bool sync = !!(bio->bi_rw & REQ_SYNC);
+ const bool unplug = !!(bio->bi_rw & REQ_UNPLUG);
+ const unsigned long ff = bio->bi_rw & REQ_FAILFAST_MASK;
int rw_flags;
if ((bio->bi_rw & REQ_HARDBARRIER) &&
return PTR_ERR(bio);
if (rq_data_dir(rq) == WRITE)
- bio->bi_rw |= (1 << REQ_WRITE);
+ bio->bi_rw |= REQ_WRITE;
if (do_copy)
rq->cmd_flags |= REQ_COPY_USER;
return 0;
/*
+ * Don't merge file system requests and discard requests
+ */
+ if ((req->cmd_flags & REQ_DISCARD) != (next->cmd_flags & REQ_DISCARD))
+ return 0;
+
+ /*
+ * Don't merge discard requests and secure discard requests
+ */
+ if ((req->cmd_flags & REQ_SECURE) != (next->cmd_flags & REQ_SECURE))
+ return 0;
+
+ /*
* not contiguous
*/
if (blk_rq_pos(req) + blk_rq_sectors(req) != blk_rq_pos(next))
kobject_uevent(&q->kobj, KOBJ_REMOVE);
kobject_del(&q->kobj);
blk_trace_remove_sysfs(disk_to_dev(disk));
+ kobject_put(&dev->kobj);
return ret;
}
static inline int blk_cpu_to_group(int cpu)
{
+ int group = NR_CPUS;
#ifdef CONFIG_SCHED_MC
const struct cpumask *mask = cpu_coregroup_mask(cpu);
- return cpumask_first(mask);
+ group = cpumask_first(mask);
#elif defined(CONFIG_SCHED_SMT)
- return cpumask_first(topology_thread_cpumask(cpu));
+ group = cpumask_first(topology_thread_cpumask(cpu));
#else
return cpu;
#endif
+ if (likely(group < NR_CPUS))
+ return group;
+ return cpu;
}
/*
/*
* fill in all the output members
*/
- hdr->device_status = status_byte(rq->errors);
+ hdr->device_status = rq->errors & 0xff;
hdr->transport_status = host_byte(rq->errors);
hdr->driver_status = driver_byte(rq->errors);
hdr->info = 0;
static int cfq_slice_async = HZ / 25;
static const int cfq_slice_async_rq = 2;
static int cfq_slice_idle = HZ / 125;
+static int cfq_group_idle = HZ / 125;
static const int cfq_target_latency = HZ * 3/10; /* 300 ms */
static const int cfq_hist_divisor = 4;
struct cfq_queue *new_cfqq;
struct cfq_group *cfqg;
struct cfq_group *orig_cfqg;
+ /* Number of sectors dispatched from queue in single dispatch round */
+ unsigned long nr_sectors;
};
/*
struct hlist_node cfqd_node;
atomic_t ref;
#endif
+ /* number of requests that are on the dispatch list or inside driver */
+ int dispatched;
};
/*
unsigned int cfq_slice[2];
unsigned int cfq_slice_async_rq;
unsigned int cfq_slice_idle;
+ unsigned int cfq_group_idle;
unsigned int cfq_latency;
unsigned int cfq_group_isolation;
&cfqg->service_trees[i][j]: NULL) \
+static inline bool iops_mode(struct cfq_data *cfqd)
+{
+ /*
+ * If we are not idling on queues and it is a NCQ drive, parallel
+ * execution of requests is on and measuring time is not possible
+ * in most of the cases until and unless we drive shallower queue
+ * depths and that becomes a performance bottleneck. In such cases
+ * switch to start providing fairness in terms of number of IOs.
+ */
+ if (!cfqd->cfq_slice_idle && cfqd->hw_tag)
+ return true;
+ else
+ return false;
+}
+
static inline enum wl_prio_t cfqq_prio(struct cfq_queue *cfqq)
{
if (cfq_class_idle(cfqq))
slice_used = cfqq->allocated_slice;
}
- cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u", slice_used);
return slice_used;
}
struct cfq_queue *cfqq)
{
struct cfq_rb_root *st = &cfqd->grp_service_tree;
- unsigned int used_sl, charge_sl;
+ unsigned int used_sl, charge;
int nr_sync = cfqg->nr_cfqq - cfqg_busy_async_queues(cfqd, cfqg)
- cfqg->service_tree_idle.count;
BUG_ON(nr_sync < 0);
- used_sl = charge_sl = cfq_cfqq_slice_usage(cfqq);
+ used_sl = charge = cfq_cfqq_slice_usage(cfqq);
- if (!cfq_cfqq_sync(cfqq) && !nr_sync)
- charge_sl = cfqq->allocated_slice;
+ if (iops_mode(cfqd))
+ charge = cfqq->slice_dispatch;
+ else if (!cfq_cfqq_sync(cfqq) && !nr_sync)
+ charge = cfqq->allocated_slice;
/* Can't update vdisktime while group is on service tree */
cfq_rb_erase(&cfqg->rb_node, st);
- cfqg->vdisktime += cfq_scale_slice(charge_sl, cfqg);
+ cfqg->vdisktime += cfq_scale_slice(charge, cfqg);
__cfq_group_service_tree_add(st, cfqg);
/* This group is being expired. Save the context */
cfq_log_cfqg(cfqd, cfqg, "served: vt=%llu min_vt=%llu", cfqg->vdisktime,
st->min_vdisktime);
+ cfq_log_cfqq(cfqq->cfqd, cfqq, "sl_used=%u disp=%u charge=%u iops=%u"
+ " sect=%u", used_sl, cfqq->slice_dispatch, charge,
+ iops_mode(cfqd), cfqq->nr_sectors);
cfq_blkiocg_update_timeslice_used(&cfqg->blkg, used_sl);
cfq_blkiocg_set_start_empty_time(&cfqg->blkg);
}
*/
atomic_set(&cfqg->ref, 1);
- /* Add group onto cgroup list */
- sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
- cfq_blkiocg_add_blkio_group(blkcg, &cfqg->blkg, (void *)cfqd,
+ /*
+ * Add group onto cgroup list. It might happen that bdi->dev is
+ * not initiliazed yet. Initialize this new group without major
+ * and minor info and this info will be filled in once a new thread
+ * comes for IO. See code above.
+ */
+ if (bdi->dev) {
+ sscanf(dev_name(bdi->dev), "%u:%u", &major, &minor);
+ cfq_blkiocg_add_blkio_group(blkcg, &cfqg->blkg, (void *)cfqd,
MKDEV(major, minor));
+ } else
+ cfq_blkiocg_add_blkio_group(blkcg, &cfqg->blkg, (void *)cfqd,
+ 0);
+
cfqg->weight = blkcg_get_weight(blkcg, cfqg->blkg.dev);
/* Add group on cfqd list */
cfqq->allocated_slice = 0;
cfqq->slice_end = 0;
cfqq->slice_dispatch = 0;
+ cfqq->nr_sectors = 0;
cfq_clear_cfqq_wait_request(cfqq);
cfq_clear_cfqq_must_dispatch(cfqq);
BUG_ON(!service_tree);
BUG_ON(!service_tree->count);
+ if (!cfqd->cfq_slice_idle)
+ return false;
+
/* We never do for idle class queues. */
if (prio == IDLE_WORKLOAD)
return false;
{
struct cfq_queue *cfqq = cfqd->active_queue;
struct cfq_io_context *cic;
- unsigned long sl;
+ unsigned long sl, group_idle = 0;
/*
* SSD device without seek penalty, disable idling. But only do so
/*
* idle is disabled, either manually or by past process history
*/
- if (!cfqd->cfq_slice_idle || !cfq_should_idle(cfqd, cfqq))
- return;
+ if (!cfq_should_idle(cfqd, cfqq)) {
+ /* no queue idling. Check for group idling */
+ if (cfqd->cfq_group_idle)
+ group_idle = cfqd->cfq_group_idle;
+ else
+ return;
+ }
/*
* still active requests from this queue, don't idle
return;
}
+ /* There are other queues in the group, don't do group idle */
+ if (group_idle && cfqq->cfqg->nr_cfqq > 1)
+ return;
+
cfq_mark_cfqq_wait_request(cfqq);
- sl = cfqd->cfq_slice_idle;
+ if (group_idle)
+ sl = cfqd->cfq_group_idle;
+ else
+ sl = cfqd->cfq_slice_idle;
mod_timer(&cfqd->idle_slice_timer, jiffies + sl);
cfq_blkiocg_update_set_idle_time_stats(&cfqq->cfqg->blkg);
- cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu", sl);
+ cfq_log_cfqq(cfqd, cfqq, "arm_idle: %lu group_idle: %d", sl,
+ group_idle ? 1 : 0);
}
/*
cfqq->next_rq = cfq_find_next_rq(cfqd, cfqq, rq);
cfq_remove_request(rq);
cfqq->dispatched++;
+ (RQ_CFQG(rq))->dispatched++;
elv_dispatch_sort(q, rq);
cfqd->rq_in_flight[cfq_cfqq_sync(cfqq)]++;
+ cfqq->nr_sectors += blk_rq_sectors(rq);
cfq_blkiocg_update_dispatch_stats(&cfqq->cfqg->blkg, blk_rq_bytes(rq),
rq_data_dir(rq), rq_is_sync(rq));
}
cfqq = NULL;
goto keep_queue;
} else
- goto expire;
+ goto check_group_idle;
}
/*
* flight or is idling for a new request, allow either of these
* conditions to happen (or time out) before selecting a new queue.
*/
- if (timer_pending(&cfqd->idle_slice_timer) ||
- (cfqq->dispatched && cfq_should_idle(cfqd, cfqq))) {
+ if (timer_pending(&cfqd->idle_slice_timer)) {
+ cfqq = NULL;
+ goto keep_queue;
+ }
+
+ if (cfqq->dispatched && cfq_should_idle(cfqd, cfqq)) {
+ cfqq = NULL;
+ goto keep_queue;
+ }
+
+ /*
+ * If group idle is enabled and there are requests dispatched from
+ * this group, wait for requests to complete.
+ */
+check_group_idle:
+ if (cfqd->cfq_group_idle && cfqq->cfqg->nr_cfqq == 1
+ && cfqq->cfqg->dispatched) {
cfqq = NULL;
goto keep_queue;
}
WARN_ON(!cfqq->dispatched);
cfqd->rq_in_driver--;
cfqq->dispatched--;
+ (RQ_CFQG(rq))->dispatched--;
cfq_blkiocg_update_completion_stats(&cfqq->cfqg->blkg,
rq_start_time_ns(rq), rq_io_start_time_ns(rq),
rq_data_dir(rq), rq_is_sync(rq));
* the queue.
*/
if (cfq_should_wait_busy(cfqd, cfqq)) {
- cfqq->slice_end = jiffies + cfqd->cfq_slice_idle;
+ unsigned long extend_sl = cfqd->cfq_slice_idle;
+ if (!cfqd->cfq_slice_idle)
+ extend_sl = cfqd->cfq_group_idle;
+ cfqq->slice_end = jiffies + extend_sl;
cfq_mark_cfqq_wait_busy(cfqq);
cfq_log_cfqq(cfqd, cfqq, "will busy wait");
}
cfqd->cfq_slice[1] = cfq_slice_sync;
cfqd->cfq_slice_async_rq = cfq_slice_async_rq;
cfqd->cfq_slice_idle = cfq_slice_idle;
+ cfqd->cfq_group_idle = cfq_group_idle;
cfqd->cfq_latency = 1;
cfqd->cfq_group_isolation = 0;
cfqd->hw_tag = -1;
SHOW_FUNCTION(cfq_back_seek_max_show, cfqd->cfq_back_max, 0);
SHOW_FUNCTION(cfq_back_seek_penalty_show, cfqd->cfq_back_penalty, 0);
SHOW_FUNCTION(cfq_slice_idle_show, cfqd->cfq_slice_idle, 1);
+SHOW_FUNCTION(cfq_group_idle_show, cfqd->cfq_group_idle, 1);
SHOW_FUNCTION(cfq_slice_sync_show, cfqd->cfq_slice[1], 1);
SHOW_FUNCTION(cfq_slice_async_show, cfqd->cfq_slice[0], 1);
SHOW_FUNCTION(cfq_slice_async_rq_show, cfqd->cfq_slice_async_rq, 0);
STORE_FUNCTION(cfq_back_seek_penalty_store, &cfqd->cfq_back_penalty, 1,
UINT_MAX, 0);
STORE_FUNCTION(cfq_slice_idle_store, &cfqd->cfq_slice_idle, 0, UINT_MAX, 1);
+STORE_FUNCTION(cfq_group_idle_store, &cfqd->cfq_group_idle, 0, UINT_MAX, 1);
STORE_FUNCTION(cfq_slice_sync_store, &cfqd->cfq_slice[1], 1, UINT_MAX, 1);
STORE_FUNCTION(cfq_slice_async_store, &cfqd->cfq_slice[0], 1, UINT_MAX, 1);
STORE_FUNCTION(cfq_slice_async_rq_store, &cfqd->cfq_slice_async_rq, 1,
CFQ_ATTR(slice_async),
CFQ_ATTR(slice_async_rq),
CFQ_ATTR(slice_idle),
+ CFQ_ATTR(group_idle),
CFQ_ATTR(low_latency),
CFQ_ATTR(group_isolation),
__ATTR_NULL
if (!cfq_slice_idle)
cfq_slice_idle = 1;
+#ifdef CONFIG_CFQ_GROUP_IOSCHED
+ if (!cfq_group_idle)
+ cfq_group_idle = 1;
+#else
+ cfq_group_idle = 0;
+#endif
if (cfq_slab_setup())
return -ENOMEM;
}
}
kobject_uevent(&e->kobj, KOBJ_ADD);
+ e->registered = 1;
}
return error;
}
{
kobject_uevent(&e->kobj, KOBJ_REMOVE);
kobject_del(&e->kobj);
+ e->registered = 0;
}
void elv_unregister_queue(struct request_queue *q)
{
struct elevator_queue *old_elevator, *e;
void *data;
+ int err;
/*
* Allocate new elevator
*/
e = elevator_alloc(q, new_e);
if (!e)
- return 0;
+ return -ENOMEM;
data = elevator_init_queue(q, e);
if (!data) {
kobject_put(&e->kobj);
- return 0;
+ return -ENOMEM;
}
/*
spin_unlock_irq(q->queue_lock);
- __elv_unregister_queue(old_elevator);
+ if (old_elevator->registered) {
+ __elv_unregister_queue(old_elevator);
- if (elv_register_queue(q))
- goto fail_register;
+ err = elv_register_queue(q);
+ if (err)
+ goto fail_register;
+ }
/*
* finally exit old elevator and turn off BYPASS.
blk_add_trace_msg(q, "elv switch: %s", e->elevator_type->elevator_name);
- return 1;
+ return 0;
fail_register:
/*
queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q);
spin_unlock_irq(q->queue_lock);
- return 0;
+ return err;
}
-ssize_t elv_iosched_store(struct request_queue *q, const char *name,
- size_t count)
+/*
+ * Switch this queue to the given IO scheduler.
+ */
+int elevator_change(struct request_queue *q, const char *name)
{
char elevator_name[ELV_NAME_MAX];
struct elevator_type *e;
if (!q->elevator)
- return count;
+ return -ENXIO;
strlcpy(elevator_name, name, sizeof(elevator_name));
e = elevator_get(strstrip(elevator_name));
if (!strcmp(elevator_name, q->elevator->elevator_type->elevator_name)) {
elevator_put(e);
- return count;
+ return 0;
}
- if (!elevator_switch(q, e))
- printk(KERN_ERR "elevator: switch to %s failed\n",
- elevator_name);
- return count;
+ return elevator_switch(q, e);
+}
+EXPORT_SYMBOL(elevator_change);
+
+ssize_t elv_iosched_store(struct request_queue *q, const char *name,
+ size_t count)
+{
+ int ret;
+
+ if (!q->elevator)
+ return count;
+
+ ret = elevator_change(q, name);
+ if (!ret)
+ return count;
+
+ printk(KERN_ERR "elevator: switch to %s failed\n", name);
+ return ret;
}
ssize_t elv_iosched_show(struct request_queue *q, char *name)
obj-y += net/
obj-$(CONFIG_ATM) += atm/
obj-$(CONFIG_FUSION) += message/
-obj-$(CONFIG_FIREWIRE) += firewire/
+obj-y += firewire/
obj-y += ieee1394/
obj-$(CONFIG_UIO) += uio/
obj-y += cdrom/
Be aware that using this interface can confuse your Embedded
Controller in a way that a normal reboot is not enough. You then
- have to power of your system, and remove the laptop battery for
+ have to power off your system, and remove the laptop battery for
some seconds.
An Embedded Controller typically is available on laptops and reads
sensor values like battery state and temperature.
#include <linux/slab.h>
#include <acpi/acpi_bus.h>
#include <acpi/acpi_drivers.h>
+#include <asm/mwait.h>
#define ACPI_PROCESSOR_AGGREGATOR_CLASS "acpi_pad"
#define ACPI_PROCESSOR_AGGREGATOR_DEVICE_NAME "Processor Aggregator"
#define ACPI_PROCESSOR_AGGREGATOR_NOTIFY 0x80
static DEFINE_MUTEX(isolated_cpus_lock);
-#define MWAIT_SUBSTATE_MASK (0xf)
-#define MWAIT_CSTATE_MASK (0xf)
-#define MWAIT_SUBSTATE_SIZE (4)
-#define CPUID_MWAIT_LEAF (5)
-#define CPUID5_ECX_EXTENSIONS_SUPPORTED (0x1)
-#define CPUID5_ECX_INTERRUPT_BREAK (0x2)
static unsigned long power_saving_mwait_eax;
static unsigned char tsc_detected_unstable;
device_remove_file(&device->dev, &dev_attr_rrtime);
}
-/* Query firmware how many CPUs should be idle */
-static int acpi_pad_pur(acpi_handle handle, int *num_cpus)
+/*
+ * Query firmware how many CPUs should be idle
+ * return -1 on failure
+ */
+static int acpi_pad_pur(acpi_handle handle)
{
struct acpi_buffer buffer = {ACPI_ALLOCATE_BUFFER, NULL};
union acpi_object *package;
- int rev, num, ret = -EINVAL;
+ int num = -1;
if (ACPI_FAILURE(acpi_evaluate_object(handle, "_PUR", NULL, &buffer)))
- return -EINVAL;
+ return num;
if (!buffer.length || !buffer.pointer)
- return -EINVAL;
+ return num;
package = buffer.pointer;
- if (package->type != ACPI_TYPE_PACKAGE || package->package.count != 2)
- goto out;
- rev = package->package.elements[0].integer.value;
- num = package->package.elements[1].integer.value;
- if (rev != 1 || num < 0)
- goto out;
- *num_cpus = num;
- ret = 0;
-out:
+
+ if (package->type == ACPI_TYPE_PACKAGE &&
+ package->package.count == 2 &&
+ package->package.elements[0].integer.value == 1) /* rev 1 */
+
+ num = package->package.elements[1].integer.value;
+
kfree(buffer.pointer);
- return ret;
+ return num;
}
/* Notify firmware how many CPUs are idle */
uint32_t idle_cpus;
mutex_lock(&isolated_cpus_lock);
- if (acpi_pad_pur(handle, &num_cpus)) {
+ num_cpus = acpi_pad_pur(handle);
+ if (num_cpus < 0) {
mutex_unlock(&isolated_cpus_lock);
return;
}
ACPI_BITMASK_POWER_BUTTON_STATUS | \
ACPI_BITMASK_SLEEP_BUTTON_STATUS | \
ACPI_BITMASK_RT_CLOCK_STATUS | \
+ ACPI_BITMASK_PCIEXP_WAKE_DISABLE | \
ACPI_BITMASK_WAKE_STATUS)
#define ACPI_BITMASK_TIMER_ENABLE 0x0001
*
* DESCRIPTION: Reacquire the interpreter execution region from within the
* interpreter code. Failure to enter the interpreter region is a
- * fatal system error. Used in conjuction with
+ * fatal system error. Used in conjunction with
* relinquish_interpreter
*
******************************************************************************/
/*
* 16-, 32-, and 64-bit cases must use the move macros that perform
- * endian conversion and/or accomodate hardware that cannot perform
+ * endian conversion and/or accommodate hardware that cannot perform
* misaligned memory transfers
*/
case ACPI_RSC_MOVE16:
depends on ACPI_APEI
help
ERST is a way provided by APEI to save and retrieve hardware
- error infomation to and from a persistent store. Enable this
+ error information to and from a persistent store. Enable this
if you want to debugging and testing the ERST kernel support
and firmware implementation.
int apei_resources_request(struct apei_resources *resources,
const char *desc)
{
- struct apei_res *res, *res_bak;
+ struct apei_res *res, *res_bak = NULL;
struct resource *r;
+ int rc;
- apei_resources_sub(resources, &apei_resources_all);
+ rc = apei_resources_sub(resources, &apei_resources_all);
+ if (rc)
+ return rc;
+ rc = -EINVAL;
list_for_each_entry(res, &resources->iomem, list) {
r = request_mem_region(res->start, res->end - res->start,
desc);
}
}
- apei_resources_merge(&apei_resources_all, resources);
+ rc = apei_resources_merge(&apei_resources_all, resources);
+ if (rc) {
+ pr_err(APEI_PFX "Fail to merge resources!\n");
+ goto err_unmap_ioport;
+ }
return 0;
err_unmap_ioport:
break;
release_mem_region(res->start, res->end - res->start);
}
- return -EINVAL;
+ return rc;
}
EXPORT_SYMBOL_GPL(apei_resources_request);
void apei_resources_release(struct apei_resources *resources)
{
+ int rc;
struct apei_res *res;
list_for_each_entry(res, &resources->iomem, list)
list_for_each_entry(res, &resources->ioport, list)
release_region(res->start, res->end - res->start);
- apei_resources_sub(&apei_resources_all, resources);
+ rc = apei_resources_sub(&apei_resources_all, resources);
+ if (rc)
+ pr_err(APEI_PFX "Fail to sub resources!\n");
}
EXPORT_SYMBOL_GPL(apei_resources_release);
static int einj_check_table(struct acpi_table_einj *einj_tab)
{
- if (einj_tab->header_length != sizeof(struct acpi_table_einj))
+ if ((einj_tab->header_length !=
+ (sizeof(struct acpi_table_einj) - sizeof(einj_tab->header)))
+ && (einj_tab->header_length != sizeof(struct acpi_table_einj)))
return -EINVAL;
if (einj_tab->header.length < sizeof(struct acpi_table_einj))
return -EINVAL;
* APEI Error Record Serialization Table debug support
*
* ERST is a way provided by APEI to save and retrieve hardware error
- * infomation to and from a persistent store. This file provide the
+ * information to and from a persistent store. This file provide the
* debugging/testing support for ERST kernel support and firmware
* implementation.
*
goto out;
}
if (len > erst_dbg_buf_len) {
- kfree(erst_dbg_buf);
+ void *p;
rc = -ENOMEM;
- erst_dbg_buf = kmalloc(len, GFP_KERNEL);
- if (!erst_dbg_buf)
+ p = kmalloc(len, GFP_KERNEL);
+ if (!p)
goto out;
+ kfree(erst_dbg_buf);
+ erst_dbg_buf = p;
erst_dbg_buf_len = len;
goto retry;
}
if (mutex_lock_interruptible(&erst_dbg_mutex))
return -EINTR;
if (usize > erst_dbg_buf_len) {
- kfree(erst_dbg_buf);
+ void *p;
rc = -ENOMEM;
- erst_dbg_buf = kmalloc(usize, GFP_KERNEL);
- if (!erst_dbg_buf)
+ p = kmalloc(usize, GFP_KERNEL);
+ if (!p)
goto out;
+ kfree(erst_dbg_buf);
+ erst_dbg_buf = p;
erst_dbg_buf_len = usize;
}
rc = copy_from_user(erst_dbg_buf, ubuf, usize);
* APEI Error Record Serialization Table support
*
* ERST is a way provided by APEI to save and retrieve hardware error
- * infomation to and from a persistent store.
+ * information to and from a persistent store.
*
* For more information about ERST, please refer to ACPI Specification
* version 4.0, section 17.4.
{
int rc;
u64 offset;
+ void *src, *dst;
+
+ /* ioremap does not work in interrupt context */
+ if (in_interrupt()) {
+ pr_warning(ERST_PFX
+ "MOVE_DATA can not be used in interrupt context");
+ return -EBUSY;
+ }
rc = __apei_exec_read_register(entry, &offset);
if (rc)
return rc;
- memmove((void *)ctx->dst_base + offset,
- (void *)ctx->src_base + offset,
- ctx->var2);
+
+ src = ioremap(ctx->src_base + offset, ctx->var2);
+ if (!src)
+ return -ENOMEM;
+ dst = ioremap(ctx->dst_base + offset, ctx->var2);
+ if (!dst)
+ return -ENOMEM;
+
+ memmove(dst, src, ctx->var2);
+
+ iounmap(src);
+ iounmap(dst);
return 0;
}
static int erst_check_table(struct acpi_table_erst *erst_tab)
{
- if (erst_tab->header_length != sizeof(struct acpi_table_erst))
+ if ((erst_tab->header_length !=
+ (sizeof(struct acpi_table_erst) - sizeof(erst_tab->header)))
+ && (erst_tab->header_length != sizeof(struct acpi_table_einj)))
return -EINVAL;
if (erst_tab->header.length < sizeof(struct acpi_table_erst))
return -EINVAL;
struct ghes *ghes = NULL;
int rc = -EINVAL;
- generic = ghes_dev->dev.platform_data;
+ generic = *(struct acpi_hest_generic **)ghes_dev->dev.platform_data;
if (!generic->enabled)
return -ENODEV;
static int hest_parse_ghes(struct acpi_hest_header *hest_hdr, void *data)
{
- struct acpi_hest_generic *generic;
struct platform_device *ghes_dev;
struct ghes_arr *ghes_arr = data;
int rc;
if (hest_hdr->type != ACPI_HEST_TYPE_GENERIC_ERROR)
return 0;
- generic = (struct acpi_hest_generic *)hest_hdr;
- if (!generic->enabled)
+
+ if (!((struct acpi_hest_generic *)hest_hdr)->enabled)
return 0;
ghes_dev = platform_device_alloc("GHES", hest_hdr->source_id);
if (!ghes_dev)
return -ENOMEM;
- ghes_dev->dev.platform_data = generic;
+
+ rc = platform_device_add_data(ghes_dev, &hest_hdr, sizeof(void *));
+ if (rc)
+ goto err;
+
rc = platform_device_add(ghes_dev);
if (rc)
goto err;
list_add_tail_rcu(&map->list, &acpi_iomaps);
spin_unlock_irqrestore(&acpi_iomaps_lock, flags);
- return vaddr + (paddr - pg_off);
+ return map->vaddr + (paddr - map->paddr);
err_unmap:
iounmap(vaddr);
return NULL;
POWER_SUPPLY_PROP_CYCLE_COUNT,
POWER_SUPPLY_PROP_VOLTAGE_MIN_DESIGN,
POWER_SUPPLY_PROP_VOLTAGE_NOW,
- POWER_SUPPLY_PROP_CURRENT_NOW,
POWER_SUPPLY_PROP_POWER_NOW,
POWER_SUPPLY_PROP_ENERGY_FULL_DESIGN,
POWER_SUPPLY_PROP_ENERGY_FULL,
{
printk(KERN_NOTICE PREFIX "DMI detected: %s\n", d->ident);
acpi_osi_setup("!Windows 2006");
+ acpi_osi_setup("!Windows 2006 SP1");
+ acpi_osi_setup("!Windows 2006 SP2");
return 0;
}
static int __init dmi_disable_osi_win7(const struct dmi_system_id *d)
},
},
{
+ /*
+ * There have a NVIF method in MSI GX723 DSDT need call by Nvidia
+ * driver (e.g. nouveau) when user press brightness hotkey.
+ * Currently, nouveau driver didn't do the job and it causes there
+ * have a infinite while loop in DSDT when user press hotkey.
+ * We add MSI GX723's dmi information to this table for workaround
+ * this issue.
+ * Will remove MSI GX723 from the table after nouveau grows support.
+ */
+ .callback = dmi_disable_osi_vista,
+ .ident = "MSI GX723",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Micro-Star International"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "GX723"),
+ },
+ },
+ {
.callback = dmi_disable_osi_vista,
.ident = "Sony VGN-NS10J_S",
.matches = {
},
},
{
+ .callback = dmi_disable_osi_vista,
+ .ident = "Toshiba Satellite L355",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
+ DMI_MATCH(DMI_PRODUCT_VERSION, "Satellite L355"),
+ },
+ },
+ {
.callback = dmi_disable_osi_win7,
.ident = "ASUS K50IJ",
.matches = {
DMI_MATCH(DMI_PRODUCT_NAME, "K50IJ"),
},
},
+ {
+ .callback = dmi_disable_osi_vista,
+ .ident = "Toshiba P305D",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Satellite P305D"),
+ },
+ },
/*
* BIOS invocation of _OSI(Linux) is almost always a BIOS bug.
static int set_power_nocheck(const struct dmi_system_id *id)
{
printk(KERN_NOTICE PREFIX "%s detected - "
- "disable power check in power transistion\n", id->ident);
+ "disable power check in power transition\n", id->ident);
acpi_power_nocheck = 1;
return 0;
}
static struct dmi_system_id dsdt_dmi_table[] __initdata = {
/*
- * Insyde BIOS on some TOSHIBA machines corrupt the DSDT.
+ * Invoke DSDT corruption work-around on all Toshiba Satellite.
* https://bugzilla.kernel.org/show_bug.cgi?id=14679
*/
{
.callback = set_copy_dsdt,
- .ident = "TOSHIBA Satellite A505",
+ .ident = "TOSHIBA Satellite",
.matches = {
DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
- DMI_MATCH(DMI_PRODUCT_NAME, "Satellite A505"),
- },
- },
- {
- .callback = set_copy_dsdt,
- .ident = "TOSHIBA Satellite L505D",
- .matches = {
- DMI_MATCH(DMI_SYS_VENDOR, "TOSHIBA"),
- DMI_MATCH(DMI_PRODUCT_NAME, "Satellite L505D"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Satellite"),
},
},
{}
/*
* If the laptop falls into the DMI check table, the power state check
- * will be disabled in the course of device power transistion.
+ * will be disabled in the course of device power transition.
*/
dmi_check_system(power_nocheck_dmi_table);
acpi_bus_unregister_driver(&acpi_fan_driver);
+#ifdef CONFIG_ACPI_PROCFS
remove_proc_entry(ACPI_FAN_CLASS, acpi_root_dir);
+#endif
return;
}
static struct dmi_system_id __cpuinitdata processor_idle_dmi_table[] = {
{
- set_no_mwait, "IFL91 board", {
- DMI_MATCH(DMI_BIOS_VENDOR, "COMPAL"),
- DMI_MATCH(DMI_SYS_VENDOR, "ZEPTO"),
- DMI_MATCH(DMI_PRODUCT_VERSION, "3215W"),
- DMI_MATCH(DMI_BOARD_NAME, "IFL91") }, NULL},
- {
set_no_mwait, "Extensa 5220", {
DMI_MATCH(DMI_BIOS_VENDOR, "Phoenix Technologies LTD"),
DMI_MATCH(DMI_SYS_VENDOR, "Acer"),
acpi_walk_namespace(ACPI_TYPE_PROCESSOR, ACPI_ROOT_OBJECT,
ACPI_UINT32_MAX,
early_init_pdc, NULL, NULL, NULL);
+ acpi_get_devices("ACPI0007", early_init_pdc, NULL, NULL);
}
printk(KERN_DEBUG "ACPI: %s registered with cpuidle\n",
acpi_idle_driver.name);
} else {
- printk(KERN_DEBUG "ACPI: acpi_idle yielding to %s",
+ printk(KERN_DEBUG "ACPI: acpi_idle yielding to %s\n",
cpuidle_get_driver()->name);
}
if (!try_module_get(calling_module))
return -EINVAL;
- /* is_done is set to negative if an error occured,
- * and to postitive if _no_ error occured, but SMM
+ /* is_done is set to negative if an error occurred,
+ * and to postitive if _no_ error occurred, but SMM
* was already notified. This avoids double notification
* which might lead to unexpected results...
*/
return 0;
}
+static int __init init_nvs_nosave(const struct dmi_system_id *d)
+{
+ acpi_nvs_nosave();
+ return 0;
+}
+
static struct dmi_system_id __initdata acpisleep_dmi_table[] = {
{
.callback = init_old_suspend_ordering,
DMI_MATCH(DMI_BOARD_NAME, "CF51-2L"),
},
},
+ {
+ .callback = init_nvs_nosave,
+ .ident = "Sony Vaio VGN-SR11M",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Sony Corporation"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "VGN-SR11M"),
+ },
+ },
+ {
+ .callback = init_nvs_nosave,
+ .ident = "Everex StepNote Series",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Everex Systems, Inc."),
+ DMI_MATCH(DMI_PRODUCT_NAME, "Everex StepNote Series"),
+ },
+ },
{},
};
#endif /* CONFIG_SUSPEND */
ACPI_DEBUG_INIT(ACPI_LV_EVENTS),
};
-static int param_get_debug_layer(char *buffer, struct kernel_param *kp)
+static int param_get_debug_layer(char *buffer, const struct kernel_param *kp)
{
int result = 0;
int i;
return result;
}
-static int param_get_debug_level(char *buffer, struct kernel_param *kp)
+static int param_get_debug_level(char *buffer, const struct kernel_param *kp)
{
int result = 0;
int i;
return result;
}
-module_param_call(debug_layer, param_set_uint, param_get_debug_layer,
- &acpi_dbg_layer, 0644);
-module_param_call(debug_level, param_set_uint, param_get_debug_level,
- &acpi_dbg_level, 0644);
+static struct kernel_param_ops param_ops_debug_layer = {
+ .set = param_set_uint,
+ .get = param_get_debug_layer,
+};
+
+static struct kernel_param_ops param_ops_debug_level = {
+ .set = param_set_uint,
+ .get = param_get_debug_level,
+};
+
+module_param_cb(debug_layer, ¶m_ops_debug_layer, &acpi_dbg_layer, 0644);
+module_param_cb(debug_level, ¶m_ops_debug_level, &acpi_dbg_level, 0644);
static char trace_method_name[6];
module_param_string(trace_method_name, trace_method_name, 6, 0644);
"support\n"));
*cap |= ACPI_VIDEO_BACKLIGHT;
if (ACPI_FAILURE(acpi_get_handle(handle, "_BQC", &h_dummy)))
- printk(KERN_WARNING FW_BUG PREFIX "ACPI brightness "
- "control misses _BQC function\n");
+ printk(KERN_WARNING FW_BUG PREFIX "No _BQC method, "
+ "cannot determine initial brightness\n");
/* We have backlight support, no need to scan further */
return AE_CTRL_TERMINATE;
}
static int ahci_pci_device_resume(struct pci_dev *pdev);
#endif
+static struct scsi_host_template ahci_sht = {
+ AHCI_SHT("ahci"),
+};
+
static struct ata_port_operations ahci_vt8251_ops = {
.inherits = &ahci_ops,
.hardreset = ahci_vt8251_hardreset,
{ PCI_VDEVICE(INTEL, 0x1c05), board_ahci }, /* CPT RAID */
{ PCI_VDEVICE(INTEL, 0x1c06), board_ahci }, /* CPT RAID */
{ PCI_VDEVICE(INTEL, 0x1c07), board_ahci }, /* CPT RAID */
+ { PCI_VDEVICE(INTEL, 0x1d02), board_ahci }, /* PBG AHCI */
+ { PCI_VDEVICE(INTEL, 0x1d04), board_ahci }, /* PBG RAID */
+ { PCI_VDEVICE(INTEL, 0x1d06), board_ahci }, /* PBG RAID */
/* JMicron 360/1/3/5/6, match class to avoid IDE function */
{ PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
extern int ahci_ignore_sss;
-extern struct scsi_host_template ahci_sht;
+extern struct device_attribute *ahci_shost_attrs[];
+extern struct device_attribute *ahci_sdev_attrs[];
+
+#define AHCI_SHT(drv_name) \
+ ATA_NCQ_SHT(drv_name), \
+ .can_queue = AHCI_MAX_CMDS - 1, \
+ .sg_tablesize = AHCI_MAX_SG, \
+ .dma_boundary = AHCI_DMA_BOUNDARY, \
+ .shost_attrs = ahci_shost_attrs, \
+ .sdev_attrs = ahci_sdev_attrs
+
extern struct ata_port_operations ahci_ops;
void ahci_save_initial_config(struct device *dev,
#include <linux/ahci_platform.h>
#include "ahci.h"
+static struct scsi_host_template ahci_platform_sht = {
+ AHCI_SHT("ahci_platform"),
+};
+
static int __init ahci_probe(struct platform_device *pdev)
{
struct device *dev = &pdev->dev;
ahci_print_info(host, "platform");
rc = ata_host_activate(host, irq, ahci_interrupt, IRQF_SHARED,
- &ahci_sht);
+ &ahci_platform_sht);
if (rc)
goto err0;
{ 0x8086, 0x1c08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
/* SATA Controller IDE (CPT) */
{ 0x8086, 0x1c09, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
+ /* SATA Controller IDE (PBG) */
+ { 0x8086, 0x1d00, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_sata },
+ /* SATA Controller IDE (PBG) */
+ { 0x8086, 0x1d08, PCI_ANY_ID, PCI_ANY_ID, 0, 0, ich8_2port_sata },
{ } /* terminate list */
};
static DEVICE_ATTR(em_buffer, S_IWUSR | S_IRUGO,
ahci_read_em_buffer, ahci_store_em_buffer);
-static struct device_attribute *ahci_shost_attrs[] = {
+struct device_attribute *ahci_shost_attrs[] = {
&dev_attr_link_power_management_policy,
&dev_attr_em_message_type,
&dev_attr_em_message,
&dev_attr_em_buffer,
NULL
};
+EXPORT_SYMBOL_GPL(ahci_shost_attrs);
-static struct device_attribute *ahci_sdev_attrs[] = {
+struct device_attribute *ahci_sdev_attrs[] = {
&dev_attr_sw_activity,
&dev_attr_unload_heads,
NULL
};
-
-struct scsi_host_template ahci_sht = {
- ATA_NCQ_SHT("ahci"),
- .can_queue = AHCI_MAX_CMDS - 1,
- .sg_tablesize = AHCI_MAX_SG,
- .dma_boundary = AHCI_DMA_BOUNDARY,
- .shost_attrs = ahci_shost_attrs,
- .sdev_attrs = ahci_sdev_attrs,
-};
-EXPORT_SYMBOL_GPL(ahci_sht);
+EXPORT_SYMBOL_GPL(ahci_sdev_attrs);
struct ata_port_operations ahci_ops = {
.inherits = &sata_pmp_port_ops,
/* issue the first D2H Register FIS */
msecs = 0;
now = jiffies;
- if (time_after(now, deadline))
+ if (time_after(deadline, now))
msecs = jiffies_to_msecs(deadline - now);
tf.ctl |= ATA_SRST;
*/
int ata_host_suspend(struct ata_host *host, pm_message_t mesg)
{
+ unsigned int ehi_flags = ATA_EHI_QUIET;
int rc;
/*
*/
ata_lpm_enable(host);
- rc = ata_host_request_pm(host, mesg, 0, ATA_EHI_QUIET, 1);
+ /*
+ * On some hardware, device fails to respond after spun down
+ * for suspend. As the device won't be used before being
+ * resumed, we don't need to touch the device. Ask EH to skip
+ * the usual stuff and proceed directly to suspend.
+ *
+ * http://thread.gmane.org/gmane.linux.ide/46764
+ */
+ if (mesg.event == PM_EVENT_SUSPEND)
+ ehi_flags |= ATA_EHI_NO_AUTOPSY | ATA_EHI_NO_RECOVERY;
+
+ rc = ata_host_request_pm(host, mesg, 0, ehi_flags, 1);
if (rc == 0)
host->dev->power.power_state = mesg;
return rc;
if (link->flags & ATA_LFLAG_DISABLED)
return 1;
+ /* skip if explicitly requested */
+ if (ehc->i.flags & ATA_EHI_NO_RECOVERY)
+ return 1;
+
/* thaw frozen port and recover failed devices */
if ((ap->pflags & ATA_PFLAG_FROZEN) || ata_link_nr_enabled(link))
return 0;
if (ioaddr->ctl_addr)
iowrite8(tf->ctl, ioaddr->ctl_addr);
ap->last_ctl = tf->ctl;
+ ata_wait_idle(ap);
}
if (is_addr && (tf->flags & ATA_TFLAG_LBA48)) {
iowrite8(tf->device, ioaddr->device_addr);
VPRINTK("device 0x%X\n", tf->device);
}
+
+ ata_wait_idle(ap);
}
EXPORT_SYMBOL_GPL(ata_sff_tf_load);
int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc,
u8 status, int in_wq)
{
- struct ata_eh_info *ehi = &ap->link.eh_info;
+ struct ata_link *link = qc->dev->link;
+ struct ata_eh_info *ehi = &link->eh_info;
unsigned long flags = 0;
int poll_next;
}
EXPORT_SYMBOL_GPL(ata_sff_hsm_move);
-void ata_sff_queue_pio_task(struct ata_port *ap, unsigned long delay)
+void ata_sff_queue_pio_task(struct ata_link *link, unsigned long delay)
{
+ struct ata_port *ap = link->ap;
+
+ WARN_ON((ap->sff_pio_task_link != NULL) &&
+ (ap->sff_pio_task_link != link));
+ ap->sff_pio_task_link = link;
+
/* may fail if ata_sff_flush_pio_task() in progress */
queue_delayed_work(ata_sff_wq, &ap->sff_pio_task,
msecs_to_jiffies(delay));
{
struct ata_port *ap =
container_of(work, struct ata_port, sff_pio_task.work);
+ struct ata_link *link = ap->sff_pio_task_link;
struct ata_queued_cmd *qc;
u8 status;
int poll_next;
+ BUG_ON(ap->sff_pio_task_link == NULL);
/* qc can be NULL if timeout occurred */
- qc = ata_qc_from_tag(ap, ap->link.active_tag);
- if (!qc)
+ qc = ata_qc_from_tag(ap, link->active_tag);
+ if (!qc) {
+ ap->sff_pio_task_link = NULL;
return;
+ }
fsm_start:
WARN_ON_ONCE(ap->hsm_task_state == HSM_ST_IDLE);
msleep(2);
status = ata_sff_busy_wait(ap, ATA_BUSY, 10);
if (status & ATA_BUSY) {
- ata_sff_queue_pio_task(ap, ATA_SHORT_PAUSE);
+ ata_sff_queue_pio_task(link, ATA_SHORT_PAUSE);
return;
}
}
+ /*
+ * hsm_move() may trigger another command to be processed.
+ * clean the link beforehand.
+ */
+ ap->sff_pio_task_link = NULL;
/* move the HSM */
poll_next = ata_sff_hsm_move(ap, qc, status, 1);
unsigned int ata_sff_qc_issue(struct ata_queued_cmd *qc)
{
struct ata_port *ap = qc->ap;
+ struct ata_link *link = qc->dev->link;
/* Use polling pio if the LLD doesn't handle
* interrupt driven pio and atapi CDB interrupt.
ap->hsm_task_state = HSM_ST_LAST;
if (qc->tf.flags & ATA_TFLAG_POLLING)
- ata_sff_queue_pio_task(ap, 0);
+ ata_sff_queue_pio_task(link, 0);
break;
if (qc->tf.flags & ATA_TFLAG_WRITE) {
/* PIO data out protocol */
ap->hsm_task_state = HSM_ST_FIRST;
- ata_sff_queue_pio_task(ap, 0);
+ ata_sff_queue_pio_task(link, 0);
/* always send first data block using the
* ata_sff_pio_task() codepath.
ap->hsm_task_state = HSM_ST;
if (qc->tf.flags & ATA_TFLAG_POLLING)
- ata_sff_queue_pio_task(ap, 0);
+ ata_sff_queue_pio_task(link, 0);
/* if polling, ata_sff_pio_task() handles the
* rest. otherwise, interrupt handler takes
/* send cdb by polling if no cdb interrupt */
if ((!(qc->dev->flags & ATA_DFLAG_CDB_INTR)) ||
(qc->tf.flags & ATA_TFLAG_POLLING))
- ata_sff_queue_pio_task(ap, 0);
+ ata_sff_queue_pio_task(link, 0);
break;
default:
unsigned int ata_bmdma_qc_issue(struct ata_queued_cmd *qc)
{
struct ata_port *ap = qc->ap;
+ struct ata_link *link = qc->dev->link;
/* defer PIO handling to sff_qc_issue */
if (!ata_is_dma(qc->tf.protocol))
/* send cdb by polling if no cdb interrupt */
if (!(qc->dev->flags & ATA_DFLAG_CDB_INTR))
- ata_sff_queue_pio_task(ap, 0);
+ ata_sff_queue_pio_task(link, 0);
break;
default:
struct pci_dev *pdev = to_pci_dev(ap->host->dev);
/* Odd numbered device ids are the units with enable bits (the -R cards) */
- if (pdev->device % 1 && !pci_test_config_bits(pdev, &artop_enable_bits[ap->port_no]))
+ if ((pdev->device & 1) &&
+ !pci_test_config_bits(pdev, &artop_enable_bits[ap->port_no]))
return -ENOENT;
return ata_sff_prereset(link, deadline);
tf->lbam,
tf->lbah);
}
+
+ ata_wait_idle(ap);
}
static int via_port_start(struct ata_port *ap)
}
if (qc->tf.flags & ATA_TFLAG_POLLING)
- ata_sff_queue_pio_task(ap, 0);
+ ata_sff_queue_pio_task(link, 0);
return 0;
}
{
struct atm_dev *dev;
IADEV *iadev;
- unsigned long flags;
int ret;
iadev = kzalloc(sizeof(*iadev), GFP_KERNEL);
ia_dev[iadev_count] = iadev;
_ia_dev[iadev_count] = dev;
iadev_count++;
- spin_lock_init(&iadev->misc_lock);
- /* First fixes first. I don't want to think about this now. */
- spin_lock_irqsave(&iadev->misc_lock, flags);
if (ia_init(dev) || ia_start(dev)) {
IF_INIT(printk("IA register failed!\n");)
iadev_count--;
ia_dev[iadev_count] = NULL;
_ia_dev[iadev_count] = NULL;
- spin_unlock_irqrestore(&iadev->misc_lock, flags);
ret = -EINVAL;
goto err_out_deregister_dev;
}
- spin_unlock_irqrestore(&iadev->misc_lock, flags);
IF_EVENT(printk("iadev_count = %d\n", iadev_count);)
iadev->next_board = ia_boards;
struct dle_q rx_dle_q;
struct free_desc_q *rx_free_desc_qhead;
struct sk_buff_head rx_dma_q;
- spinlock_t rx_lock, misc_lock;
+ spinlock_t rx_lock;
struct atm_vcc **rx_open; /* list of all open VCs */
u16 num_rx_desc, rx_buf_sz, rxing;
u32 rx_pkt_ram, rx_tmp_cnt;
struct atm_dev *atmdev = container_of(dev, struct atm_dev, class_dev);
struct solos_card *card = atmdev->dev_data;
struct sk_buff *skb;
+ unsigned int len;
spin_lock(&card->cli_queue_lock);
skb = skb_dequeue(&card->cli_queue[SOLOS_CHAN(atmdev)]);
if(skb == NULL)
return sprintf(buf, "No data.\n");
- memcpy(buf, skb->data, skb->len);
- dev_dbg(&card->dev->dev, "len: %d\n", skb->len);
+ len = skb->len;
+ memcpy(buf, skb->data, len);
+ dev_dbg(&card->dev->dev, "len: %d\n", len);
kfree_skb(skb);
- return skb->len;
+ return len;
}
static int send_command(struct solos_card *card, int dev, const char *buf, size_t size)
{
dev->power.status = DPM_ON;
init_completion(&dev->power.completion);
+ complete_all(&dev->power.completion);
dev->power.wakeup_count = 0;
pm_runtime_init(dev);
}
return sprintf(buf, "%d\n", topology_##name(cpu)); \
}
-#if defined(topology_thread_cpumask) || defined(topology_core_cpumask)
+#if defined(topology_thread_cpumask) || defined(topology_core_cpumask) || \
+ defined(topology_book_cpumask)
static ssize_t show_cpumap(int type, const struct cpumask *mask, char *buf)
{
ptrdiff_t len = PTR_ALIGN(buf + PAGE_SIZE - 1, PAGE_SIZE) - buf;
define_one_ro_named(core_siblings, show_core_cpumask);
define_one_ro_named(core_siblings_list, show_core_cpumask_list);
+#ifdef CONFIG_SCHED_BOOK
+define_id_show_func(book_id);
+define_one_ro(book_id);
+define_siblings_show_func(book_cpumask);
+define_one_ro_named(book_siblings, show_book_cpumask);
+define_one_ro_named(book_siblings_list, show_book_cpumask_list);
+#endif
+
static struct attribute *default_attrs[] = {
&attr_physical_package_id.attr,
&attr_core_id.attr,
&attr_thread_siblings_list.attr,
&attr_core_siblings.attr,
&attr_core_siblings_list.attr,
+#ifdef CONFIG_SCHED_BOOK
+ &attr_book_id.attr,
+ &attr_book_siblings.attr,
+ &attr_book_siblings_list.attr,
+#endif
NULL
};
If unsure, say N.
+config BLK_DEV_RBD
+ tristate "Rados block device (RBD)"
+ depends on INET && EXPERIMENTAL && BLOCK
+ select CEPH_LIB
+ select LIBCRC32C
+ select CRYPTO_AES
+ select CRYPTO
+ default n
+ help
+ Say Y here if you want include the Rados block device, which stripes
+ a block device over objects stored in the Ceph distributed object
+ store.
+
+ More information at http://ceph.newdream.net/.
+
+ If unsure, say N.
+
endif # BLK_DEV
obj-$(CONFIG_XEN_BLKDEV_FRONTEND) += xen-blkfront.o
obj-$(CONFIG_BLK_DEV_DRBD) += drbd/
+obj-$(CONFIG_BLK_DEV_RBD) += rbd.o
swim_mod-objs := swim.o swim_asm.o
spin_lock_irqsave(&h->lock, flags);
addQ(&h->reqQ, c);
h->Qdepth++;
+ if (h->Qdepth > h->maxQsinceinit)
+ h->maxQsinceinit = h->Qdepth;
start_io(h);
spin_unlock_irqrestore(&h->lock, flags);
}
misc_fw_support = readl(&cfgtable->misc_fw_support);
use_doorbell = misc_fw_support & MISC_FW_DOORBELL_RESET;
+ /* The doorbell reset seems to cause lockups on some Smart
+ * Arrays (e.g. P410, P410i, maybe others). Until this is
+ * fixed or at least isolated, avoid the doorbell reset.
+ */
+ use_doorbell = 0;
+
rc = cciss_controller_hard_reset(pdev, vaddr, use_doorbell);
if (rc)
goto unmap_cfgtable;
h->scatter_list = kmalloc(h->max_commands *
sizeof(struct scatterlist *),
GFP_KERNEL);
+ if (!h->scatter_list)
+ goto clean4;
+
for (k = 0; k < h->nr_cmds; k++) {
h->scatter_list[k] = kmalloc(sizeof(struct scatterlist) *
h->maxsgentries,
clean4:
kfree(h->cmd_pool_bits);
/* Free up sg elements */
- for (k = 0; k < h->nr_cmds; k++)
+ for (k-- ; k >= 0; k--)
kfree(h->scatter_list[k]);
kfree(h->scatter_list);
cciss_free_sg_chain_blocks(h->cmd_sg_list, h->nr_cmds);
pos = ((loff_t) bio->bi_sector << 9) + lo->lo_offset;
if (bio_rw(bio) == WRITE) {
- bool barrier = (bio->bi_rw & REQ_HARDBARRIER);
+ bool barrier = !!(bio->bi_rw & REQ_HARDBARRIER);
struct file *file = lo->lo_backing_file;
if (barrier) {
host->breq->queuedata = host;
/* mflash is random device, thanx for the noop */
- elevator_exit(host->breq->elevator);
- err = elevator_init(host->breq, "noop");
+ err = elevator_change(host->breq, "noop");
if (err) {
printk(KERN_ERR "%s:%d (elevator_init) fail\n",
__func__, __LINE__);
pkt_shrink_pktlist(pd);
}
-static struct pktcdvd_device *pkt_find_dev_from_minor(int dev_minor)
+static struct pktcdvd_device *pkt_find_dev_from_minor(unsigned int dev_minor)
{
if (dev_minor >= MAX_WRITERS)
return NULL;
memcpy(buf, dev->bounce_buf+offset, size);
offset += size;
flush_kernel_dcache_page(bvec->bv_page);
- bvec_kunmap_irq(bvec, &flags);
+ bvec_kunmap_irq(buf, &flags);
i++;
}
}
--- /dev/null
+/*
+ rbd.c -- Export ceph rados objects as a Linux block device
+
+
+ based on drivers/block/osdblk.c:
+
+ Copyright 2009 Red Hat, Inc.
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; see the file COPYING. If not, write to
+ the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
+
+
+
+ Instructions for use
+ --------------------
+
+ 1) Map a Linux block device to an existing rbd image.
+
+ Usage: <mon ip addr> <options> <pool name> <rbd image name> [snap name]
+
+ $ echo "192.168.0.1 name=admin rbd foo" > /sys/class/rbd/add
+
+ The snapshot name can be "-" or omitted to map the image read/write.
+
+ 2) List all active blkdev<->object mappings.
+
+ In this example, we have performed step #1 twice, creating two blkdevs,
+ mapped to two separate rados objects in the rados rbd pool
+
+ $ cat /sys/class/rbd/list
+ #id major client_name pool name snap KB
+ 0 254 client4143 rbd foo - 1024000
+
+ The columns, in order, are:
+ - blkdev unique id
+ - blkdev assigned major
+ - rados client id
+ - rados pool name
+ - rados block device name
+ - mapped snapshot ("-" if none)
+ - device size in KB
+
+
+ 3) Create a snapshot.
+
+ Usage: <blkdev id> <snapname>
+
+ $ echo "0 mysnap" > /sys/class/rbd/snap_create
+
+
+ 4) Listing a snapshot.
+
+ $ cat /sys/class/rbd/snaps_list
+ #id snap KB
+ 0 - 1024000 (*)
+ 0 foo 1024000
+
+ The columns, in order, are:
+ - blkdev unique id
+ - snapshot name, '-' means none (active read/write version)
+ - size of device at time of snapshot
+ - the (*) indicates this is the active version
+
+ 5) Rollback to snapshot.
+
+ Usage: <blkdev id> <snapname>
+
+ $ echo "0 mysnap" > /sys/class/rbd/snap_rollback
+
+
+ 6) Mapping an image using snapshot.
+
+ A snapshot mapping is read-only. This is being done by passing
+ snap=<snapname> to the options when adding a device.
+
+ $ echo "192.168.0.1 name=admin,snap=mysnap rbd foo" > /sys/class/rbd/add
+
+
+ 7) Remove an active blkdev<->rbd image mapping.
+
+ In this example, we remove the mapping with blkdev unique id 1.
+
+ $ echo 1 > /sys/class/rbd/remove
+
+
+ NOTE: The actual creation and deletion of rados objects is outside the scope
+ of this driver.
+
+ */
+
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/osd_client.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/decode.h>
+
+#include <linux/kernel.h>
+#include <linux/device.h>
+#include <linux/module.h>
+#include <linux/fs.h>
+#include <linux/blkdev.h>
+
+#include "rbd_types.h"
+
+#define DRV_NAME "rbd"
+#define DRV_NAME_LONG "rbd (rados block device)"
+
+#define RBD_MINORS_PER_MAJOR 256 /* max minors per blkdev */
+
+#define RBD_MAX_MD_NAME_LEN (96 + sizeof(RBD_SUFFIX))
+#define RBD_MAX_POOL_NAME_LEN 64
+#define RBD_MAX_SNAP_NAME_LEN 32
+#define RBD_MAX_OPT_LEN 1024
+
+#define RBD_SNAP_HEAD_NAME "-"
+
+#define DEV_NAME_LEN 32
+
+/*
+ * block device image metadata (in-memory version)
+ */
+struct rbd_image_header {
+ u64 image_size;
+ char block_name[32];
+ __u8 obj_order;
+ __u8 crypt_type;
+ __u8 comp_type;
+ struct rw_semaphore snap_rwsem;
+ struct ceph_snap_context *snapc;
+ size_t snap_names_len;
+ u64 snap_seq;
+ u32 total_snaps;
+
+ char *snap_names;
+ u64 *snap_sizes;
+};
+
+/*
+ * an instance of the client. multiple devices may share a client.
+ */
+struct rbd_client {
+ struct ceph_client *client;
+ struct kref kref;
+ struct list_head node;
+};
+
+/*
+ * a single io request
+ */
+struct rbd_request {
+ struct request *rq; /* blk layer request */
+ struct bio *bio; /* cloned bio */
+ struct page **pages; /* list of used pages */
+ u64 len;
+};
+
+/*
+ * a single device
+ */
+struct rbd_device {
+ int id; /* blkdev unique id */
+
+ int major; /* blkdev assigned major */
+ struct gendisk *disk; /* blkdev's gendisk and rq */
+ struct request_queue *q;
+
+ struct ceph_client *client;
+ struct rbd_client *rbd_client;
+
+ char name[DEV_NAME_LEN]; /* blkdev name, e.g. rbd3 */
+
+ spinlock_t lock; /* queue lock */
+
+ struct rbd_image_header header;
+ char obj[RBD_MAX_OBJ_NAME_LEN]; /* rbd image name */
+ int obj_len;
+ char obj_md_name[RBD_MAX_MD_NAME_LEN]; /* hdr nm. */
+ char pool_name[RBD_MAX_POOL_NAME_LEN];
+ int poolid;
+
+ char snap_name[RBD_MAX_SNAP_NAME_LEN];
+ u32 cur_snap; /* index+1 of current snapshot within snap context
+ 0 - for the head */
+ int read_only;
+
+ struct list_head node;
+};
+
+static spinlock_t node_lock; /* protects client get/put */
+
+static struct class *class_rbd; /* /sys/class/rbd */
+static DEFINE_MUTEX(ctl_mutex); /* Serialize open/close/setup/teardown */
+static LIST_HEAD(rbd_dev_list); /* devices */
+static LIST_HEAD(rbd_client_list); /* clients */
+
+
+static int rbd_open(struct block_device *bdev, fmode_t mode)
+{
+ struct gendisk *disk = bdev->bd_disk;
+ struct rbd_device *rbd_dev = disk->private_data;
+
+ set_device_ro(bdev, rbd_dev->read_only);
+
+ if ((mode & FMODE_WRITE) && rbd_dev->read_only)
+ return -EROFS;
+
+ return 0;
+}
+
+static const struct block_device_operations rbd_bd_ops = {
+ .owner = THIS_MODULE,
+ .open = rbd_open,
+};
+
+/*
+ * Initialize an rbd client instance.
+ * We own *opt.
+ */
+static struct rbd_client *rbd_client_create(struct ceph_options *opt)
+{
+ struct rbd_client *rbdc;
+ int ret = -ENOMEM;
+
+ dout("rbd_client_create\n");
+ rbdc = kmalloc(sizeof(struct rbd_client), GFP_KERNEL);
+ if (!rbdc)
+ goto out_opt;
+
+ kref_init(&rbdc->kref);
+ INIT_LIST_HEAD(&rbdc->node);
+
+ rbdc->client = ceph_create_client(opt, rbdc);
+ if (IS_ERR(rbdc->client))
+ goto out_rbdc;
+ opt = NULL; /* Now rbdc->client is responsible for opt */
+
+ ret = ceph_open_session(rbdc->client);
+ if (ret < 0)
+ goto out_err;
+
+ spin_lock(&node_lock);
+ list_add_tail(&rbdc->node, &rbd_client_list);
+ spin_unlock(&node_lock);
+
+ dout("rbd_client_create created %p\n", rbdc);
+ return rbdc;
+
+out_err:
+ ceph_destroy_client(rbdc->client);
+out_rbdc:
+ kfree(rbdc);
+out_opt:
+ if (opt)
+ ceph_destroy_options(opt);
+ return ERR_PTR(ret);
+}
+
+/*
+ * Find a ceph client with specific addr and configuration.
+ */
+static struct rbd_client *__rbd_client_find(struct ceph_options *opt)
+{
+ struct rbd_client *client_node;
+
+ if (opt->flags & CEPH_OPT_NOSHARE)
+ return NULL;
+
+ list_for_each_entry(client_node, &rbd_client_list, node)
+ if (ceph_compare_options(opt, client_node->client) == 0)
+ return client_node;
+ return NULL;
+}
+
+/*
+ * Get a ceph client with specific addr and configuration, if one does
+ * not exist create it.
+ */
+static int rbd_get_client(struct rbd_device *rbd_dev, const char *mon_addr,
+ char *options)
+{
+ struct rbd_client *rbdc;
+ struct ceph_options *opt;
+ int ret;
+
+ ret = ceph_parse_options(&opt, options, mon_addr,
+ mon_addr + strlen(mon_addr), NULL, NULL);
+ if (ret < 0)
+ return ret;
+
+ spin_lock(&node_lock);
+ rbdc = __rbd_client_find(opt);
+ if (rbdc) {
+ ceph_destroy_options(opt);
+
+ /* using an existing client */
+ kref_get(&rbdc->kref);
+ rbd_dev->rbd_client = rbdc;
+ rbd_dev->client = rbdc->client;
+ spin_unlock(&node_lock);
+ return 0;
+ }
+ spin_unlock(&node_lock);
+
+ rbdc = rbd_client_create(opt);
+ if (IS_ERR(rbdc))
+ return PTR_ERR(rbdc);
+
+ rbd_dev->rbd_client = rbdc;
+ rbd_dev->client = rbdc->client;
+ return 0;
+}
+
+/*
+ * Destroy ceph client
+ */
+static void rbd_client_release(struct kref *kref)
+{
+ struct rbd_client *rbdc = container_of(kref, struct rbd_client, kref);
+
+ dout("rbd_release_client %p\n", rbdc);
+ spin_lock(&node_lock);
+ list_del(&rbdc->node);
+ spin_unlock(&node_lock);
+
+ ceph_destroy_client(rbdc->client);
+ kfree(rbdc);
+}
+
+/*
+ * Drop reference to ceph client node. If it's not referenced anymore, release
+ * it.
+ */
+static void rbd_put_client(struct rbd_device *rbd_dev)
+{
+ kref_put(&rbd_dev->rbd_client->kref, rbd_client_release);
+ rbd_dev->rbd_client = NULL;
+ rbd_dev->client = NULL;
+}
+
+
+/*
+ * Create a new header structure, translate header format from the on-disk
+ * header.
+ */
+static int rbd_header_from_disk(struct rbd_image_header *header,
+ struct rbd_image_header_ondisk *ondisk,
+ int allocated_snaps,
+ gfp_t gfp_flags)
+{
+ int i;
+ u32 snap_count = le32_to_cpu(ondisk->snap_count);
+ int ret = -ENOMEM;
+
+ init_rwsem(&header->snap_rwsem);
+
+ header->snap_names_len = le64_to_cpu(ondisk->snap_names_len);
+ header->snapc = kmalloc(sizeof(struct ceph_snap_context) +
+ snap_count *
+ sizeof(struct rbd_image_snap_ondisk),
+ gfp_flags);
+ if (!header->snapc)
+ return -ENOMEM;
+ if (snap_count) {
+ header->snap_names = kmalloc(header->snap_names_len,
+ GFP_KERNEL);
+ if (!header->snap_names)
+ goto err_snapc;
+ header->snap_sizes = kmalloc(snap_count * sizeof(u64),
+ GFP_KERNEL);
+ if (!header->snap_sizes)
+ goto err_names;
+ } else {
+ header->snap_names = NULL;
+ header->snap_sizes = NULL;
+ }
+ memcpy(header->block_name, ondisk->block_name,
+ sizeof(ondisk->block_name));
+
+ header->image_size = le64_to_cpu(ondisk->image_size);
+ header->obj_order = ondisk->options.order;
+ header->crypt_type = ondisk->options.crypt_type;
+ header->comp_type = ondisk->options.comp_type;
+
+ atomic_set(&header->snapc->nref, 1);
+ header->snap_seq = le64_to_cpu(ondisk->snap_seq);
+ header->snapc->num_snaps = snap_count;
+ header->total_snaps = snap_count;
+
+ if (snap_count &&
+ allocated_snaps == snap_count) {
+ for (i = 0; i < snap_count; i++) {
+ header->snapc->snaps[i] =
+ le64_to_cpu(ondisk->snaps[i].id);
+ header->snap_sizes[i] =
+ le64_to_cpu(ondisk->snaps[i].image_size);
+ }
+
+ /* copy snapshot names */
+ memcpy(header->snap_names, &ondisk->snaps[i],
+ header->snap_names_len);
+ }
+
+ return 0;
+
+err_names:
+ kfree(header->snap_names);
+err_snapc:
+ kfree(header->snapc);
+ return ret;
+}
+
+static int snap_index(struct rbd_image_header *header, int snap_num)
+{
+ return header->total_snaps - snap_num;
+}
+
+static u64 cur_snap_id(struct rbd_device *rbd_dev)
+{
+ struct rbd_image_header *header = &rbd_dev->header;
+
+ if (!rbd_dev->cur_snap)
+ return 0;
+
+ return header->snapc->snaps[snap_index(header, rbd_dev->cur_snap)];
+}
+
+static int snap_by_name(struct rbd_image_header *header, const char *snap_name,
+ u64 *seq, u64 *size)
+{
+ int i;
+ char *p = header->snap_names;
+
+ for (i = 0; i < header->total_snaps; i++, p += strlen(p) + 1) {
+ if (strcmp(snap_name, p) == 0)
+ break;
+ }
+ if (i == header->total_snaps)
+ return -ENOENT;
+ if (seq)
+ *seq = header->snapc->snaps[i];
+
+ if (size)
+ *size = header->snap_sizes[i];
+
+ return i;
+}
+
+static int rbd_header_set_snap(struct rbd_device *dev,
+ const char *snap_name,
+ u64 *size)
+{
+ struct rbd_image_header *header = &dev->header;
+ struct ceph_snap_context *snapc = header->snapc;
+ int ret = -ENOENT;
+
+ down_write(&header->snap_rwsem);
+
+ if (!snap_name ||
+ !*snap_name ||
+ strcmp(snap_name, "-") == 0 ||
+ strcmp(snap_name, RBD_SNAP_HEAD_NAME) == 0) {
+ if (header->total_snaps)
+ snapc->seq = header->snap_seq;
+ else
+ snapc->seq = 0;
+ dev->cur_snap = 0;
+ dev->read_only = 0;
+ if (size)
+ *size = header->image_size;
+ } else {
+ ret = snap_by_name(header, snap_name, &snapc->seq, size);
+ if (ret < 0)
+ goto done;
+
+ dev->cur_snap = header->total_snaps - ret;
+ dev->read_only = 1;
+ }
+
+ ret = 0;
+done:
+ up_write(&header->snap_rwsem);
+ return ret;
+}
+
+static void rbd_header_free(struct rbd_image_header *header)
+{
+ kfree(header->snapc);
+ kfree(header->snap_names);
+ kfree(header->snap_sizes);
+}
+
+/*
+ * get the actual striped segment name, offset and length
+ */
+static u64 rbd_get_segment(struct rbd_image_header *header,
+ const char *block_name,
+ u64 ofs, u64 len,
+ char *seg_name, u64 *segofs)
+{
+ u64 seg = ofs >> header->obj_order;
+
+ if (seg_name)
+ snprintf(seg_name, RBD_MAX_SEG_NAME_LEN,
+ "%s.%012llx", block_name, seg);
+
+ ofs = ofs & ((1 << header->obj_order) - 1);
+ len = min_t(u64, len, (1 << header->obj_order) - ofs);
+
+ if (segofs)
+ *segofs = ofs;
+
+ return len;
+}
+
+/*
+ * bio helpers
+ */
+
+static void bio_chain_put(struct bio *chain)
+{
+ struct bio *tmp;
+
+ while (chain) {
+ tmp = chain;
+ chain = chain->bi_next;
+ bio_put(tmp);
+ }
+}
+
+/*
+ * zeros a bio chain, starting at specific offset
+ */
+static void zero_bio_chain(struct bio *chain, int start_ofs)
+{
+ struct bio_vec *bv;
+ unsigned long flags;
+ void *buf;
+ int i;
+ int pos = 0;
+
+ while (chain) {
+ bio_for_each_segment(bv, chain, i) {
+ if (pos + bv->bv_len > start_ofs) {
+ int remainder = max(start_ofs - pos, 0);
+ buf = bvec_kmap_irq(bv, &flags);
+ memset(buf + remainder, 0,
+ bv->bv_len - remainder);
+ bvec_kunmap_irq(buf, &flags);
+ }
+ pos += bv->bv_len;
+ }
+
+ chain = chain->bi_next;
+ }
+}
+
+/*
+ * bio_chain_clone - clone a chain of bios up to a certain length.
+ * might return a bio_pair that will need to be released.
+ */
+static struct bio *bio_chain_clone(struct bio **old, struct bio **next,
+ struct bio_pair **bp,
+ int len, gfp_t gfpmask)
+{
+ struct bio *tmp, *old_chain = *old, *new_chain = NULL, *tail = NULL;
+ int total = 0;
+
+ if (*bp) {
+ bio_pair_release(*bp);
+ *bp = NULL;
+ }
+
+ while (old_chain && (total < len)) {
+ tmp = bio_kmalloc(gfpmask, old_chain->bi_max_vecs);
+ if (!tmp)
+ goto err_out;
+
+ if (total + old_chain->bi_size > len) {
+ struct bio_pair *bp;
+
+ /*
+ * this split can only happen with a single paged bio,
+ * split_bio will BUG_ON if this is not the case
+ */
+ dout("bio_chain_clone split! total=%d remaining=%d"
+ "bi_size=%d\n",
+ (int)total, (int)len-total,
+ (int)old_chain->bi_size);
+
+ /* split the bio. We'll release it either in the next
+ call, or it will have to be released outside */
+ bp = bio_split(old_chain, (len - total) / 512ULL);
+ if (!bp)
+ goto err_out;
+
+ __bio_clone(tmp, &bp->bio1);
+
+ *next = &bp->bio2;
+ } else {
+ __bio_clone(tmp, old_chain);
+ *next = old_chain->bi_next;
+ }
+
+ tmp->bi_bdev = NULL;
+ gfpmask &= ~__GFP_WAIT;
+ tmp->bi_next = NULL;
+
+ if (!new_chain) {
+ new_chain = tail = tmp;
+ } else {
+ tail->bi_next = tmp;
+ tail = tmp;
+ }
+ old_chain = old_chain->bi_next;
+
+ total += tmp->bi_size;
+ }
+
+ BUG_ON(total < len);
+
+ if (tail)
+ tail->bi_next = NULL;
+
+ *old = old_chain;
+
+ return new_chain;
+
+err_out:
+ dout("bio_chain_clone with err\n");
+ bio_chain_put(new_chain);
+ return NULL;
+}
+
+/*
+ * helpers for osd request op vectors.
+ */
+static int rbd_create_rw_ops(struct ceph_osd_req_op **ops,
+ int num_ops,
+ int opcode,
+ u32 payload_len)
+{
+ *ops = kzalloc(sizeof(struct ceph_osd_req_op) * (num_ops + 1),
+ GFP_NOIO);
+ if (!*ops)
+ return -ENOMEM;
+ (*ops)[0].op = opcode;
+ /*
+ * op extent offset and length will be set later on
+ * in calc_raw_layout()
+ */
+ (*ops)[0].payload_len = payload_len;
+ return 0;
+}
+
+static void rbd_destroy_ops(struct ceph_osd_req_op *ops)
+{
+ kfree(ops);
+}
+
+/*
+ * Send ceph osd request
+ */
+static int rbd_do_request(struct request *rq,
+ struct rbd_device *dev,
+ struct ceph_snap_context *snapc,
+ u64 snapid,
+ const char *obj, u64 ofs, u64 len,
+ struct bio *bio,
+ struct page **pages,
+ int num_pages,
+ int flags,
+ struct ceph_osd_req_op *ops,
+ int num_reply,
+ void (*rbd_cb)(struct ceph_osd_request *req,
+ struct ceph_msg *msg))
+{
+ struct ceph_osd_request *req;
+ struct ceph_file_layout *layout;
+ int ret;
+ u64 bno;
+ struct timespec mtime = CURRENT_TIME;
+ struct rbd_request *req_data;
+ struct ceph_osd_request_head *reqhead;
+ struct rbd_image_header *header = &dev->header;
+
+ ret = -ENOMEM;
+ req_data = kzalloc(sizeof(*req_data), GFP_NOIO);
+ if (!req_data)
+ goto done;
+
+ dout("rbd_do_request len=%lld ofs=%lld\n", len, ofs);
+
+ down_read(&header->snap_rwsem);
+
+ req = ceph_osdc_alloc_request(&dev->client->osdc, flags,
+ snapc,
+ ops,
+ false,
+ GFP_NOIO, pages, bio);
+ if (IS_ERR(req)) {
+ up_read(&header->snap_rwsem);
+ ret = PTR_ERR(req);
+ goto done_pages;
+ }
+
+ req->r_callback = rbd_cb;
+
+ req_data->rq = rq;
+ req_data->bio = bio;
+ req_data->pages = pages;
+ req_data->len = len;
+
+ req->r_priv = req_data;
+
+ reqhead = req->r_request->front.iov_base;
+ reqhead->snapid = cpu_to_le64(CEPH_NOSNAP);
+
+ strncpy(req->r_oid, obj, sizeof(req->r_oid));
+ req->r_oid_len = strlen(req->r_oid);
+
+ layout = &req->r_file_layout;
+ memset(layout, 0, sizeof(*layout));
+ layout->fl_stripe_unit = cpu_to_le32(1 << RBD_MAX_OBJ_ORDER);
+ layout->fl_stripe_count = cpu_to_le32(1);
+ layout->fl_object_size = cpu_to_le32(1 << RBD_MAX_OBJ_ORDER);
+ layout->fl_pg_preferred = cpu_to_le32(-1);
+ layout->fl_pg_pool = cpu_to_le32(dev->poolid);
+ ceph_calc_raw_layout(&dev->client->osdc, layout, snapid,
+ ofs, &len, &bno, req, ops);
+
+ ceph_osdc_build_request(req, ofs, &len,
+ ops,
+ snapc,
+ &mtime,
+ req->r_oid, req->r_oid_len);
+ up_read(&header->snap_rwsem);
+
+ ret = ceph_osdc_start_request(&dev->client->osdc, req, false);
+ if (ret < 0)
+ goto done_err;
+
+ if (!rbd_cb) {
+ ret = ceph_osdc_wait_request(&dev->client->osdc, req);
+ ceph_osdc_put_request(req);
+ }
+ return ret;
+
+done_err:
+ bio_chain_put(req_data->bio);
+ ceph_osdc_put_request(req);
+done_pages:
+ kfree(req_data);
+done:
+ if (rq)
+ blk_end_request(rq, ret, len);
+ return ret;
+}
+
+/*
+ * Ceph osd op callback
+ */
+static void rbd_req_cb(struct ceph_osd_request *req, struct ceph_msg *msg)
+{
+ struct rbd_request *req_data = req->r_priv;
+ struct ceph_osd_reply_head *replyhead;
+ struct ceph_osd_op *op;
+ __s32 rc;
+ u64 bytes;
+ int read_op;
+
+ /* parse reply */
+ replyhead = msg->front.iov_base;
+ WARN_ON(le32_to_cpu(replyhead->num_ops) == 0);
+ op = (void *)(replyhead + 1);
+ rc = le32_to_cpu(replyhead->result);
+ bytes = le64_to_cpu(op->extent.length);
+ read_op = (le32_to_cpu(op->op) == CEPH_OSD_OP_READ);
+
+ dout("rbd_req_cb bytes=%lld readop=%d rc=%d\n", bytes, read_op, rc);
+
+ if (rc == -ENOENT && read_op) {
+ zero_bio_chain(req_data->bio, 0);
+ rc = 0;
+ } else if (rc == 0 && read_op && bytes < req_data->len) {
+ zero_bio_chain(req_data->bio, bytes);
+ bytes = req_data->len;
+ }
+
+ blk_end_request(req_data->rq, rc, bytes);
+
+ if (req_data->bio)
+ bio_chain_put(req_data->bio);
+
+ ceph_osdc_put_request(req);
+ kfree(req_data);
+}
+
+/*
+ * Do a synchronous ceph osd operation
+ */
+static int rbd_req_sync_op(struct rbd_device *dev,
+ struct ceph_snap_context *snapc,
+ u64 snapid,
+ int opcode,
+ int flags,
+ struct ceph_osd_req_op *orig_ops,
+ int num_reply,
+ const char *obj,
+ u64 ofs, u64 len,
+ char *buf)
+{
+ int ret;
+ struct page **pages;
+ int num_pages;
+ struct ceph_osd_req_op *ops = orig_ops;
+ u32 payload_len;
+
+ num_pages = calc_pages_for(ofs , len);
+ pages = ceph_alloc_page_vector(num_pages, GFP_KERNEL);
+ if (IS_ERR(pages))
+ return PTR_ERR(pages);
+
+ if (!orig_ops) {
+ payload_len = (flags & CEPH_OSD_FLAG_WRITE ? len : 0);
+ ret = rbd_create_rw_ops(&ops, 1, opcode, payload_len);
+ if (ret < 0)
+ goto done;
+
+ if ((flags & CEPH_OSD_FLAG_WRITE) && buf) {
+ ret = ceph_copy_to_page_vector(pages, buf, ofs, len);
+ if (ret < 0)
+ goto done_ops;
+ }
+ }
+
+ ret = rbd_do_request(NULL, dev, snapc, snapid,
+ obj, ofs, len, NULL,
+ pages, num_pages,
+ flags,
+ ops,
+ 2,
+ NULL);
+ if (ret < 0)
+ goto done_ops;
+
+ if ((flags & CEPH_OSD_FLAG_READ) && buf)
+ ret = ceph_copy_from_page_vector(pages, buf, ofs, ret);
+
+done_ops:
+ if (!orig_ops)
+ rbd_destroy_ops(ops);
+done:
+ ceph_release_page_vector(pages, num_pages);
+ return ret;
+}
+
+/*
+ * Do an asynchronous ceph osd operation
+ */
+static int rbd_do_op(struct request *rq,
+ struct rbd_device *rbd_dev ,
+ struct ceph_snap_context *snapc,
+ u64 snapid,
+ int opcode, int flags, int num_reply,
+ u64 ofs, u64 len,
+ struct bio *bio)
+{
+ char *seg_name;
+ u64 seg_ofs;
+ u64 seg_len;
+ int ret;
+ struct ceph_osd_req_op *ops;
+ u32 payload_len;
+
+ seg_name = kmalloc(RBD_MAX_SEG_NAME_LEN + 1, GFP_NOIO);
+ if (!seg_name)
+ return -ENOMEM;
+
+ seg_len = rbd_get_segment(&rbd_dev->header,
+ rbd_dev->header.block_name,
+ ofs, len,
+ seg_name, &seg_ofs);
+
+ payload_len = (flags & CEPH_OSD_FLAG_WRITE ? seg_len : 0);
+
+ ret = rbd_create_rw_ops(&ops, 1, opcode, payload_len);
+ if (ret < 0)
+ goto done;
+
+ /* we've taken care of segment sizes earlier when we
+ cloned the bios. We should never have a segment
+ truncated at this point */
+ BUG_ON(seg_len < len);
+
+ ret = rbd_do_request(rq, rbd_dev, snapc, snapid,
+ seg_name, seg_ofs, seg_len,
+ bio,
+ NULL, 0,
+ flags,
+ ops,
+ num_reply,
+ rbd_req_cb);
+done:
+ kfree(seg_name);
+ return ret;
+}
+
+/*
+ * Request async osd write
+ */
+static int rbd_req_write(struct request *rq,
+ struct rbd_device *rbd_dev,
+ struct ceph_snap_context *snapc,
+ u64 ofs, u64 len,
+ struct bio *bio)
+{
+ return rbd_do_op(rq, rbd_dev, snapc, CEPH_NOSNAP,
+ CEPH_OSD_OP_WRITE,
+ CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
+ 2,
+ ofs, len, bio);
+}
+
+/*
+ * Request async osd read
+ */
+static int rbd_req_read(struct request *rq,
+ struct rbd_device *rbd_dev,
+ u64 snapid,
+ u64 ofs, u64 len,
+ struct bio *bio)
+{
+ return rbd_do_op(rq, rbd_dev, NULL,
+ (snapid ? snapid : CEPH_NOSNAP),
+ CEPH_OSD_OP_READ,
+ CEPH_OSD_FLAG_READ,
+ 2,
+ ofs, len, bio);
+}
+
+/*
+ * Request sync osd read
+ */
+static int rbd_req_sync_read(struct rbd_device *dev,
+ struct ceph_snap_context *snapc,
+ u64 snapid,
+ const char *obj,
+ u64 ofs, u64 len,
+ char *buf)
+{
+ return rbd_req_sync_op(dev, NULL,
+ (snapid ? snapid : CEPH_NOSNAP),
+ CEPH_OSD_OP_READ,
+ CEPH_OSD_FLAG_READ,
+ NULL,
+ 1, obj, ofs, len, buf);
+}
+
+/*
+ * Request sync osd read
+ */
+static int rbd_req_sync_rollback_obj(struct rbd_device *dev,
+ u64 snapid,
+ const char *obj)
+{
+ struct ceph_osd_req_op *ops;
+ int ret = rbd_create_rw_ops(&ops, 1, CEPH_OSD_OP_ROLLBACK, 0);
+ if (ret < 0)
+ return ret;
+
+ ops[0].snap.snapid = snapid;
+
+ ret = rbd_req_sync_op(dev, NULL,
+ CEPH_NOSNAP,
+ 0,
+ CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
+ ops,
+ 1, obj, 0, 0, NULL);
+
+ rbd_destroy_ops(ops);
+
+ if (ret < 0)
+ return ret;
+
+ return ret;
+}
+
+/*
+ * Request sync osd read
+ */
+static int rbd_req_sync_exec(struct rbd_device *dev,
+ const char *obj,
+ const char *cls,
+ const char *method,
+ const char *data,
+ int len)
+{
+ struct ceph_osd_req_op *ops;
+ int cls_len = strlen(cls);
+ int method_len = strlen(method);
+ int ret = rbd_create_rw_ops(&ops, 1, CEPH_OSD_OP_CALL,
+ cls_len + method_len + len);
+ if (ret < 0)
+ return ret;
+
+ ops[0].cls.class_name = cls;
+ ops[0].cls.class_len = (__u8)cls_len;
+ ops[0].cls.method_name = method;
+ ops[0].cls.method_len = (__u8)method_len;
+ ops[0].cls.argc = 0;
+ ops[0].cls.indata = data;
+ ops[0].cls.indata_len = len;
+
+ ret = rbd_req_sync_op(dev, NULL,
+ CEPH_NOSNAP,
+ 0,
+ CEPH_OSD_FLAG_WRITE | CEPH_OSD_FLAG_ONDISK,
+ ops,
+ 1, obj, 0, 0, NULL);
+
+ rbd_destroy_ops(ops);
+
+ dout("cls_exec returned %d\n", ret);
+ return ret;
+}
+
+/*
+ * block device queue callback
+ */
+static void rbd_rq_fn(struct request_queue *q)
+{
+ struct rbd_device *rbd_dev = q->queuedata;
+ struct request *rq;
+ struct bio_pair *bp = NULL;
+
+ rq = blk_fetch_request(q);
+
+ while (1) {
+ struct bio *bio;
+ struct bio *rq_bio, *next_bio = NULL;
+ bool do_write;
+ int size, op_size = 0;
+ u64 ofs;
+
+ /* peek at request from block layer */
+ if (!rq)
+ break;
+
+ dout("fetched request\n");
+
+ /* filter out block requests we don't understand */
+ if ((rq->cmd_type != REQ_TYPE_FS)) {
+ __blk_end_request_all(rq, 0);
+ goto next;
+ }
+
+ /* deduce our operation (read, write) */
+ do_write = (rq_data_dir(rq) == WRITE);
+
+ size = blk_rq_bytes(rq);
+ ofs = blk_rq_pos(rq) * 512ULL;
+ rq_bio = rq->bio;
+ if (do_write && rbd_dev->read_only) {
+ __blk_end_request_all(rq, -EROFS);
+ goto next;
+ }
+
+ spin_unlock_irq(q->queue_lock);
+
+ dout("%s 0x%x bytes at 0x%llx\n",
+ do_write ? "write" : "read",
+ size, blk_rq_pos(rq) * 512ULL);
+
+ do {
+ /* a bio clone to be passed down to OSD req */
+ dout("rq->bio->bi_vcnt=%d\n", rq->bio->bi_vcnt);
+ op_size = rbd_get_segment(&rbd_dev->header,
+ rbd_dev->header.block_name,
+ ofs, size,
+ NULL, NULL);
+ bio = bio_chain_clone(&rq_bio, &next_bio, &bp,
+ op_size, GFP_ATOMIC);
+ if (!bio) {
+ spin_lock_irq(q->queue_lock);
+ __blk_end_request_all(rq, -ENOMEM);
+ goto next;
+ }
+
+ /* init OSD command: write or read */
+ if (do_write)
+ rbd_req_write(rq, rbd_dev,
+ rbd_dev->header.snapc,
+ ofs,
+ op_size, bio);
+ else
+ rbd_req_read(rq, rbd_dev,
+ cur_snap_id(rbd_dev),
+ ofs,
+ op_size, bio);
+
+ size -= op_size;
+ ofs += op_size;
+
+ rq_bio = next_bio;
+ } while (size > 0);
+
+ if (bp)
+ bio_pair_release(bp);
+
+ spin_lock_irq(q->queue_lock);
+next:
+ rq = blk_fetch_request(q);
+ }
+}
+
+/*
+ * a queue callback. Makes sure that we don't create a bio that spans across
+ * multiple osd objects. One exception would be with a single page bios,
+ * which we handle later at bio_chain_clone
+ */
+static int rbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bmd,
+ struct bio_vec *bvec)
+{
+ struct rbd_device *rbd_dev = q->queuedata;
+ unsigned int chunk_sectors = 1 << (rbd_dev->header.obj_order - 9);
+ sector_t sector = bmd->bi_sector + get_start_sect(bmd->bi_bdev);
+ unsigned int bio_sectors = bmd->bi_size >> 9;
+ int max;
+
+ max = (chunk_sectors - ((sector & (chunk_sectors - 1))
+ + bio_sectors)) << 9;
+ if (max < 0)
+ max = 0; /* bio_add cannot handle a negative return */
+ if (max <= bvec->bv_len && bio_sectors == 0)
+ return bvec->bv_len;
+ return max;
+}
+
+static void rbd_free_disk(struct rbd_device *rbd_dev)
+{
+ struct gendisk *disk = rbd_dev->disk;
+
+ if (!disk)
+ return;
+
+ rbd_header_free(&rbd_dev->header);
+
+ if (disk->flags & GENHD_FL_UP)
+ del_gendisk(disk);
+ if (disk->queue)
+ blk_cleanup_queue(disk->queue);
+ put_disk(disk);
+}
+
+/*
+ * reload the ondisk the header
+ */
+static int rbd_read_header(struct rbd_device *rbd_dev,
+ struct rbd_image_header *header)
+{
+ ssize_t rc;
+ struct rbd_image_header_ondisk *dh;
+ int snap_count = 0;
+ u64 snap_names_len = 0;
+
+ while (1) {
+ int len = sizeof(*dh) +
+ snap_count * sizeof(struct rbd_image_snap_ondisk) +
+ snap_names_len;
+
+ rc = -ENOMEM;
+ dh = kmalloc(len, GFP_KERNEL);
+ if (!dh)
+ return -ENOMEM;
+
+ rc = rbd_req_sync_read(rbd_dev,
+ NULL, CEPH_NOSNAP,
+ rbd_dev->obj_md_name,
+ 0, len,
+ (char *)dh);
+ if (rc < 0)
+ goto out_dh;
+
+ rc = rbd_header_from_disk(header, dh, snap_count, GFP_KERNEL);
+ if (rc < 0)
+ goto out_dh;
+
+ if (snap_count != header->total_snaps) {
+ snap_count = header->total_snaps;
+ snap_names_len = header->snap_names_len;
+ rbd_header_free(header);
+ kfree(dh);
+ continue;
+ }
+ break;
+ }
+
+out_dh:
+ kfree(dh);
+ return rc;
+}
+
+/*
+ * create a snapshot
+ */
+static int rbd_header_add_snap(struct rbd_device *dev,
+ const char *snap_name,
+ gfp_t gfp_flags)
+{
+ int name_len = strlen(snap_name);
+ u64 new_snapid;
+ int ret;
+ void *data, *data_start, *data_end;
+
+ /* we should create a snapshot only if we're pointing at the head */
+ if (dev->cur_snap)
+ return -EINVAL;
+
+ ret = ceph_monc_create_snapid(&dev->client->monc, dev->poolid,
+ &new_snapid);
+ dout("created snapid=%lld\n", new_snapid);
+ if (ret < 0)
+ return ret;
+
+ data = kmalloc(name_len + 16, gfp_flags);
+ if (!data)
+ return -ENOMEM;
+
+ data_start = data;
+ data_end = data + name_len + 16;
+
+ ceph_encode_string_safe(&data, data_end, snap_name, name_len, bad);
+ ceph_encode_64_safe(&data, data_end, new_snapid, bad);
+
+ ret = rbd_req_sync_exec(dev, dev->obj_md_name, "rbd", "snap_add",
+ data_start, data - data_start);
+
+ kfree(data_start);
+
+ if (ret < 0)
+ return ret;
+
+ dev->header.snapc->seq = new_snapid;
+
+ return 0;
+bad:
+ return -ERANGE;
+}
+
+/*
+ * only read the first part of the ondisk header, without the snaps info
+ */
+static int rbd_update_snaps(struct rbd_device *rbd_dev)
+{
+ int ret;
+ struct rbd_image_header h;
+ u64 snap_seq;
+
+ ret = rbd_read_header(rbd_dev, &h);
+ if (ret < 0)
+ return ret;
+
+ down_write(&rbd_dev->header.snap_rwsem);
+
+ snap_seq = rbd_dev->header.snapc->seq;
+
+ kfree(rbd_dev->header.snapc);
+ kfree(rbd_dev->header.snap_names);
+ kfree(rbd_dev->header.snap_sizes);
+
+ rbd_dev->header.total_snaps = h.total_snaps;
+ rbd_dev->header.snapc = h.snapc;
+ rbd_dev->header.snap_names = h.snap_names;
+ rbd_dev->header.snap_sizes = h.snap_sizes;
+ rbd_dev->header.snapc->seq = snap_seq;
+
+ up_write(&rbd_dev->header.snap_rwsem);
+
+ return 0;
+}
+
+static int rbd_init_disk(struct rbd_device *rbd_dev)
+{
+ struct gendisk *disk;
+ struct request_queue *q;
+ int rc;
+ u64 total_size = 0;
+
+ /* contact OSD, request size info about the object being mapped */
+ rc = rbd_read_header(rbd_dev, &rbd_dev->header);
+ if (rc)
+ return rc;
+
+ rc = rbd_header_set_snap(rbd_dev, rbd_dev->snap_name, &total_size);
+ if (rc)
+ return rc;
+
+ /* create gendisk info */
+ rc = -ENOMEM;
+ disk = alloc_disk(RBD_MINORS_PER_MAJOR);
+ if (!disk)
+ goto out;
+
+ sprintf(disk->disk_name, DRV_NAME "%d", rbd_dev->id);
+ disk->major = rbd_dev->major;
+ disk->first_minor = 0;
+ disk->fops = &rbd_bd_ops;
+ disk->private_data = rbd_dev;
+
+ /* init rq */
+ rc = -ENOMEM;
+ q = blk_init_queue(rbd_rq_fn, &rbd_dev->lock);
+ if (!q)
+ goto out_disk;
+ blk_queue_merge_bvec(q, rbd_merge_bvec);
+ disk->queue = q;
+
+ q->queuedata = rbd_dev;
+
+ rbd_dev->disk = disk;
+ rbd_dev->q = q;
+
+ /* finally, announce the disk to the world */
+ set_capacity(disk, total_size / 512ULL);
+ add_disk(disk);
+
+ pr_info("%s: added with size 0x%llx\n",
+ disk->disk_name, (unsigned long long)total_size);
+ return 0;
+
+out_disk:
+ put_disk(disk);
+out:
+ return rc;
+}
+
+/********************************************************************
+ * /sys/class/rbd/
+ * add map rados objects to blkdev
+ * remove unmap rados objects
+ * list show mappings
+ *******************************************************************/
+
+static void class_rbd_release(struct class *cls)
+{
+ kfree(cls);
+}
+
+static ssize_t class_rbd_list(struct class *c,
+ struct class_attribute *attr,
+ char *data)
+{
+ int n = 0;
+ struct list_head *tmp;
+ int max = PAGE_SIZE;
+
+ mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+ n += snprintf(data, max,
+ "#id\tmajor\tclient_name\tpool\tname\tsnap\tKB\n");
+
+ list_for_each(tmp, &rbd_dev_list) {
+ struct rbd_device *rbd_dev;
+
+ rbd_dev = list_entry(tmp, struct rbd_device, node);
+ n += snprintf(data+n, max-n,
+ "%d\t%d\tclient%lld\t%s\t%s\t%s\t%lld\n",
+ rbd_dev->id,
+ rbd_dev->major,
+ ceph_client_id(rbd_dev->client),
+ rbd_dev->pool_name,
+ rbd_dev->obj, rbd_dev->snap_name,
+ rbd_dev->header.image_size >> 10);
+ if (n == max)
+ break;
+ }
+
+ mutex_unlock(&ctl_mutex);
+ return n;
+}
+
+static ssize_t class_rbd_add(struct class *c,
+ struct class_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct ceph_osd_client *osdc;
+ struct rbd_device *rbd_dev;
+ ssize_t rc = -ENOMEM;
+ int irc, new_id = 0;
+ struct list_head *tmp;
+ char *mon_dev_name;
+ char *options;
+
+ if (!try_module_get(THIS_MODULE))
+ return -ENODEV;
+
+ mon_dev_name = kmalloc(RBD_MAX_OPT_LEN, GFP_KERNEL);
+ if (!mon_dev_name)
+ goto err_out_mod;
+
+ options = kmalloc(RBD_MAX_OPT_LEN, GFP_KERNEL);
+ if (!options)
+ goto err_mon_dev;
+
+ /* new rbd_device object */
+ rbd_dev = kzalloc(sizeof(*rbd_dev), GFP_KERNEL);
+ if (!rbd_dev)
+ goto err_out_opt;
+
+ /* static rbd_device initialization */
+ spin_lock_init(&rbd_dev->lock);
+ INIT_LIST_HEAD(&rbd_dev->node);
+
+ /* generate unique id: find highest unique id, add one */
+ mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+ list_for_each(tmp, &rbd_dev_list) {
+ struct rbd_device *rbd_dev;
+
+ rbd_dev = list_entry(tmp, struct rbd_device, node);
+ if (rbd_dev->id >= new_id)
+ new_id = rbd_dev->id + 1;
+ }
+
+ rbd_dev->id = new_id;
+
+ /* add to global list */
+ list_add_tail(&rbd_dev->node, &rbd_dev_list);
+
+ /* parse add command */
+ if (sscanf(buf, "%" __stringify(RBD_MAX_OPT_LEN) "s "
+ "%" __stringify(RBD_MAX_OPT_LEN) "s "
+ "%" __stringify(RBD_MAX_POOL_NAME_LEN) "s "
+ "%" __stringify(RBD_MAX_OBJ_NAME_LEN) "s"
+ "%" __stringify(RBD_MAX_SNAP_NAME_LEN) "s",
+ mon_dev_name, options, rbd_dev->pool_name,
+ rbd_dev->obj, rbd_dev->snap_name) < 4) {
+ rc = -EINVAL;
+ goto err_out_slot;
+ }
+
+ if (rbd_dev->snap_name[0] == 0)
+ rbd_dev->snap_name[0] = '-';
+
+ rbd_dev->obj_len = strlen(rbd_dev->obj);
+ snprintf(rbd_dev->obj_md_name, sizeof(rbd_dev->obj_md_name), "%s%s",
+ rbd_dev->obj, RBD_SUFFIX);
+
+ /* initialize rest of new object */
+ snprintf(rbd_dev->name, DEV_NAME_LEN, DRV_NAME "%d", rbd_dev->id);
+ rc = rbd_get_client(rbd_dev, mon_dev_name, options);
+ if (rc < 0)
+ goto err_out_slot;
+
+ mutex_unlock(&ctl_mutex);
+
+ /* pick the pool */
+ osdc = &rbd_dev->client->osdc;
+ rc = ceph_pg_poolid_by_name(osdc->osdmap, rbd_dev->pool_name);
+ if (rc < 0)
+ goto err_out_client;
+ rbd_dev->poolid = rc;
+
+ /* register our block device */
+ irc = register_blkdev(0, rbd_dev->name);
+ if (irc < 0) {
+ rc = irc;
+ goto err_out_client;
+ }
+ rbd_dev->major = irc;
+
+ /* set up and announce blkdev mapping */
+ rc = rbd_init_disk(rbd_dev);
+ if (rc)
+ goto err_out_blkdev;
+
+ return count;
+
+err_out_blkdev:
+ unregister_blkdev(rbd_dev->major, rbd_dev->name);
+err_out_client:
+ rbd_put_client(rbd_dev);
+ mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+err_out_slot:
+ list_del_init(&rbd_dev->node);
+ mutex_unlock(&ctl_mutex);
+
+ kfree(rbd_dev);
+err_out_opt:
+ kfree(options);
+err_mon_dev:
+ kfree(mon_dev_name);
+err_out_mod:
+ dout("Error adding device %s\n", buf);
+ module_put(THIS_MODULE);
+ return rc;
+}
+
+static struct rbd_device *__rbd_get_dev(unsigned long id)
+{
+ struct list_head *tmp;
+ struct rbd_device *rbd_dev;
+
+ list_for_each(tmp, &rbd_dev_list) {
+ rbd_dev = list_entry(tmp, struct rbd_device, node);
+ if (rbd_dev->id == id)
+ return rbd_dev;
+ }
+ return NULL;
+}
+
+static ssize_t class_rbd_remove(struct class *c,
+ struct class_attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct rbd_device *rbd_dev = NULL;
+ int target_id, rc;
+ unsigned long ul;
+
+ rc = strict_strtoul(buf, 10, &ul);
+ if (rc)
+ return rc;
+
+ /* convert to int; abort if we lost anything in the conversion */
+ target_id = (int) ul;
+ if (target_id != ul)
+ return -EINVAL;
+
+ /* remove object from list immediately */
+ mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+ rbd_dev = __rbd_get_dev(target_id);
+ if (rbd_dev)
+ list_del_init(&rbd_dev->node);
+
+ mutex_unlock(&ctl_mutex);
+
+ if (!rbd_dev)
+ return -ENOENT;
+
+ rbd_put_client(rbd_dev);
+
+ /* clean up and free blkdev */
+ rbd_free_disk(rbd_dev);
+ unregister_blkdev(rbd_dev->major, rbd_dev->name);
+ kfree(rbd_dev);
+
+ /* release module ref */
+ module_put(THIS_MODULE);
+
+ return count;
+}
+
+static ssize_t class_rbd_snaps_list(struct class *c,
+ struct class_attribute *attr,
+ char *data)
+{
+ struct rbd_device *rbd_dev = NULL;
+ struct list_head *tmp;
+ struct rbd_image_header *header;
+ int i, n = 0, max = PAGE_SIZE;
+ int ret;
+
+ mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+ n += snprintf(data, max, "#id\tsnap\tKB\n");
+
+ list_for_each(tmp, &rbd_dev_list) {
+ char *names, *p;
+ struct ceph_snap_context *snapc;
+
+ rbd_dev = list_entry(tmp, struct rbd_device, node);
+ header = &rbd_dev->header;
+
+ down_read(&header->snap_rwsem);
+
+ names = header->snap_names;
+ snapc = header->snapc;
+
+ n += snprintf(data + n, max - n, "%d\t%s\t%lld%s\n",
+ rbd_dev->id, RBD_SNAP_HEAD_NAME,
+ header->image_size >> 10,
+ (!rbd_dev->cur_snap ? " (*)" : ""));
+ if (n == max)
+ break;
+
+ p = names;
+ for (i = 0; i < header->total_snaps; i++, p += strlen(p) + 1) {
+ n += snprintf(data + n, max - n, "%d\t%s\t%lld%s\n",
+ rbd_dev->id, p, header->snap_sizes[i] >> 10,
+ (rbd_dev->cur_snap &&
+ (snap_index(header, i) == rbd_dev->cur_snap) ?
+ " (*)" : ""));
+ if (n == max)
+ break;
+ }
+
+ up_read(&header->snap_rwsem);
+ }
+
+
+ ret = n;
+ mutex_unlock(&ctl_mutex);
+ return ret;
+}
+
+static ssize_t class_rbd_snaps_refresh(struct class *c,
+ struct class_attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct rbd_device *rbd_dev = NULL;
+ int target_id, rc;
+ unsigned long ul;
+ int ret = count;
+
+ rc = strict_strtoul(buf, 10, &ul);
+ if (rc)
+ return rc;
+
+ /* convert to int; abort if we lost anything in the conversion */
+ target_id = (int) ul;
+ if (target_id != ul)
+ return -EINVAL;
+
+ mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+ rbd_dev = __rbd_get_dev(target_id);
+ if (!rbd_dev) {
+ ret = -ENOENT;
+ goto done;
+ }
+
+ rc = rbd_update_snaps(rbd_dev);
+ if (rc < 0)
+ ret = rc;
+
+done:
+ mutex_unlock(&ctl_mutex);
+ return ret;
+}
+
+static ssize_t class_rbd_snap_create(struct class *c,
+ struct class_attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct rbd_device *rbd_dev = NULL;
+ int target_id, ret;
+ char *name;
+
+ name = kmalloc(RBD_MAX_SNAP_NAME_LEN + 1, GFP_KERNEL);
+ if (!name)
+ return -ENOMEM;
+
+ /* parse snaps add command */
+ if (sscanf(buf, "%d "
+ "%" __stringify(RBD_MAX_SNAP_NAME_LEN) "s",
+ &target_id,
+ name) != 2) {
+ ret = -EINVAL;
+ goto done;
+ }
+
+ mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+ rbd_dev = __rbd_get_dev(target_id);
+ if (!rbd_dev) {
+ ret = -ENOENT;
+ goto done_unlock;
+ }
+
+ ret = rbd_header_add_snap(rbd_dev,
+ name, GFP_KERNEL);
+ if (ret < 0)
+ goto done_unlock;
+
+ ret = rbd_update_snaps(rbd_dev);
+ if (ret < 0)
+ goto done_unlock;
+
+ ret = count;
+done_unlock:
+ mutex_unlock(&ctl_mutex);
+done:
+ kfree(name);
+ return ret;
+}
+
+static ssize_t class_rbd_rollback(struct class *c,
+ struct class_attribute *attr,
+ const char *buf,
+ size_t count)
+{
+ struct rbd_device *rbd_dev = NULL;
+ int target_id, ret;
+ u64 snapid;
+ char snap_name[RBD_MAX_SNAP_NAME_LEN];
+ u64 cur_ofs;
+ char *seg_name;
+
+ /* parse snaps add command */
+ if (sscanf(buf, "%d "
+ "%" __stringify(RBD_MAX_SNAP_NAME_LEN) "s",
+ &target_id,
+ snap_name) != 2) {
+ return -EINVAL;
+ }
+
+ ret = -ENOMEM;
+ seg_name = kmalloc(RBD_MAX_SEG_NAME_LEN + 1, GFP_NOIO);
+ if (!seg_name)
+ return ret;
+
+ mutex_lock_nested(&ctl_mutex, SINGLE_DEPTH_NESTING);
+
+ rbd_dev = __rbd_get_dev(target_id);
+ if (!rbd_dev) {
+ ret = -ENOENT;
+ goto done_unlock;
+ }
+
+ ret = snap_by_name(&rbd_dev->header, snap_name, &snapid, NULL);
+ if (ret < 0)
+ goto done_unlock;
+
+ dout("snapid=%lld\n", snapid);
+
+ cur_ofs = 0;
+ while (cur_ofs < rbd_dev->header.image_size) {
+ cur_ofs += rbd_get_segment(&rbd_dev->header,
+ rbd_dev->obj,
+ cur_ofs, (u64)-1,
+ seg_name, NULL);
+ dout("seg_name=%s\n", seg_name);
+
+ ret = rbd_req_sync_rollback_obj(rbd_dev, snapid, seg_name);
+ if (ret < 0)
+ pr_warning("could not roll back obj %s err=%d\n",
+ seg_name, ret);
+ }
+
+ ret = rbd_update_snaps(rbd_dev);
+ if (ret < 0)
+ goto done_unlock;
+
+ ret = count;
+
+done_unlock:
+ mutex_unlock(&ctl_mutex);
+ kfree(seg_name);
+
+ return ret;
+}
+
+static struct class_attribute class_rbd_attrs[] = {
+ __ATTR(add, 0200, NULL, class_rbd_add),
+ __ATTR(remove, 0200, NULL, class_rbd_remove),
+ __ATTR(list, 0444, class_rbd_list, NULL),
+ __ATTR(snaps_refresh, 0200, NULL, class_rbd_snaps_refresh),
+ __ATTR(snap_create, 0200, NULL, class_rbd_snap_create),
+ __ATTR(snaps_list, 0444, class_rbd_snaps_list, NULL),
+ __ATTR(snap_rollback, 0200, NULL, class_rbd_rollback),
+ __ATTR_NULL
+};
+
+/*
+ * create control files in sysfs
+ * /sys/class/rbd/...
+ */
+static int rbd_sysfs_init(void)
+{
+ int ret = -ENOMEM;
+
+ class_rbd = kzalloc(sizeof(*class_rbd), GFP_KERNEL);
+ if (!class_rbd)
+ goto out;
+
+ class_rbd->name = DRV_NAME;
+ class_rbd->owner = THIS_MODULE;
+ class_rbd->class_release = class_rbd_release;
+ class_rbd->class_attrs = class_rbd_attrs;
+
+ ret = class_register(class_rbd);
+ if (ret)
+ goto out_class;
+ return 0;
+
+out_class:
+ kfree(class_rbd);
+ class_rbd = NULL;
+ pr_err(DRV_NAME ": failed to create class rbd\n");
+out:
+ return ret;
+}
+
+static void rbd_sysfs_cleanup(void)
+{
+ if (class_rbd)
+ class_destroy(class_rbd);
+ class_rbd = NULL;
+}
+
+int __init rbd_init(void)
+{
+ int rc;
+
+ rc = rbd_sysfs_init();
+ if (rc)
+ return rc;
+ spin_lock_init(&node_lock);
+ pr_info("loaded " DRV_NAME_LONG "\n");
+ return 0;
+}
+
+void __exit rbd_exit(void)
+{
+ rbd_sysfs_cleanup();
+}
+
+module_init(rbd_init);
+module_exit(rbd_exit);
+
+MODULE_AUTHOR("Sage Weil <sage@newdream.net>");
+MODULE_AUTHOR("Yehuda Sadeh <yehuda@hq.newdream.net>");
+MODULE_DESCRIPTION("rados block device");
+
+/* following authorship retained from original osdblk.c */
+MODULE_AUTHOR("Jeff Garzik <jeff@garzik.org>");
+
+MODULE_LICENSE("GPL");
--- /dev/null
+/*
+ * Ceph - scalable distributed file system
+ *
+ * Copyright (C) 2004-2010 Sage Weil <sage@newdream.net>
+ *
+ * This is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License version 2.1, as published by the Free Software
+ * Foundation. See file COPYING.
+ *
+ */
+
+#ifndef CEPH_RBD_TYPES_H
+#define CEPH_RBD_TYPES_H
+
+#include <linux/types.h>
+
+/*
+ * rbd image 'foo' consists of objects
+ * foo.rbd - image metadata
+ * foo.00000000
+ * foo.00000001
+ * ... - data
+ */
+
+#define RBD_SUFFIX ".rbd"
+#define RBD_DIRECTORY "rbd_directory"
+#define RBD_INFO "rbd_info"
+
+#define RBD_DEFAULT_OBJ_ORDER 22 /* 4MB */
+#define RBD_MIN_OBJ_ORDER 16
+#define RBD_MAX_OBJ_ORDER 30
+
+#define RBD_MAX_OBJ_NAME_LEN 96
+#define RBD_MAX_SEG_NAME_LEN 128
+
+#define RBD_COMP_NONE 0
+#define RBD_CRYPT_NONE 0
+
+#define RBD_HEADER_TEXT "<<< Rados Block Device Image >>>\n"
+#define RBD_HEADER_SIGNATURE "RBD"
+#define RBD_HEADER_VERSION "001.005"
+
+struct rbd_info {
+ __le64 max_id;
+} __attribute__ ((packed));
+
+struct rbd_image_snap_ondisk {
+ __le64 id;
+ __le64 image_size;
+} __attribute__((packed));
+
+struct rbd_image_header_ondisk {
+ char text[40];
+ char block_name[24];
+ char signature[4];
+ char version[8];
+ struct {
+ __u8 order;
+ __u8 crypt_type;
+ __u8 comp_type;
+ __u8 unused;
+ } __attribute__((packed)) options;
+ __le64 image_size;
+ __le64 snap_seq;
+ __le32 snap_count;
+ __le32 reserved;
+ __le64 snap_names_len;
+ struct rbd_image_snap_ondisk snaps[0];
+} __attribute__((packed));
+
+
+#endif
#include <linux/spinlock.h>
#include <linux/slab.h>
#include <linux/blkdev.h>
-#include <linux/smp_lock.h>
#include <linux/hdreg.h>
#include <linux/virtio.h>
#include <linux/virtio_blk.h>
struct virtio_blk *vblk = disk->private_data;
struct request *req;
struct bio *bio;
+ int err;
bio = bio_map_kern(vblk->disk->queue, id_str, VIRTIO_BLK_ID_BYTES,
GFP_KERNEL);
}
req->cmd_type = REQ_TYPE_SPECIAL;
- return blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+ err = blk_execute_rq(vblk->disk->queue, vblk->disk, req, false);
+ blk_put_request(req);
+
+ return err;
}
-static int virtblk_locked_ioctl(struct block_device *bdev, fmode_t mode,
- unsigned cmd, unsigned long data)
+static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
+ unsigned int cmd, unsigned long data)
{
struct gendisk *disk = bdev->bd_disk;
struct virtio_blk *vblk = disk->private_data;
(void __user *)data);
}
-static int virtblk_ioctl(struct block_device *bdev, fmode_t mode,
- unsigned int cmd, unsigned long param)
-{
- int ret;
-
- lock_kernel();
- ret = virtblk_locked_ioctl(bdev, mode, cmd, param);
- unlock_kernel();
-
- return ret;
-}
-
/* We provide getgeo only to please some old bootloader/partitioning tools */
static int virtblk_getgeo(struct block_device *bd, struct hd_geometry *geo)
{
config AGP_AMD64
tristate "AMD Opteron/Athlon64 on-CPU GART support"
- depends on AGP && X86 && K8_NB
+ depends on AGP && X86 && AMD_NB
help
This option gives you AGP support for the GLX component of
X using the on-CPU northbridge of the AMD Athlon64/Opteron CPUs.
#include <linux/mmzone.h>
#include <asm/page.h> /* PAGE_SIZE */
#include <asm/e820.h>
-#include <asm/k8.h>
+#include <asm/amd_nb.h>
#include <asm/gart.h>
#include "agp.h"
u32 temp;
struct aper_size_info_32 *values;
- dev = k8_northbridges[0];
+ dev = k8_northbridges.nb_misc[0];
if (dev==NULL)
return 0;
unsigned long gatt_bus = virt_to_phys(agp_bridge->gatt_table_real);
int i;
+ if (!k8_northbridges.gart_supported)
+ return 0;
+
/* Configure AGP regs in each x86-64 host bridge. */
- for (i = 0; i < num_k8_northbridges; i++) {
+ for (i = 0; i < k8_northbridges.num; i++) {
agp_bridge->gart_bus_addr =
- amd64_configure(k8_northbridges[i], gatt_bus);
+ amd64_configure(k8_northbridges.nb_misc[i],
+ gatt_bus);
}
k8_flush_garts();
return 0;
{
u32 tmp;
int i;
- for (i = 0; i < num_k8_northbridges; i++) {
- struct pci_dev *dev = k8_northbridges[i];
+
+ if (!k8_northbridges.gart_supported)
+ return;
+
+ for (i = 0; i < k8_northbridges.num; i++) {
+ struct pci_dev *dev = k8_northbridges.nb_misc[i];
/* disable gart translation */
pci_read_config_dword(dev, AMD64_GARTAPERTURECTL, &tmp);
- tmp &= ~AMD64_GARTEN;
+ tmp &= ~GARTEN;
pci_write_config_dword(dev, AMD64_GARTAPERTURECTL, tmp);
}
}
if (order < 0 || !agp_aperture_valid(aper, (32*1024*1024)<<order))
return -1;
- pci_write_config_dword(nb, AMD64_GARTAPERTURECTL, order << 1);
+ gart_set_size_and_enable(nb, order);
pci_write_config_dword(nb, AMD64_GARTAPERTUREBASE, aper >> 25);
return 0;
}
-static __devinit int cache_nbs (struct pci_dev *pdev, u32 cap_ptr)
+static __devinit int cache_nbs(struct pci_dev *pdev, u32 cap_ptr)
{
int i;
if (cache_k8_northbridges() < 0)
return -ENODEV;
+ if (!k8_northbridges.gart_supported)
+ return -ENODEV;
+
i = 0;
- for (i = 0; i < num_k8_northbridges; i++) {
- struct pci_dev *dev = k8_northbridges[i];
+ for (i = 0; i < k8_northbridges.num; i++) {
+ struct pci_dev *dev = k8_northbridges.nb_misc[i];
if (fix_northbridge(dev, pdev, cap_ptr) < 0) {
dev_err(&dev->dev, "no usable aperture found\n");
#ifdef __x86_64__
}
/* shadow x86-64 registers into ULi registers */
- pci_read_config_dword (k8_northbridges[0], AMD64_GARTAPERTUREBASE, &httfea);
+ pci_read_config_dword (k8_northbridges.nb_misc[0], AMD64_GARTAPERTUREBASE,
+ &httfea);
/* if x86-64 aperture base is beyond 4G, exit here */
if ((httfea & 0x7fff) >> (32 - 25)) {
pci_write_config_dword(dev1, NVIDIA_X86_64_1_APSIZE, tmp);
/* shadow x86-64 registers into NVIDIA registers */
- pci_read_config_dword (k8_northbridges[0], AMD64_GARTAPERTUREBASE, &apbase);
+ pci_read_config_dword (k8_northbridges.nb_misc[0], AMD64_GARTAPERTUREBASE,
+ &apbase);
/* if x86-64 aperture base is beyond 4G, exit here */
if ( (apbase & 0x7fff) >> (32 - 25) ) {
bridge->driver->cache_flush();
#ifdef CONFIG_X86
- set_memory_uc((unsigned long)table, 1 << page_order);
+ if (set_memory_uc((unsigned long)table, 1 << page_order))
+ printk(KERN_WARNING "Could not set GATT table memory to UC!");
+
bridge->gatt_table = (void *)table;
#else
bridge->gatt_table = ioremap_nocache(virt_to_phys(table),
"G45/G43", NULL, &intel_i965_driver },
{ PCI_DEVICE_ID_INTEL_B43_HB, PCI_DEVICE_ID_INTEL_B43_IG,
"B43", NULL, &intel_i965_driver },
+ { PCI_DEVICE_ID_INTEL_B43_1_HB, PCI_DEVICE_ID_INTEL_B43_1_IG,
+ "B43", NULL, &intel_i965_driver },
{ PCI_DEVICE_ID_INTEL_G41_HB, PCI_DEVICE_ID_INTEL_G41_IG,
"G41", NULL, &intel_i965_driver },
{ PCI_DEVICE_ID_INTEL_IRONLAKE_D_HB, PCI_DEVICE_ID_INTEL_IRONLAKE_D_IG,
#define PCI_DEVICE_ID_INTEL_Q33_IG 0x29D2
#define PCI_DEVICE_ID_INTEL_B43_HB 0x2E40
#define PCI_DEVICE_ID_INTEL_B43_IG 0x2E42
+#define PCI_DEVICE_ID_INTEL_B43_1_HB 0x2E90
+#define PCI_DEVICE_ID_INTEL_B43_1_IG 0x2E92
#define PCI_DEVICE_ID_INTEL_GM45_HB 0x2A40
#define PCI_DEVICE_ID_INTEL_GM45_IG 0x2A42
#define PCI_DEVICE_ID_INTEL_EAGLELAKE_HB 0x2E00
#ifdef CONFIG_PCI
static int pci_registered;
#endif
+#ifdef CONFIG_ACPI
+static int pnp_registered;
+#endif
#ifdef CONFIG_PPC_OF
static int of_registered;
#endif
{
struct acpi_device *acpi_dev;
struct smi_info *info;
- struct resource *res;
+ struct resource *res, *res_second;
acpi_handle handle;
acpi_status status;
unsigned long long tmp;
info->io.addr_data = res->start;
info->io.regspacing = DEFAULT_REGSPACING;
- res = pnp_get_resource(dev,
+ res_second = pnp_get_resource(dev,
(info->io.addr_type == IPMI_IO_ADDR_SPACE) ?
IORESOURCE_IO : IORESOURCE_MEM,
1);
- if (res) {
- if (res->start > info->io.addr_data)
- info->io.regspacing = res->start - info->io.addr_data;
+ if (res_second) {
+ if (res_second->start > info->io.addr_data)
+ info->io.regspacing = res_second->start - info->io.addr_data;
}
info->io.regsize = DEFAULT_REGSPACING;
info->io.regshift = 0;
#ifdef CONFIG_ACPI
pnp_register_driver(&ipmi_pnp_driver);
+ pnp_registered = 1;
#endif
#ifdef CONFIG_DMI
pci_unregister_driver(&ipmi_pci_driver);
#endif
#ifdef CONFIG_ACPI
- pnp_unregister_driver(&ipmi_pnp_driver);
+ if (pnp_registered)
+ pnp_unregister_driver(&ipmi_pnp_driver);
#endif
#ifdef CONFIG_PPC_OF
/*
* capabilities for /dev/zero
* - permits private mappings, "copies" are taken of the source of zeros
+ * - no writeback happens
*/
static struct backing_dev_info zero_bdi = {
.name = "char/mem",
- .capabilities = BDI_CAP_MAP_COPY,
+ .capabilities = BDI_CAP_MAP_COPY | BDI_CAP_NO_ACCT_AND_WRITEBACK,
};
static const struct file_operations full_fops = {
#define TPM_MAX_PROTECTED_ORDINAL 12
#define TPM_PROTECTED_ORDINAL_MASK 0xFF
+/*
+ * Bug workaround - some TPM's don't flush the most
+ * recently changed pcr on suspend, so force the flush
+ * with an extend to the selected _unused_ non-volatile pcr.
+ */
+static int tpm_suspend_pcr;
+module_param_named(suspend_pcr, tpm_suspend_pcr, uint, 0644);
+MODULE_PARM_DESC(suspend_pcr,
+ "PCR to use for dummy writes to faciltate flush on suspend.");
+
static LIST_HEAD(tpm_chip_list);
static DEFINE_SPINLOCK(driver_lock);
static DECLARE_BITMAP(dev_mask, TPM_NUM_DEVICES);
.ordinal = TPM_ORD_SAVESTATE
};
-/* Bug workaround - some TPM's don't flush the most
- * recently changed pcr on suspend, so force the flush
- * with an extend to the selected _unused_ non-volatile pcr.
- */
-static int tpm_suspend_pcr;
-static int __init tpm_suspend_setup(char *str)
-{
- get_option(&str, &tpm_suspend_pcr);
- return 1;
-}
-__setup("tpm_suspend_pcr=", tpm_suspend_setup);
-
/*
* We are about to suspend. Save the TPM state
* so that it can be restored.
/* Used for exporting per-port information to debugfs */
struct dentry *debugfs_dir;
+ /* List of all the devices we're handling */
+ struct list_head portdevs;
+
/* Number of devices this driver is handling */
unsigned int index;
* ports for that device (vdev->priv).
*/
struct ports_device {
+ /* Next portdev in the list, head is in the pdrvdata struct */
+ struct list_head list;
+
/*
* Workqueue handlers where we process deferred work after
* notification
struct console cons;
/* Each port associates with a separate char device */
- struct cdev cdev;
+ struct cdev *cdev;
struct device *dev;
+ /* Reference-counting to handle port hot-unplugs and file operations */
+ struct kref kref;
+
/* A waitqueue for poll() or blocking read operations */
wait_queue_head_t waitqueue;
/* The 'name' of the port that we expose via sysfs properties */
char *name;
+ /* We can notify apps of host connect / disconnect events via SIGIO */
+ struct fasync_struct *async_queue;
+
/* The 'id' to identify the port with the Host */
u32 id;
return port;
}
+static struct port *find_port_by_devt_in_portdev(struct ports_device *portdev,
+ dev_t dev)
+{
+ struct port *port;
+ unsigned long flags;
+
+ spin_lock_irqsave(&portdev->ports_lock, flags);
+ list_for_each_entry(port, &portdev->ports, list)
+ if (port->cdev->dev == dev)
+ goto out;
+ port = NULL;
+out:
+ spin_unlock_irqrestore(&portdev->ports_lock, flags);
+
+ return port;
+}
+
+static struct port *find_port_by_devt(dev_t dev)
+{
+ struct ports_device *portdev;
+ struct port *port;
+ unsigned long flags;
+
+ spin_lock_irqsave(&pdrvdata_lock, flags);
+ list_for_each_entry(portdev, &pdrvdata.portdevs, list) {
+ port = find_port_by_devt_in_portdev(portdev, dev);
+ if (port)
+ goto out;
+ }
+ port = NULL;
+out:
+ spin_unlock_irqrestore(&pdrvdata_lock, flags);
+ return port;
+}
+
static struct port *find_port_by_id(struct ports_device *portdev, u32 id)
{
struct port *port;
static ssize_t send_control_msg(struct port *port, unsigned int event,
unsigned int value)
{
- return __send_control_msg(port->portdev, port->id, event, value);
+ /* Did the port get unplugged before userspace closed it? */
+ if (port->portdev)
+ return __send_control_msg(port->portdev, port->id, event, value);
+ return 0;
}
/* Callers must take the port->outvq_lock */
/*
* Wait till the host acknowledges it pushed out the data we
- * sent. This is done for ports in blocking mode or for data
- * from the hvc_console; the tty operations are performed with
- * spinlocks held so we can't sleep here.
+ * sent. This is done for data from the hvc_console; the tty
+ * operations are performed with spinlocks held so we can't
+ * sleep here. An alternative would be to copy the data to a
+ * buffer and relax the spinning requirement. The downside is
+ * we need to kmalloc a GFP_ATOMIC buffer each time the
+ * console driver writes something out.
*/
while (!virtqueue_get_buf(out_vq, &len))
cpu_relax();
/* The condition that must be true for polling to end */
static bool will_read_block(struct port *port)
{
+ if (!port->guest_connected) {
+ /* Port got hot-unplugged. Let's exit. */
+ return false;
+ }
return !port_has_data(port) && port->host_connected;
}
if (ret < 0)
return ret;
}
+ /* Port got hot-unplugged. */
+ if (!port->guest_connected)
+ return -ENODEV;
/*
* We could've received a disconnection message while we were
* waiting for more data.
ssize_t ret;
bool nonblock;
+ /* Userspace could be out to fool us */
+ if (!count)
+ return 0;
+
port = filp->private_data;
nonblock = filp->f_flags & O_NONBLOCK;
if (ret < 0)
return ret;
}
+ /* Port got hot-unplugged. */
+ if (!port->guest_connected)
+ return -ENODEV;
count = min((size_t)(32 * 1024), count);
goto free_buf;
}
+ /*
+ * We now ask send_buf() to not spin for generic ports -- we
+ * can re-use the same code path that non-blocking file
+ * descriptors take for blocking file descriptors since the
+ * wait is already done and we're certain the write will go
+ * through to the host.
+ */
+ nonblock = true;
ret = send_buf(port, buf, count, nonblock);
if (nonblock && ret > 0)
port = filp->private_data;
poll_wait(filp, &port->waitqueue, wait);
+ if (!port->guest_connected) {
+ /* Port got unplugged */
+ return POLLHUP;
+ }
ret = 0;
- if (port->inbuf)
+ if (!will_read_block(port))
ret |= POLLIN | POLLRDNORM;
if (!will_write_block(port))
ret |= POLLOUT;
return ret;
}
+static void remove_port(struct kref *kref);
+
static int port_fops_release(struct inode *inode, struct file *filp)
{
struct port *port;
reclaim_consumed_buffers(port);
spin_unlock_irq(&port->outvq_lock);
+ /*
+ * Locks aren't necessary here as a port can't be opened after
+ * unplug, and if a port isn't unplugged, a kref would already
+ * exist for the port. Plus, taking ports_lock here would
+ * create a dependency on other locks taken by functions
+ * inside remove_port if we're the last holder of the port,
+ * creating many problems.
+ */
+ kref_put(&port->kref, remove_port);
+
return 0;
}
{
struct cdev *cdev = inode->i_cdev;
struct port *port;
+ int ret;
- port = container_of(cdev, struct port, cdev);
+ port = find_port_by_devt(cdev->dev);
filp->private_data = port;
+ /* Prevent against a port getting hot-unplugged at the same time */
+ spin_lock_irq(&port->portdev->ports_lock);
+ kref_get(&port->kref);
+ spin_unlock_irq(&port->portdev->ports_lock);
+
/*
* Don't allow opening of console port devices -- that's done
* via /dev/hvc
*/
- if (is_console_port(port))
- return -ENXIO;
+ if (is_console_port(port)) {
+ ret = -ENXIO;
+ goto out;
+ }
/* Allow only one process to open a particular port at a time */
spin_lock_irq(&port->inbuf_lock);
if (port->guest_connected) {
spin_unlock_irq(&port->inbuf_lock);
- return -EMFILE;
+ ret = -EMFILE;
+ goto out;
}
port->guest_connected = true;
reclaim_consumed_buffers(port);
spin_unlock_irq(&port->outvq_lock);
+ nonseekable_open(inode, filp);
+
/* Notify host of port being opened */
send_control_msg(filp->private_data, VIRTIO_CONSOLE_PORT_OPEN, 1);
return 0;
+out:
+ kref_put(&port->kref, remove_port);
+ return ret;
+}
+
+static int port_fops_fasync(int fd, struct file *filp, int mode)
+{
+ struct port *port;
+
+ port = filp->private_data;
+ return fasync_helper(fd, filp, mode, &port->async_queue);
}
/*
.write = port_fops_write,
.poll = port_fops_poll,
.release = port_fops_release,
+ .fasync = port_fops_fasync,
+ .llseek = no_llseek,
};
/*
return nr_added_bufs;
}
+static void send_sigio_to_port(struct port *port)
+{
+ if (port->async_queue && port->guest_connected)
+ kill_fasync(&port->async_queue, SIGIO, POLL_OUT);
+}
+
static int add_port(struct ports_device *portdev, u32 id)
{
char debugfs_name[16];
err = -ENOMEM;
goto fail;
}
+ kref_init(&port->kref);
port->portdev = portdev;
port->id = id;
port->name = NULL;
port->inbuf = NULL;
port->cons.hvc = NULL;
+ port->async_queue = NULL;
port->cons.ws.ws_row = port->cons.ws.ws_col = 0;
port->in_vq = portdev->in_vqs[port->id];
port->out_vq = portdev->out_vqs[port->id];
- cdev_init(&port->cdev, &port_fops);
+ port->cdev = cdev_alloc();
+ if (!port->cdev) {
+ dev_err(&port->portdev->vdev->dev, "Error allocating cdev\n");
+ err = -ENOMEM;
+ goto free_port;
+ }
+ port->cdev->ops = &port_fops;
devt = MKDEV(portdev->chr_major, id);
- err = cdev_add(&port->cdev, devt, 1);
+ err = cdev_add(port->cdev, devt, 1);
if (err < 0) {
dev_err(&port->portdev->vdev->dev,
"Error %d adding cdev for port %u\n", err, id);
- goto free_port;
+ goto free_cdev;
}
port->dev = device_create(pdrvdata.class, &port->portdev->vdev->dev,
devt, port, "vport%up%u",
free_device:
device_destroy(pdrvdata.class, port->dev->devt);
free_cdev:
- cdev_del(&port->cdev);
+ cdev_del(port->cdev);
free_port:
kfree(port);
fail:
return err;
}
-/* Remove all port-specific data. */
-static int remove_port(struct port *port)
+/* No users remain, remove all port-specific data. */
+static void remove_port(struct kref *kref)
+{
+ struct port *port;
+
+ port = container_of(kref, struct port, kref);
+
+ sysfs_remove_group(&port->dev->kobj, &port_attribute_group);
+ device_destroy(pdrvdata.class, port->dev->devt);
+ cdev_del(port->cdev);
+
+ kfree(port->name);
+
+ debugfs_remove(port->debugfs_file);
+
+ kfree(port);
+}
+
+/*
+ * Port got unplugged. Remove port from portdev's list and drop the
+ * kref reference. If no userspace has this port opened, it will
+ * result in immediate removal the port.
+ */
+static void unplug_port(struct port *port)
{
struct port_buffer *buf;
+ spin_lock_irq(&port->portdev->ports_lock);
+ list_del(&port->list);
+ spin_unlock_irq(&port->portdev->ports_lock);
+
if (port->guest_connected) {
port->guest_connected = false;
port->host_connected = false;
wake_up_interruptible(&port->waitqueue);
- send_control_msg(port, VIRTIO_CONSOLE_PORT_OPEN, 0);
- }
- spin_lock_irq(&port->portdev->ports_lock);
- list_del(&port->list);
- spin_unlock_irq(&port->portdev->ports_lock);
+ /* Let the app know the port is going down. */
+ send_sigio_to_port(port);
+ }
if (is_console_port(port)) {
spin_lock_irq(&pdrvdata_lock);
hvc_remove(port->cons.hvc);
#endif
}
- sysfs_remove_group(&port->dev->kobj, &port_attribute_group);
- device_destroy(pdrvdata.class, port->dev->devt);
- cdev_del(&port->cdev);
/* Remove unused data this port might have received. */
discard_port_data(port);
while ((buf = virtqueue_detach_unused_buf(port->in_vq)))
free_buf(buf);
- kfree(port->name);
-
- debugfs_remove(port->debugfs_file);
+ /*
+ * We should just assume the device itself has gone off --
+ * else a close on an open port later will try to send out a
+ * control message.
+ */
+ port->portdev = NULL;
- kfree(port);
- return 0;
+ /*
+ * Locks around here are not necessary - a port can't be
+ * opened after we removed the port struct from ports_list
+ * above.
+ */
+ kref_put(&port->kref, remove_port);
}
/* Any private messages that the Host and Guest want to share */
add_port(portdev, cpkt->id);
break;
case VIRTIO_CONSOLE_PORT_REMOVE:
- remove_port(port);
+ unplug_port(port);
break;
case VIRTIO_CONSOLE_CONSOLE_PORT:
if (!cpkt->value)
spin_lock_irq(&port->outvq_lock);
reclaim_consumed_buffers(port);
spin_unlock_irq(&port->outvq_lock);
+
+ /*
+ * If the guest is connected, it'll be interested in
+ * knowing the host connection state changed.
+ */
+ send_sigio_to_port(port);
break;
case VIRTIO_CONSOLE_PORT_NAME:
/*
wake_up_interruptible(&port->waitqueue);
+ /* Send a SIGIO indicating new data in case the process asked for it */
+ send_sigio_to_port(port);
+
if (is_console_port(port) && hvc_poll(port->cons.hvc))
hvc_kick();
}
add_port(portdev, 0);
}
+ spin_lock_irq(&pdrvdata_lock);
+ list_add_tail(&portdev->list, &pdrvdata.portdevs);
+ spin_unlock_irq(&pdrvdata_lock);
+
__send_control_msg(portdev, VIRTIO_CONSOLE_BAD_ID,
VIRTIO_CONSOLE_DEVICE_READY, 1);
return 0;
{
struct ports_device *portdev;
struct port *port, *port2;
- struct port_buffer *buf;
- unsigned int len;
portdev = vdev->priv;
+ spin_lock_irq(&pdrvdata_lock);
+ list_del(&portdev->list);
+ spin_unlock_irq(&pdrvdata_lock);
+
+ /* Disable interrupts for vqs */
+ vdev->config->reset(vdev);
+ /* Finish up work that's lined up */
cancel_work_sync(&portdev->control_work);
list_for_each_entry_safe(port, port2, &portdev->ports, list)
- remove_port(port);
+ unplug_port(port);
unregister_chrdev(portdev->chr_major, "virtio-portsdev");
- while ((buf = virtqueue_get_buf(portdev->c_ivq, &len)))
- free_buf(buf);
+ /*
+ * When yanking out a device, we immediately lose the
+ * (device-side) queues. So there's no point in keeping the
+ * guest side around till we drop our final reference. This
+ * also means that any ports which are in an open state will
+ * have to just stop using the port, as the vqs are going
+ * away.
+ */
+ if (use_multiport(portdev)) {
+ struct port_buffer *buf;
+ unsigned int len;
- while ((buf = virtqueue_detach_unused_buf(portdev->c_ivq)))
- free_buf(buf);
+ while ((buf = virtqueue_get_buf(portdev->c_ivq, &len)))
+ free_buf(buf);
+
+ while ((buf = virtqueue_detach_unused_buf(portdev->c_ivq)))
+ free_buf(buf);
+ }
vdev->config->del_vqs(vdev);
kfree(portdev->in_vqs);
PTR_ERR(pdrvdata.debugfs_dir));
}
INIT_LIST_HEAD(&pdrvdata.consoles);
+ INIT_LIST_HEAD(&pdrvdata.portdevs);
return register_virtio_driver(&virtio_console);
}
case KIOCSOUND:
if (!perm)
goto eperm;
- /* FIXME: This is an old broken API but we need to keep it
- supported and somehow separate the historic advertised
- tick rate from any real one */
+ /*
+ * The use of PIT_TICK_RATE is historic, it used to be
+ * the platform-dependent CLOCK_TICK_RATE between 2.6.12
+ * and 2.6.36, which was a minor but unfortunate ABI
+ * change.
+ */
if (arg)
- arg = CLOCK_TICK_RATE / arg;
+ arg = PIT_TICK_RATE / arg;
kd_mksound(arg, 0);
break;
*/
ticks = HZ * ((arg >> 16) & 0xffff) / 1000;
count = ticks ? (arg & 0xffff) : 0;
- /* FIXME: This is an old broken API but we need to keep it
- supported and somehow separate the historic advertised
- tick rate from any real one */
if (count)
- count = CLOCK_TICK_RATE / count;
+ count = PIT_TICK_RATE / count;
kd_mksound(count, ticks);
break;
}
* Limiting Performance Impact
* ---------------------------
* C states, especially those with large exit latencies, can have a real
- * noticable impact on workloads, which is not acceptable for most sysadmins,
+ * noticeable impact on workloads, which is not acceptable for most sysadmins,
* and in addition, less performance has a power price of its own.
*
* As a general rule of thumb, menu assumes that the following heuristic
static LIST_HEAD(dca_domains);
+static BLOCKING_NOTIFIER_HEAD(dca_provider_chain);
+
+static int dca_providers_blocked;
+
static struct pci_bus *dca_pci_rc_from_dev(struct device *dev)
{
struct pci_dev *pdev = to_pci_dev(dev);
kfree(domain);
}
+static int dca_provider_ioat_ver_3_0(struct device *dev)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+
+ return ((pdev->vendor == PCI_VENDOR_ID_INTEL) &&
+ ((pdev->device == PCI_DEVICE_ID_INTEL_IOAT_TBG0) ||
+ (pdev->device == PCI_DEVICE_ID_INTEL_IOAT_TBG1) ||
+ (pdev->device == PCI_DEVICE_ID_INTEL_IOAT_TBG2) ||
+ (pdev->device == PCI_DEVICE_ID_INTEL_IOAT_TBG3) ||
+ (pdev->device == PCI_DEVICE_ID_INTEL_IOAT_TBG4) ||
+ (pdev->device == PCI_DEVICE_ID_INTEL_IOAT_TBG5) ||
+ (pdev->device == PCI_DEVICE_ID_INTEL_IOAT_TBG6) ||
+ (pdev->device == PCI_DEVICE_ID_INTEL_IOAT_TBG7)));
+}
+
+static void unregister_dca_providers(void)
+{
+ struct dca_provider *dca, *_dca;
+ struct list_head unregistered_providers;
+ struct dca_domain *domain;
+ unsigned long flags;
+
+ blocking_notifier_call_chain(&dca_provider_chain,
+ DCA_PROVIDER_REMOVE, NULL);
+
+ INIT_LIST_HEAD(&unregistered_providers);
+
+ spin_lock_irqsave(&dca_lock, flags);
+
+ if (list_empty(&dca_domains)) {
+ spin_unlock_irqrestore(&dca_lock, flags);
+ return;
+ }
+
+ /* at this point only one domain in the list is expected */
+ domain = list_first_entry(&dca_domains, struct dca_domain, node);
+ if (!domain)
+ return;
+
+ list_for_each_entry_safe(dca, _dca, &domain->dca_providers, node) {
+ list_del(&dca->node);
+ list_add(&dca->node, &unregistered_providers);
+ }
+
+ dca_free_domain(domain);
+
+ spin_unlock_irqrestore(&dca_lock, flags);
+
+ list_for_each_entry_safe(dca, _dca, &unregistered_providers, node) {
+ dca_sysfs_remove_provider(dca);
+ list_del(&dca->node);
+ }
+}
+
static struct dca_domain *dca_find_domain(struct pci_bus *rc)
{
struct dca_domain *domain;
domain = dca_find_domain(rc);
if (!domain) {
- domain = dca_allocate_domain(rc);
- if (domain)
- list_add(&domain->node, &dca_domains);
+ if (dca_provider_ioat_ver_3_0(dev) && !list_empty(&dca_domains)) {
+ dca_providers_blocked = 1;
+ } else {
+ domain = dca_allocate_domain(rc);
+ if (domain)
+ list_add(&domain->node, &dca_domains);
+ }
}
return domain;
}
EXPORT_SYMBOL_GPL(free_dca_provider);
-static BLOCKING_NOTIFIER_HEAD(dca_provider_chain);
-
/**
* register_dca_provider - register a dca provider
* @dca - struct created by alloc_dca_provider()
unsigned long flags;
struct dca_domain *domain;
+ spin_lock_irqsave(&dca_lock, flags);
+ if (dca_providers_blocked) {
+ spin_unlock_irqrestore(&dca_lock, flags);
+ return -ENODEV;
+ }
+ spin_unlock_irqrestore(&dca_lock, flags);
+
err = dca_sysfs_add_provider(dca, dev);
if (err)
return err;
spin_lock_irqsave(&dca_lock, flags);
domain = dca_get_domain(dev);
if (!domain) {
- spin_unlock_irqrestore(&dca_lock, flags);
+ if (dca_providers_blocked) {
+ spin_unlock_irqrestore(&dca_lock, flags);
+ dca_sysfs_remove_provider(dca);
+ unregister_dca_providers();
+ } else {
+ spin_unlock_irqrestore(&dca_lock, flags);
+ }
return -ENODEV;
}
list_add(&dca->node, &domain->dca_providers);
dma->device_issue_pending = ioat2_issue_pending;
dma->device_alloc_chan_resources = ioat2_alloc_chan_resources;
dma->device_free_chan_resources = ioat2_free_chan_resources;
- dma->device_tx_status = ioat_tx_status;
+ dma->device_tx_status = ioat_dma_tx_status;
err = ioat_probe(device);
if (err)
static void mv_xor_device_clear_eoc_cause(struct mv_xor_chan *chan)
{
- u32 val = (1 << (1 + (chan->idx * 16)));
+ u32 val = ~(1 << (chan->idx * 16));
dev_dbg(chan->device->common.dev, "%s, val 0x%08x\n", __func__, val);
__raw_writel(val, XOR_INTR_CAUSE(chan));
}
sh_chan = to_sh_chan(chan);
param = chan->private;
- slave_addr = param->config->addr;
/* Someone calling slave DMA on a public channel? */
if (!param || !sg_len) {
return NULL;
}
+ slave_addr = param->config->addr;
+
/*
* if (param != NULL), this is a successfully requested slave channel,
* therefore param->config != NULL too.
there're four debug levels (x=0,1,2,3 from low to high).
Usually you should select 'N'.
- config EDAC_DECODE_MCE
+config EDAC_DECODE_MCE
tristate "Decode MCEs in human-readable form (only on AMD for now)"
depends on CPU_SUP_AMD && X86_MCE
default y
which occur really early upon boot, before the module infrastructure
has been initialized.
+config EDAC_MCE_INJ
+ tristate "Simple MCE injection interface over /sysfs"
+ depends on EDAC_DECODE_MCE
+ default n
+ help
+ This is a simple interface to inject MCEs over /sysfs and test
+ the MCE decoding code in EDAC.
+
+ This is currently AMD-only.
+
config EDAC_MM_EDAC
tristate "Main Memory EDAC (Error Detection And Correction) reporting"
help
config EDAC_AMD64
tristate "AMD64 (Opteron, Athlon64) K8, F10h, F11h"
- depends on EDAC_MM_EDAC && K8_NB && X86_64 && PCI && EDAC_DECODE_MCE
+ depends on EDAC_MM_EDAC && AMD_NB && X86_64 && PCI && EDAC_DECODE_MCE
help
Support for error detection and correction on the AMD 64
Families of Memory Controllers (K8, F10h and F11h)
config EDAC_AMD64_ERROR_INJECTION
- bool "Sysfs Error Injection facilities"
+ bool "Sysfs HW Error injection facilities"
depends on EDAC_AMD64
help
Recent Opterons (Family 10h and later) provide for Memory Error
edac_core-objs += edac_pci.o edac_pci_sysfs.o
endif
+obj-$(CONFIG_EDAC_MCE_INJ) += mce_amd_inj.o
+
+edac_mce_amd-objs := mce_amd.o
obj-$(CONFIG_EDAC_DECODE_MCE) += edac_mce_amd.o
obj-$(CONFIG_EDAC_AMD76X) += amd76x_edac.o
#include "amd64_edac.h"
-#include <asm/k8.h>
+#include <asm/amd_nb.h>
static struct edac_pci_ctl_info *amd64_ctl_pci;
amd64_handle_ue(mci, info);
}
-void amd64_decode_bus_error(int node_id, struct err_regs *regs)
+void amd64_decode_bus_error(int node_id, struct mce *m, u32 nbcfg)
{
struct mem_ctl_info *mci = mci_lookup[node_id];
+ struct err_regs regs;
- __amd64_decode_bus_error(mci, regs);
+ regs.nbsl = (u32) m->status;
+ regs.nbsh = (u32)(m->status >> 32);
+ regs.nbeal = (u32) m->addr;
+ regs.nbeah = (u32)(m->addr >> 32);
+ regs.nbcfg = nbcfg;
+
+ __amd64_decode_bus_error(mci, ®s);
/*
* Check the UE bit of the NB status high register, if set generate some
*
* FIXME: this should go somewhere else, if at all.
*/
- if (regs->nbsh & K8_NBSH_UC_ERR && !report_gart_errors)
+ if (regs.nbsh & K8_NBSH_UC_ERR && !report_gart_errors)
edac_mc_handle_ue_no_info(mci, "UE bit is set");
}
* to finish initialization of the MC instances.
*/
err = -ENODEV;
- for (nb = 0; nb < num_k8_northbridges; nb++) {
+ for (nb = 0; nb < k8_northbridges.num; nb++) {
if (!pvt_lookup[nb])
continue;
#include <linux/edac.h>
#include <asm/msr.h>
#include "edac_core.h"
-#include "edac_mce_amd.h"
+#include "mce_amd.h"
#define amd64_printk(level, fmt, arg...) \
edac_printk(level, "amd64", fmt, ##arg)
extern const char *to_msgs[2];
extern const char *pp_msgs[4];
extern const char *ii_msgs[4];
-extern const char *ext_msgs[32];
extern const char *htlink_msgs[8];
#ifdef CONFIG_EDAC_DEBUG
-#define NUM_DBG_ATTRS 9
+#define NUM_DBG_ATTRS 5
#else
#define NUM_DBG_ATTRS 0
#endif
#include "amd64_edac.h"
-/*
- * accept a hex value and store it into the virtual error register file, field:
- * nbeal and nbeah. Assume virtual error values have already been set for: NBSL,
- * NBSH and NBCFG. Then proceed to map the error values to a MC, CSROW and
- * CHANNEL
- */
-static ssize_t amd64_nbea_store(struct mem_ctl_info *mci, const char *data,
- size_t count)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
- unsigned long long value;
- int ret = 0;
-
- ret = strict_strtoull(data, 16, &value);
- if (ret != -EINVAL) {
- debugf0("received NBEA= 0x%llx\n", value);
-
- /* place the value into the virtual error packet */
- pvt->ctl_error_info.nbeal = (u32) value;
- value >>= 32;
- pvt->ctl_error_info.nbeah = (u32) value;
-
- /* Process the Mapping request */
- /* TODO: Add race prevention */
- amd_decode_nb_mce(pvt->mc_node_id, &pvt->ctl_error_info, 1);
-
- return count;
- }
- return ret;
-}
-
-/* display back what the last NBEA (MCA NB Address (MC4_ADDR)) was written */
-static ssize_t amd64_nbea_show(struct mem_ctl_info *mci, char *data)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
- u64 value;
-
- value = pvt->ctl_error_info.nbeah;
- value <<= 32;
- value |= pvt->ctl_error_info.nbeal;
-
- return sprintf(data, "%llx\n", value);
-}
-
-/* store the NBSL (MCA NB Status Low (MC4_STATUS)) value user desires */
-static ssize_t amd64_nbsl_store(struct mem_ctl_info *mci, const char *data,
- size_t count)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
- unsigned long value;
- int ret = 0;
-
- ret = strict_strtoul(data, 16, &value);
- if (ret != -EINVAL) {
- debugf0("received NBSL= 0x%lx\n", value);
-
- pvt->ctl_error_info.nbsl = (u32) value;
-
- return count;
- }
- return ret;
-}
-
-/* display back what the last NBSL value written */
-static ssize_t amd64_nbsl_show(struct mem_ctl_info *mci, char *data)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
- u32 value;
-
- value = pvt->ctl_error_info.nbsl;
-
- return sprintf(data, "%x\n", value);
-}
-
-/* store the NBSH (MCA NB Status High) value user desires */
-static ssize_t amd64_nbsh_store(struct mem_ctl_info *mci, const char *data,
- size_t count)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
- unsigned long value;
- int ret = 0;
-
- ret = strict_strtoul(data, 16, &value);
- if (ret != -EINVAL) {
- debugf0("received NBSH= 0x%lx\n", value);
-
- pvt->ctl_error_info.nbsh = (u32) value;
-
- return count;
- }
- return ret;
-}
-
-/* display back what the last NBSH value written */
-static ssize_t amd64_nbsh_show(struct mem_ctl_info *mci, char *data)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
- u32 value;
-
- value = pvt->ctl_error_info.nbsh;
-
- return sprintf(data, "%x\n", value);
+#define EDAC_DCT_ATTR_SHOW(reg) \
+static ssize_t amd64_##reg##_show(struct mem_ctl_info *mci, char *data) \
+{ \
+ struct amd64_pvt *pvt = mci->pvt_info; \
+ return sprintf(data, "0x%016llx\n", (u64)pvt->reg); \
}
-/* accept and store the NBCFG (MCA NB Configuration) value user desires */
-static ssize_t amd64_nbcfg_store(struct mem_ctl_info *mci,
- const char *data, size_t count)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
- unsigned long value;
- int ret = 0;
-
- ret = strict_strtoul(data, 16, &value);
- if (ret != -EINVAL) {
- debugf0("received NBCFG= 0x%lx\n", value);
-
- pvt->ctl_error_info.nbcfg = (u32) value;
-
- return count;
- }
- return ret;
-}
-
-/* various show routines for the controls of a MCI */
-static ssize_t amd64_nbcfg_show(struct mem_ctl_info *mci, char *data)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
-
- return sprintf(data, "%x\n", pvt->ctl_error_info.nbcfg);
-}
-
-
-static ssize_t amd64_dhar_show(struct mem_ctl_info *mci, char *data)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
-
- return sprintf(data, "%x\n", pvt->dhar);
-}
-
-
-static ssize_t amd64_dbam_show(struct mem_ctl_info *mci, char *data)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
-
- return sprintf(data, "%x\n", pvt->dbam0);
-}
-
-
-static ssize_t amd64_topmem_show(struct mem_ctl_info *mci, char *data)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
-
- return sprintf(data, "%llx\n", pvt->top_mem);
-}
-
-
-static ssize_t amd64_topmem2_show(struct mem_ctl_info *mci, char *data)
-{
- struct amd64_pvt *pvt = mci->pvt_info;
-
- return sprintf(data, "%llx\n", pvt->top_mem2);
-}
+EDAC_DCT_ATTR_SHOW(dhar);
+EDAC_DCT_ATTR_SHOW(dbam0);
+EDAC_DCT_ATTR_SHOW(top_mem);
+EDAC_DCT_ATTR_SHOW(top_mem2);
static ssize_t amd64_hole_show(struct mem_ctl_info *mci, char *data)
{
{
.attr = {
- .name = "nbea_ctl",
- .mode = (S_IRUGO | S_IWUSR)
- },
- .show = amd64_nbea_show,
- .store = amd64_nbea_store,
- },
- {
- .attr = {
- .name = "nbsl_ctl",
- .mode = (S_IRUGO | S_IWUSR)
- },
- .show = amd64_nbsl_show,
- .store = amd64_nbsl_store,
- },
- {
- .attr = {
- .name = "nbsh_ctl",
- .mode = (S_IRUGO | S_IWUSR)
- },
- .show = amd64_nbsh_show,
- .store = amd64_nbsh_store,
- },
- {
- .attr = {
- .name = "nbcfg_ctl",
- .mode = (S_IRUGO | S_IWUSR)
- },
- .show = amd64_nbcfg_show,
- .store = amd64_nbcfg_store,
- },
- {
- .attr = {
.name = "dhar",
.mode = (S_IRUGO)
},
.name = "dbam",
.mode = (S_IRUGO)
},
- .show = amd64_dbam_show,
+ .show = amd64_dbam0_show,
.store = NULL,
},
{
.name = "topmem",
.mode = (S_IRUGO)
},
- .show = amd64_topmem_show,
+ .show = amd64_top_mem_show,
.store = NULL,
},
{
.name = "topmem2",
.mode = (S_IRUGO)
},
- .show = amd64_topmem2_show,
+ .show = amd64_top_mem2_show,
.store = NULL,
},
{
#include <linux/ctype.h>
#include <linux/module.h>
#include <linux/slab.h>
+#include <linux/edac.h>
#include "edac_core.h"
#include "edac_module.h"
debugf1("%s()\n", __func__);
/* get the /sys/devices/system/edac reference */
- edac_class = edac_get_edac_class();
+ edac_class = edac_get_sysfs_class();
if (edac_class == NULL) {
debugf1("%s() no edac_class error\n", __func__);
err = -ENODEV;
if (!try_module_get(edac_dev->owner)) {
err = -ENODEV;
- goto err_out;
+ goto err_mod_get;
}
/* register */
err_kobj_reg:
module_put(edac_dev->owner);
+err_mod_get:
+ edac_put_sysfs_class();
+
err_out:
return err;
}
* edac_device_unregister_sysfs_main_kobj:
* the '..../edac/<name>' kobject
*/
-void edac_device_unregister_sysfs_main_kobj(
- struct edac_device_ctl_info *edac_dev)
+void edac_device_unregister_sysfs_main_kobj(struct edac_device_ctl_info *dev)
{
debugf0("%s()\n", __func__);
debugf4("%s() name of kobject is: %s\n",
- __func__, kobject_name(&edac_dev->kobj));
+ __func__, kobject_name(&dev->kobj));
/*
* Unregister the edac device's kobject and
* a) module_put() this module
* b) 'kfree' the memory
*/
- kobject_put(&edac_dev->kobj);
+ kobject_put(&dev->kobj);
+ edac_put_sysfs_class();
}
/* edac_dev -> instance information */
{
int status;
+ if (mci->op_state != OP_RUNNING_POLL)
+ return;
+
status = cancel_delayed_work(&mci->work);
if (status == 0) {
debugf0("%s() not canceled, flush the queue\n",
#include <linux/ctype.h>
#include <linux/slab.h>
+#include <linux/edac.h>
#include <linux/bug.h>
#include "edac_core.h"
*/
int edac_sysfs_setup_mc_kset(void)
{
- int err = 0;
+ int err = -EINVAL;
struct sysdev_class *edac_class;
debugf1("%s()\n", __func__);
/* get the /sys/devices/system/edac class reference */
- edac_class = edac_get_edac_class();
+ edac_class = edac_get_sysfs_class();
if (edac_class == NULL) {
debugf1("%s() no edac_class error=%d\n", __func__, err);
goto fail_out;
if (!mc_kset) {
err = -ENOMEM;
debugf1("%s() Failed to register '.../edac/mc'\n", __func__);
- goto fail_out;
+ goto fail_kset;
}
debugf1("%s() Registered '.../edac/mc' kobject\n", __func__);
return 0;
+fail_kset:
+ edac_put_sysfs_class();
- /* error unwind stack */
fail_out:
return err;
}
void edac_sysfs_teardown_mc_kset(void)
{
kset_unregister(mc_kset);
+ edac_put_sysfs_class();
}
+++ /dev/null
-#include <linux/module.h>
-#include "edac_mce_amd.h"
-
-static bool report_gart_errors;
-static void (*nb_bus_decoder)(int node_id, struct err_regs *regs);
-
-void amd_report_gart_errors(bool v)
-{
- report_gart_errors = v;
-}
-EXPORT_SYMBOL_GPL(amd_report_gart_errors);
-
-void amd_register_ecc_decoder(void (*f)(int, struct err_regs *))
-{
- nb_bus_decoder = f;
-}
-EXPORT_SYMBOL_GPL(amd_register_ecc_decoder);
-
-void amd_unregister_ecc_decoder(void (*f)(int, struct err_regs *))
-{
- if (nb_bus_decoder) {
- WARN_ON(nb_bus_decoder != f);
-
- nb_bus_decoder = NULL;
- }
-}
-EXPORT_SYMBOL_GPL(amd_unregister_ecc_decoder);
-
-/*
- * string representation for the different MCA reported error types, see F3x48
- * or MSR0000_0411.
- */
-const char *tt_msgs[] = { /* transaction type */
- "instruction",
- "data",
- "generic",
- "reserved"
-};
-EXPORT_SYMBOL_GPL(tt_msgs);
-
-const char *ll_msgs[] = { /* cache level */
- "L0",
- "L1",
- "L2",
- "L3/generic"
-};
-EXPORT_SYMBOL_GPL(ll_msgs);
-
-const char *rrrr_msgs[] = {
- "generic",
- "generic read",
- "generic write",
- "data read",
- "data write",
- "inst fetch",
- "prefetch",
- "evict",
- "snoop",
- "reserved RRRR= 9",
- "reserved RRRR= 10",
- "reserved RRRR= 11",
- "reserved RRRR= 12",
- "reserved RRRR= 13",
- "reserved RRRR= 14",
- "reserved RRRR= 15"
-};
-EXPORT_SYMBOL_GPL(rrrr_msgs);
-
-const char *pp_msgs[] = { /* participating processor */
- "local node originated (SRC)",
- "local node responded to request (RES)",
- "local node observed as 3rd party (OBS)",
- "generic"
-};
-EXPORT_SYMBOL_GPL(pp_msgs);
-
-const char *to_msgs[] = {
- "no timeout",
- "timed out"
-};
-EXPORT_SYMBOL_GPL(to_msgs);
-
-const char *ii_msgs[] = { /* memory or i/o */
- "mem access",
- "reserved",
- "i/o access",
- "generic"
-};
-EXPORT_SYMBOL_GPL(ii_msgs);
-
-/*
- * Map the 4 or 5 (family-specific) bits of Extended Error code to the
- * string table.
- */
-const char *ext_msgs[] = {
- "K8 ECC error", /* 0_0000b */
- "CRC error on link", /* 0_0001b */
- "Sync error packets on link", /* 0_0010b */
- "Master Abort during link operation", /* 0_0011b */
- "Target Abort during link operation", /* 0_0100b */
- "Invalid GART PTE entry during table walk", /* 0_0101b */
- "Unsupported atomic RMW command received", /* 0_0110b */
- "WDT error: NB transaction timeout", /* 0_0111b */
- "ECC/ChipKill ECC error", /* 0_1000b */
- "SVM DEV Error", /* 0_1001b */
- "Link Data error", /* 0_1010b */
- "Link/L3/Probe Filter Protocol error", /* 0_1011b */
- "NB Internal Arrays Parity error", /* 0_1100b */
- "DRAM Address/Control Parity error", /* 0_1101b */
- "Link Transmission error", /* 0_1110b */
- "GART/DEV Table Walk Data error" /* 0_1111b */
- "Res 0x100 error", /* 1_0000b */
- "Res 0x101 error", /* 1_0001b */
- "Res 0x102 error", /* 1_0010b */
- "Res 0x103 error", /* 1_0011b */
- "Res 0x104 error", /* 1_0100b */
- "Res 0x105 error", /* 1_0101b */
- "Res 0x106 error", /* 1_0110b */
- "Res 0x107 error", /* 1_0111b */
- "Res 0x108 error", /* 1_1000b */
- "Res 0x109 error", /* 1_1001b */
- "Res 0x10A error", /* 1_1010b */
- "Res 0x10B error", /* 1_1011b */
- "ECC error in L3 Cache Data", /* 1_1100b */
- "L3 Cache Tag error", /* 1_1101b */
- "L3 Cache LRU Parity error", /* 1_1110b */
- "Probe Filter error" /* 1_1111b */
-};
-EXPORT_SYMBOL_GPL(ext_msgs);
-
-static void amd_decode_dc_mce(u64 mc0_status)
-{
- u32 ec = mc0_status & 0xffff;
- u32 xec = (mc0_status >> 16) & 0xf;
-
- pr_emerg("Data Cache Error");
-
- if (xec == 1 && TLB_ERROR(ec))
- pr_cont(": %s TLB multimatch.\n", LL_MSG(ec));
- else if (xec == 0) {
- if (mc0_status & (1ULL << 40))
- pr_cont(" during Data Scrub.\n");
- else if (TLB_ERROR(ec))
- pr_cont(": %s TLB parity error.\n", LL_MSG(ec));
- else if (MEM_ERROR(ec)) {
- u8 ll = ec & 0x3;
- u8 tt = (ec >> 2) & 0x3;
- u8 rrrr = (ec >> 4) & 0xf;
-
- /* see F10h BKDG (31116), Table 92. */
- if (ll == 0x1) {
- if (tt != 0x1)
- goto wrong_dc_mce;
-
- pr_cont(": Data/Tag %s error.\n", RRRR_MSG(ec));
-
- } else if (ll == 0x2 && rrrr == 0x3)
- pr_cont(" during L1 linefill from L2.\n");
- else
- goto wrong_dc_mce;
- } else if (BUS_ERROR(ec) && boot_cpu_data.x86 == 0xf)
- pr_cont(" during system linefill.\n");
- else
- goto wrong_dc_mce;
- } else
- goto wrong_dc_mce;
-
- return;
-
-wrong_dc_mce:
- pr_warning("Corrupted DC MCE info?\n");
-}
-
-static void amd_decode_ic_mce(u64 mc1_status)
-{
- u32 ec = mc1_status & 0xffff;
- u32 xec = (mc1_status >> 16) & 0xf;
-
- pr_emerg("Instruction Cache Error");
-
- if (xec == 1 && TLB_ERROR(ec))
- pr_cont(": %s TLB multimatch.\n", LL_MSG(ec));
- else if (xec == 0) {
- if (TLB_ERROR(ec))
- pr_cont(": %s TLB Parity error.\n", LL_MSG(ec));
- else if (BUS_ERROR(ec)) {
- if (boot_cpu_data.x86 == 0xf &&
- (mc1_status & (1ULL << 58)))
- pr_cont(" during system linefill.\n");
- else
- pr_cont(" during attempted NB data read.\n");
- } else if (MEM_ERROR(ec)) {
- u8 ll = ec & 0x3;
- u8 rrrr = (ec >> 4) & 0xf;
-
- if (ll == 0x2)
- pr_cont(" during a linefill from L2.\n");
- else if (ll == 0x1) {
-
- switch (rrrr) {
- case 0x5:
- pr_cont(": Parity error during "
- "data load.\n");
- break;
-
- case 0x7:
- pr_cont(": Copyback Parity/Victim"
- " error.\n");
- break;
-
- case 0x8:
- pr_cont(": Tag Snoop error.\n");
- break;
-
- default:
- goto wrong_ic_mce;
- break;
- }
- }
- } else
- goto wrong_ic_mce;
- } else
- goto wrong_ic_mce;
-
- return;
-
-wrong_ic_mce:
- pr_warning("Corrupted IC MCE info?\n");
-}
-
-static void amd_decode_bu_mce(u64 mc2_status)
-{
- u32 ec = mc2_status & 0xffff;
- u32 xec = (mc2_status >> 16) & 0xf;
-
- pr_emerg("Bus Unit Error");
-
- if (xec == 0x1)
- pr_cont(" in the write data buffers.\n");
- else if (xec == 0x3)
- pr_cont(" in the victim data buffers.\n");
- else if (xec == 0x2 && MEM_ERROR(ec))
- pr_cont(": %s error in the L2 cache tags.\n", RRRR_MSG(ec));
- else if (xec == 0x0) {
- if (TLB_ERROR(ec))
- pr_cont(": %s error in a Page Descriptor Cache or "
- "Guest TLB.\n", TT_MSG(ec));
- else if (BUS_ERROR(ec))
- pr_cont(": %s/ECC error in data read from NB: %s.\n",
- RRRR_MSG(ec), PP_MSG(ec));
- else if (MEM_ERROR(ec)) {
- u8 rrrr = (ec >> 4) & 0xf;
-
- if (rrrr >= 0x7)
- pr_cont(": %s error during data copyback.\n",
- RRRR_MSG(ec));
- else if (rrrr <= 0x1)
- pr_cont(": %s parity/ECC error during data "
- "access from L2.\n", RRRR_MSG(ec));
- else
- goto wrong_bu_mce;
- } else
- goto wrong_bu_mce;
- } else
- goto wrong_bu_mce;
-
- return;
-
-wrong_bu_mce:
- pr_warning("Corrupted BU MCE info?\n");
-}
-
-static void amd_decode_ls_mce(u64 mc3_status)
-{
- u32 ec = mc3_status & 0xffff;
- u32 xec = (mc3_status >> 16) & 0xf;
-
- pr_emerg("Load Store Error");
-
- if (xec == 0x0) {
- u8 rrrr = (ec >> 4) & 0xf;
-
- if (!BUS_ERROR(ec) || (rrrr != 0x3 && rrrr != 0x4))
- goto wrong_ls_mce;
-
- pr_cont(" during %s.\n", RRRR_MSG(ec));
- }
- return;
-
-wrong_ls_mce:
- pr_warning("Corrupted LS MCE info?\n");
-}
-
-void amd_decode_nb_mce(int node_id, struct err_regs *regs, int handle_errors)
-{
- u32 ec = ERROR_CODE(regs->nbsl);
-
- if (!handle_errors)
- return;
-
- /*
- * GART TLB error reporting is disabled by default. Bail out early.
- */
- if (TLB_ERROR(ec) && !report_gart_errors)
- return;
-
- pr_emerg("Northbridge Error, node %d", node_id);
-
- /*
- * F10h, revD can disable ErrCpu[3:0] so check that first and also the
- * value encoding has changed so interpret those differently
- */
- if ((boot_cpu_data.x86 == 0x10) &&
- (boot_cpu_data.x86_model > 7)) {
- if (regs->nbsh & K8_NBSH_ERR_CPU_VAL)
- pr_cont(", core: %u\n", (u8)(regs->nbsh & 0xf));
- } else {
- u8 assoc_cpus = regs->nbsh & 0xf;
-
- if (assoc_cpus > 0)
- pr_cont(", core: %d", fls(assoc_cpus) - 1);
-
- pr_cont("\n");
- }
-
- pr_emerg("%s.\n", EXT_ERR_MSG(regs->nbsl));
-
- if (BUS_ERROR(ec) && nb_bus_decoder)
- nb_bus_decoder(node_id, regs);
-}
-EXPORT_SYMBOL_GPL(amd_decode_nb_mce);
-
-static void amd_decode_fr_mce(u64 mc5_status)
-{
- /* we have only one error signature so match all fields at once. */
- if ((mc5_status & 0xffff) == 0x0f0f)
- pr_emerg(" FR Error: CPU Watchdog timer expire.\n");
- else
- pr_warning("Corrupted FR MCE info?\n");
-}
-
-static inline void amd_decode_err_code(unsigned int ec)
-{
- if (TLB_ERROR(ec)) {
- pr_emerg("Transaction: %s, Cache Level %s\n",
- TT_MSG(ec), LL_MSG(ec));
- } else if (MEM_ERROR(ec)) {
- pr_emerg("Transaction: %s, Type: %s, Cache Level: %s",
- RRRR_MSG(ec), TT_MSG(ec), LL_MSG(ec));
- } else if (BUS_ERROR(ec)) {
- pr_emerg("Transaction type: %s(%s), %s, Cache Level: %s, "
- "Participating Processor: %s\n",
- RRRR_MSG(ec), II_MSG(ec), TO_MSG(ec), LL_MSG(ec),
- PP_MSG(ec));
- } else
- pr_warning("Huh? Unknown MCE error 0x%x\n", ec);
-}
-
-static int amd_decode_mce(struct notifier_block *nb, unsigned long val,
- void *data)
-{
- struct mce *m = (struct mce *)data;
- struct err_regs regs;
- int node, ecc;
-
- pr_emerg("MC%d_STATUS: ", m->bank);
-
- pr_cont("%sorrected error, other errors lost: %s, "
- "CPU context corrupt: %s",
- ((m->status & MCI_STATUS_UC) ? "Unc" : "C"),
- ((m->status & MCI_STATUS_OVER) ? "yes" : "no"),
- ((m->status & MCI_STATUS_PCC) ? "yes" : "no"));
-
- /* do the two bits[14:13] together */
- ecc = (m->status >> 45) & 0x3;
- if (ecc)
- pr_cont(", %sECC Error", ((ecc == 2) ? "C" : "U"));
-
- pr_cont("\n");
-
- switch (m->bank) {
- case 0:
- amd_decode_dc_mce(m->status);
- break;
-
- case 1:
- amd_decode_ic_mce(m->status);
- break;
-
- case 2:
- amd_decode_bu_mce(m->status);
- break;
-
- case 3:
- amd_decode_ls_mce(m->status);
- break;
-
- case 4:
- regs.nbsl = (u32) m->status;
- regs.nbsh = (u32)(m->status >> 32);
- regs.nbeal = (u32) m->addr;
- regs.nbeah = (u32)(m->addr >> 32);
- node = amd_get_nb_id(m->extcpu);
-
- amd_decode_nb_mce(node, ®s, 1);
- break;
-
- case 5:
- amd_decode_fr_mce(m->status);
- break;
-
- default:
- break;
- }
-
- amd_decode_err_code(m->status & 0xffff);
-
- return NOTIFY_STOP;
-}
-
-static struct notifier_block amd_mce_dec_nb = {
- .notifier_call = amd_decode_mce,
-};
-
-static int __init mce_amd_init(void)
-{
- /*
- * We can decode MCEs for K8, F10h and F11h CPUs:
- */
- if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
- return 0;
-
- if (boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x11)
- return 0;
-
- atomic_notifier_chain_register(&x86_mce_decoder_chain, &amd_mce_dec_nb);
-
- return 0;
-}
-early_initcall(mce_amd_init);
-
-#ifdef MODULE
-static void __exit mce_amd_exit(void)
-{
- atomic_notifier_chain_unregister(&x86_mce_decoder_chain, &amd_mce_dec_nb);
-}
-
-MODULE_DESCRIPTION("AMD MCE decoder");
-MODULE_ALIAS("edac-mce-amd");
-MODULE_LICENSE("GPL");
-module_exit(mce_amd_exit);
-#endif
struct workqueue_struct *edac_workqueue;
/*
- * sysfs object: /sys/devices/system/edac
- * need to export to other files in this modules
- */
-static struct sysdev_class edac_class = {
- .name = "edac",
-};
-static int edac_class_valid;
-
-/*
* edac_op_state_to_string()
*/
char *edac_op_state_to_string(int opstate)
}
/*
- * edac_get_edac_class()
- *
- * return pointer to the edac class of 'edac'
- */
-struct sysdev_class *edac_get_edac_class(void)
-{
- struct sysdev_class *classptr = NULL;
-
- if (edac_class_valid)
- classptr = &edac_class;
-
- return classptr;
-}
-
-/*
- * edac_register_sysfs_edac_name()
- *
- * register the 'edac' into /sys/devices/system
- *
- * return:
- * 0 success
- * !0 error
- */
-static int edac_register_sysfs_edac_name(void)
-{
- int err;
-
- /* create the /sys/devices/system/edac directory */
- err = sysdev_class_register(&edac_class);
-
- if (err) {
- debugf1("%s() error=%d\n", __func__, err);
- return err;
- }
-
- edac_class_valid = 1;
- return 0;
-}
-
-/*
- * sysdev_class_unregister()
- *
- * unregister the 'edac' from /sys/devices/system
- */
-static void edac_unregister_sysfs_edac_name(void)
-{
- /* only if currently registered, then unregister it */
- if (edac_class_valid)
- sysdev_class_unregister(&edac_class);
-
- edac_class_valid = 0;
-}
-
-/*
* edac_workqueue_setup
* initialize the edac work queue for polling operations
*/
edac_pci_clear_parity_errors();
/*
- * perform the registration of the /sys/devices/system/edac class object
- */
- if (edac_register_sysfs_edac_name()) {
- edac_printk(KERN_ERR, EDAC_MC,
- "Error initializing 'edac' kobject\n");
- err = -ENODEV;
- goto error;
- }
-
- /*
* now set up the mc_kset under the edac class object
*/
err = edac_sysfs_setup_mc_kset();
if (err)
- goto sysfs_setup_fail;
+ goto error;
/* Setup/Initialize the workq for this core */
err = edac_workqueue_setup();
workq_fail:
edac_sysfs_teardown_mc_kset();
-sysfs_setup_fail:
- edac_unregister_sysfs_edac_name();
-
error:
return err;
}
/* tear down the various subsystems */
edac_workqueue_teardown();
edac_sysfs_teardown_mc_kset();
- edac_unregister_sysfs_edac_name();
}
/*
struct edac_device_ctl_info *edac_dev);
extern int edac_device_create_sysfs(struct edac_device_ctl_info *edac_dev);
extern void edac_device_remove_sysfs(struct edac_device_ctl_info *edac_dev);
-extern struct sysdev_class *edac_get_edac_class(void);
/* edac core workqueue: single CPU mode */
extern struct workqueue_struct *edac_workqueue;
*
*/
#include <linux/module.h>
-#include <linux/sysdev.h>
+#include <linux/edac.h>
#include <linux/slab.h>
#include <linux/ctype.h>
/* First time, so create the main kobject and its
* controls and atributes
*/
- edac_class = edac_get_edac_class();
+ edac_class = edac_get_sysfs_class();
if (edac_class == NULL) {
debugf1("%s() no edac_class\n", __func__);
err = -ENODEV;
if (!try_module_get(THIS_MODULE)) {
debugf1("%s() try_module_get() failed\n", __func__);
err = -ENODEV;
- goto decrement_count_fail;
+ goto mod_get_fail;
}
edac_pci_top_main_kobj = kzalloc(sizeof(struct kobject), GFP_KERNEL);
kzalloc_fail:
module_put(THIS_MODULE);
+mod_get_fail:
+ edac_put_sysfs_class();
+
decrement_count_fail:
/* if are on this error exit, nothing to tear down */
atomic_dec(&edac_pci_sysfs_refcount);
__func__);
kobject_put(edac_pci_top_main_kobj);
}
+ edac_put_sysfs_class();
}
/*
*
* Author: Dave Jiang <djiang@mvista.com>
*
- * 2007 (c) MontaVista Software, Inc. This file is licensed under
- * the terms of the GNU General Public License version 2. This program
- * is licensed "as is" without any warranty of any kind, whether express
- * or implied.
+ * 2007 (c) MontaVista Software, Inc.
+ * 2010 (c) Advanced Micro Devices Inc.
+ * Borislav Petkov <borislav.petkov@amd.com>
+ *
+ * This file is licensed under the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
*
*/
#include <linux/module.h>
int edac_err_assert = 0;
EXPORT_SYMBOL_GPL(edac_err_assert);
+static atomic_t edac_class_valid = ATOMIC_INIT(0);
+
/*
* called to determine if there is an EDAC driver interested in
* knowing an event (such as NMI) occurred
edac_err_assert++;
}
EXPORT_SYMBOL_GPL(edac_atomic_assert_error);
+
+/*
+ * sysfs object: /sys/devices/system/edac
+ * need to export to other files
+ */
+struct sysdev_class edac_class = {
+ .name = "edac",
+};
+EXPORT_SYMBOL_GPL(edac_class);
+
+/* return pointer to the 'edac' node in sysfs */
+struct sysdev_class *edac_get_sysfs_class(void)
+{
+ int err = 0;
+
+ if (atomic_read(&edac_class_valid))
+ goto out;
+
+ /* create the /sys/devices/system/edac directory */
+ err = sysdev_class_register(&edac_class);
+ if (err) {
+ printk(KERN_ERR "Error registering toplevel EDAC sysfs dir\n");
+ return NULL;
+ }
+
+out:
+ atomic_inc(&edac_class_valid);
+ return &edac_class;
+}
+EXPORT_SYMBOL_GPL(edac_get_sysfs_class);
+
+void edac_put_sysfs_class(void)
+{
+ /* last user unregisters it */
+ if (atomic_dec_and_test(&edac_class_valid))
+ sysdev_class_unregister(&edac_class);
+}
+EXPORT_SYMBOL_GPL(edac_put_sysfs_class);
ATTR_COUNTER(0),
ATTR_COUNTER(1),
ATTR_COUNTER(2),
+ { .attr = { .name = NULL } }
};
static struct mcidev_sysfs_group i7core_udimm_counters = {
--- /dev/null
+#include <linux/module.h>
+#include <linux/slab.h>
+
+#include "mce_amd.h"
+
+static struct amd_decoder_ops *fam_ops;
+
+static u8 nb_err_cpumask = 0xf;
+
+static bool report_gart_errors;
+static void (*nb_bus_decoder)(int node_id, struct mce *m, u32 nbcfg);
+
+void amd_report_gart_errors(bool v)
+{
+ report_gart_errors = v;
+}
+EXPORT_SYMBOL_GPL(amd_report_gart_errors);
+
+void amd_register_ecc_decoder(void (*f)(int, struct mce *, u32))
+{
+ nb_bus_decoder = f;
+}
+EXPORT_SYMBOL_GPL(amd_register_ecc_decoder);
+
+void amd_unregister_ecc_decoder(void (*f)(int, struct mce *, u32))
+{
+ if (nb_bus_decoder) {
+ WARN_ON(nb_bus_decoder != f);
+
+ nb_bus_decoder = NULL;
+ }
+}
+EXPORT_SYMBOL_GPL(amd_unregister_ecc_decoder);
+
+/*
+ * string representation for the different MCA reported error types, see F3x48
+ * or MSR0000_0411.
+ */
+
+/* transaction type */
+const char *tt_msgs[] = { "INSN", "DATA", "GEN", "RESV" };
+EXPORT_SYMBOL_GPL(tt_msgs);
+
+/* cache level */
+const char *ll_msgs[] = { "RESV", "L1", "L2", "L3/GEN" };
+EXPORT_SYMBOL_GPL(ll_msgs);
+
+/* memory transaction type */
+const char *rrrr_msgs[] = {
+ "GEN", "RD", "WR", "DRD", "DWR", "IRD", "PRF", "EV", "SNP"
+};
+EXPORT_SYMBOL_GPL(rrrr_msgs);
+
+/* participating processor */
+const char *pp_msgs[] = { "SRC", "RES", "OBS", "GEN" };
+EXPORT_SYMBOL_GPL(pp_msgs);
+
+/* request timeout */
+const char *to_msgs[] = { "no timeout", "timed out" };
+EXPORT_SYMBOL_GPL(to_msgs);
+
+/* memory or i/o */
+const char *ii_msgs[] = { "MEM", "RESV", "IO", "GEN" };
+EXPORT_SYMBOL_GPL(ii_msgs);
+
+static const char *f10h_nb_mce_desc[] = {
+ "HT link data error",
+ "Protocol error (link, L3, probe filter, etc.)",
+ "Parity error in NB-internal arrays",
+ "Link Retry due to IO link transmission error",
+ "L3 ECC data cache error",
+ "ECC error in L3 cache tag",
+ "L3 LRU parity bits error",
+ "ECC Error in the Probe Filter directory"
+};
+
+static bool f12h_dc_mce(u16 ec)
+{
+ bool ret = false;
+
+ if (MEM_ERROR(ec)) {
+ u8 ll = ec & 0x3;
+ ret = true;
+
+ if (ll == LL_L2)
+ pr_cont("during L1 linefill from L2.\n");
+ else if (ll == LL_L1)
+ pr_cont("Data/Tag %s error.\n", RRRR_MSG(ec));
+ else
+ ret = false;
+ }
+ return ret;
+}
+
+static bool f10h_dc_mce(u16 ec)
+{
+ u8 r4 = (ec >> 4) & 0xf;
+ u8 ll = ec & 0x3;
+
+ if (r4 == R4_GEN && ll == LL_L1) {
+ pr_cont("during data scrub.\n");
+ return true;
+ }
+ return f12h_dc_mce(ec);
+}
+
+static bool k8_dc_mce(u16 ec)
+{
+ if (BUS_ERROR(ec)) {
+ pr_cont("during system linefill.\n");
+ return true;
+ }
+
+ return f10h_dc_mce(ec);
+}
+
+static bool f14h_dc_mce(u16 ec)
+{
+ u8 r4 = (ec >> 4) & 0xf;
+ u8 ll = ec & 0x3;
+ u8 tt = (ec >> 2) & 0x3;
+ u8 ii = tt;
+ bool ret = true;
+
+ if (MEM_ERROR(ec)) {
+
+ if (tt != TT_DATA || ll != LL_L1)
+ return false;
+
+ switch (r4) {
+ case R4_DRD:
+ case R4_DWR:
+ pr_cont("Data/Tag parity error due to %s.\n",
+ (r4 == R4_DRD ? "load/hw prf" : "store"));
+ break;
+ case R4_EVICT:
+ pr_cont("Copyback parity error on a tag miss.\n");
+ break;
+ case R4_SNOOP:
+ pr_cont("Tag parity error during snoop.\n");
+ break;
+ default:
+ ret = false;
+ }
+ } else if (BUS_ERROR(ec)) {
+
+ if ((ii != II_MEM && ii != II_IO) || ll != LL_LG)
+ return false;
+
+ pr_cont("System read data error on a ");
+
+ switch (r4) {
+ case R4_RD:
+ pr_cont("TLB reload.\n");
+ break;
+ case R4_DWR:
+ pr_cont("store.\n");
+ break;
+ case R4_DRD:
+ pr_cont("load.\n");
+ break;
+ default:
+ ret = false;
+ }
+ } else {
+ ret = false;
+ }
+
+ return ret;
+}
+
+static void amd_decode_dc_mce(struct mce *m)
+{
+ u16 ec = m->status & 0xffff;
+ u8 xec = (m->status >> 16) & 0xf;
+
+ pr_emerg(HW_ERR "Data Cache Error: ");
+
+ /* TLB error signatures are the same across families */
+ if (TLB_ERROR(ec)) {
+ u8 tt = (ec >> 2) & 0x3;
+
+ if (tt == TT_DATA) {
+ pr_cont("%s TLB %s.\n", LL_MSG(ec),
+ (xec ? "multimatch" : "parity error"));
+ return;
+ }
+ else
+ goto wrong_dc_mce;
+ }
+
+ if (!fam_ops->dc_mce(ec))
+ goto wrong_dc_mce;
+
+ return;
+
+wrong_dc_mce:
+ pr_emerg(HW_ERR "Corrupted DC MCE info?\n");
+}
+
+static bool k8_ic_mce(u16 ec)
+{
+ u8 ll = ec & 0x3;
+ u8 r4 = (ec >> 4) & 0xf;
+ bool ret = true;
+
+ if (!MEM_ERROR(ec))
+ return false;
+
+ if (ll == 0x2)
+ pr_cont("during a linefill from L2.\n");
+ else if (ll == 0x1) {
+ switch (r4) {
+ case R4_IRD:
+ pr_cont("Parity error during data load.\n");
+ break;
+
+ case R4_EVICT:
+ pr_cont("Copyback Parity/Victim error.\n");
+ break;
+
+ case R4_SNOOP:
+ pr_cont("Tag Snoop error.\n");
+ break;
+
+ default:
+ ret = false;
+ break;
+ }
+ } else
+ ret = false;
+
+ return ret;
+}
+
+static bool f14h_ic_mce(u16 ec)
+{
+ u8 ll = ec & 0x3;
+ u8 tt = (ec >> 2) & 0x3;
+ u8 r4 = (ec >> 4) & 0xf;
+ bool ret = true;
+
+ if (MEM_ERROR(ec)) {
+ if (tt != 0 || ll != 1)
+ ret = false;
+
+ if (r4 == R4_IRD)
+ pr_cont("Data/tag array parity error for a tag hit.\n");
+ else if (r4 == R4_SNOOP)
+ pr_cont("Tag error during snoop/victimization.\n");
+ else
+ ret = false;
+ }
+ return ret;
+}
+
+static void amd_decode_ic_mce(struct mce *m)
+{
+ u16 ec = m->status & 0xffff;
+ u8 xec = (m->status >> 16) & 0xf;
+
+ pr_emerg(HW_ERR "Instruction Cache Error: ");
+
+ if (TLB_ERROR(ec))
+ pr_cont("%s TLB %s.\n", LL_MSG(ec),
+ (xec ? "multimatch" : "parity error"));
+ else if (BUS_ERROR(ec)) {
+ bool k8 = (boot_cpu_data.x86 == 0xf && (m->status & BIT_64(58)));
+
+ pr_cont("during %s.\n", (k8 ? "system linefill" : "NB data read"));
+ } else if (fam_ops->ic_mce(ec))
+ ;
+ else
+ pr_emerg(HW_ERR "Corrupted IC MCE info?\n");
+}
+
+static void amd_decode_bu_mce(struct mce *m)
+{
+ u32 ec = m->status & 0xffff;
+ u32 xec = (m->status >> 16) & 0xf;
+
+ pr_emerg(HW_ERR "Bus Unit Error");
+
+ if (xec == 0x1)
+ pr_cont(" in the write data buffers.\n");
+ else if (xec == 0x3)
+ pr_cont(" in the victim data buffers.\n");
+ else if (xec == 0x2 && MEM_ERROR(ec))
+ pr_cont(": %s error in the L2 cache tags.\n", RRRR_MSG(ec));
+ else if (xec == 0x0) {
+ if (TLB_ERROR(ec))
+ pr_cont(": %s error in a Page Descriptor Cache or "
+ "Guest TLB.\n", TT_MSG(ec));
+ else if (BUS_ERROR(ec))
+ pr_cont(": %s/ECC error in data read from NB: %s.\n",
+ RRRR_MSG(ec), PP_MSG(ec));
+ else if (MEM_ERROR(ec)) {
+ u8 rrrr = (ec >> 4) & 0xf;
+
+ if (rrrr >= 0x7)
+ pr_cont(": %s error during data copyback.\n",
+ RRRR_MSG(ec));
+ else if (rrrr <= 0x1)
+ pr_cont(": %s parity/ECC error during data "
+ "access from L2.\n", RRRR_MSG(ec));
+ else
+ goto wrong_bu_mce;
+ } else
+ goto wrong_bu_mce;
+ } else
+ goto wrong_bu_mce;
+
+ return;
+
+wrong_bu_mce:
+ pr_emerg(HW_ERR "Corrupted BU MCE info?\n");
+}
+
+static void amd_decode_ls_mce(struct mce *m)
+{
+ u16 ec = m->status & 0xffff;
+ u8 xec = (m->status >> 16) & 0xf;
+
+ if (boot_cpu_data.x86 == 0x14) {
+ pr_emerg("You shouldn't be seeing an LS MCE on this cpu family,"
+ " please report on LKML.\n");
+ return;
+ }
+
+ pr_emerg(HW_ERR "Load Store Error");
+
+ if (xec == 0x0) {
+ u8 r4 = (ec >> 4) & 0xf;
+
+ if (!BUS_ERROR(ec) || (r4 != R4_DRD && r4 != R4_DWR))
+ goto wrong_ls_mce;
+
+ pr_cont(" during %s.\n", RRRR_MSG(ec));
+ } else
+ goto wrong_ls_mce;
+
+ return;
+
+wrong_ls_mce:
+ pr_emerg(HW_ERR "Corrupted LS MCE info?\n");
+}
+
+static bool k8_nb_mce(u16 ec, u8 xec)
+{
+ bool ret = true;
+
+ switch (xec) {
+ case 0x1:
+ pr_cont("CRC error detected on HT link.\n");
+ break;
+
+ case 0x5:
+ pr_cont("Invalid GART PTE entry during GART table walk.\n");
+ break;
+
+ case 0x6:
+ pr_cont("Unsupported atomic RMW received from an IO link.\n");
+ break;
+
+ case 0x0:
+ case 0x8:
+ if (boot_cpu_data.x86 == 0x11)
+ return false;
+
+ pr_cont("DRAM ECC error detected on the NB.\n");
+ break;
+
+ case 0xd:
+ pr_cont("Parity error on the DRAM addr/ctl signals.\n");
+ break;
+
+ default:
+ ret = false;
+ break;
+ }
+
+ return ret;
+}
+
+static bool f10h_nb_mce(u16 ec, u8 xec)
+{
+ bool ret = true;
+ u8 offset = 0;
+
+ if (k8_nb_mce(ec, xec))
+ return true;
+
+ switch(xec) {
+ case 0xa ... 0xc:
+ offset = 10;
+ break;
+
+ case 0xe:
+ offset = 11;
+ break;
+
+ case 0xf:
+ if (TLB_ERROR(ec))
+ pr_cont("GART Table Walk data error.\n");
+ else if (BUS_ERROR(ec))
+ pr_cont("DMA Exclusion Vector Table Walk error.\n");
+ else
+ ret = false;
+
+ goto out;
+ break;
+
+ case 0x1c ... 0x1f:
+ offset = 24;
+ break;
+
+ default:
+ ret = false;
+
+ goto out;
+ break;
+ }
+
+ pr_cont("%s.\n", f10h_nb_mce_desc[xec - offset]);
+
+out:
+ return ret;
+}
+
+static bool nb_noop_mce(u16 ec, u8 xec)
+{
+ return false;
+}
+
+void amd_decode_nb_mce(int node_id, struct mce *m, u32 nbcfg)
+{
+ u8 xec = (m->status >> 16) & 0x1f;
+ u16 ec = m->status & 0xffff;
+ u32 nbsh = (u32)(m->status >> 32);
+
+ pr_emerg(HW_ERR "Northbridge Error, node %d: ", node_id);
+
+ /*
+ * F10h, revD can disable ErrCpu[3:0] so check that first and also the
+ * value encoding has changed so interpret those differently
+ */
+ if ((boot_cpu_data.x86 == 0x10) &&
+ (boot_cpu_data.x86_model > 7)) {
+ if (nbsh & K8_NBSH_ERR_CPU_VAL)
+ pr_cont(", core: %u", (u8)(nbsh & nb_err_cpumask));
+ } else {
+ u8 assoc_cpus = nbsh & nb_err_cpumask;
+
+ if (assoc_cpus > 0)
+ pr_cont(", core: %d", fls(assoc_cpus) - 1);
+ }
+
+ switch (xec) {
+ case 0x2:
+ pr_cont("Sync error (sync packets on HT link detected).\n");
+ return;
+
+ case 0x3:
+ pr_cont("HT Master abort.\n");
+ return;
+
+ case 0x4:
+ pr_cont("HT Target abort.\n");
+ return;
+
+ case 0x7:
+ pr_cont("NB Watchdog timeout.\n");
+ return;
+
+ case 0x9:
+ pr_cont("SVM DMA Exclusion Vector error.\n");
+ return;
+
+ default:
+ break;
+ }
+
+ if (!fam_ops->nb_mce(ec, xec))
+ goto wrong_nb_mce;
+
+ if (boot_cpu_data.x86 == 0xf || boot_cpu_data.x86 == 0x10)
+ if ((xec == 0x8 || xec == 0x0) && nb_bus_decoder)
+ nb_bus_decoder(node_id, m, nbcfg);
+
+ return;
+
+wrong_nb_mce:
+ pr_emerg(HW_ERR "Corrupted NB MCE info?\n");
+}
+EXPORT_SYMBOL_GPL(amd_decode_nb_mce);
+
+static void amd_decode_fr_mce(struct mce *m)
+{
+ if (boot_cpu_data.x86 == 0xf ||
+ boot_cpu_data.x86 == 0x11)
+ goto wrong_fr_mce;
+
+ /* we have only one error signature so match all fields at once. */
+ if ((m->status & 0xffff) == 0x0f0f) {
+ pr_emerg(HW_ERR "FR Error: CPU Watchdog timer expire.\n");
+ return;
+ }
+
+wrong_fr_mce:
+ pr_emerg(HW_ERR "Corrupted FR MCE info?\n");
+}
+
+static inline void amd_decode_err_code(u16 ec)
+{
+ if (TLB_ERROR(ec)) {
+ pr_emerg(HW_ERR "Transaction: %s, Cache Level: %s\n",
+ TT_MSG(ec), LL_MSG(ec));
+ } else if (MEM_ERROR(ec)) {
+ pr_emerg(HW_ERR "Transaction: %s, Type: %s, Cache Level: %s\n",
+ RRRR_MSG(ec), TT_MSG(ec), LL_MSG(ec));
+ } else if (BUS_ERROR(ec)) {
+ pr_emerg(HW_ERR "Transaction: %s (%s), %s, Cache Level: %s, "
+ "Participating Processor: %s\n",
+ RRRR_MSG(ec), II_MSG(ec), TO_MSG(ec), LL_MSG(ec),
+ PP_MSG(ec));
+ } else
+ pr_emerg(HW_ERR "Huh? Unknown MCE error 0x%x\n", ec);
+}
+
+/*
+ * Filter out unwanted MCE signatures here.
+ */
+static bool amd_filter_mce(struct mce *m)
+{
+ u8 xec = (m->status >> 16) & 0x1f;
+
+ /*
+ * NB GART TLB error reporting is disabled by default.
+ */
+ if (m->bank == 4 && xec == 0x5 && !report_gart_errors)
+ return true;
+
+ return false;
+}
+
+int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
+{
+ struct mce *m = (struct mce *)data;
+ int node, ecc;
+
+ if (amd_filter_mce(m))
+ return NOTIFY_STOP;
+
+ pr_emerg(HW_ERR "MC%d_STATUS: ", m->bank);
+
+ pr_cont("%sorrected error, other errors lost: %s, "
+ "CPU context corrupt: %s",
+ ((m->status & MCI_STATUS_UC) ? "Unc" : "C"),
+ ((m->status & MCI_STATUS_OVER) ? "yes" : "no"),
+ ((m->status & MCI_STATUS_PCC) ? "yes" : "no"));
+
+ /* do the two bits[14:13] together */
+ ecc = (m->status >> 45) & 0x3;
+ if (ecc)
+ pr_cont(", %sECC Error", ((ecc == 2) ? "C" : "U"));
+
+ pr_cont("\n");
+
+ switch (m->bank) {
+ case 0:
+ amd_decode_dc_mce(m);
+ break;
+
+ case 1:
+ amd_decode_ic_mce(m);
+ break;
+
+ case 2:
+ amd_decode_bu_mce(m);
+ break;
+
+ case 3:
+ amd_decode_ls_mce(m);
+ break;
+
+ case 4:
+ node = amd_get_nb_id(m->extcpu);
+ amd_decode_nb_mce(node, m, 0);
+ break;
+
+ case 5:
+ amd_decode_fr_mce(m);
+ break;
+
+ default:
+ break;
+ }
+
+ amd_decode_err_code(m->status & 0xffff);
+
+ return NOTIFY_STOP;
+}
+EXPORT_SYMBOL_GPL(amd_decode_mce);
+
+static struct notifier_block amd_mce_dec_nb = {
+ .notifier_call = amd_decode_mce,
+};
+
+static int __init mce_amd_init(void)
+{
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+ return 0;
+
+ if ((boot_cpu_data.x86 < 0xf || boot_cpu_data.x86 > 0x12) &&
+ (boot_cpu_data.x86 != 0x14 || boot_cpu_data.x86_model > 0xf))
+ return 0;
+
+ fam_ops = kzalloc(sizeof(struct amd_decoder_ops), GFP_KERNEL);
+ if (!fam_ops)
+ return -ENOMEM;
+
+ switch (boot_cpu_data.x86) {
+ case 0xf:
+ fam_ops->dc_mce = k8_dc_mce;
+ fam_ops->ic_mce = k8_ic_mce;
+ fam_ops->nb_mce = k8_nb_mce;
+ break;
+
+ case 0x10:
+ fam_ops->dc_mce = f10h_dc_mce;
+ fam_ops->ic_mce = k8_ic_mce;
+ fam_ops->nb_mce = f10h_nb_mce;
+ break;
+
+ case 0x11:
+ fam_ops->dc_mce = k8_dc_mce;
+ fam_ops->ic_mce = k8_ic_mce;
+ fam_ops->nb_mce = f10h_nb_mce;
+ break;
+
+ case 0x12:
+ fam_ops->dc_mce = f12h_dc_mce;
+ fam_ops->ic_mce = k8_ic_mce;
+ fam_ops->nb_mce = nb_noop_mce;
+ break;
+
+ case 0x14:
+ nb_err_cpumask = 0x3;
+ fam_ops->dc_mce = f14h_dc_mce;
+ fam_ops->ic_mce = f14h_ic_mce;
+ fam_ops->nb_mce = nb_noop_mce;
+ break;
+
+ default:
+ printk(KERN_WARNING "Huh? What family is that: %d?!\n",
+ boot_cpu_data.x86);
+ kfree(fam_ops);
+ return -EINVAL;
+ }
+
+ pr_info("MCE: In-kernel MCE decoding enabled.\n");
+
+ atomic_notifier_chain_register(&x86_mce_decoder_chain, &amd_mce_dec_nb);
+
+ return 0;
+}
+early_initcall(mce_amd_init);
+
+#ifdef MODULE
+static void __exit mce_amd_exit(void)
+{
+ atomic_notifier_chain_unregister(&x86_mce_decoder_chain, &amd_mce_dec_nb);
+ kfree(fam_ops);
+}
+
+MODULE_DESCRIPTION("AMD MCE decoder");
+MODULE_ALIAS("edac-mce-amd");
+MODULE_LICENSE("GPL");
+module_exit(mce_amd_exit);
+#endif
#ifndef _EDAC_MCE_AMD_H
#define _EDAC_MCE_AMD_H
+#include <linux/notifier.h>
+
#include <asm/mce.h>
+#define BIT_64(n) (U64_C(1) << (n))
+
#define ERROR_CODE(x) ((x) & 0xffff)
#define EXT_ERROR_CODE(x) (((x) >> 16) & 0x1f)
-#define EXT_ERR_MSG(x) ext_msgs[EXT_ERROR_CODE(x)]
#define LOW_SYNDROME(x) (((x) >> 15) & 0xff)
#define HIGH_SYNDROME(x) (((x) >> 24) & 0xff)
#define II_MSG(x) ii_msgs[II(x)]
#define LL(x) (((x) >> 0) & 0x3)
#define LL_MSG(x) ll_msgs[LL(x)]
-#define RRRR(x) (((x) >> 4) & 0xf)
-#define RRRR_MSG(x) rrrr_msgs[RRRR(x)]
#define TO(x) (((x) >> 8) & 0x1)
#define TO_MSG(x) to_msgs[TO(x)]
#define PP(x) (((x) >> 9) & 0x3)
#define PP_MSG(x) pp_msgs[PP(x)]
+#define RRRR(x) (((x) >> 4) & 0xf)
+#define RRRR_MSG(x) ((RRRR(x) < 9) ? rrrr_msgs[RRRR(x)] : "Wrong R4!")
+
#define K8_NBSH 0x4C
#define K8_NBSH_VALID_BIT BIT(31)
#define K8_NBSH_UECC BIT(13)
#define K8_NBSH_ERR_SCRUBER BIT(8)
+enum tt_ids {
+ TT_INSTR = 0,
+ TT_DATA,
+ TT_GEN,
+ TT_RESV,
+};
+
+enum ll_ids {
+ LL_RESV = 0,
+ LL_L1,
+ LL_L2,
+ LL_LG,
+};
+
+enum ii_ids {
+ II_MEM = 0,
+ II_RESV,
+ II_IO,
+ II_GEN,
+};
+
+enum rrrr_ids {
+ R4_GEN = 0,
+ R4_RD,
+ R4_WR,
+ R4_DRD,
+ R4_DWR,
+ R4_IRD,
+ R4_PREF,
+ R4_EVICT,
+ R4_SNOOP,
+};
+
extern const char *tt_msgs[];
extern const char *ll_msgs[];
extern const char *rrrr_msgs[];
extern const char *pp_msgs[];
extern const char *to_msgs[];
extern const char *ii_msgs[];
-extern const char *ext_msgs[];
/*
* relevant NB regs
u32 nbeal;
};
+/*
+ * per-family decoder ops
+ */
+struct amd_decoder_ops {
+ bool (*dc_mce)(u16);
+ bool (*ic_mce)(u16);
+ bool (*nb_mce)(u16, u8);
+};
void amd_report_gart_errors(bool);
-void amd_register_ecc_decoder(void (*f)(int, struct err_regs *));
-void amd_unregister_ecc_decoder(void (*f)(int, struct err_regs *));
-void amd_decode_nb_mce(int, struct err_regs *, int);
+void amd_register_ecc_decoder(void (*f)(int, struct mce *, u32));
+void amd_unregister_ecc_decoder(void (*f)(int, struct mce *, u32));
+void amd_decode_nb_mce(int, struct mce *, u32);
+int amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data);
#endif /* _EDAC_MCE_AMD_H */
--- /dev/null
+/*
+ * A simple MCE injection facility for testing the MCE decoding code. This
+ * driver should be built as module so that it can be loaded on production
+ * kernels for testing purposes.
+ *
+ * This file may be distributed under the terms of the GNU General Public
+ * License version 2.
+ *
+ * Copyright (c) 2010: Borislav Petkov <borislav.petkov@amd.com>
+ * Advanced Micro Devices Inc.
+ */
+
+#include <linux/kobject.h>
+#include <linux/sysdev.h>
+#include <linux/edac.h>
+#include <asm/mce.h>
+
+#include "mce_amd.h"
+
+struct edac_mce_attr {
+ struct attribute attr;
+ ssize_t (*show) (struct kobject *kobj, struct edac_mce_attr *attr, char *buf);
+ ssize_t (*store)(struct kobject *kobj, struct edac_mce_attr *attr,
+ const char *buf, size_t count);
+};
+
+#define EDAC_MCE_ATTR(_name, _mode, _show, _store) \
+static struct edac_mce_attr mce_attr_##_name = __ATTR(_name, _mode, _show, _store)
+
+static struct kobject *mce_kobj;
+
+/*
+ * Collect all the MCi_XXX settings
+ */
+static struct mce i_mce;
+
+#define MCE_INJECT_STORE(reg) \
+static ssize_t edac_inject_##reg##_store(struct kobject *kobj, \
+ struct edac_mce_attr *attr, \
+ const char *data, size_t count)\
+{ \
+ int ret = 0; \
+ unsigned long value; \
+ \
+ ret = strict_strtoul(data, 16, &value); \
+ if (ret < 0) \
+ printk(KERN_ERR "Error writing MCE " #reg " field.\n"); \
+ \
+ i_mce.reg = value; \
+ \
+ return count; \
+}
+
+MCE_INJECT_STORE(status);
+MCE_INJECT_STORE(misc);
+MCE_INJECT_STORE(addr);
+
+#define MCE_INJECT_SHOW(reg) \
+static ssize_t edac_inject_##reg##_show(struct kobject *kobj, \
+ struct edac_mce_attr *attr, \
+ char *buf) \
+{ \
+ return sprintf(buf, "0x%016llx\n", i_mce.reg); \
+}
+
+MCE_INJECT_SHOW(status);
+MCE_INJECT_SHOW(misc);
+MCE_INJECT_SHOW(addr);
+
+EDAC_MCE_ATTR(status, 0644, edac_inject_status_show, edac_inject_status_store);
+EDAC_MCE_ATTR(misc, 0644, edac_inject_misc_show, edac_inject_misc_store);
+EDAC_MCE_ATTR(addr, 0644, edac_inject_addr_show, edac_inject_addr_store);
+
+/*
+ * This denotes into which bank we're injecting and triggers
+ * the injection, at the same time.
+ */
+static ssize_t edac_inject_bank_store(struct kobject *kobj,
+ struct edac_mce_attr *attr,
+ const char *data, size_t count)
+{
+ int ret = 0;
+ unsigned long value;
+
+ ret = strict_strtoul(data, 10, &value);
+ if (ret < 0) {
+ printk(KERN_ERR "Invalid bank value!\n");
+ return -EINVAL;
+ }
+
+ if (value > 5) {
+ printk(KERN_ERR "Non-existant MCE bank: %lu\n", value);
+ return -EINVAL;
+ }
+
+ i_mce.bank = value;
+
+ amd_decode_mce(NULL, 0, &i_mce);
+
+ return count;
+}
+
+static ssize_t edac_inject_bank_show(struct kobject *kobj,
+ struct edac_mce_attr *attr, char *buf)
+{
+ return sprintf(buf, "%d\n", i_mce.bank);
+}
+
+EDAC_MCE_ATTR(bank, 0644, edac_inject_bank_show, edac_inject_bank_store);
+
+static struct edac_mce_attr *sysfs_attrs[] = { &mce_attr_status, &mce_attr_misc,
+ &mce_attr_addr, &mce_attr_bank
+};
+
+static int __init edac_init_mce_inject(void)
+{
+ struct sysdev_class *edac_class = NULL;
+ int i, err = 0;
+
+ edac_class = edac_get_sysfs_class();
+ if (!edac_class)
+ return -EINVAL;
+
+ mce_kobj = kobject_create_and_add("mce", &edac_class->kset.kobj);
+ if (!mce_kobj) {
+ printk(KERN_ERR "Error creating a mce kset.\n");
+ err = -ENOMEM;
+ goto err_mce_kobj;
+ }
+
+ for (i = 0; i < ARRAY_SIZE(sysfs_attrs); i++) {
+ err = sysfs_create_file(mce_kobj, &sysfs_attrs[i]->attr);
+ if (err) {
+ printk(KERN_ERR "Error creating %s in sysfs.\n",
+ sysfs_attrs[i]->attr.name);
+ goto err_sysfs_create;
+ }
+ }
+ return 0;
+
+err_sysfs_create:
+ while (i-- >= 0)
+ sysfs_remove_file(mce_kobj, &sysfs_attrs[i]->attr);
+
+ kobject_del(mce_kobj);
+
+err_mce_kobj:
+ edac_put_sysfs_class();
+
+ return err;
+}
+
+static void __exit edac_exit_mce_inject(void)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(sysfs_attrs); i++)
+ sysfs_remove_file(mce_kobj, &sysfs_attrs[i]->attr);
+
+ kobject_del(mce_kobj);
+
+ edac_put_sysfs_class();
+}
+
+module_init(edac_init_mce_inject);
+module_exit(edac_exit_mce_inject);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Borislav Petkov <borislav.petkov@amd.com>");
+MODULE_AUTHOR("AMD Inc.");
+MODULE_DESCRIPTION("MCE injection facility for testing MCE decoding");
{PCI_VENDOR_ID_JMICRON, PCI_DEVICE_ID_JMICRON_JMB38X_FW, QUIRK_NO_MSI},
{PCI_VENDOR_ID_NEC, PCI_ANY_ID, QUIRK_CYCLE_TIMER},
{PCI_VENDOR_ID_VIA, PCI_ANY_ID, QUIRK_CYCLE_TIMER},
+ {PCI_VENDOR_ID_RICOH, PCI_ANY_ID, QUIRK_CYCLE_TIMER},
{PCI_VENDOR_ID_APPLE, PCI_DEVICE_ID_APPLE_UNI_N_FW, QUIRK_BE_HEADERS},
};
const struct pci_device_id *ent)
{
struct fw_ohci *ohci;
- u32 bus_options, max_receive, link_speed, version, link_enh;
+ u32 bus_options, max_receive, link_speed, version;
u64 guid;
int i, err, n_ir, n_it;
size_t size;
if (param_quirks)
ohci->quirks = param_quirks;
- /* TI OHCI-Lynx and compatible: set recommended configuration bits. */
- if (dev->vendor == PCI_VENDOR_ID_TI) {
- pci_read_config_dword(dev, PCI_CFG_TI_LinkEnh, &link_enh);
-
- /* adjust latency of ATx FIFO: use 1.7 KB threshold */
- link_enh &= ~TI_LinkEnh_atx_thresh_mask;
- link_enh |= TI_LinkEnh_atx_thresh_1_7K;
-
- /* use priority arbitration for asynchronous responses */
- link_enh |= TI_LinkEnh_enab_unfair;
-
- /* required for aPhyEnhanceEnable to work */
- link_enh |= TI_LinkEnh_enab_accel;
-
- pci_write_config_dword(dev, PCI_CFG_TI_LinkEnh, link_enh);
- }
-
ar_context_init(&ohci->ar_request_ctx, ohci,
OHCI1394_AsReqRcvContextControlSet);
#define OHCI1394_phy_tcode 0xe
-/* TI extensions */
-
-#define PCI_CFG_TI_LinkEnh 0xf4
-#define TI_LinkEnh_enab_accel 0x00000002
-#define TI_LinkEnh_enab_unfair 0x00000080
-#define TI_LinkEnh_atx_thresh_mask 0x00003000
-#define TI_LinkEnh_atx_thresh_1_7K 0x00001000
-
#endif /* _FIREWIRE_OHCI_H */
config ISCSI_IBFT
tristate "iSCSI Boot Firmware Table Attributes module"
select ISCSI_BOOT_SYSFS
- depends on ISCSI_IBFT_FIND && SCSI
+ depends on ISCSI_IBFT_FIND && SCSI && SCSI_LOWLEVEL
default n
help
This option enables support for detection and exposing of iSCSI
return err;
}
-static int sx150x_init_hw(struct sx150x_chip *chip,
- struct sx150x_platform_data *pdata)
+static int sx150x_reset(struct sx150x_chip *chip)
{
- int err = 0;
+ int err;
- err = i2c_smbus_write_word_data(chip->client,
+ err = i2c_smbus_write_byte_data(chip->client,
chip->dev_cfg->reg_reset,
- 0x3412);
+ 0x12);
if (err < 0)
return err;
+ err = i2c_smbus_write_byte_data(chip->client,
+ chip->dev_cfg->reg_reset,
+ 0x34);
+ return err;
+}
+
+static int sx150x_init_hw(struct sx150x_chip *chip,
+ struct sx150x_platform_data *pdata)
+{
+ int err = 0;
+
+ if (pdata->reset_during_probe) {
+ err = sx150x_reset(chip);
+ if (err < 0)
+ return err;
+ }
+
err = sx150x_i2c_write(chip->client,
chip->dev_cfg->reg_misc,
0x01);
* user_data: A pointer the data that is copied to the buffer.
* size: The Number of bytes to copy.
*/
-extern int drm_buffer_copy_from_user(struct drm_buffer *buf,
- void __user *user_data, int size)
+int drm_buffer_copy_from_user(struct drm_buffer *buf,
+ void __user *user_data, int size)
{
int nr_pages = size / PAGE_SIZE + 1;
int idx;
{
int idx = drm_buffer_index(buf);
int page = drm_buffer_page(buf);
- void *obj = 0;
+ void *obj = NULL;
if (idx + objsize <= PAGE_SIZE) {
obj = &buf->data[page][idx];
if (connector->funcs->force)
connector->funcs->force(connector);
} else {
- connector->status = connector->funcs->detect(connector);
- drm_helper_hpd_irq_event(dev);
+ connector->status = connector->funcs->detect(connector, true);
+ drm_kms_helper_poll_enable(dev);
}
if (connector->status == connector_status_disconnected) {
mode_changed = true;
if (mode_changed) {
- old_fb = set->crtc->fb;
- set->crtc->fb = set->fb;
set->crtc->enabled = (set->mode != NULL);
if (set->mode != NULL) {
DRM_DEBUG_KMS("attempting to set mode from"
" userspace\n");
drm_mode_debug_printmodeline(set->mode);
+ old_fb = set->crtc->fb;
+ set->crtc->fb = set->fb;
if (!drm_crtc_helper_set_mode(set->crtc, set->mode,
set->x, set->y,
old_fb)) {
!(connector->polled & DRM_CONNECTOR_POLL_HPD))
continue;
- status = connector->funcs->detect(connector);
+ status = connector->funcs->detect(connector, false);
if (old_status != status)
changed = true;
}
return -ENOMEM;
kref_init(&obj->refcount);
- kref_init(&obj->handlecount);
+ atomic_set(&obj->handle_count, 0);
obj->size = size;
atomic_inc(&dev->object_count);
}
EXPORT_SYMBOL(drm_gem_object_free);
-/**
- * Called after the last reference to the object has been lost.
- * Must be called without holding struct_mutex
- *
- * Frees the object
- */
-void
-drm_gem_object_free_unlocked(struct kref *kref)
-{
- struct drm_gem_object *obj = (struct drm_gem_object *) kref;
- struct drm_device *dev = obj->dev;
-
- if (dev->driver->gem_free_object_unlocked != NULL)
- dev->driver->gem_free_object_unlocked(obj);
- else if (dev->driver->gem_free_object != NULL) {
- mutex_lock(&dev->struct_mutex);
- dev->driver->gem_free_object(obj);
- mutex_unlock(&dev->struct_mutex);
- }
-}
-EXPORT_SYMBOL(drm_gem_object_free_unlocked);
-
static void drm_gem_object_ref_bug(struct kref *list_kref)
{
BUG();
* called before drm_gem_object_free or we'll be touching
* freed memory
*/
-void
-drm_gem_object_handle_free(struct kref *kref)
+void drm_gem_object_handle_free(struct drm_gem_object *obj)
{
- struct drm_gem_object *obj = container_of(kref,
- struct drm_gem_object,
- handlecount);
struct drm_device *dev = obj->dev;
/* Remove any name for this object */
struct drm_gem_object *obj = vma->vm_private_data;
drm_gem_object_reference(obj);
+
+ mutex_lock(&obj->dev->struct_mutex);
+ drm_vm_open_locked(vma);
+ mutex_unlock(&obj->dev->struct_mutex);
}
EXPORT_SYMBOL(drm_gem_vm_open);
{
struct drm_gem_object *obj = vma->vm_private_data;
- drm_gem_object_unreference_unlocked(obj);
+ mutex_lock(&obj->dev->struct_mutex);
+ drm_vm_close_locked(vma);
+ drm_gem_object_unreference(obj);
+ mutex_unlock(&obj->dev->struct_mutex);
}
EXPORT_SYMBOL(drm_gem_vm_close);
seq_printf(m, "%6d %8zd %7d %8d\n",
obj->name, obj->size,
- atomic_read(&obj->handlecount.refcount),
+ atomic_read(&obj->handle_count),
atomic_read(&obj->refcount.refcount));
return 0;
}
dev->hose = pdev->sysdata;
#endif
+ mutex_lock(&drm_global_mutex);
+
if ((ret = drm_fill_in_dev(dev, ent, driver))) {
printk(KERN_ERR "DRM: Fill_in_dev failed.\n");
goto err_g2;
driver->name, driver->major, driver->minor, driver->patchlevel,
driver->date, pci_name(pdev), dev->primary->index);
+ mutex_unlock(&drm_global_mutex);
return 0;
err_g4:
pci_disable_device(pdev);
err_g1:
kfree(dev);
+ mutex_unlock(&drm_global_mutex);
return ret;
}
EXPORT_SYMBOL(drm_get_pci_dev);
dev->platformdev = platdev;
dev->dev = &platdev->dev;
+ mutex_lock(&drm_global_mutex);
+
ret = drm_fill_in_dev(dev, NULL, driver);
if (ret) {
list_add_tail(&dev->driver_item, &driver->device_list);
+ mutex_unlock(&drm_global_mutex);
+
DRM_INFO("Initialized %s %d.%d.%d %s on minor %d\n",
driver->name, driver->major, driver->minor, driver->patchlevel,
driver->date, dev->primary->index);
drm_put_minor(&dev->control);
err_g1:
kfree(dev);
+ mutex_unlock(&drm_global_mutex);
return ret;
}
EXPORT_SYMBOL(drm_get_platform_dev);
struct drm_connector *connector = to_drm_connector(device);
enum drm_connector_status status;
- status = connector->funcs->detect(connector);
+ status = connector->funcs->detect(connector, true);
return snprintf(buf, PAGE_SIZE, "%s\n",
drm_get_connector_status_name(status));
}
mutex_unlock(&dev->struct_mutex);
}
-/**
- * \c close method for all virtual memory types.
- *
- * \param vma virtual memory area.
- *
- * Search the \p vma private data entry in drm_device::vmalist, unlink it, and
- * free it.
- */
-static void drm_vm_close(struct vm_area_struct *vma)
+void drm_vm_close_locked(struct vm_area_struct *vma)
{
struct drm_file *priv = vma->vm_file->private_data;
struct drm_device *dev = priv->minor->dev;
vma->vm_start, vma->vm_end - vma->vm_start);
atomic_dec(&dev->vma_count);
- mutex_lock(&dev->struct_mutex);
list_for_each_entry_safe(pt, temp, &dev->vmalist, head) {
if (pt->vma == vma) {
list_del(&pt->head);
break;
}
}
+}
+
+/**
+ * \c close method for all virtual memory types.
+ *
+ * \param vma virtual memory area.
+ *
+ * Search the \p vma private data entry in drm_device::vmalist, unlink it, and
+ * free it.
+ */
+static void drm_vm_close(struct vm_area_struct *vma)
+{
+ struct drm_file *priv = vma->vm_file->private_data;
+ struct drm_device *dev = priv->minor->dev;
+
+ mutex_lock(&dev->struct_mutex);
+ drm_vm_close_locked(vma);
mutex_unlock(&dev->struct_mutex);
}
static const struct file_operations i810_buffer_fops = {
.open = drm_open,
.release = drm_release,
- .unlocked_ioctl = drm_ioctl,
+ .unlocked_ioctl = i810_ioctl,
.mmap = i810_mmap_buffers,
.fasync = drm_fasync,
};
static const struct file_operations i830_buffer_fops = {
.open = drm_open,
.release = drm_release,
- .unlocked_ioctl = drm_ioctl,
+ .unlocked_ioctl = i830_ioctl,
.mmap = i830_mmap_buffers,
.fasync = drm_fasync,
};
}
}
- div_u64(diff, diff1);
+ diff = div_u64(diff, diff1);
ret = ((m * diff) + c);
- div_u64(ret, 10);
+ ret = div_u64(ret, 10);
dev_priv->last_count1 = total_count;
dev_priv->last_time1 = now;
/* More magic constants... */
diff = diff * 1181;
- div_u64(diff, diffms * 10);
+ diff = div_u64(diff, diffms * 10);
dev_priv->gfx_power = diff;
}
dev_priv->mchdev_lock = &mchdev_lock;
spin_unlock(&mchdev_lock);
+ /* XXX Prevent module unload due to memory corruption bugs. */
+ __module_get(THIS_MODULE);
+
return 0;
out_workqueue_free:
INTEL_VGA_DEVICE(0x2e22, &intel_g45_info), /* G45_G */
INTEL_VGA_DEVICE(0x2e32, &intel_g45_info), /* G41_G */
INTEL_VGA_DEVICE(0x2e42, &intel_g45_info), /* B43_G */
+ INTEL_VGA_DEVICE(0x2e92, &intel_g45_info), /* B43_G.1 */
INTEL_VGA_DEVICE(0xa001, &intel_pineview_info),
INTEL_VGA_DEVICE(0xa011, &intel_pineview_info),
INTEL_VGA_DEVICE(0x0042, &intel_ironlake_d_info),
return -ENOMEM;
ret = drm_gem_handle_create(file_priv, obj, &handle);
+ /* drop reference from allocate - handle holds it now */
+ drm_gem_object_unreference_unlocked(obj);
if (ret) {
- drm_gem_object_unreference_unlocked(obj);
return ret;
}
- /* Sink the floating reference from kref_init(handlecount) */
- drm_gem_object_handle_unreference_unlocked(obj);
-
args->handle = handle;
return 0;
}
return -ENOENT;
obj_priv = to_intel_bo(obj);
- /* Bounds check source.
- *
- * XXX: This could use review for overflow issues...
- */
- if (args->offset > obj->size || args->size > obj->size ||
- args->offset + args->size > obj->size) {
- drm_gem_object_unreference_unlocked(obj);
- return -EINVAL;
+ /* Bounds check source. */
+ if (args->offset > obj->size || args->size > obj->size - args->offset) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ if (!access_ok(VERIFY_WRITE,
+ (char __user *)(uintptr_t)args->data_ptr,
+ args->size)) {
+ ret = -EFAULT;
+ goto err;
}
if (i915_gem_object_needs_bit17_swizzle(obj)) {
file_priv);
}
+err:
drm_gem_object_unreference_unlocked(obj);
-
return ret;
}
user_data = (char __user *) (uintptr_t) args->data_ptr;
remain = args->size;
- if (!access_ok(VERIFY_READ, user_data, remain))
- return -EFAULT;
mutex_lock(&dev->struct_mutex);
return -ENOENT;
obj_priv = to_intel_bo(obj);
- /* Bounds check destination.
- *
- * XXX: This could use review for overflow issues...
- */
- if (args->offset > obj->size || args->size > obj->size ||
- args->offset + args->size > obj->size) {
- drm_gem_object_unreference_unlocked(obj);
- return -EINVAL;
+ /* Bounds check destination. */
+ if (args->offset > obj->size || args->size > obj->size - args->offset) {
+ ret = -EINVAL;
+ goto err;
+ }
+
+ if (!access_ok(VERIFY_READ,
+ (char __user *)(uintptr_t)args->data_ptr,
+ args->size)) {
+ ret = -EFAULT;
+ goto err;
}
/* We can only do the GTT pwrite on untiled buffers, as otherwise
DRM_INFO("pwrite failed %d\n", ret);
#endif
+err:
drm_gem_object_unreference_unlocked(obj);
-
return ret;
}
reg->obj = obj;
- if (IS_GEN6(dev))
+ switch (INTEL_INFO(dev)->gen) {
+ case 6:
sandybridge_write_fence_reg(reg);
- else if (IS_I965G(dev))
+ break;
+ case 5:
+ case 4:
i965_write_fence_reg(reg);
- else if (IS_I9XX(dev))
+ break;
+ case 3:
i915_write_fence_reg(reg);
- else
+ break;
+ case 2:
i830_write_fence_reg(reg);
+ break;
+ }
trace_i915_gem_object_get_fence(obj, obj_priv->fence_reg,
obj_priv->tiling_mode);
struct drm_i915_gem_object *obj_priv = to_intel_bo(obj);
struct drm_i915_fence_reg *reg =
&dev_priv->fence_regs[obj_priv->fence_reg];
+ uint32_t fence_reg;
- if (IS_GEN6(dev)) {
+ switch (INTEL_INFO(dev)->gen) {
+ case 6:
I915_WRITE64(FENCE_REG_SANDYBRIDGE_0 +
(obj_priv->fence_reg * 8), 0);
- } else if (IS_I965G(dev)) {
+ break;
+ case 5:
+ case 4:
I915_WRITE64(FENCE_REG_965_0 + (obj_priv->fence_reg * 8), 0);
- } else {
- uint32_t fence_reg;
-
- if (obj_priv->fence_reg < 8)
- fence_reg = FENCE_REG_830_0 + obj_priv->fence_reg * 4;
+ break;
+ case 3:
+ if (obj_priv->fence_reg >= 8)
+ fence_reg = FENCE_REG_945_8 + (obj_priv->fence_reg - 8) * 4;
else
- fence_reg = FENCE_REG_945_8 + (obj_priv->fence_reg -
- 8) * 4;
+ case 2:
+ fence_reg = FENCE_REG_830_0 + obj_priv->fence_reg * 4;
I915_WRITE(fence_reg, 0);
+ break;
}
reg->obj = NULL;
(int) reloc->offset,
reloc->read_domains,
reloc->write_domain);
+ drm_gem_object_unreference(target_obj);
+ i915_gem_object_unpin(obj);
return -EINVAL;
}
if (reloc->write_domain & I915_GEM_DOMAIN_CPU ||
struct list_head *unwind)
{
list_add(&obj_priv->evict_list, unwind);
+ drm_gem_object_reference(&obj_priv->base);
return drm_mm_scan_add_block(obj_priv->gtt_space);
}
{
drm_i915_private_t *dev_priv = dev->dev_private;
struct list_head eviction_list, unwind_list;
- struct drm_i915_gem_object *obj_priv, *tmp_obj_priv;
+ struct drm_i915_gem_object *obj_priv;
struct list_head *render_iter, *bsd_iter;
int ret = 0;
list_for_each_entry(obj_priv, &unwind_list, evict_list) {
ret = drm_mm_scan_remove_block(obj_priv->gtt_space);
BUG_ON(ret);
+ drm_gem_object_unreference(&obj_priv->base);
}
/* We expect the caller to unpin, evict all and try again, or give up.
return -ENOSPC;
found:
+ /* drm_mm doesn't allow any other other operations while
+ * scanning, therefore store to be evicted objects on a
+ * temporary list. */
INIT_LIST_HEAD(&eviction_list);
- list_for_each_entry_safe(obj_priv, tmp_obj_priv,
- &unwind_list, evict_list) {
+ while (!list_empty(&unwind_list)) {
+ obj_priv = list_first_entry(&unwind_list,
+ struct drm_i915_gem_object,
+ evict_list);
if (drm_mm_scan_remove_block(obj_priv->gtt_space)) {
- /* drm_mm doesn't allow any other other operations while
- * scanning, therefore store to be evicted objects on a
- * temporary list. */
list_move(&obj_priv->evict_list, &eviction_list);
+ continue;
}
+ list_del(&obj_priv->evict_list);
+ drm_gem_object_unreference(&obj_priv->base);
}
/* Unbinding will emit any required flushes */
- list_for_each_entry_safe(obj_priv, tmp_obj_priv,
- &eviction_list, evict_list) {
-#if WATCH_LRU
- DRM_INFO("%s: evicting %p\n", __func__, obj);
-#endif
- ret = i915_gem_object_unbind(&obj_priv->base);
- if (ret)
- return ret;
+ while (!list_empty(&eviction_list)) {
+ obj_priv = list_first_entry(&eviction_list,
+ struct drm_i915_gem_object,
+ evict_list);
+ if (ret == 0)
+ ret = i915_gem_object_unbind(&obj_priv->base);
+ list_del(&obj_priv->evict_list);
+ drm_gem_object_unreference(&obj_priv->base);
}
- /* The just created free hole should be on the top of the free stack
- * maintained by drm_mm, so this BUG_ON actually executes in O(1).
- * Furthermore all accessed data has just recently been used, so it
- * should be really fast, too. */
- BUG_ON(!drm_mm_search_free(&dev_priv->mm.gtt_space, min_size,
- alignment, 0));
-
- return 0;
+ return ret;
}
int
i915_seqno_passed(i915_get_gem_seqno(dev,
&dev_priv->render_ring),
i915_get_tail_request(dev)->seqno)) {
+ bool missed_wakeup = false;
+
dev_priv->hangcheck_count = 0;
/* Issue a wake-up to catch stuck h/w. */
- if (dev_priv->render_ring.waiting_gem_seqno |
- dev_priv->bsd_ring.waiting_gem_seqno) {
- DRM_ERROR("Hangcheck timer elapsed... GPU idle, missed IRQ.\n");
- if (dev_priv->render_ring.waiting_gem_seqno)
- DRM_WAKEUP(&dev_priv->render_ring.irq_queue);
- if (dev_priv->bsd_ring.waiting_gem_seqno)
- DRM_WAKEUP(&dev_priv->bsd_ring.irq_queue);
+ if (dev_priv->render_ring.waiting_gem_seqno &&
+ waitqueue_active(&dev_priv->render_ring.irq_queue)) {
+ DRM_WAKEUP(&dev_priv->render_ring.irq_queue);
+ missed_wakeup = true;
+ }
+
+ if (dev_priv->bsd_ring.waiting_gem_seqno &&
+ waitqueue_active(&dev_priv->bsd_ring.irq_queue)) {
+ DRM_WAKEUP(&dev_priv->bsd_ring.irq_queue);
+ missed_wakeup = true;
}
+
+ if (missed_wakeup)
+ DRM_ERROR("Hangcheck timer elapsed... GPU idle, missed IRQ.\n");
return;
}
#define WM1_LP_SR_EN (1<<31)
#define WM1_LP_LATENCY_SHIFT 24
#define WM1_LP_LATENCY_MASK (0x7f<<24)
+#define WM1_LP_FBC_LP1_MASK (0xf<<20)
+#define WM1_LP_FBC_LP1_SHIFT 20
#define WM1_LP_SR_MASK (0x1ff<<8)
#define WM1_LP_SR_SHIFT 8
#define WM1_LP_CURSOR_MASK (0x3f)
+#define WM2_LP_ILK 0x4510c
+#define WM2_LP_EN (1<<31)
+#define WM3_LP_ILK 0x45110
+#define WM3_LP_EN (1<<31)
+#define WM1S_LP_ILK 0x45120
+#define WM1S_LP_EN (1<<31)
/* Memory latency timer register */
#define MLTR_ILK 0x11222
dev_priv->saveSWF2[i] = I915_READ(SWF30 + (i << 2));
/* Fences */
- if (IS_I965G(dev)) {
+ switch (INTEL_INFO(dev)->gen) {
+ case 6:
+ for (i = 0; i < 16; i++)
+ dev_priv->saveFENCE[i] = I915_READ64(FENCE_REG_SANDYBRIDGE_0 + (i * 8));
+ break;
+ case 5:
+ case 4:
for (i = 0; i < 16; i++)
dev_priv->saveFENCE[i] = I915_READ64(FENCE_REG_965_0 + (i * 8));
- } else {
- for (i = 0; i < 8; i++)
- dev_priv->saveFENCE[i] = I915_READ(FENCE_REG_830_0 + (i * 4));
-
+ break;
+ case 3:
if (IS_I945G(dev) || IS_I945GM(dev) || IS_G33(dev))
for (i = 0; i < 8; i++)
dev_priv->saveFENCE[i+8] = I915_READ(FENCE_REG_945_8 + (i * 4));
+ case 2:
+ for (i = 0; i < 8; i++)
+ dev_priv->saveFENCE[i] = I915_READ(FENCE_REG_830_0 + (i * 4));
+ break;
+
}
return 0;
I915_WRITE(HWS_PGA, dev_priv->saveHWS);
/* Fences */
- if (IS_I965G(dev)) {
+ switch (INTEL_INFO(dev)->gen) {
+ case 6:
+ for (i = 0; i < 16; i++)
+ I915_WRITE64(FENCE_REG_SANDYBRIDGE_0 + (i * 8), dev_priv->saveFENCE[i]);
+ break;
+ case 5:
+ case 4:
for (i = 0; i < 16; i++)
I915_WRITE64(FENCE_REG_965_0 + (i * 8), dev_priv->saveFENCE[i]);
- } else {
- for (i = 0; i < 8; i++)
- I915_WRITE(FENCE_REG_830_0 + (i * 4), dev_priv->saveFENCE[i]);
+ break;
+ case 3:
+ case 2:
if (IS_I945G(dev) || IS_I945GM(dev) || IS_G33(dev))
for (i = 0; i < 8; i++)
I915_WRITE(FENCE_REG_945_8 + (i * 4), dev_priv->saveFENCE[i+8]);
+ for (i = 0; i < 8; i++)
+ I915_WRITE(FENCE_REG_830_0 + (i * 4), dev_priv->saveFENCE[i]);
+ break;
}
i915_restore_display(dev);
if (wait_for((I915_READ(PCH_ADPA) & ADPA_CRT_HOTPLUG_FORCE_TRIGGER) == 0,
1000, 1))
- DRM_ERROR("timed out waiting for FORCE_TRIGGER");
+ DRM_DEBUG_KMS("timed out waiting for FORCE_TRIGGER");
if (turn_off_dac) {
I915_WRITE(PCH_ADPA, temp);
if (wait_for((I915_READ(PORT_HOTPLUG_EN) &
CRT_HOTPLUG_FORCE_DETECT) == 0,
1000, 1))
- DRM_ERROR("timed out waiting for FORCE_DETECT to go off");
+ DRM_DEBUG_KMS("timed out waiting for FORCE_DETECT to go off");
}
stat = I915_READ(PORT_HOTPLUG_STAT);
return status;
}
-static enum drm_connector_status intel_crt_detect(struct drm_connector *connector)
+static enum drm_connector_status
+intel_crt_detect(struct drm_connector *connector, bool force)
{
struct drm_device *dev = connector->dev;
struct drm_encoder *encoder = intel_attached_encoder(connector);
if (intel_crt_detect_ddc(encoder))
return connector_status_connected;
+ if (!force)
+ return connector->status;
+
/* for pre-945g platforms use load detect */
if (encoder->crtc && encoder->crtc->enabled) {
status = intel_crt_load_detect(encoder->crtc, intel_encoder);
DRM_DEBUG_KMS("vblank wait timed out\n");
}
-/**
- * intel_wait_for_vblank_off - wait for vblank after disabling a pipe
+/*
+ * intel_wait_for_pipe_off - wait for pipe to turn off
* @dev: drm device
* @pipe: pipe to wait for
*
* spinning on the vblank interrupt status bit, since we won't actually
* see an interrupt when the pipe is disabled.
*
- * So this function waits for the display line value to settle (it
- * usually ends up stopping at the start of the next frame).
+ * On Gen4 and above:
+ * wait for the pipe register state bit to turn off
+ *
+ * Otherwise:
+ * wait for the display line value to settle (it usually
+ * ends up stopping at the start of the next frame).
+ *
*/
-void intel_wait_for_vblank_off(struct drm_device *dev, int pipe)
+static void intel_wait_for_pipe_off(struct drm_device *dev, int pipe)
{
struct drm_i915_private *dev_priv = dev->dev_private;
- int pipedsl_reg = (pipe == 0 ? PIPEADSL : PIPEBDSL);
- unsigned long timeout = jiffies + msecs_to_jiffies(100);
- u32 last_line;
-
- /* Wait for the display line to settle */
- do {
- last_line = I915_READ(pipedsl_reg) & DSL_LINEMASK;
- mdelay(5);
- } while (((I915_READ(pipedsl_reg) & DSL_LINEMASK) != last_line) &&
- time_after(timeout, jiffies));
-
- if (time_after(jiffies, timeout))
- DRM_DEBUG_KMS("vblank wait timed out\n");
+
+ if (INTEL_INFO(dev)->gen >= 4) {
+ int pipeconf_reg = (pipe == 0 ? PIPEACONF : PIPEBCONF);
+
+ /* Wait for the Pipe State to go off */
+ if (wait_for((I915_READ(pipeconf_reg) & I965_PIPECONF_ACTIVE) == 0,
+ 100, 0))
+ DRM_DEBUG_KMS("pipe_off wait timed out\n");
+ } else {
+ u32 last_line;
+ int pipedsl_reg = (pipe == 0 ? PIPEADSL : PIPEBDSL);
+ unsigned long timeout = jiffies + msecs_to_jiffies(100);
+
+ /* Wait for the display line to settle */
+ do {
+ last_line = I915_READ(pipedsl_reg) & DSL_LINEMASK;
+ mdelay(5);
+ } while (((I915_READ(pipedsl_reg) & DSL_LINEMASK) != last_line) &&
+ time_after(timeout, jiffies));
+ if (time_after(jiffies, timeout))
+ DRM_DEBUG_KMS("pipe_off wait timed out\n");
+ }
}
/* Parameters have changed, update FBC info */
I915_READ(dspbase_reg);
}
- /* Wait for vblank for the disable to take effect */
- intel_wait_for_vblank_off(dev, pipe);
-
/* Don't disable pipe A or pipe A PLLs if needed */
if (pipeconf_reg == PIPEACONF &&
- (dev_priv->quirks & QUIRK_PIPEA_FORCE))
+ (dev_priv->quirks & QUIRK_PIPEA_FORCE)) {
+ /* Wait for vblank for the disable to take effect */
+ intel_wait_for_vblank(dev, pipe);
goto skip_pipe_off;
+ }
/* Next, disable display pipes */
temp = I915_READ(pipeconf_reg);
I915_READ(pipeconf_reg);
}
- /* Wait for vblank for the disable to take effect. */
- intel_wait_for_vblank_off(dev, pipe);
+ /* Wait for the pipe to turn off */
+ intel_wait_for_pipe_off(dev, pipe);
temp = I915_READ(dpll_reg);
if ((temp & DPLL_VCO_ENABLE) != 0) {
struct drm_display_mode *adjusted_mode)
{
struct drm_device *dev = crtc->dev;
+
if (HAS_PCH_SPLIT(dev)) {
/* FDI link clock is fixed at 2.7G */
if (mode->clock * 3 > IRONLAKE_FDI_FREQ * 4)
return false;
}
+
+ /* XXX some encoders set the crtcinfo, others don't.
+ * Obviously we need some form of conflict resolution here...
+ */
+ if (adjusted_mode->crtc_htotal == 0)
+ drm_mode_set_crtcinfo(adjusted_mode, 0);
+
return true;
}
/* Don't promote wm_size to unsigned... */
if (wm_size > (long)wm->max_wm)
wm_size = wm->max_wm;
- if (wm_size <= 0) {
+ if (wm_size <= 0)
wm_size = wm->default_wm;
- DRM_ERROR("Insufficient FIFO for plane, expect flickering:"
- " entries required = %ld, available = %lu.\n",
- entries_required + wm->guard_size,
- wm->fifo_size);
- }
-
return wm_size;
}
reg_value = I915_READ(WM1_LP_ILK);
reg_value &= ~(WM1_LP_LATENCY_MASK | WM1_LP_SR_MASK |
WM1_LP_CURSOR_MASK);
- reg_value |= WM1_LP_SR_EN |
- (ilk_sr_latency << WM1_LP_LATENCY_SHIFT) |
+ reg_value |= (ilk_sr_latency << WM1_LP_LATENCY_SHIFT) |
(sr_wm << WM1_LP_SR_SHIFT) | cursor_wm;
I915_WRITE(WM1_LP_ILK, reg_value);
I915_WRITE(DISP_ARB_CTL,
(I915_READ(DISP_ARB_CTL) |
DISP_FBC_WM_DIS));
+ I915_WRITE(WM3_LP_ILK, 0);
+ I915_WRITE(WM2_LP_ILK, 0);
+ I915_WRITE(WM1_LP_ILK, 0);
}
/*
* Based on the document from hardware guys the following bits
ILK_DPFC_DIS2 |
ILK_CLK_FBC);
}
- if (IS_GEN6(dev))
- return;
+ return;
} else if (IS_G4X(dev)) {
uint32_t dspclk_gate;
I915_WRITE(RENCLK_GATE_D1, 0);
OUT_RING(MI_FLUSH);
ADVANCE_LP_RING();
}
- } else {
+ } else
DRM_DEBUG_KMS("Failed to allocate render context."
- "Disable RC6\n");
- return;
- }
+ "Disable RC6\n");
}
if (I915_HAS_RC6(dev) && drm_core_check_feature(dev, DRIVER_MODESET)) {
intel_dp_set_link_train(struct intel_dp *intel_dp,
uint32_t dp_reg_value,
uint8_t dp_train_pat,
- uint8_t train_set[4],
- bool first)
+ uint8_t train_set[4])
{
struct drm_device *dev = intel_dp->base.enc.dev;
struct drm_i915_private *dev_priv = dev->dev_private;
- struct intel_crtc *intel_crtc = to_intel_crtc(intel_dp->base.enc.crtc);
int ret;
I915_WRITE(intel_dp->output_reg, dp_reg_value);
POSTING_READ(intel_dp->output_reg);
- if (first)
- intel_wait_for_vblank(dev, intel_crtc->pipe);
intel_dp_aux_native_write_1(intel_dp,
DP_TRAINING_PATTERN_SET,
uint8_t voltage;
bool clock_recovery = false;
bool channel_eq = false;
- bool first = true;
int tries;
u32 reg;
uint32_t DP = intel_dp->DP;
+ struct intel_crtc *intel_crtc = to_intel_crtc(intel_dp->base.enc.crtc);
+
+ /* Enable output, wait for it to become active */
+ I915_WRITE(intel_dp->output_reg, intel_dp->DP);
+ POSTING_READ(intel_dp->output_reg);
+ intel_wait_for_vblank(dev, intel_crtc->pipe);
/* Write the link configuration data */
intel_dp_aux_native_write(intel_dp, DP_LINK_BW_SET,
reg = DP | DP_LINK_TRAIN_PAT_1;
if (!intel_dp_set_link_train(intel_dp, reg,
- DP_TRAINING_PATTERN_1, train_set, first))
+ DP_TRAINING_PATTERN_1, train_set))
break;
- first = false;
/* Set training pattern 1 */
udelay(100);
/* channel eq pattern */
if (!intel_dp_set_link_train(intel_dp, reg,
- DP_TRAINING_PATTERN_2, train_set,
- false))
+ DP_TRAINING_PATTERN_2, train_set))
break;
udelay(400);
* \return false if DP port is disconnected.
*/
static enum drm_connector_status
-intel_dp_detect(struct drm_connector *connector)
+intel_dp_detect(struct drm_connector *connector, bool force)
{
struct drm_encoder *encoder = intel_attached_encoder(connector);
struct intel_dp *intel_dp = enc_to_intel_dp(encoder);
struct drm_crtc *crtc);
int intel_get_pipe_from_crtc_id(struct drm_device *dev, void *data,
struct drm_file *file_priv);
-extern void intel_wait_for_vblank_off(struct drm_device *dev, int pipe);
extern void intel_wait_for_vblank(struct drm_device *dev, int pipe);
extern struct drm_crtc *intel_get_crtc_from_pipe(struct drm_device *dev, int pipe);
extern struct drm_crtc *intel_get_load_detect_pipe(struct intel_encoder *intel_encoder,
*
* Unimplemented.
*/
-static enum drm_connector_status intel_dvo_detect(struct drm_connector *connector)
+static enum drm_connector_status
+intel_dvo_detect(struct drm_connector *connector, bool force)
{
struct drm_encoder *encoder = intel_attached_encoder(connector);
struct intel_dvo *intel_dvo = enc_to_intel_dvo(encoder);
drm_fb_helper_fini(&ifbdev->helper);
drm_framebuffer_cleanup(&ifb->base);
- if (ifb->obj)
+ if (ifb->obj) {
drm_gem_object_unreference(ifb->obj);
+ ifb->obj = NULL;
+ }
return 0;
}
}
static enum drm_connector_status
-intel_hdmi_detect(struct drm_connector *connector)
+intel_hdmi_detect(struct drm_connector *connector, bool force)
{
struct drm_encoder *encoder = intel_attached_encoder(connector);
struct intel_hdmi *intel_hdmi = enc_to_intel_hdmi(encoder);
* connected and closed means disconnected. We also send hotplug events as
* needed, using lid status notification from the input layer.
*/
-static enum drm_connector_status intel_lvds_detect(struct drm_connector *connector)
+static enum drm_connector_status
+intel_lvds_detect(struct drm_connector *connector, bool force)
{
struct drm_device *dev = connector->dev;
enum drm_connector_status status = connector_status_connected;
* the LID nofication event.
*/
if (connector)
- connector->status = connector->funcs->detect(connector);
+ connector->status = connector->funcs->detect(connector,
+ false);
+
/* Don't force modeset on machines where it causes a GPU lockup */
if (dmi_check_system(intel_no_modeset_on_lid))
return NOTIFY_OK;
intel_encoder->clone_mask = (1 << INTEL_LVDS_CLONE_BIT);
intel_encoder->crtc_mask = (1 << 1);
- if (IS_I965G(dev))
- intel_encoder->crtc_mask |= (1 << 0);
drm_encoder_helper_add(encoder, &intel_lvds_helper_funcs);
drm_connector_helper_add(connector, &intel_lvds_connector_helper_funcs);
connector->display_info.subpixel_order = SubPixelHorizontalRGB;
if (!analog_connector)
return false;
- if (analog_connector->funcs->detect(analog_connector) ==
+ if (analog_connector->funcs->detect(analog_connector, false) ==
connector_status_disconnected)
return false;
return status;
}
-static enum drm_connector_status intel_sdvo_detect(struct drm_connector *connector)
+static enum drm_connector_status
+intel_sdvo_detect(struct drm_connector *connector, bool force)
{
uint16_t response;
struct drm_encoder *encoder = intel_attached_encoder(connector);
return true;
err:
- intel_sdvo_destroy_enhance_property(connector);
- kfree(intel_sdvo_connector);
+ intel_sdvo_destroy(connector);
return false;
}
return true;
err:
- intel_sdvo_destroy_enhance_property(connector);
- kfree(intel_sdvo_connector);
+ intel_sdvo_destroy(connector);
return false;
}
uint16_t response;
} enhancements;
- if (!intel_sdvo_get_value(intel_sdvo,
- SDVO_CMD_GET_SUPPORTED_ENHANCEMENTS,
- &enhancements, sizeof(enhancements)))
- return false;
-
+ enhancements.response = 0;
+ intel_sdvo_get_value(intel_sdvo,
+ SDVO_CMD_GET_SUPPORTED_ENHANCEMENTS,
+ &enhancements, sizeof(enhancements));
if (enhancements.response == 0) {
DRM_DEBUG_KMS("No enhancement is supported\n");
return true;
* we have a pipe programmed in order to probe the TV.
*/
static enum drm_connector_status
-intel_tv_detect(struct drm_connector *connector)
+intel_tv_detect(struct drm_connector *connector, bool force)
{
struct drm_display_mode mode;
struct drm_encoder *encoder = intel_attached_encoder(connector);
if (encoder->crtc && encoder->crtc->enabled) {
type = intel_tv_detect_type(intel_tv);
- } else {
+ } else if (force) {
struct drm_crtc *crtc;
int dpms_mode;
intel_release_load_detect_pipe(&intel_tv->base, connector,
dpms_mode);
} else
- type = -1;
- }
-
- intel_tv->type = type;
+ return connector_status_unknown;
+ } else
+ return connector->status;
if (type < 0)
return connector_status_disconnected;
}
static enum drm_connector_status
-nouveau_connector_detect(struct drm_connector *connector)
+nouveau_connector_detect(struct drm_connector *connector, bool force)
{
struct drm_device *dev = connector->dev;
struct nouveau_connector *nv_connector = nouveau_connector(connector);
}
static enum drm_connector_status
-nouveau_connector_detect_lvds(struct drm_connector *connector)
+nouveau_connector_detect_lvds(struct drm_connector *connector, bool force)
{
struct drm_device *dev = connector->dev;
struct drm_nouveau_private *dev_priv = dev->dev_private;
/* Try retrieving EDID via DDC */
if (!dev_priv->vbios.fp_no_ddc) {
- status = nouveau_connector_detect(connector);
+ status = nouveau_connector_detect(connector, force);
if (status == connector_status_connected)
goto out;
}
if (nv_encoder->dcb->type == OUTPUT_LVDS &&
(nv_encoder->dcb->lvdsconf.use_straps_for_mode ||
dev_priv->vbios.fp_no_ddc) && nouveau_bios_fp_mode(dev, NULL)) {
- nv_connector->native_mode = drm_mode_create(dev);
- nouveau_bios_fp_mode(dev, nv_connector->native_mode);
+ struct drm_display_mode mode;
+
+ nouveau_bios_fp_mode(dev, &mode);
+ nv_connector->native_mode = drm_mode_duplicate(dev, &mode);
}
/* Find the native mode if this is a digital panel, if we didn't
goto out;
ret = drm_gem_handle_create(file_priv, nvbo->gem, &req->info.handle);
+ /* drop reference from allocate - handle holds it now */
+ drm_gem_object_unreference_unlocked(nvbo->gem);
out:
- drm_gem_object_handle_unreference_unlocked(nvbo->gem);
-
- if (ret)
- drm_gem_object_unreference_unlocked(nvbo->gem);
return ret;
}
#define SW_I2C_CNTL_WRITE1BIT 6
//==============================VESA definition Portion===============================
-#define VESA_OEM_PRODUCT_REV '01.00'
+#define VESA_OEM_PRODUCT_REV "01.00"
#define VESA_MODE_ATTRIBUTE_MODE_SUPPORT 0xBB //refer to VBE spec p.32, no TTY support
#define VESA_MODE_WIN_ATTRIBUTE 7
#define VESA_WIN_SIZE 64
pll->algo = PLL_ALGO_LEGACY;
pll->flags |= RADEON_PLL_PREFER_CLOSEST_LOWER;
}
- /* There is some evidence (often anecdotal) that RV515 LVDS
+ /* There is some evidence (often anecdotal) that RV515/RV620 LVDS
* (on some boards at least) prefers the legacy algo. I'm not
* sure whether this should handled generically or on a
* case-by-case quirk basis. Both algos should work fine in the
* majority of cases.
*/
if ((radeon_encoder->active_device & (ATOM_DEVICE_LCD_SUPPORT)) &&
- (rdev->family == CHIP_RV515)) {
+ ((rdev->family == CHIP_RV515) ||
+ (rdev->family == CHIP_RV620))) {
/* allow the user to overrride just in case */
if (radeon_new_pll == 1)
pll->algo = PLL_ALGO_NEW;
WREG32(RCU_IND_INDEX, 0x203);
efuse_straps_3 = RREG32(RCU_IND_DATA);
- efuse_box_bit_127_124 = (u8)(efuse_straps_3 & 0xF0000000) >> 28;
+ efuse_box_bit_127_124 = (u8)((efuse_straps_3 & 0xF0000000) >> 28);
switch(efuse_box_bit_127_124) {
case 0x0:
EVERGREEN_MAX_BACKENDS_MASK));
break;
}
- } else
- gb_backend_map =
- evergreen_get_tile_pipe_to_backend_map(rdev,
- rdev->config.evergreen.max_tile_pipes,
- rdev->config.evergreen.max_backends,
- ((EVERGREEN_MAX_BACKENDS_MASK <<
- rdev->config.evergreen.max_backends) &
- EVERGREEN_MAX_BACKENDS_MASK));
+ } else {
+ switch (rdev->family) {
+ case CHIP_CYPRESS:
+ case CHIP_HEMLOCK:
+ gb_backend_map = 0x66442200;
+ break;
+ case CHIP_JUNIPER:
+ gb_backend_map = 0x00006420;
+ break;
+ default:
+ gb_backend_map =
+ evergreen_get_tile_pipe_to_backend_map(rdev,
+ rdev->config.evergreen.max_tile_pipes,
+ rdev->config.evergreen.max_backends,
+ ((EVERGREEN_MAX_BACKENDS_MASK <<
+ rdev->config.evergreen.max_backends) &
+ EVERGREEN_MAX_BACKENDS_MASK));
+ }
+ }
rdev->config.evergreen.tile_config = gb_addr_config;
WREG32(GB_BACKEND_MAP, gb_backend_map);
rdev->mc.mc_vram_size = RREG32(CONFIG_MEMSIZE) * 1024 * 1024;
rdev->mc.real_vram_size = RREG32(CONFIG_MEMSIZE) * 1024 * 1024;
rdev->mc.visible_vram_size = rdev->mc.aper_size;
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
r600_vram_gtt_location(rdev, &rdev->mc);
radeon_update_bandwidth_info(rdev);
{
u32 tmp;
- WREG32(CP_INT_CNTL, 0);
+ WREG32(CP_INT_CNTL, CNTX_BUSY_INT_ENABLE | CNTX_EMPTY_INT_ENABLE);
WREG32(GRBM_INT_CNTL, 0);
WREG32(INT_MASK + EVERGREEN_CRTC0_REGISTER_OFFSET, 0);
WREG32(INT_MASK + EVERGREEN_CRTC1_REGISTER_OFFSET, 0);
return r;
}
rdev->cp.ready = true;
+ rdev->mc.active_vram_size = rdev->mc.real_vram_size;
return 0;
}
void r100_cp_disable(struct radeon_device *rdev)
{
/* Disable ring */
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
rdev->cp.ready = false;
WREG32(RADEON_CP_CSQ_MODE, 0);
WREG32(RADEON_CP_CSQ_CNTL, 0);
return false;
}
elapsed = jiffies_to_msecs(cjiffies - lockup->last_jiffies);
- if (elapsed >= 3000) {
- /* very likely the improbable case where current
- * rptr is equal to last recorded, a while ago, rptr
- * this is more likely a false positive update tracking
- * information which should force us to be recall at
- * latter point
- */
- lockup->last_cp_rptr = cp->rptr;
- lockup->last_jiffies = jiffies;
- return false;
- }
- if (elapsed >= 1000) {
+ if (elapsed >= 10000) {
dev_err(rdev->dev, "GPU lockup CP stall for more than %lumsec\n", elapsed);
return true;
}
/* FIXME we don't use the second aperture yet when we could use it */
if (rdev->mc.visible_vram_size > rdev->mc.aper_size)
rdev->mc.visible_vram_size = rdev->mc.aper_size;
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
config_aper_size = RREG32(RADEON_CONFIG_APER_SIZE);
if (rdev->flags & RADEON_IS_IGP) {
uint32_t tom;
unsigned long size;
unsigned prim_walk;
unsigned nverts;
+ unsigned num_cb = track->num_cb;
- for (i = 0; i < track->num_cb; i++) {
+ if (!track->zb_cb_clear && !track->color_channel_mask &&
+ !track->blend_read_enable)
+ num_cb = 0;
+
+ for (i = 0; i < num_cb; i++) {
if (track->cb[i].robj == NULL) {
- if (!(track->zb_cb_clear || track->color_channel_mask ||
- track->blend_read_enable)) {
- continue;
- }
DRM_ERROR("[drm] No buffer for color buffer %d !\n", i);
return -EINVAL;
}
rdev->mc.mc_vram_size = RREG32(CONFIG_MEMSIZE);
rdev->mc.real_vram_size = RREG32(CONFIG_MEMSIZE);
rdev->mc.visible_vram_size = rdev->mc.aper_size;
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
r600_vram_gtt_location(rdev, &rdev->mc);
if (rdev->flags & RADEON_IS_IGP) {
*/
void r600_cp_stop(struct radeon_device *rdev)
{
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
WREG32(R_0086D8_CP_ME_CNTL, S_0086D8_CP_ME_HALT(1));
}
if (i < rdev->usec_timeout) {
DRM_INFO("ib test succeeded in %u usecs\n", i);
} else {
- DRM_ERROR("radeon: ib test failed (sracth(0x%04X)=0x%08X)\n",
+ DRM_ERROR("radeon: ib test failed (scratch(0x%04X)=0x%08X)\n",
scratch, tmp);
r = -EINVAL;
}
{
u32 tmp;
- WREG32(CP_INT_CNTL, 0);
+ WREG32(CP_INT_CNTL, CNTX_BUSY_INT_ENABLE | CNTX_EMPTY_INT_ENABLE);
WREG32(GRBM_INT_CNTL, 0);
WREG32(DxMODE_INT_MASK, 0);
if (ASIC_IS_DCE3(rdev)) {
/* r7xx hw bug. write to HDP_DEBUG1 followed by fb read
* rather than write to HDP_REG_COHERENCY_FLUSH_CNTL
*/
- if ((rdev->family >= CHIP_RV770) && (rdev->family <= CHIP_RV740)) {
+ if ((rdev->family >= CHIP_RV770) && (rdev->family <= CHIP_RV740) &&
+ rdev->vram_scratch.ptr) {
void __iomem *ptr = (void *)rdev->vram_scratch.ptr;
u32 tmp;
+/*
+ * Copyright 2009 Advanced Micro Devices, Inc.
+ * Copyright 2009 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ */
+
#include "drmP.h"
#include "drm.h"
#include "radeon_drm.h"
memcpy(ptr + rdev->r600_blit.ps_offset, r6xx_ps, r6xx_ps_size * 4);
radeon_bo_kunmap(rdev->r600_blit.shader_obj);
radeon_bo_unreserve(rdev->r600_blit.shader_obj);
+ rdev->mc.active_vram_size = rdev->mc.real_vram_size;
return 0;
}
{
int r;
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
if (rdev->r600_blit.shader_obj == NULL)
return;
/* If we can't reserve the bo, unref should be enough to destroy
+/*
+ * Copyright 2009 Advanced Micro Devices, Inc.
+ * Copyright 2009 Red Hat Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) AND/OR ITS SUPPLIERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ *
+ */
#ifndef R600_BLIT_SHADERS_H
#define R600_BLIT_SHADERS_H
/* using get ib will give us the offset into the mipmap bo */
word0 = radeon_get_ib_value(p, idx + 3) << 8;
if ((mipmap_size + word0) > radeon_bo_size(mipmap)) {
- dev_warn(p->dev, "mipmap bo too small (%d %d %d %d %d %d -> %d have %ld)\n",
- w0, h0, bpe, blevel, nlevels, word0, mipmap_size, radeon_bo_size(texture));
- return -EINVAL;
+ /*dev_warn(p->dev, "mipmap bo too small (%d %d %d %d %d %d -> %d have %ld)\n",
+ w0, h0, bpe, blevel, nlevels, word0, mipmap_size, radeon_bo_size(texture));*/
}
return 0;
}
* about vram size near mc fb location */
u64 mc_vram_size;
u64 visible_vram_size;
+ u64 active_vram_size;
u64 gtt_size;
u64 gtt_start;
u64 gtt_end;
*connector_type = DRM_MODE_CONNECTOR_DVID;
}
+ /* MSI K9A2GM V2/V3 board has no HDMI or DVI */
+ if ((dev->pdev->device == 0x796e) &&
+ (dev->pdev->subsystem_vendor == 0x1462) &&
+ (dev->pdev->subsystem_device == 0x7302)) {
+ if ((supported_device == ATOM_DEVICE_DFP2_SUPPORT) ||
+ (supported_device == ATOM_DEVICE_DFP3_SUPPORT))
+ return false;
+ }
+
/* a-bit f-i90hd - ciaranm on #radeonhd - this board has no DVI */
if ((dev->pdev->device == 0x7941) &&
(dev->pdev->subsystem_vendor == 0x147b) &&
switch (tv_info->ucTV_BootUpDefaultStandard) {
case ATOM_TV_NTSC:
tv_std = TV_STD_NTSC;
- DRM_INFO("Default TV standard: NTSC\n");
+ DRM_DEBUG_KMS("Default TV standard: NTSC\n");
break;
case ATOM_TV_NTSCJ:
tv_std = TV_STD_NTSC_J;
- DRM_INFO("Default TV standard: NTSC-J\n");
+ DRM_DEBUG_KMS("Default TV standard: NTSC-J\n");
break;
case ATOM_TV_PAL:
tv_std = TV_STD_PAL;
- DRM_INFO("Default TV standard: PAL\n");
+ DRM_DEBUG_KMS("Default TV standard: PAL\n");
break;
case ATOM_TV_PALM:
tv_std = TV_STD_PAL_M;
- DRM_INFO("Default TV standard: PAL-M\n");
+ DRM_DEBUG_KMS("Default TV standard: PAL-M\n");
break;
case ATOM_TV_PALN:
tv_std = TV_STD_PAL_N;
- DRM_INFO("Default TV standard: PAL-N\n");
+ DRM_DEBUG_KMS("Default TV standard: PAL-N\n");
break;
case ATOM_TV_PALCN:
tv_std = TV_STD_PAL_CN;
- DRM_INFO("Default TV standard: PAL-CN\n");
+ DRM_DEBUG_KMS("Default TV standard: PAL-CN\n");
break;
case ATOM_TV_PAL60:
tv_std = TV_STD_PAL_60;
- DRM_INFO("Default TV standard: PAL-60\n");
+ DRM_DEBUG_KMS("Default TV standard: PAL-60\n");
break;
case ATOM_TV_SECAM:
tv_std = TV_STD_SECAM;
- DRM_INFO("Default TV standard: SECAM\n");
+ DRM_DEBUG_KMS("Default TV standard: SECAM\n");
break;
default:
tv_std = TV_STD_NTSC;
- DRM_INFO("Unknown TV standard; defaulting to NTSC\n");
+ DRM_DEBUG_KMS("Unknown TV standard; defaulting to NTSC\n");
break;
}
}
switch (RBIOS8(tv_info + 7) & 0xf) {
case 1:
tv_std = TV_STD_NTSC;
- DRM_INFO("Default TV standard: NTSC\n");
+ DRM_DEBUG_KMS("Default TV standard: NTSC\n");
break;
case 2:
tv_std = TV_STD_PAL;
- DRM_INFO("Default TV standard: PAL\n");
+ DRM_DEBUG_KMS("Default TV standard: PAL\n");
break;
case 3:
tv_std = TV_STD_PAL_M;
- DRM_INFO("Default TV standard: PAL-M\n");
+ DRM_DEBUG_KMS("Default TV standard: PAL-M\n");
break;
case 4:
tv_std = TV_STD_PAL_60;
- DRM_INFO("Default TV standard: PAL-60\n");
+ DRM_DEBUG_KMS("Default TV standard: PAL-60\n");
break;
case 5:
tv_std = TV_STD_NTSC_J;
- DRM_INFO("Default TV standard: NTSC-J\n");
+ DRM_DEBUG_KMS("Default TV standard: NTSC-J\n");
break;
case 6:
tv_std = TV_STD_SCART_PAL;
- DRM_INFO("Default TV standard: SCART-PAL\n");
+ DRM_DEBUG_KMS("Default TV standard: SCART-PAL\n");
break;
default:
tv_std = TV_STD_NTSC;
- DRM_INFO
+ DRM_DEBUG_KMS
("Unknown TV standard; defaulting to NTSC\n");
break;
}
switch ((RBIOS8(tv_info + 9) >> 2) & 0x3) {
case 0:
- DRM_INFO("29.498928713 MHz TV ref clk\n");
+ DRM_DEBUG_KMS("29.498928713 MHz TV ref clk\n");
break;
case 1:
- DRM_INFO("28.636360000 MHz TV ref clk\n");
+ DRM_DEBUG_KMS("28.636360000 MHz TV ref clk\n");
break;
case 2:
- DRM_INFO("14.318180000 MHz TV ref clk\n");
+ DRM_DEBUG_KMS("14.318180000 MHz TV ref clk\n");
break;
case 3:
- DRM_INFO("27.000000000 MHz TV ref clk\n");
+ DRM_DEBUG_KMS("27.000000000 MHz TV ref clk\n");
break;
default:
break;
if (tmds_info) {
ver = RBIOS8(tmds_info);
- DRM_INFO("DFP table revision: %d\n", ver);
+ DRM_DEBUG_KMS("DFP table revision: %d\n", ver);
if (ver == 3) {
n = RBIOS8(tmds_info + 5) + 1;
if (n > 4)
offset = combios_get_table_offset(dev, COMBIOS_EXT_TMDS_INFO_TABLE);
if (offset) {
ver = RBIOS8(offset);
- DRM_INFO("External TMDS Table revision: %d\n", ver);
+ DRM_DEBUG_KMS("External TMDS Table revision: %d\n", ver);
tmds->slave_addr = RBIOS8(offset + 4 + 2);
tmds->slave_addr >>= 1; /* 7 bit addressing */
gpio = RBIOS8(offset + 4 + 3);
/* PowerMac8,1 ? */
/* imac g5 isight */
rdev->mode_info.connector_table = CT_IMAC_G5_ISIGHT;
+ } else if ((rdev->pdev->device == 0x4a48) &&
+ (rdev->pdev->subsystem_vendor == 0x1002) &&
+ (rdev->pdev->subsystem_device == 0x4a48)) {
+ /* Mac X800 */
+ rdev->mode_info.connector_table = CT_MAC_X800;
} else
#endif /* CONFIG_PPC_PMAC */
#ifdef CONFIG_PPC64
CONNECTOR_OBJECT_ID_VGA,
&hpd);
break;
+ case CT_MAC_X800:
+ DRM_INFO("Connector Table: %d (mac x800)\n",
+ rdev->mode_info.connector_table);
+ /* DVI - primary dac, internal tmds */
+ ddc_i2c = combios_setup_i2c_bus(rdev, DDC_DVI, 0, 0);
+ hpd.hpd = RADEON_HPD_1; /* ??? */
+ radeon_add_legacy_encoder(dev,
+ radeon_get_encoder_enum(dev,
+ ATOM_DEVICE_DFP1_SUPPORT,
+ 0),
+ ATOM_DEVICE_DFP1_SUPPORT);
+ radeon_add_legacy_encoder(dev,
+ radeon_get_encoder_enum(dev,
+ ATOM_DEVICE_CRT1_SUPPORT,
+ 1),
+ ATOM_DEVICE_CRT1_SUPPORT);
+ radeon_add_legacy_connector(dev, 0,
+ ATOM_DEVICE_DFP1_SUPPORT |
+ ATOM_DEVICE_CRT1_SUPPORT,
+ DRM_MODE_CONNECTOR_DVII, &ddc_i2c,
+ CONNECTOR_OBJECT_ID_SINGLE_LINK_DVI_I,
+ &hpd);
+ /* DVI - tv dac, dvo */
+ ddc_i2c = combios_setup_i2c_bus(rdev, DDC_MONID, 0, 0);
+ hpd.hpd = RADEON_HPD_2; /* ??? */
+ radeon_add_legacy_encoder(dev,
+ radeon_get_encoder_enum(dev,
+ ATOM_DEVICE_DFP2_SUPPORT,
+ 0),
+ ATOM_DEVICE_DFP2_SUPPORT);
+ radeon_add_legacy_encoder(dev,
+ radeon_get_encoder_enum(dev,
+ ATOM_DEVICE_CRT2_SUPPORT,
+ 2),
+ ATOM_DEVICE_CRT2_SUPPORT);
+ radeon_add_legacy_connector(dev, 1,
+ ATOM_DEVICE_DFP2_SUPPORT |
+ ATOM_DEVICE_CRT2_SUPPORT,
+ DRM_MODE_CONNECTOR_DVII, &ddc_i2c,
+ CONNECTOR_OBJECT_ID_DUAL_LINK_DVI_I,
+ &hpd);
+ break;
default:
DRM_INFO("Connector table: %d (invalid)\n",
rdev->mode_info.connector_table);
return MODE_OK;
}
-static enum drm_connector_status radeon_lvds_detect(struct drm_connector *connector)
+static enum drm_connector_status
+radeon_lvds_detect(struct drm_connector *connector, bool force)
{
struct radeon_connector *radeon_connector = to_radeon_connector(connector);
struct drm_encoder *encoder = radeon_best_single_encoder(connector);
return MODE_OK;
}
-static enum drm_connector_status radeon_vga_detect(struct drm_connector *connector)
+static enum drm_connector_status
+radeon_vga_detect(struct drm_connector *connector, bool force)
{
struct radeon_connector *radeon_connector = to_radeon_connector(connector);
struct drm_encoder *encoder;
return MODE_OK;
}
-static enum drm_connector_status radeon_tv_detect(struct drm_connector *connector)
+static enum drm_connector_status
+radeon_tv_detect(struct drm_connector *connector, bool force)
{
struct drm_encoder *encoder;
struct drm_encoder_helper_funcs *encoder_funcs;
* we have to check if this analog encoder is shared with anyone else (TV)
* if its shared we have to set the other connector to disconnected.
*/
-static enum drm_connector_status radeon_dvi_detect(struct drm_connector *connector)
+static enum drm_connector_status
+radeon_dvi_detect(struct drm_connector *connector, bool force)
{
struct radeon_connector *radeon_connector = to_radeon_connector(connector);
struct drm_encoder *encoder = NULL;
return ret;
}
-static enum drm_connector_status radeon_dp_detect(struct drm_connector *connector)
+static enum drm_connector_status
+radeon_dp_detect(struct drm_connector *connector, bool force)
{
struct radeon_connector *radeon_connector = to_radeon_connector(connector);
enum drm_connector_status ret = connector_status_disconnected;
struct radeon_crtc *radeon_crtc = to_radeon_crtc(crtc);
struct radeon_device *rdev = crtc->dev->dev_private;
int xorigin = 0, yorigin = 0;
+ int w = radeon_crtc->cursor_width;
if (x < 0)
xorigin = -x + 1;
if (yorigin >= CURSOR_HEIGHT)
yorigin = CURSOR_HEIGHT - 1;
- radeon_lock_cursor(crtc, true);
- if (ASIC_IS_DCE4(rdev)) {
- /* cursors are offset into the total surface */
- x += crtc->x;
- y += crtc->y;
- DRM_DEBUG("x %d y %d c->x %d c->y %d\n", x, y, crtc->x, crtc->y);
-
- /* XXX: check if evergreen has the same issues as avivo chips */
- WREG32(EVERGREEN_CUR_POSITION + radeon_crtc->crtc_offset,
- ((xorigin ? 0 : x) << 16) |
- (yorigin ? 0 : y));
- WREG32(EVERGREEN_CUR_HOT_SPOT + radeon_crtc->crtc_offset, (xorigin << 16) | yorigin);
- WREG32(EVERGREEN_CUR_SIZE + radeon_crtc->crtc_offset,
- ((radeon_crtc->cursor_width - 1) << 16) | (radeon_crtc->cursor_height - 1));
- } else if (ASIC_IS_AVIVO(rdev)) {
- int w = radeon_crtc->cursor_width;
+ if (ASIC_IS_AVIVO(rdev)) {
int i = 0;
struct drm_crtc *crtc_p;
if (w <= 0)
w = 1;
}
+ }
+ radeon_lock_cursor(crtc, true);
+ if (ASIC_IS_DCE4(rdev)) {
+ WREG32(EVERGREEN_CUR_POSITION + radeon_crtc->crtc_offset,
+ ((xorigin ? 0 : x) << 16) |
+ (yorigin ? 0 : y));
+ WREG32(EVERGREEN_CUR_HOT_SPOT + radeon_crtc->crtc_offset, (xorigin << 16) | yorigin);
+ WREG32(EVERGREEN_CUR_SIZE + radeon_crtc->crtc_offset,
+ ((w - 1) << 16) | (radeon_crtc->cursor_height - 1));
+ } else if (ASIC_IS_AVIVO(rdev)) {
WREG32(AVIVO_D1CUR_POSITION + radeon_crtc->crtc_offset,
((xorigin ? 0 : x) << 16) |
(yorigin ? 0 : y));
DRM_INFO(" DFP4: %s\n", encoder_names[radeon_encoder->encoder_id]);
if (devices & ATOM_DEVICE_DFP5_SUPPORT)
DRM_INFO(" DFP5: %s\n", encoder_names[radeon_encoder->encoder_id]);
+ if (devices & ATOM_DEVICE_DFP6_SUPPORT)
+ DRM_INFO(" DFP6: %s\n", encoder_names[radeon_encoder->encoder_id]);
if (devices & ATOM_DEVICE_TV1_SUPPORT)
DRM_INFO(" TV1: %s\n", encoder_names[radeon_encoder->encoder_id]);
if (devices & ATOM_DEVICE_CV_SUPPORT)
{
struct radeon_framebuffer *radeon_fb = to_radeon_framebuffer(fb);
- if (radeon_fb->obj)
+ if (radeon_fb->obj) {
drm_gem_object_unreference_unlocked(radeon_fb->obj);
+ }
drm_framebuffer_cleanup(fb);
kfree(radeon_fb);
}
radeon_crtc->rmx_type = radeon_encoder->rmx_type;
else
radeon_crtc->rmx_type = RMX_OFF;
- src_v = crtc->mode.vdisplay;
- dst_v = radeon_crtc->native_mode.vdisplay;
- src_h = crtc->mode.hdisplay;
- dst_h = radeon_crtc->native_mode.vdisplay;
/* copy native mode */
memcpy(&radeon_crtc->native_mode,
&radeon_encoder->native_mode,
sizeof(struct drm_display_mode));
+ src_v = crtc->mode.vdisplay;
+ dst_v = radeon_crtc->native_mode.vdisplay;
+ src_h = crtc->mode.hdisplay;
+ dst_h = radeon_crtc->native_mode.hdisplay;
/* fix up for overscan on hdmi */
if (ASIC_IS_AVIVO(rdev) &&
+ (!(mode->flags & DRM_MODE_FLAG_INTERLACE)) &&
((radeon_encoder->underscan_type == UNDERSCAN_ON) ||
((radeon_encoder->underscan_type == UNDERSCAN_AUTO) &&
drm_detect_hdmi_monitor(radeon_connector->edid) &&
ret = radeon_bo_reserve(rbo, false);
if (likely(ret == 0)) {
radeon_bo_kunmap(rbo);
+ radeon_bo_unpin(rbo);
radeon_bo_unreserve(rbo);
}
drm_gem_object_unreference_unlocked(gobj);
{
struct fb_info *info;
struct radeon_framebuffer *rfb = &rfbdev->rfb;
- struct radeon_bo *rbo;
- int r;
if (rfbdev->helper.fbdev) {
info = rfbdev->helper.fbdev;
}
if (rfb->obj) {
- rbo = rfb->obj->driver_private;
- r = radeon_bo_reserve(rbo, false);
- if (likely(r == 0)) {
- radeon_bo_kunmap(rbo);
- radeon_bo_unpin(rbo);
- radeon_bo_unreserve(rbo);
- }
- drm_gem_object_unreference_unlocked(rfb->obj);
+ radeonfb_destroy_pinned_object(rfb->obj);
+ rfb->obj = NULL;
}
drm_fb_helper_fini(&rfbdev->helper);
drm_framebuffer_cleanup(&rfb->base);
return r;
}
r = drm_gem_handle_create(filp, gobj, &handle);
+ /* drop reference from allocate - handle holds it now */
+ drm_gem_object_unreference_unlocked(gobj);
if (r) {
- drm_gem_object_unreference_unlocked(gobj);
return r;
}
- drm_gem_object_handle_unreference_unlocked(gobj);
args->handle = handle;
return 0;
}
*/
int radeon_driver_firstopen_kms(struct drm_device *dev)
{
+ struct radeon_device *rdev = dev->dev_private;
+
+ if (rdev->powered_down)
+ return -EINVAL;
return 0;
}
/* mostly for macs, but really any system without connector tables */
enum radeon_connector_table {
- CT_NONE,
+ CT_NONE = 0,
CT_GENERIC,
CT_IBOOK,
CT_POWERBOOK_EXTERNAL,
CT_IMAC_G5_ISIGHT,
CT_EMAC,
CT_RN50_POWER,
+ CT_MAC_X800,
};
enum radeon_dvo_chip {
u32 c = 0;
rbo->placement.fpfn = 0;
- rbo->placement.lpfn = 0;
+ rbo->placement.lpfn = rbo->rdev->mc.active_vram_size >> PAGE_SHIFT;
rbo->placement.placement = rbo->placements;
rbo->placement.busy_placement = rbo->placements;
if (domain & RADEON_GEM_DOMAIN_VRAM)
int r;
r = ttm_bo_reserve(&bo->tbo, true, no_wait, false, 0);
- if (unlikely(r != 0)) {
- if (r != -ERESTARTSYS)
- dev_err(bo->rdev->dev, "%p reserve failed for wait\n", bo);
+ if (unlikely(r != 0))
return r;
- }
spin_lock(&bo->tbo.lock);
if (mem_type)
*mem_type = bo->tbo.mem.mem_type;
rdev->mc.real_vram_size = RREG32(RADEON_CONFIG_MEMSIZE);
rdev->mc.mc_vram_size = rdev->mc.real_vram_size;
rdev->mc.visible_vram_size = rdev->mc.aper_size;
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
rdev->mc.igp_sideport_enabled = radeon_atombios_sideport_present(rdev);
base = RREG32_MC(R_000004_MC_FB_LOCATION);
base = G_000004_MC_FB_START(base) << 16;
rdev->mc.aper_base = pci_resource_start(rdev->pdev, 0);
rdev->mc.aper_size = pci_resource_len(rdev->pdev, 0);
rdev->mc.visible_vram_size = rdev->mc.aper_size;
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
base = RREG32_MC(R_000100_MCCFG_FB_LOCATION);
base = G_000100_MC_FB_START(base) << 16;
rdev->mc.igp_sideport_enabled = radeon_atombios_sideport_present(rdev);
*/
void r700_cp_stop(struct radeon_device *rdev)
{
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
WREG32(CP_ME_CNTL, (CP_ME_HALT | CP_PFP_HALT));
}
rdev->mc.mc_vram_size = RREG32(CONFIG_MEMSIZE);
rdev->mc.real_vram_size = RREG32(CONFIG_MEMSIZE);
rdev->mc.visible_vram_size = rdev->mc.aper_size;
+ rdev->mc.active_vram_size = rdev->mc.visible_vram_size;
r600_vram_gtt_location(rdev, &rdev->mc);
radeon_update_bandwidth_info(rdev);
}
/**
+ * Call bo::reserved and with the lru lock held.
+ * Will release GPU memory type usage on destruction.
+ * This is the place to put in driver specific hooks.
+ * Will release the bo::reserved lock and the
+ * lru lock on exit.
+ */
+
+static void ttm_bo_cleanup_memtype_use(struct ttm_buffer_object *bo)
+{
+ struct ttm_bo_global *glob = bo->glob;
+
+ if (bo->ttm) {
+
+ /**
+ * Release the lru_lock, since we don't want to have
+ * an atomic requirement on ttm_tt[unbind|destroy].
+ */
+
+ spin_unlock(&glob->lru_lock);
+ ttm_tt_unbind(bo->ttm);
+ ttm_tt_destroy(bo->ttm);
+ bo->ttm = NULL;
+ spin_lock(&glob->lru_lock);
+ }
+
+ if (bo->mem.mm_node) {
+ drm_mm_put_block(bo->mem.mm_node);
+ bo->mem.mm_node = NULL;
+ }
+
+ atomic_set(&bo->reserved, 0);
+ wake_up_all(&bo->event_queue);
+ spin_unlock(&glob->lru_lock);
+}
+
+
+/**
* If bo idle, remove from delayed- and lru lists, and unref.
* If not idle, and already on delayed list, do nothing.
* If not idle, and not on delayed list, put on delayed list,
int ret;
spin_lock(&bo->lock);
+retry:
(void) ttm_bo_wait(bo, false, false, !remove_all);
if (!bo->sync_obj) {
spin_unlock(&bo->lock);
spin_lock(&glob->lru_lock);
- put_count = ttm_bo_del_from_lru(bo);
+ ret = ttm_bo_reserve_locked(bo, false, !remove_all, false, 0);
+
+ /**
+ * Someone else has the object reserved. Bail and retry.
+ */
- ret = ttm_bo_reserve_locked(bo, false, false, false, 0);
- BUG_ON(ret);
- if (bo->ttm)
- ttm_tt_unbind(bo->ttm);
+ if (unlikely(ret == -EBUSY)) {
+ spin_unlock(&glob->lru_lock);
+ spin_lock(&bo->lock);
+ goto requeue;
+ }
+
+ /**
+ * We can re-check for sync object without taking
+ * the bo::lock since setting the sync object requires
+ * also bo::reserved. A busy object at this point may
+ * be caused by another thread starting an accelerated
+ * eviction.
+ */
+
+ if (unlikely(bo->sync_obj)) {
+ atomic_set(&bo->reserved, 0);
+ wake_up_all(&bo->event_queue);
+ spin_unlock(&glob->lru_lock);
+ spin_lock(&bo->lock);
+ if (remove_all)
+ goto retry;
+ else
+ goto requeue;
+ }
+
+ put_count = ttm_bo_del_from_lru(bo);
if (!list_empty(&bo->ddestroy)) {
list_del_init(&bo->ddestroy);
++put_count;
}
- if (bo->mem.mm_node) {
- drm_mm_put_block(bo->mem.mm_node);
- bo->mem.mm_node = NULL;
- }
- spin_unlock(&glob->lru_lock);
- atomic_set(&bo->reserved, 0);
+ ttm_bo_cleanup_memtype_use(bo);
while (put_count--)
kref_put(&bo->list_kref, ttm_bo_ref_bug);
return 0;
}
-
+requeue:
spin_lock(&glob->lru_lock);
if (list_empty(&bo->ddestroy)) {
void *sync_obj = bo->sync_obj;
INIT_LIST_HEAD(&fbo->lru);
INIT_LIST_HEAD(&fbo->swap);
fbo->vm_node = NULL;
+ atomic_set(&fbo->cpu_writers, 0);
fbo->sync_obj = driver->sync_obj_ref(bo->sync_obj);
kref_init(&fbo->list_kref);
spinlock_t lock;
bool fill_lock;
struct list_head list;
- int gfp_flags;
+ gfp_t gfp_flags;
unsigned npages;
char *name;
unsigned long nfrees;
* This function is reentrant if caller updates count depending on number of
* pages returned in pages array.
*/
-static int ttm_alloc_new_pages(struct list_head *pages, int gfp_flags,
+static int ttm_alloc_new_pages(struct list_head *pages, gfp_t gfp_flags,
int ttm_flags, enum ttm_caching_state cstate, unsigned count)
{
struct page **caching_array;
{
struct ttm_page_pool *pool = ttm_get_pool(flags, cstate);
struct page *p = NULL;
- int gfp_flags = GFP_USER;
+ gfp_t gfp_flags = GFP_USER;
int r;
/* set zero flag for page allocation if required */
return 0;
}
-void ttm_page_alloc_fini()
+void ttm_page_alloc_fini(void)
{
int i;
{0, 0, 0}
};
-static char *vmw_devname = "vmwgfx";
+static int enable_fbdev;
static int vmw_probe(struct pci_dev *, const struct pci_device_id *);
static void vmw_master_init(struct vmw_master *);
static int vmwgfx_pm_notifier(struct notifier_block *nb, unsigned long val,
void *ptr);
+MODULE_PARM_DESC(enable_fbdev, "Enable vmwgfx fbdev");
+module_param_named(enable_fbdev, enable_fbdev, int, 0600);
+
static void vmw_print_capabilities(uint32_t capabilities)
{
DRM_INFO("Capabilities:\n");
{
int ret;
- vmw_kms_save_vga(dev_priv);
-
ret = vmw_fifo_init(dev_priv, &dev_priv->fifo);
if (unlikely(ret != 0)) {
DRM_ERROR("Unable to initialize FIFO.\n");
static void vmw_release_device(struct vmw_private *dev_priv)
{
vmw_fifo_release(dev_priv, &dev_priv->fifo);
- vmw_kms_restore_vga(dev_priv);
}
+int vmw_3d_resource_inc(struct vmw_private *dev_priv)
+{
+ int ret = 0;
+
+ mutex_lock(&dev_priv->release_mutex);
+ if (unlikely(dev_priv->num_3d_resources++ == 0)) {
+ ret = vmw_request_device(dev_priv);
+ if (unlikely(ret != 0))
+ --dev_priv->num_3d_resources;
+ }
+ mutex_unlock(&dev_priv->release_mutex);
+ return ret;
+}
+
+
+void vmw_3d_resource_dec(struct vmw_private *dev_priv)
+{
+ int32_t n3d;
+
+ mutex_lock(&dev_priv->release_mutex);
+ if (unlikely(--dev_priv->num_3d_resources == 0))
+ vmw_release_device(dev_priv);
+ n3d = (int32_t) dev_priv->num_3d_resources;
+ mutex_unlock(&dev_priv->release_mutex);
+
+ BUG_ON(n3d < 0);
+}
static int vmw_driver_load(struct drm_device *dev, unsigned long chipset)
{
dev_priv->last_read_sequence = (uint32_t) -100;
mutex_init(&dev_priv->hw_mutex);
mutex_init(&dev_priv->cmdbuf_mutex);
+ mutex_init(&dev_priv->release_mutex);
rwlock_init(&dev_priv->resource_lock);
idr_init(&dev_priv->context_idr);
idr_init(&dev_priv->surface_idr);
dev_priv->vram_start = pci_resource_start(dev->pdev, 1);
dev_priv->mmio_start = pci_resource_start(dev->pdev, 2);
+ dev_priv->enable_fb = enable_fbdev;
+
mutex_lock(&dev_priv->hw_mutex);
vmw_write(dev_priv, SVGA_REG_ID, SVGA_ID_2);
dev->dev_private = dev_priv;
- if (!dev->devname)
- dev->devname = vmw_devname;
-
- if (dev_priv->capabilities & SVGA_CAP_IRQMASK) {
- ret = drm_irq_install(dev);
- if (unlikely(ret != 0)) {
- DRM_ERROR("Failed installing irq: %d\n", ret);
- goto out_no_irq;
- }
- }
-
ret = pci_request_regions(dev->pdev, "vmwgfx probe");
dev_priv->stealth = (ret != 0);
if (dev_priv->stealth) {
goto out_no_device;
}
}
- ret = vmw_request_device(dev_priv);
+ ret = vmw_kms_init(dev_priv);
if (unlikely(ret != 0))
- goto out_no_device;
- vmw_kms_init(dev_priv);
+ goto out_no_kms;
vmw_overlay_init(dev_priv);
- vmw_fb_init(dev_priv);
+ if (dev_priv->enable_fb) {
+ ret = vmw_3d_resource_inc(dev_priv);
+ if (unlikely(ret != 0))
+ goto out_no_fifo;
+ vmw_kms_save_vga(dev_priv);
+ vmw_fb_init(dev_priv);
+ DRM_INFO("%s", vmw_fifo_have_3d(dev_priv) ?
+ "Detected device 3D availability.\n" :
+ "Detected no device 3D availability.\n");
+ } else {
+ DRM_INFO("Delayed 3D detection since we're not "
+ "running the device in SVGA mode yet.\n");
+ }
+
+ if (dev_priv->capabilities & SVGA_CAP_IRQMASK) {
+ ret = drm_irq_install(dev);
+ if (unlikely(ret != 0)) {
+ DRM_ERROR("Failed installing irq: %d\n", ret);
+ goto out_no_irq;
+ }
+ }
dev_priv->pm_nb.notifier_call = vmwgfx_pm_notifier;
register_pm_notifier(&dev_priv->pm_nb);
- DRM_INFO("%s", vmw_fifo_have_3d(dev_priv) ? "Have 3D\n" : "No 3D\n");
-
return 0;
-out_no_device:
- if (dev_priv->capabilities & SVGA_CAP_IRQMASK)
- drm_irq_uninstall(dev_priv->dev);
- if (dev->devname == vmw_devname)
- dev->devname = NULL;
out_no_irq:
+ if (dev_priv->enable_fb) {
+ vmw_fb_close(dev_priv);
+ vmw_kms_restore_vga(dev_priv);
+ vmw_3d_resource_dec(dev_priv);
+ }
+out_no_fifo:
+ vmw_overlay_close(dev_priv);
+ vmw_kms_close(dev_priv);
+out_no_kms:
+ if (dev_priv->stealth)
+ pci_release_region(dev->pdev, 2);
+ else
+ pci_release_regions(dev->pdev);
+out_no_device:
ttm_object_device_release(&dev_priv->tdev);
out_err4:
iounmap(dev_priv->mmio_virt);
unregister_pm_notifier(&dev_priv->pm_nb);
- vmw_fb_close(dev_priv);
+ if (dev_priv->capabilities & SVGA_CAP_IRQMASK)
+ drm_irq_uninstall(dev_priv->dev);
+ if (dev_priv->enable_fb) {
+ vmw_fb_close(dev_priv);
+ vmw_kms_restore_vga(dev_priv);
+ vmw_3d_resource_dec(dev_priv);
+ }
vmw_kms_close(dev_priv);
vmw_overlay_close(dev_priv);
- vmw_release_device(dev_priv);
if (dev_priv->stealth)
pci_release_region(dev->pdev, 2);
else
pci_release_regions(dev->pdev);
- if (dev_priv->capabilities & SVGA_CAP_IRQMASK)
- drm_irq_uninstall(dev_priv->dev);
- if (dev->devname == vmw_devname)
- dev->devname = NULL;
ttm_object_device_release(&dev_priv->tdev);
iounmap(dev_priv->mmio_virt);
drm_mtrr_del(dev_priv->mmio_mtrr, dev_priv->mmio_start,
struct drm_ioctl_desc *ioctl =
&vmw_ioctls[nr - DRM_COMMAND_BASE];
- if (unlikely(ioctl->cmd != cmd)) {
+ if (unlikely(ioctl->cmd_drv != cmd)) {
DRM_ERROR("Invalid command format, ioctl %d\n",
nr - DRM_COMMAND_BASE);
return -EINVAL;
struct vmw_master *vmaster = vmw_master(file_priv->master);
int ret = 0;
+ if (!dev_priv->enable_fb) {
+ ret = vmw_3d_resource_inc(dev_priv);
+ if (unlikely(ret != 0))
+ return ret;
+ vmw_kms_save_vga(dev_priv);
+ mutex_lock(&dev_priv->hw_mutex);
+ vmw_write(dev_priv, SVGA_REG_TRACES, 0);
+ mutex_unlock(&dev_priv->hw_mutex);
+ }
+
if (active) {
BUG_ON(active != &dev_priv->fbdev_master);
ret = ttm_vt_lock(&active->lock, false, vmw_fp->tfile);
return 0;
out_no_active_lock:
- vmw_release_device(dev_priv);
+ if (!dev_priv->enable_fb) {
+ mutex_lock(&dev_priv->hw_mutex);
+ vmw_write(dev_priv, SVGA_REG_TRACES, 1);
+ mutex_unlock(&dev_priv->hw_mutex);
+ vmw_kms_restore_vga(dev_priv);
+ vmw_3d_resource_dec(dev_priv);
+ }
return ret;
}
ttm_lock_set_kill(&vmaster->lock, true, SIGTERM);
+ if (!dev_priv->enable_fb) {
+ ret = ttm_bo_evict_mm(&dev_priv->bdev, TTM_PL_VRAM);
+ if (unlikely(ret != 0))
+ DRM_ERROR("Unable to clean VRAM on master drop.\n");
+ mutex_lock(&dev_priv->hw_mutex);
+ vmw_write(dev_priv, SVGA_REG_TRACES, 1);
+ mutex_unlock(&dev_priv->hw_mutex);
+ vmw_kms_restore_vga(dev_priv);
+ vmw_3d_resource_dec(dev_priv);
+ }
+
dev_priv->active_master = &dev_priv->fbdev_master;
ttm_lock_set_kill(&dev_priv->fbdev_master.lock, false, SIGTERM);
ttm_vt_unlock(&dev_priv->fbdev_master.lock);
- vmw_fb_on(dev_priv);
+ if (dev_priv->enable_fb)
+ vmw_fb_on(dev_priv);
}
.irq_postinstall = vmw_irq_postinstall,
.irq_uninstall = vmw_irq_uninstall,
.irq_handler = vmw_irq_handler,
+ .get_vblank_counter = vmw_get_vblank_counter,
.reclaim_buffers_locked = NULL,
.get_map_ofs = drm_core_get_map_ofs,
.get_reg_ofs = drm_core_get_reg_ofs,
bool stealth;
bool is_opened;
+ bool enable_fb;
/**
* Master management.
struct vmw_master *active_master;
struct vmw_master fbdev_master;
struct notifier_block pm_nb;
+
+ struct mutex release_mutex;
+ uint32_t num_3d_resources;
};
static inline struct vmw_private *vmw_priv(struct drm_device *dev)
return val;
}
+int vmw_3d_resource_inc(struct vmw_private *dev_priv);
+void vmw_3d_resource_dec(struct vmw_private *dev_priv);
+
/**
* GMR utilities - vmwgfx_gmr.c
*/
unsigned bbp, unsigned depth);
int vmw_kms_update_layout_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv);
+u32 vmw_get_vblank_counter(struct drm_device *dev, int crtc);
/**
* Overlay control - vmwgfx_overlay.c
if (unlikely(ret != 0))
goto err_unlock;
+ if (bo->mem.mem_type == TTM_PL_VRAM &&
+ bo->mem.mm_node->start < bo->num_pages)
+ (void) ttm_bo_validate(bo, &vmw_sys_placement, false,
+ false, false);
+
ret = ttm_bo_validate(bo, &ne_placement, false, false, false);
/* Could probably bug on */
mutex_lock(&dev_priv->hw_mutex);
dev_priv->enable_state = vmw_read(dev_priv, SVGA_REG_ENABLE);
dev_priv->config_done_state = vmw_read(dev_priv, SVGA_REG_CONFIG_DONE);
+ dev_priv->traces_state = vmw_read(dev_priv, SVGA_REG_TRACES);
vmw_write(dev_priv, SVGA_REG_ENABLE, 1);
min = 4;
dev_priv->config_done_state);
vmw_write(dev_priv, SVGA_REG_ENABLE,
dev_priv->enable_state);
+ vmw_write(dev_priv, SVGA_REG_TRACES,
+ dev_priv->traces_state);
mutex_unlock(&dev_priv->hw_mutex);
vmw_fence_queue_takedown(&fifo->fence_queue);
save->width = vmw_read(vmw_priv, SVGA_REG_DISPLAY_WIDTH);
save->height = vmw_read(vmw_priv, SVGA_REG_DISPLAY_HEIGHT);
vmw_write(vmw_priv, SVGA_REG_DISPLAY_ID, SVGA_ID_INVALID);
+ if (i == 0 && vmw_priv->num_displays == 1 &&
+ save->width == 0 && save->height == 0) {
+
+ /*
+ * It should be fairly safe to assume that these
+ * values are uninitialized.
+ */
+
+ save->width = vmw_priv->vga_width - save->pos_x;
+ save->height = vmw_priv->vga_height - save->pos_y;
+ }
}
+
return 0;
}
ttm_read_unlock(&vmaster->lock);
return ret;
}
+
+u32 vmw_get_vblank_counter(struct drm_device *dev, int crtc)
+{
+ return 0;
+}
#include "vmwgfx_kms.h"
+#define VMWGFX_LDU_NUM_DU 8
+
#define vmw_crtc_to_ldu(x) \
container_of(x, struct vmw_legacy_display_unit, base.crtc)
#define vmw_encoder_to_ldu(x) \
}
static enum drm_connector_status
- vmw_ldu_connector_detect(struct drm_connector *connector)
+ vmw_ldu_connector_detect(struct drm_connector *connector,
+ bool force)
{
if (vmw_connector_to_ldu(connector)->pref_active)
return connector_status_connected;
drm_connector_init(dev, connector, &vmw_legacy_connector_funcs,
DRM_MODE_CONNECTOR_LVDS);
- connector->status = vmw_ldu_connector_detect(connector);
+ connector->status = vmw_ldu_connector_detect(connector, true);
drm_encoder_init(dev, encoder, &vmw_legacy_encoder_funcs,
DRM_MODE_ENCODER_LVDS);
int vmw_kms_init_legacy_display_system(struct vmw_private *dev_priv)
{
+ struct drm_device *dev = dev_priv->dev;
+ int i;
+ int ret;
+
if (dev_priv->ldu_priv) {
DRM_INFO("ldu system already on\n");
return -EINVAL;
drm_mode_create_dirty_info_property(dev_priv->dev);
- vmw_ldu_init(dev_priv, 0);
- /* for old hardware without multimon only enable one display */
if (dev_priv->capabilities & SVGA_CAP_MULTIMON) {
- vmw_ldu_init(dev_priv, 1);
- vmw_ldu_init(dev_priv, 2);
- vmw_ldu_init(dev_priv, 3);
- vmw_ldu_init(dev_priv, 4);
- vmw_ldu_init(dev_priv, 5);
- vmw_ldu_init(dev_priv, 6);
- vmw_ldu_init(dev_priv, 7);
+ for (i = 0; i < VMWGFX_LDU_NUM_DU; ++i)
+ vmw_ldu_init(dev_priv, i);
+ ret = drm_vblank_init(dev, VMWGFX_LDU_NUM_DU);
+ } else {
+ /* for old hardware without multimon only enable one display */
+ vmw_ldu_init(dev_priv, 0);
+ ret = drm_vblank_init(dev, 1);
}
- return 0;
+ return ret;
}
int vmw_kms_close_legacy_display_system(struct vmw_private *dev_priv)
{
+ struct drm_device *dev = dev_priv->dev;
+
+ drm_vblank_cleanup(dev);
if (!dev_priv->ldu_priv)
return -ENOSYS;
ldu->pref_height = 600;
ldu->pref_active = false;
}
- con->status = vmw_ldu_connector_detect(con);
+ con->status = vmw_ldu_connector_detect(con, true);
}
mutex_unlock(&dev->mode_config.mutex);
cmd->body.cid = cpu_to_le32(res->id);
vmw_fifo_commit(dev_priv, sizeof(*cmd));
+ vmw_3d_resource_dec(dev_priv);
}
static int vmw_context_init(struct vmw_private *dev_priv,
cmd->body.cid = cpu_to_le32(res->id);
vmw_fifo_commit(dev_priv, sizeof(*cmd));
+ (void) vmw_3d_resource_inc(dev_priv);
vmw_resource_activate(res, vmw_hw_context_destroy);
return 0;
}
cmd->body.sid = cpu_to_le32(res->id);
vmw_fifo_commit(dev_priv, sizeof(*cmd));
+ vmw_3d_resource_dec(dev_priv);
}
void vmw_surface_res_free(struct vmw_resource *res)
}
vmw_fifo_commit(dev_priv, submit_size);
+ (void) vmw_3d_resource_inc(dev_priv);
vmw_resource_activate(res, vmw_hw_surface_destroy);
return 0;
}
pr_debug("vgaarb: decoding count now is: %d\n", vga_decode_count);
}
-void __vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes, bool userspace)
+static void __vga_set_legacy_decoding(struct pci_dev *pdev, unsigned int decodes, bool userspace)
{
struct vga_device *vgadev;
unsigned long flags;
USB_DEVICE_ID_CANDO_MULTI_TOUCH) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CANDO,
USB_DEVICE_ID_CANDO_MULTI_TOUCH_11_6) },
+ { HID_USB_DEVICE(USB_VENDOR_ID_CANDO,
+ USB_DEVICE_ID_CANDO_MULTI_TOUCH_15_6) },
{ }
};
MODULE_DEVICE_TABLE(hid, cando_devices);
{ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_ALU_WIRELESS_2009_JIS) },
{ HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_FOUNTAIN_TP_ONLY) },
{ HID_USB_DEVICE(USB_VENDOR_ID_APPLE, USB_DEVICE_ID_APPLE_GEYSER1_TP_ONLY) },
+ { HID_USB_DEVICE(USB_VENDOR_ID_ASUS, USB_DEVICE_ID_ASUS_T91MT) },
+ { HID_USB_DEVICE(USB_VENDOR_ID_ASUS, USB_DEVICE_ID_ASUSTEK_MULTITOUCH_YFO) },
{ HID_USB_DEVICE(USB_VENDOR_ID_BELKIN, USB_DEVICE_ID_FLIP_KVM) },
{ HID_USB_DEVICE(USB_VENDOR_ID_BTC, USB_DEVICE_ID_BTC_EMPREX_REMOTE) },
+ { HID_USB_DEVICE(USB_VENDOR_ID_BTC, USB_DEVICE_ID_BTC_EMPREX_REMOTE_2) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CANDO, USB_DEVICE_ID_CANDO_MULTI_TOUCH) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CANDO, USB_DEVICE_ID_CANDO_MULTI_TOUCH_11_6) },
+ { HID_USB_DEVICE(USB_VENDOR_ID_CANDO, USB_DEVICE_ID_CANDO_MULTI_TOUCH_15_6) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CHERRY, USB_DEVICE_ID_CHERRY_CYMOTION) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CHERRY, USB_DEVICE_ID_CHERRY_CYMOTION_SOLAR) },
{ HID_USB_DEVICE(USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_TACTICAL_PAD) },
{ HID_USB_DEVICE(USB_VENDOR_ID_AIPTEK, USB_DEVICE_ID_AIPTEK_24) },
{ HID_USB_DEVICE(USB_VENDOR_ID_AIRCABLE, USB_DEVICE_ID_AIRCABLE1) },
{ HID_USB_DEVICE(USB_VENDOR_ID_ALCOR, USB_DEVICE_ID_ALCOR_USBRS232) },
- { HID_USB_DEVICE(USB_VENDOR_ID_ASUS, USB_DEVICE_ID_ASUS_T91MT)},
{ HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_LCM)},
{ HID_USB_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_LCM2)},
{ HID_USB_DEVICE(USB_VENDOR_ID_AVERMEDIA, USB_DEVICE_ID_AVER_FM_MR800) },
#define USB_VENDOR_ID_ASUS 0x0486
#define USB_DEVICE_ID_ASUS_T91MT 0x0185
+#define USB_DEVICE_ID_ASUSTEK_MULTITOUCH_YFO 0x0186
#define USB_VENDOR_ID_ASUSTEK 0x0b05
#define USB_DEVICE_ID_ASUSTEK_LCM 0x1726
#define USB_VENDOR_ID_BTC 0x046e
#define USB_DEVICE_ID_BTC_EMPREX_REMOTE 0x5578
+#define USB_DEVICE_ID_BTC_EMPREX_REMOTE_2 0x5577
#define USB_VENDOR_ID_CANDO 0x2087
#define USB_DEVICE_ID_CANDO_MULTI_TOUCH 0x0a01
#define USB_DEVICE_ID_CANDO_MULTI_TOUCH_11_6 0x0b03
+#define USB_DEVICE_ID_CANDO_MULTI_TOUCH_15_6 0x0f01
#define USB_VENDOR_ID_CH 0x068e
#define USB_DEVICE_ID_CH_PRO_PEDALS 0x00f2
#define USB_VENDOR_ID_CHICONY 0x04f2
#define USB_DEVICE_ID_CHICONY_TACTICAL_PAD 0x0418
+#define USB_DEVICE_ID_CHICONY_MULTI_TOUCH 0xb19d
#define USB_VENDOR_ID_CIDC 0x1677
#define USB_VENDOR_ID_TURBOX 0x062a
#define USB_DEVICE_ID_TURBOX_KEYBOARD 0x0201
+#define USB_DEVICE_ID_TURBOX_TOUCHSCREEN_MOSART 0x7100
#define USB_VENDOR_ID_TWINHAN 0x6253
#define USB_DEVICE_ID_TWINHAN_IR_REMOTE 0x0100
#define USB_VENDOR_ID_UCLOGIC 0x5543
#define USB_DEVICE_ID_UCLOGIC_TABLET_PF1209 0x0042
#define USB_DEVICE_ID_UCLOGIC_TABLET_WP4030U 0x0003
+#define USB_DEVICE_ID_UCLOGIC_TABLET_KNA5 0x6001
#define USB_VENDOR_ID_VERNIER 0x08f7
#define USB_DEVICE_ID_VERNIER_LABPRO 0x0001
static const struct hid_device_id mosart_devices[] = {
{ HID_USB_DEVICE(USB_VENDOR_ID_ASUS, USB_DEVICE_ID_ASUS_T91MT) },
+ { HID_USB_DEVICE(USB_VENDOR_ID_ASUS, USB_DEVICE_ID_ASUSTEK_MULTITOUCH_YFO) },
{ }
};
MODULE_DEVICE_TABLE(hid, mosart_devices);
static const struct hid_device_id ts_devices[] = {
{ HID_USB_DEVICE(USB_VENDOR_ID_TOPSEED, USB_DEVICE_ID_TOPSEED_CYBERLINK) },
{ HID_USB_DEVICE(USB_VENDOR_ID_BTC, USB_DEVICE_ID_BTC_EMPREX_REMOTE) },
+ { HID_USB_DEVICE(USB_VENDOR_ID_BTC, USB_DEVICE_ID_BTC_EMPREX_REMOTE_2) },
{ HID_USB_DEVICE(USB_VENDOR_ID_TOPSEED2, USB_DEVICE_ID_TOPSEED2_RF_COMBO) },
{ }
};
int ret = 0;
mutex_lock(&minors_lock);
+
+ if (!hidraw_table[minor]) {
+ ret = -ENODEV;
+ goto out;
+ }
+
dev = hidraw_table[minor]->hid;
if (!dev->hid_output_raw_report) {
mutex_lock(&minors_lock);
dev = hidraw_table[minor];
+ if (!dev) {
+ ret = -ENODEV;
+ goto out;
+ }
switch (cmd) {
case HIDIOCGRDESCSIZE:
ret = -ENOTTY;
}
+out:
mutex_unlock(&minors_lock);
return ret;
}
}
} else {
int skipped_report_id = 0;
+ int report_id = buf[0];
if (buf[0] == 0x0) {
/* Don't send the Report ID */
buf++;
ret = usb_control_msg(dev, usb_sndctrlpipe(dev, 0),
HID_REQ_SET_REPORT,
USB_DIR_OUT | USB_TYPE_CLASS | USB_RECIP_INTERFACE,
- ((report_type + 1) << 8) | *buf,
+ ((report_type + 1) << 8) | report_id,
interface->desc.bInterfaceNumber, buf, count,
USB_CTRL_SET_TIMEOUT);
/* count also the report id, if this was a numbered report. */
{ }
};
+struct usb_interface *usbhid_find_interface(int minor)
+{
+ return usb_find_interface(&hid_driver, minor);
+}
+
static struct hid_driver hid_usb_driver = {
.name = "generic-usb",
.id_table = hid_usb_table,
{ USB_VENDOR_ID_AASHIMA, USB_DEVICE_ID_AASHIMA_PREDATOR, HID_QUIRK_BADPAD },
{ USB_VENDOR_ID_ALPS, USB_DEVICE_ID_IBM_GAMEPAD, HID_QUIRK_BADPAD },
{ USB_VENDOR_ID_CHIC, USB_DEVICE_ID_CHIC_GAMEPAD, HID_QUIRK_BADPAD },
+ { USB_VENDOR_ID_DWAV, USB_DEVICE_ID_EGALAX_TOUCHCONTROLLER, HID_QUIRK_MULTI_INPUT | HID_QUIRK_NOGET },
{ USB_VENDOR_ID_DWAV, USB_DEVICE_ID_DWAV_EGALAX_MULTITOUCH, HID_QUIRK_MULTI_INPUT },
{ USB_VENDOR_ID_MOJO, USB_DEVICE_ID_RETRO_ADAPTER, HID_QUIRK_MULTI_INPUT },
+ { USB_VENDOR_ID_TURBOX, USB_DEVICE_ID_TURBOX_TOUCHSCREEN_MOSART, HID_QUIRK_MULTI_INPUT },
{ USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_DRIVING, HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT },
{ USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_FLYING, HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT },
{ USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_FIGHTING, HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT },
{ USB_VENDOR_ID_TURBOX, USB_DEVICE_ID_TURBOX_KEYBOARD, HID_QUIRK_NOGET },
{ USB_VENDOR_ID_UCLOGIC, USB_DEVICE_ID_UCLOGIC_TABLET_PF1209, HID_QUIRK_MULTI_INPUT },
{ USB_VENDOR_ID_UCLOGIC, USB_DEVICE_ID_UCLOGIC_TABLET_WP4030U, HID_QUIRK_MULTI_INPUT },
+ { USB_VENDOR_ID_UCLOGIC, USB_DEVICE_ID_UCLOGIC_TABLET_KNA5, HID_QUIRK_MULTI_INPUT },
{ USB_VENDOR_ID_WISEGROUP, USB_DEVICE_ID_DUAL_USB_JOYPAD, HID_QUIRK_NOGET | HID_QUIRK_MULTI_INPUT | HID_QUIRK_SKIP_OUTPUT_REPORTS },
{ USB_VENDOR_ID_WISEGROUP, USB_DEVICE_ID_QUAD_USB_JOYPAD, HID_QUIRK_NOGET | HID_QUIRK_MULTI_INPUT },
{ USB_VENDOR_ID_PI_ENGINEERING, USB_DEVICE_ID_PI_ENGINEERING_VEC_USB_FOOTPEDAL, HID_QUIRK_HIDINPUT_FORCE },
+ { USB_VENDOR_ID_CHICONY, USB_DEVICE_ID_CHICONY_MULTI_TOUCH, HID_QUIRK_MULTI_INPUT },
+
{ 0, 0 }
};
struct hiddev *hiddev;
int res;
- intf = usb_find_interface(&hiddev_driver, iminor(inode));
+ intf = usbhid_find_interface(iminor(inode));
if (!intf)
return -ENODEV;
hid = usb_get_intfdata(intf);
(struct hid_device *hid, struct hid_report *report, unsigned char dir);
int usbhid_get_power(struct hid_device *hid);
void usbhid_put_power(struct hid_device *hid);
+struct usb_interface *usbhid_find_interface(int minor);
/* iofl flags */
#define HID_CTRL_RUNNING 1
config SENSORS_PKGTEMP
tristate "Intel processor package temperature sensor"
- depends on X86 && PCI && EXPERIMENTAL
+ depends on X86 && EXPERIMENTAL
help
If you say yes here you get support for the package level temperature
sensor inside your CPU. Check documentation/driver for details.
int chip_type;
char valid; /* !=0 if following fields are valid */
unsigned long last_updated; /* In jiffies */
- unsigned int update_rate; /* In milliseconds */
+ unsigned int update_interval; /* In milliseconds */
/* The chan_select_table contains the possible configurations for
* auto fan control.
*/
static SENSOR_DEVICE_ATTR(temp3_fault, S_IRUGO, show_alarm, NULL, 13);
static SENSOR_DEVICE_ATTR(temp1_crit_alarm, S_IRUGO, show_alarm, NULL, 14);
-/* Update Rate */
-static const unsigned int update_rates[] = {
+/* Update Interval */
+static const unsigned int update_intervals[] = {
16000, 8000, 4000, 2000, 1000, 500, 250, 125,
};
-static ssize_t show_update_rate(struct device *dev,
- struct device_attribute *attr, char *buf)
+static ssize_t show_update_interval(struct device *dev,
+ struct device_attribute *attr, char *buf)
{
struct i2c_client *client = to_i2c_client(dev);
struct adm1031_data *data = i2c_get_clientdata(client);
- return sprintf(buf, "%u\n", data->update_rate);
+ return sprintf(buf, "%u\n", data->update_interval);
}
-static ssize_t set_update_rate(struct device *dev,
- struct device_attribute *attr,
- const char *buf, size_t count)
+static ssize_t set_update_interval(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
{
struct i2c_client *client = to_i2c_client(dev);
struct adm1031_data *data = i2c_get_clientdata(client);
if (err)
return err;
- /* find the nearest update rate from the table */
- for (i = 0; i < ARRAY_SIZE(update_rates) - 1; i++) {
- if (val >= update_rates[i])
+ /*
+ * Find the nearest update interval from the table.
+ * Use it to determine the matching update rate.
+ */
+ for (i = 0; i < ARRAY_SIZE(update_intervals) - 1; i++) {
+ if (val >= update_intervals[i])
break;
}
- /* if not found, we point to the last entry (lowest update rate) */
+ /* if not found, we point to the last entry (lowest update interval) */
/* set the new update rate while preserving other settings */
reg = adm1031_read_value(client, ADM1031_REG_FAN_FILTER);
adm1031_write_value(client, ADM1031_REG_FAN_FILTER, reg);
mutex_lock(&data->update_lock);
- data->update_rate = update_rates[i];
+ data->update_interval = update_intervals[i];
mutex_unlock(&data->update_lock);
return count;
}
-static DEVICE_ATTR(update_rate, S_IRUGO | S_IWUSR, show_update_rate,
- set_update_rate);
+static DEVICE_ATTR(update_interval, S_IRUGO | S_IWUSR, show_update_interval,
+ set_update_interval);
static struct attribute *adm1031_attributes[] = {
&sensor_dev_attr_fan1_input.dev_attr.attr,
&sensor_dev_attr_auto_fan1_min_pwm.dev_attr.attr,
- &dev_attr_update_rate.attr,
+ &dev_attr_update_interval.attr,
&dev_attr_alarms.attr,
NULL
mask = ADM1031_UPDATE_RATE_MASK;
read_val = adm1031_read_value(client, ADM1031_REG_FAN_FILTER);
i = (read_val & mask) >> ADM1031_UPDATE_RATE_SHIFT;
- data->update_rate = update_rates[i];
+ /* Save it as update interval */
+ data->update_interval = update_intervals[i];
}
static struct adm1031_data *adm1031_update_device(struct device *dev)
mutex_lock(&data->update_lock);
- next_update = data->last_updated + msecs_to_jiffies(data->update_rate);
+ next_update = data->last_updated
+ + msecs_to_jiffies(data->update_interval);
if (time_after(jiffies, next_update) || !data->valid) {
dev_dbg(&client->dev, "Starting adm1031 update\n");
#include <linux/pci.h>
#include <asm/msr.h>
#include <asm/processor.h>
+#include <asm/smp.h>
#define DRVNAME "coretemp"
int err;
struct platform_device *pdev;
struct pdev_entry *pdev_entry;
-#ifdef CONFIG_SMP
struct cpuinfo_x86 *c = &cpu_data(cpu);
-#endif
+
+ /*
+ * CPUID.06H.EAX[0] indicates whether the CPU has thermal
+ * sensors. We check this bit only, all the early CPUs
+ * without thermal sensors will be filtered out.
+ */
+ if (!cpu_has(c, X86_FEATURE_DTS)) {
+ printk(KERN_INFO DRVNAME ": CPU (model=0x%x)"
+ " has no thermal sensor.\n", c->x86_model);
+ return 0;
+ }
mutex_lock(&pdev_list_mutex);
static void coretemp_device_remove(unsigned int cpu)
{
- struct pdev_entry *p, *n;
+ struct pdev_entry *p;
+ unsigned int i;
+
mutex_lock(&pdev_list_mutex);
- list_for_each_entry_safe(p, n, &pdev_list, list) {
- if (p->cpu == cpu) {
- platform_device_unregister(p->pdev);
- list_del(&p->list);
- kfree(p);
- }
+ list_for_each_entry(p, &pdev_list, list) {
+ if (p->cpu != cpu)
+ continue;
+
+ platform_device_unregister(p->pdev);
+ list_del(&p->list);
+ mutex_unlock(&pdev_list_mutex);
+ kfree(p);
+ for_each_cpu(i, cpu_sibling_mask(cpu))
+ if (i != cpu && !coretemp_device_add(i))
+ break;
+ return;
}
mutex_unlock(&pdev_list_mutex);
}
if (err)
goto exit;
- for_each_online_cpu(i) {
- struct cpuinfo_x86 *c = &cpu_data(i);
- /*
- * CPUID.06H.EAX[0] indicates whether the CPU has thermal
- * sensors. We check this bit only, all the early CPUs
- * without thermal sensors will be filtered out.
- */
- if (c->cpuid_level >= 6 && (cpuid_eax(0x06) & 0x01))
- coretemp_device_add(i);
- else {
- printk(KERN_INFO DRVNAME ": CPU (model=0x%x)"
- " has no thermal sensor.\n", c->x86_model);
- }
- }
+ for_each_online_cpu(i)
+ coretemp_device_add(i);
+
+#ifndef CONFIG_HOTPLUG_CPU
if (list_empty(&pdev_list)) {
err = -ENODEV;
goto exit_driver_unreg;
}
+#endif
register_hotcpu_notifier(&coretemp_cpu_notifier);
return 0;
-exit_driver_unreg:
#ifndef CONFIG_HOTPLUG_CPU
+exit_driver_unreg:
platform_driver_unregister(&coretemp_driver);
#endif
exit:
res = sysfs_create_group(&client->dev.kobj, &m_thermal_gr);
if (res) {
dev_warn(&client->dev, "create group failed\n");
- hwmon_device_unregister(data->hwmon_dev);
goto thermal_error1;
}
data->hwmon_dev = hwmon_device_register(&client->dev);
/* Super-I/O Function prototypes */
static inline int superio_inb(int base, int reg);
static inline int superio_inw(int base, int reg);
-static inline void superio_enter(int base);
+static inline int superio_enter(int base);
static inline void superio_select(int base, int ld);
static inline void superio_exit(int base);
return val;
}
-static inline void superio_enter(int base)
+static inline int superio_enter(int base)
{
+ /* Don't step on other drivers' I/O space by accident */
+ if (!request_muxed_region(base, 2, DRVNAME)) {
+ printk(KERN_ERR DRVNAME ": I/O address 0x%04x already in use\n",
+ base);
+ return -EBUSY;
+ }
+
/* according to the datasheet the key must be send twice! */
outb(SIO_UNLOCK_KEY, base);
outb(SIO_UNLOCK_KEY, base);
+
+ return 0;
}
static inline void superio_select(int base, int ld)
static inline void superio_exit(int base)
{
outb(SIO_LOCK_KEY, base);
+ release_region(base, 2);
}
static inline int fan_from_reg(u16 reg)
static int __init f71882fg_find(int sioaddr, unsigned short *address,
struct f71882fg_sio_data *sio_data)
{
- int err = -ENODEV;
u16 devid;
-
- /* Don't step on other drivers' I/O space by accident */
- if (!request_region(sioaddr, 2, DRVNAME)) {
- printk(KERN_ERR DRVNAME ": I/O address 0x%04x already in use\n",
- (int)sioaddr);
- return -EBUSY;
- }
-
- superio_enter(sioaddr);
+ int err = superio_enter(sioaddr);
+ if (err)
+ return err;
devid = superio_inw(sioaddr, SIO_REG_MANID);
if (devid != SIO_FINTEK_ID) {
pr_debug(DRVNAME ": Not a Fintek device\n");
+ err = -ENODEV;
goto exit;
}
default:
printk(KERN_INFO DRVNAME ": Unsupported Fintek device: %04x\n",
(unsigned int)devid);
+ err = -ENODEV;
goto exit;
}
if (!(superio_inb(sioaddr, SIO_REG_ENABLE) & 0x01)) {
printk(KERN_WARNING DRVNAME ": Device not activated\n");
+ err = -ENODEV;
goto exit;
}
*address = superio_inw(sioaddr, SIO_REG_ADDR);
if (*address == 0) {
printk(KERN_WARNING DRVNAME ": Base address not set\n");
+ err = -ENODEV;
goto exit;
}
*address &= ~(REGION_LENGTH - 1); /* Ignore 3 LSB */
(int)superio_inb(sioaddr, SIO_REG_DEVREV));
exit:
superio_exit(sioaddr);
- release_region(sioaddr, 2);
return err;
}
#define F75375_REG_PWM2_DROP_DUTY 0x6C
#define FAN_CTRL_LINEAR(nr) (4 + nr)
-#define FAN_CTRL_MODE(nr) (5 + ((nr) * 2))
+#define FAN_CTRL_MODE(nr) (4 + ((nr) * 2))
/*
* Data structures and manipulation thereof
return -EINVAL;
fanmode = f75375_read8(client, F75375_REG_FAN_TIMER);
- fanmode = ~(3 << FAN_CTRL_MODE(nr));
+ fanmode &= ~(3 << FAN_CTRL_MODE(nr));
switch (val) {
case 0: /* Full speed */
mutex_lock(&data->update_lock);
conf = f75375_read8(client, F75375_REG_CONFIG1);
- conf = ~(1 << FAN_CTRL_LINEAR(nr));
+ conf &= ~(1 << FAN_CTRL_LINEAR(nr));
if (val == 0)
conf |= (1 << FAN_CTRL_LINEAR(nr)) ;
AXIS_DMI_MATCH("HPB442x", "HP ProBook 442", xy_rotated_left),
AXIS_DMI_MATCH("HPB452x", "HP ProBook 452", y_inverted),
AXIS_DMI_MATCH("HPB522x", "HP ProBook 522", xy_swap),
+ AXIS_DMI_MATCH("HPB532x", "HP ProBook 532", y_inverted),
+ AXIS_DMI_MATCH("Mini5102", "HP Mini 5102", xy_rotated_left_usd),
{ NULL, }
/* Laptop models without axis info (yet):
* "NC6910" "HP Compaq 6910"
wake_up_interruptible(&lis3_dev.misc_wait);
kill_fasync(&lis3_dev.async_queue, SIGIO, POLL_IN);
out:
- if (lis3_dev.whoami == WAI_8B && lis3_dev.idev &&
+ if (lis3_dev.pdata && lis3_dev.whoami == WAI_8B && lis3_dev.idev &&
lis3_dev.idev->input->users)
return IRQ_WAKE_THREAD;
return IRQ_HANDLED;
* io-apic is not configurable (and generates a warning) but I keep it
* in case of support for other hardware.
*/
- if (dev->whoami == WAI_8B)
+ if (dev->pdata && dev->whoami == WAI_8B)
thread_fn = lis302dl_interrupt_thread1_8b;
else
thread_fn = NULL;
{
struct lis3lv02d *lis3 = i2c_get_clientdata(client);
- if (!lis3->pdata->wakeup_flags)
+ if (!lis3->pdata || !lis3->pdata->wakeup_flags)
lis3lv02d_poweroff(lis3);
return 0;
}
{
struct lis3lv02d *lis3 = i2c_get_clientdata(client);
- if (!lis3->pdata->wakeup_flags)
+ if (!lis3->pdata || !lis3->pdata->wakeup_flags)
lis3lv02d_poweron(lis3);
return 0;
}
{
struct lis3lv02d *lis3 = spi_get_drvdata(spi);
- if (!lis3->pdata->wakeup_flags)
+ if (!lis3->pdata || !lis3->pdata->wakeup_flags)
lis3lv02d_poweroff(&lis3_dev);
return 0;
{
struct lis3lv02d *lis3 = spi_get_drvdata(spi);
- if (!lis3->pdata->wakeup_flags)
+ if (!lis3->pdata || !lis3->pdata->wakeup_flags)
lis3lv02d_poweron(lis3);
return 0;
struct lm95241_data {
struct device *hwmon_dev;
struct mutex update_lock;
- unsigned long last_updated, rate; /* in jiffies */
+ unsigned long last_updated, interval; /* in jiffies */
char valid; /* zero until following fields are valid */
/* registers values */
u8 local_h, local_l; /* local */
show_temp(remote1);
show_temp(remote2);
-static ssize_t show_rate(struct device *dev, struct device_attribute *attr,
+static ssize_t show_interval(struct device *dev, struct device_attribute *attr,
char *buf)
{
struct lm95241_data *data = lm95241_update_device(dev);
- snprintf(buf, PAGE_SIZE - 1, "%lu\n", 1000 * data->rate / HZ);
+ snprintf(buf, PAGE_SIZE - 1, "%lu\n", 1000 * data->interval / HZ);
return strlen(buf);
}
-static ssize_t set_rate(struct device *dev, struct device_attribute *attr,
+static ssize_t set_interval(struct device *dev, struct device_attribute *attr,
const char *buf, size_t count)
{
struct i2c_client *client = to_i2c_client(dev);
struct lm95241_data *data = i2c_get_clientdata(client);
- strict_strtol(buf, 10, &data->rate);
- data->rate = data->rate * HZ / 1000;
+ strict_strtol(buf, 10, &data->interval);
+ data->interval = data->interval * HZ / 1000;
return count;
}
static DEVICE_ATTR(temp3_min, S_IWUSR | S_IRUGO, show_min2, set_min2);
static DEVICE_ATTR(temp2_max, S_IWUSR | S_IRUGO, show_max1, set_max1);
static DEVICE_ATTR(temp3_max, S_IWUSR | S_IRUGO, show_max2, set_max2);
-static DEVICE_ATTR(rate, S_IWUSR | S_IRUGO, show_rate, set_rate);
+static DEVICE_ATTR(update_interval, S_IWUSR | S_IRUGO, show_interval,
+ set_interval);
static struct attribute *lm95241_attributes[] = {
&dev_attr_temp1_input.attr,
&dev_attr_temp3_min.attr,
&dev_attr_temp2_max.attr,
&dev_attr_temp3_max.attr,
- &dev_attr_rate.attr,
+ &dev_attr_update_interval.attr,
NULL
};
{
struct lm95241_data *data = i2c_get_clientdata(client);
- data->rate = HZ; /* 1 sec default */
+ data->interval = HZ; /* 1 sec default */
data->valid = 0;
data->config = CFG_CR0076;
data->model = 0;
mutex_lock(&data->update_lock);
- if (time_after(jiffies, data->last_updated + data->rate) ||
+ if (time_after(jiffies, data->last_updated + data->interval) ||
!data->valid) {
dev_dbg(&client->dev, "Updating lm95241 data.\n");
data->local_h =
#include <linux/list.h>
#include <linux/platform_device.h>
#include <linux/cpu.h>
-#include <linux/pci.h>
#include <asm/msr.h>
#include <asm/processor.h>
err = sysfs_create_group(&pdev->dev.kobj, &pkgtemp_group);
if (err)
- goto exit_free;
+ goto exit_dev;
data->hwmon_dev = hwmon_device_register(&pdev->dev);
if (IS_ERR(data->hwmon_dev)) {
exit_class:
sysfs_remove_group(&pdev->dev.kobj, &pkgtemp_group);
+exit_dev:
+ device_remove_file(&pdev->dev, &sensor_dev_attr_temp1_max.dev_attr);
exit_free:
kfree(data);
exit:
hwmon_device_unregister(data->hwmon_dev);
sysfs_remove_group(&pdev->dev.kobj, &pkgtemp_group);
+ device_remove_file(&pdev->dev, &sensor_dev_attr_temp1_max.dev_attr);
platform_set_drvdata(pdev, NULL);
kfree(data);
return 0;
int err;
struct platform_device *pdev;
struct pdev_entry *pdev_entry;
-#ifdef CONFIG_SMP
struct cpuinfo_x86 *c = &cpu_data(cpu);
-#endif
+
+ if (!cpu_has(c, X86_FEATURE_PTS))
+ return 0;
mutex_lock(&pdev_list_mutex);
#ifdef CONFIG_HOTPLUG_CPU
static void pkgtemp_device_remove(unsigned int cpu)
{
- struct pdev_entry *p, *n;
+ struct pdev_entry *p;
unsigned int i;
int err;
mutex_lock(&pdev_list_mutex);
- list_for_each_entry_safe(p, n, &pdev_list, list) {
+ list_for_each_entry(p, &pdev_list, list) {
if (p->cpu != cpu)
continue;
platform_device_unregister(p->pdev);
list_del(&p->list);
+ mutex_unlock(&pdev_list_mutex);
kfree(p);
for_each_cpu(i, cpu_core_mask(cpu)) {
if (i != cpu) {
break;
}
}
- break;
+ return;
}
mutex_unlock(&pdev_list_mutex);
}
goto exit;
for_each_online_cpu(i) {
- struct cpuinfo_x86 *c = &cpu_data(i);
-
- if (!cpu_has(c, X86_FEATURE_PTS))
- continue;
-
err = pkgtemp_device_add(i);
if (err)
goto exit_devices_unreg;
static inline void
superio_exit(int ioreg)
{
+ outb(0xaa, ioreg);
outb(0x02, ioreg);
outb(0x02, ioreg + 1);
}
dev_dbg(&ofdev->dev, "hw routines for %s registered.\n",
cpm->adap.name);
+ /*
+ * register OF I2C devices
+ */
+ of_i2c_register_devices(&cpm->adap);
+
return 0;
out_shut:
cpm_i2c_shutdown(cpm);
INIT_COMPLETION(dev->cmd_complete);
dev->cmd_err = 0;
- /* Take I2C out of reset, configure it as master and set the
- * start bit */
- flag = DAVINCI_I2C_MDR_IRS | DAVINCI_I2C_MDR_MST | DAVINCI_I2C_MDR_STT;
+ /* Take I2C out of reset and configure it as master */
+ flag = DAVINCI_I2C_MDR_IRS | DAVINCI_I2C_MDR_MST;
/* if the slave address is ten bit address, enable XA bit */
if (msg->flags & I2C_M_TEN)
flag |= DAVINCI_I2C_MDR_XA;
if (!(msg->flags & I2C_M_RD))
flag |= DAVINCI_I2C_MDR_TRX;
- if (stop)
- flag |= DAVINCI_I2C_MDR_STP;
- if (msg->len == 0) {
+ if (msg->len == 0)
flag |= DAVINCI_I2C_MDR_RM;
- flag &= ~DAVINCI_I2C_MDR_STP;
- }
/* Enable receive or transmit interrupts */
w = davinci_i2c_read_reg(dev, DAVINCI_I2C_IMR_REG);
dev->terminate = 0;
- /* write the data into mode register */
+ /*
+ * Write mode register first as needed for correct behaviour
+ * on OMAP-L138, but don't set STT yet to avoid a race with XRDY
+ * occuring before we have loaded DXR
+ */
davinci_i2c_write_reg(dev, DAVINCI_I2C_MDR_REG, flag);
/*
* because transmit-data-ready interrupt can come before
* NACK-interrupt during sending of previous message and
* ICDXR may have wrong data
+ * It also saves us one interrupt, slightly faster
*/
if ((!(msg->flags & I2C_M_RD)) && dev->buf_len) {
davinci_i2c_write_reg(dev, DAVINCI_I2C_DXR_REG, *dev->buf++);
dev->buf_len--;
}
+ /* Set STT to begin transmit now DXR is loaded */
+ flag |= DAVINCI_I2C_MDR_STT;
+ if (stop && msg->len != 0)
+ flag |= DAVINCI_I2C_MDR_STP;
+ davinci_i2c_write_reg(dev, DAVINCI_I2C_MDR_REG, flag);
+
r = wait_for_completion_interruptible_timeout(&dev->cmd_complete,
dev->adapter.timeout);
if (r == 0) {
dev_info(&ofdev->dev, "using %s mode\n",
dev->fast_mode ? "fast (400 kHz)" : "standard (100 kHz)");
+ /* Now register all the child nodes */
+ of_i2c_register_devices(adap);
+
return 0;
error_cleanup:
static int i2c_imx_trx_complete(struct imx_i2c_struct *i2c_imx)
{
- int result;
-
- result = wait_event_interruptible_timeout(i2c_imx->queue,
- i2c_imx->i2csr & I2SR_IIF, HZ / 10);
+ wait_event_timeout(i2c_imx->queue, i2c_imx->i2csr & I2SR_IIF, HZ / 10);
- if (unlikely(result < 0)) {
- dev_dbg(&i2c_imx->adapter.dev, "<%s> result < 0\n", __func__);
- return result;
- } else if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
+ if (unlikely(!(i2c_imx->i2csr & I2SR_IIF))) {
dev_dbg(&i2c_imx->adapter.dev, "<%s> Timeout\n", __func__);
return -ETIMEDOUT;
}
i2c_imx->i2csr = temp;
temp &= ~I2SR_IIF;
writeb(temp, i2c_imx->base + IMX_I2C_I2SR);
- wake_up_interruptible(&i2c_imx->queue);
+ wake_up(&i2c_imx->queue);
return IRQ_HANDLED;
}
dev_err(i2c->dev, "failed to add adapter\n");
goto fail_add;
}
+ of_i2c_register_devices(&i2c->adap);
return result;
return result;
} else if (result == 0) {
dev_dbg(i2c->dev, "%s: timeout\n", __func__);
- result = -ETIMEDOUT;
+ return -ETIMEDOUT;
}
return 0;
if (r == 0)
r = num;
+
+ omap_i2c_wait_for_bb(dev);
out:
omap_i2c_idle(dev);
return r;
static int pca_isa_waitforcompletion(void *pd)
{
- long ret = ~0;
unsigned long timeout;
+ long ret;
if (irq > -1) {
ret = wait_event_timeout(pca_wait,
} else {
/* Do polling */
timeout = jiffies + pca_isa_ops.timeout;
- while (((pca_isa_readbyte(pd, I2C_PCA_CON)
- & I2C_PCA_CON_SI) == 0)
- && (ret = time_before(jiffies, timeout)))
+ do {
+ ret = time_before(jiffies, timeout);
+ if (pca_isa_readbyte(pd, I2C_PCA_CON)
+ & I2C_PCA_CON_SI)
+ break;
udelay(100);
+ } while (ret);
}
+
return ret > 0;
}
static int i2c_pca_pf_waitforcompletion(void *pd)
{
struct i2c_pca_pf_data *i2c = pd;
- long ret = ~0;
unsigned long timeout;
+ long ret;
if (i2c->irq) {
ret = wait_event_timeout(i2c->wait,
} else {
/* Do polling */
timeout = jiffies + i2c->adap.timeout;
- while (((i2c->algo_data.read_byte(i2c, I2C_PCA_CON)
- & I2C_PCA_CON_SI) == 0)
- && (ret = time_before(jiffies, timeout)))
+ do {
+ ret = time_before(jiffies, timeout);
+ if (i2c->algo_data.read_byte(i2c, I2C_PCA_CON)
+ & I2C_PCA_CON_SI)
+ break;
udelay(100);
+ } while (ret);
}
return ret > 0;
unsigned long sda_delay;
if (pdata->sda_delay) {
- sda_delay = (freq / 1000) * pdata->sda_delay;
- sda_delay /= 1000000;
+ sda_delay = clkin * pdata->sda_delay;
+ sda_delay = DIV_ROUND_UP(sda_delay, 1000000);
sda_delay = DIV_ROUND_UP(sda_delay, 5);
if (sda_delay > 3)
sda_delay = 3;
#include <linux/init.h>
#include <linux/idr.h>
#include <linux/mutex.h>
-#include <linux/of_i2c.h>
#include <linux/of_device.h>
#include <linux/completion.h>
#include <linux/hardirq.h>
{
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
- if (pm_runtime_suspended(dev))
- return 0;
-
- if (pm)
- return pm->suspend ? pm->suspend(dev) : 0;
+ if (pm) {
+ if (pm_runtime_suspended(dev))
+ return 0;
+ else
+ return pm->suspend ? pm->suspend(dev) : 0;
+ }
return i2c_legacy_suspend(dev, PMSG_SUSPEND);
}
else
ret = i2c_legacy_resume(dev);
- if (!ret) {
- pm_runtime_disable(dev);
- pm_runtime_set_active(dev);
- pm_runtime_enable(dev);
- }
-
return ret;
}
{
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
- if (pm_runtime_suspended(dev))
- return 0;
-
- if (pm)
- return pm->freeze ? pm->freeze(dev) : 0;
+ if (pm) {
+ if (pm_runtime_suspended(dev))
+ return 0;
+ else
+ return pm->freeze ? pm->freeze(dev) : 0;
+ }
return i2c_legacy_suspend(dev, PMSG_FREEZE);
}
{
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
- if (pm_runtime_suspended(dev))
- return 0;
-
- if (pm)
- return pm->thaw ? pm->thaw(dev) : 0;
+ if (pm) {
+ if (pm_runtime_suspended(dev))
+ return 0;
+ else
+ return pm->thaw ? pm->thaw(dev) : 0;
+ }
return i2c_legacy_resume(dev);
}
{
const struct dev_pm_ops *pm = dev->driver ? dev->driver->pm : NULL;
- if (pm_runtime_suspended(dev))
- return 0;
-
- if (pm)
- return pm->poweroff ? pm->poweroff(dev) : 0;
+ if (pm) {
+ if (pm_runtime_suspended(dev))
+ return 0;
+ else
+ return pm->poweroff ? pm->poweroff(dev) : 0;
+ }
return i2c_legacy_suspend(dev, PMSG_HIBERNATE);
}
if (adap->nr < __i2c_first_dynamic_bus_num)
i2c_scan_static_board_info(adap);
- /* Register devices from the device tree */
- of_i2c_register_devices(adap);
-
/* Notify drivers */
mutex_lock(&core_lock);
bus_for_each_drv(&i2c_bus_type, NULL, adap, __process_new_adapter);
if (hwif == NULL)
continue;
- if (hwif->present)
- hwif_register_devices(hwif);
- }
-
- ide_host_for_each_port(i, hwif, host) {
- if (hwif == NULL)
- continue;
-
ide_sysfs_register_port(hwif);
ide_proc_register_port(hwif);
- if (hwif->present)
+ if (hwif->present) {
ide_proc_port_register_devices(hwif);
+ hwif_register_devices(hwif);
+ }
}
return j ? 0 : -1;
#include <linux/hrtimer.h> /* ktime_get_real() */
#include <trace/events/power.h>
#include <linux/sched.h>
+#include <asm/mwait.h>
#define INTEL_IDLE_VERSION "0.4"
#define PREFIX "intel_idle: "
-#define MWAIT_SUBSTATE_MASK (0xf)
-#define MWAIT_CSTATE_MASK (0xf)
-#define MWAIT_SUBSTATE_SIZE (4)
-#define MWAIT_MAX_NUM_CSTATES 8
-#define CPUID_MWAIT_LEAF (5)
-#define CPUID5_ECX_EXTENSIONS_SUPPORTED (0x1)
-#define CPUID5_ECX_INTERRUPT_BREAK (0x2)
-
static struct cpuidle_driver intel_idle_driver = {
.name = "intel_idle",
.owner = THIS_MODULE,
/* Reliable LAPIC Timer States, bit 1 for C1 etc. */
static unsigned int lapic_timer_reliable_states;
-static struct cpuidle_device *intel_idle_cpuidle_devices;
+static struct cpuidle_device __percpu *intel_idle_cpuidle_devices;
static int intel_idle(struct cpuidle_device *dev, struct cpuidle_state *state);
static struct cpuidle_state *cpuidle_state_table;
.name = "NHM-C3",
.desc = "MWAIT 0x10",
.driver_data = (void *) 0x10,
- .flags = CPUIDLE_FLAG_TIME_VALID,
+ .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 20,
.power_usage = 500,
.target_residency = 80,
.name = "NHM-C6",
.desc = "MWAIT 0x20",
.driver_data = (void *) 0x20,
- .flags = CPUIDLE_FLAG_TIME_VALID,
+ .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 200,
.power_usage = 350,
.target_residency = 800,
.name = "ATM-C4",
.desc = "MWAIT 0x30",
.driver_data = (void *) 0x30,
- .flags = CPUIDLE_FLAG_TIME_VALID,
+ .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
.exit_latency = 100,
.power_usage = 250,
.target_residency = 400,
{ /* MWAIT C5 */ },
{ /* MWAIT C6 */
.name = "ATM-C6",
- .desc = "MWAIT 0x40",
- .driver_data = (void *) 0x40,
- .flags = CPUIDLE_FLAG_TIME_VALID,
- .exit_latency = 200,
+ .desc = "MWAIT 0x52",
+ .driver_data = (void *) 0x52,
+ .flags = CPUIDLE_FLAG_TIME_VALID | CPUIDLE_FLAG_TLB_FLUSHED,
+ .exit_latency = 140,
.power_usage = 150,
- .target_residency = 800,
- .enter = NULL }, /* disabled */
+ .target_residency = 560,
+ .enter = &intel_idle },
};
/**
local_irq_disable();
+ /*
+ * If the state flag indicates that the TLB will be flushed or if this
+ * is the deepest c-state supported, do a voluntary leave mm to avoid
+ * costly and mostly unnecessary wakeups for flushing the user TLB's
+ * associated with the active mm.
+ */
+ if (state->flags & CPUIDLE_FLAG_TLB_FLUSHED ||
+ (&dev->states[dev->state_count - 1] == state))
+ leave_mm(cpu);
+
if (!(lapic_timer_reliable_states & (1 << (cstate))))
clockevents_notify(CLOCK_EVT_NOTIFY_BROADCAST_ENTER, &cpu);
#define T3_MAX_PBL_SIZE 256
#define T3_MAX_RQ_SIZE 1024
#define T3_MAX_QP_DEPTH (T3_MAX_RQ_SIZE-1)
-#define T3_MAX_CQ_DEPTH 262144
+#define T3_MAX_CQ_DEPTH 65536
#define T3_MAX_NUM_STAG (1<<15)
#define T3_MAX_MR_SIZE 0x100000000ULL
#define T3_PAGESIZE_MASK 0xffff000 /* 4KB-128MB */
V_MSS_IDX(mtu_idx) |
V_L2T_IDX(ep->l2t->idx) | V_TX_CHANNEL(ep->l2t->smt_idx);
opt0l = V_TOS((ep->tos >> 2) & M_TOS) | V_RCV_BUFSIZ(rcv_win>>10);
- opt2 = V_FLAVORS_VALID(1) | V_CONG_CONTROL_FLAVOR(cong_flavor);
+ opt2 = F_RX_COALESCE_VALID | V_RX_COALESCE(0) | V_FLAVORS_VALID(1) |
+ V_CONG_CONTROL_FLAVOR(cong_flavor);
skb->priority = CPL_PRIORITY_SETUP;
set_arp_failure_handler(skb, act_open_req_arp_failure);
V_MSS_IDX(mtu_idx) |
V_L2T_IDX(ep->l2t->idx) | V_TX_CHANNEL(ep->l2t->smt_idx);
opt0l = V_TOS((ep->tos >> 2) & M_TOS) | V_RCV_BUFSIZ(rcv_win>>10);
- opt2 = V_FLAVORS_VALID(1) | V_CONG_CONTROL_FLAVOR(cong_flavor);
+ opt2 = F_RX_COALESCE_VALID | V_RX_COALESCE(0) | V_FLAVORS_VALID(1) |
+ V_CONG_CONTROL_FLAVOR(cong_flavor);
rpl = cplhdr(skb);
rpl->wr.wr_hi = htonl(V_WR_OP(FW_WROPCODE_FORWARD));
static void nes_retrans_expired(struct nes_cm_node *cm_node)
{
struct iw_cm_id *cm_id = cm_node->cm_id;
- switch (cm_node->state) {
+ enum nes_cm_node_state state = cm_node->state;
+ cm_node->state = NES_CM_STATE_CLOSED;
+ switch (state) {
case NES_CM_STATE_SYN_RCVD:
case NES_CM_STATE_CLOSING:
rem_ref_cm_node(cm_node->cm_core, cm_node);
case NES_CM_STATE_FIN_WAIT1:
if (cm_node->cm_id)
cm_id->rem_ref(cm_id);
- cm_node->state = NES_CM_STATE_CLOSED;
send_reset(cm_node, NULL);
break;
default:
break;
case NES_CM_STATE_MPAREQ_RCVD:
passive_state = atomic_add_return(1, &cm_node->passive_state);
- if (passive_state == NES_SEND_RESET_EVENT)
- create_event(cm_node, NES_CM_EVENT_RESET);
- cm_node->state = NES_CM_STATE_CLOSED;
dev_kfree_skb_any(skb);
break;
case NES_CM_STATE_ESTABLISHED:
case NES_CM_STATE_CLOSED:
drop_packet(skb);
break;
+ case NES_CM_STATE_FIN_WAIT2:
case NES_CM_STATE_FIN_WAIT1:
case NES_CM_STATE_LAST_ACK:
cm_node->cm_id->rem_ref(cm_node->cm_id);
return -EINVAL;
}
+ passive_state = atomic_add_return(1, &cm_node->passive_state);
+ if (passive_state == NES_SEND_RESET_EVENT) {
+ rem_ref_cm_node(cm_node->cm_core, cm_node);
+ return -ECONNRESET;
+ }
+
/* associate the node with the QP */
nesqp->cm_node = (void *)cm_node;
cm_node->nesqp = nesqp;
printk(KERN_ERR "%s[%u] OFA CM event_handler returned, "
"ret=%d\n", __func__, __LINE__, ret);
- passive_state = atomic_add_return(1, &cm_node->passive_state);
- if (passive_state == NES_SEND_RESET_EVENT)
- create_event(cm_node, NES_CM_EVENT_RESET);
return 0;
}
return; /* Ignore it, wait for close complete */
if (atomic_inc_return(&nesqp->close_timer_started) == 1) {
+ if ((tcp_state == NES_AEQE_TCP_STATE_CLOSE_WAIT) &&
+ (nesqp->ibqp_state == IB_QPS_RTS) &&
+ ((nesadapter->eeprom_version >> 16) != NES_A0)) {
+ spin_lock_irqsave(&nesqp->lock, flags);
+ nesqp->hw_iwarp_state = iwarp_state;
+ nesqp->hw_tcp_state = tcp_state;
+ nesqp->last_aeq = async_event_id;
+ next_iwarp_state = NES_CQP_QP_IWARP_STATE_CLOSING;
+ nesqp->hw_iwarp_state = NES_AEQE_IWARP_STATE_CLOSING;
+ spin_unlock_irqrestore(&nesqp->lock, flags);
+ nes_hw_modify_qp(nesdev, nesqp, next_iwarp_state, 0, 0);
+ nes_cm_disconn(nesqp);
+ }
nesqp->cm_id->add_ref(nesqp->cm_id);
schedule_nes_timer(nesqp->cm_node, (struct sk_buff *)nesqp,
NES_TIMER_TYPE_CLOSE, 1, 0);
nesqp->hwqp.qp_id, atomic_read(&nesqp->refcount),
async_event_id, nesqp->last_aeq, tcp_state);
}
-
break;
case NES_AEQE_AEID_LLP_CLOSE_COMPLETE:
if (nesqp->term_flags) {
#define NES_PHY_TYPE_KR 9
#define NES_MULTICAST_PF_MAX 8
+#define NES_A0 3
enum pci_regs {
NES_INT_STAT = 0x0000,
NES_IDX_MAC_TX_CONFIG + (nesdev->mac_index*0x200));
u32temp |= NES_IDX_MAC_TX_CONFIG_ENABLE_PAUSE;
nes_write_indexed(nesdev,
- NES_IDX_MAC_TX_CONFIG_ENABLE_PAUSE + (nesdev->mac_index*0x200), u32temp);
+ NES_IDX_MAC_TX_CONFIG + (nesdev->mac_index*0x200), u32temp);
nesdev->disable_tx_flow_control = 0;
} else if ((et_pauseparam->tx_pause == 0) && (nesdev->disable_tx_flow_control == 0)) {
u32temp = nes_read_indexed(nesdev,
NES_IDX_MAC_TX_CONFIG + (nesdev->mac_index*0x200));
u32temp &= ~NES_IDX_MAC_TX_CONFIG_ENABLE_PAUSE;
nes_write_indexed(nesdev,
- NES_IDX_MAC_TX_CONFIG_ENABLE_PAUSE + (nesdev->mac_index*0x200), u32temp);
+ NES_IDX_MAC_TX_CONFIG + (nesdev->mac_index*0x200), u32temp);
nesdev->disable_tx_flow_control = 1;
}
if ((et_pauseparam->rx_pause == 1) && (nesdev->disable_rx_flow_control == 1)) {
int minor;
struct input_handle handle;
wait_queue_head_t wait;
- struct evdev_client *grab;
+ struct evdev_client __rcu *grab;
struct list_head client_list;
spinlock_t client_lock; /* protects client_list */
struct mutex mutex;
if ((_IOC_NR(cmd) & ~ABS_MAX) == _IOC_NR(EVIOCGABS(0))) {
+ if (!dev->absinfo)
+ return -EINVAL;
+
t = _IOC_NR(cmd) & ABS_MAX;
abs = dev->absinfo[t];
}
}
- if (_IOC_DIR(cmd) == _IOC_READ) {
+ if (_IOC_DIR(cmd) == _IOC_WRITE) {
if ((_IOC_NR(cmd) & ~ABS_MAX) == _IOC_NR(EVIOCSABS(0))) {
+ if (!dev->absinfo)
+ return -EINVAL;
+
t = _IOC_NR(cmd) & ABS_MAX;
if (copy_from_user(&abs, p, min_t(size_t,
* @dev: input device supporting MT events and finger tracking
* @num_slots: number of slots used by the device
*
- * This function allocates all necessary memory for MT slot handling
- * in the input device, and adds ABS_MT_SLOT to the device capabilities.
+ * This function allocates all necessary memory for MT slot handling in the
+ * input device, and adds ABS_MT_SLOT to the device capabilities. All slots
+ * are initially marked as unused iby setting ABS_MT_TRACKING_ID to -1.
*/
int input_mt_create_slots(struct input_dev *dev, unsigned int num_slots)
{
+ int i;
+
if (!num_slots)
return 0;
dev->mtsize = num_slots;
input_set_abs_params(dev, ABS_MT_SLOT, 0, num_slots - 1, 0, 0);
+ /* Mark slots as 'unused' */
+ for (i = 0; i < num_slots; i++)
+ dev->mt[i].abs[ABS_MT_TRACKING_ID - ABS_MT_FIRST] = -1;
+
return 0;
}
EXPORT_SYMBOL(input_mt_create_slots);
memcpy(joydev->abspam, abspam, len);
+ for (i = 0; i < joydev->nabs; i++)
+ joydev->absmap[joydev->abspam[i]] = i;
+
out:
kfree(abspam);
return retval;
t.endidx = 91;
t.seq = tseq;
t.act.semaphore = &tsem;
- init_MUTEX_LOCKED(&tsem);
+ sema_init(&tsem, 0);
if (hp_sdc_enqueue_transaction(&t)) return -1;
return -ENODEV;
#endif
- init_MUTEX(&i8042tregs);
+ sema_init(&i8042tregs, 1);
if ((ret = hp_sdc_request_timer_irq(&hp_sdc_rtc_isr)))
return ret;
retval = uinput_validate_absbits(dev);
if (retval < 0)
goto exit;
+ if (test_bit(ABS_MT_SLOT, dev->absbit)) {
+ int nslot = input_abs_get_max(dev, ABS_MT_SLOT) + 1;
+ input_mt_create_slots(dev, nslot);
+ input_set_events_per_packet(dev, 6 * nslot);
+ } else if (test_bit(ABS_MT_POSITION_X, dev->absbit)) {
+ input_set_events_per_packet(dev, 60);
+ }
}
udev->state = UIST_SETUP_COMPLETE;
const struct bcm5974_config *cfg,
const struct tp_finger *f)
{
- input_report_abs(input, ABS_MT_TOUCH_MAJOR, raw2int(f->force_major));
- input_report_abs(input, ABS_MT_TOUCH_MINOR, raw2int(f->force_minor));
- input_report_abs(input, ABS_MT_WIDTH_MAJOR, raw2int(f->size_major));
- input_report_abs(input, ABS_MT_WIDTH_MINOR, raw2int(f->size_minor));
+ input_report_abs(input, ABS_MT_TOUCH_MAJOR,
+ raw2int(f->force_major) << 1);
+ input_report_abs(input, ABS_MT_TOUCH_MINOR,
+ raw2int(f->force_minor) << 1);
+ input_report_abs(input, ABS_MT_WIDTH_MAJOR,
+ raw2int(f->size_major) << 1);
+ input_report_abs(input, ABS_MT_WIDTH_MINOR,
+ raw2int(f->size_minor) << 1);
input_report_abs(input, ABS_MT_ORIENTATION,
MAX_FINGER_ORIENTATION - raw2int(f->orientation));
input_report_abs(input, ABS_MT_POSITION_X, raw2int(f->abs_x));
mlc->ostarted = 0;
rwlock_init(&mlc->lock);
- init_MUTEX(&mlc->osem);
+ sema_init(&mlc->osem, 1);
- init_MUTEX(&mlc->isem);
+ sema_init(&mlc->isem, 1);
mlc->icount = -1;
mlc->imatch = 0;
mlc->opercnt = 0;
- init_MUTEX_LOCKED(&(mlc->csem));
+ sema_init(&(mlc->csem), 0);
hil_mlc_clear_di_scratch(mlc);
hil_mlc_clear_di_map(mlc, 0);
ts_sync[1] = 0x0f;
ts_sync[2] = ts_sync[3] = ts_sync[4] = ts_sync[5] = 0;
t_sync.act.semaphore = &s_sync;
- init_MUTEX_LOCKED(&s_sync);
+ sema_init(&s_sync, 0);
hp_sdc_enqueue_transaction(&t_sync);
down(&s_sync); /* Wait for t_sync to complete */
return hp_sdc.dev_err;
}
- init_MUTEX_LOCKED(&tq_init_sem);
+ sema_init(&tq_init_sem, 0);
tq_init.actidx = 0;
tq_init.idx = 1;
static void __exit i8042_exit(void)
{
- platform_driver_unregister(&i8042_driver);
platform_device_unregister(i8042_platform_device);
+ platform_driver_unregister(&i8042_driver);
i8042_platform_exit();
panic_blink = NULL;
static int wacom_open(struct input_dev *dev)
{
struct wacom *wacom = input_get_drvdata(dev);
+ int retval = 0;
- mutex_lock(&wacom->lock);
-
- wacom->irq->dev = wacom->usbdev;
-
- if (usb_autopm_get_interface(wacom->intf) < 0) {
- mutex_unlock(&wacom->lock);
+ if (usb_autopm_get_interface(wacom->intf) < 0)
return -EIO;
- }
+
+ mutex_lock(&wacom->lock);
if (usb_submit_urb(wacom->irq, GFP_KERNEL)) {
- usb_autopm_put_interface(wacom->intf);
- mutex_unlock(&wacom->lock);
- return -EIO;
+ retval = -EIO;
+ goto out;
}
wacom->open = true;
wacom->intf->needs_remote_wakeup = 1;
+out:
mutex_unlock(&wacom->lock);
- return 0;
+ if (retval)
+ usb_autopm_put_interface(wacom->intf);
+ return retval;
}
static void wacom_close(struct input_dev *dev)
wacom->open = false;
wacom->intf->needs_remote_wakeup = 0;
mutex_unlock(&wacom->lock);
+
+ usb_autopm_put_interface(wacom->intf);
}
static int wacom_parse_hid(struct usb_interface *intf, struct hid_descriptor *hid_desc,
if (features->type == WACOM_G4 ||
features->type == WACOM_MO) {
input_report_abs(input, ABS_DISTANCE, data[6] & 0x3f);
- rw = (signed)(data[7] & 0x04) - (data[7] & 0x03);
+ rw = (data[7] & 0x04) - (data[7] & 0x03);
} else {
input_report_abs(input, ABS_DISTANCE, data[7] & 0x3f);
- rw = -(signed)data[6];
+ rw = -(signed char)data[6];
}
input_report_rel(input, REL_WHEEL, rw);
}
/* general pen packet */
if ((data[1] & 0xb8) == 0xa0) {
t = (data[6] << 2) | ((data[7] >> 6) & 3);
- if (features->type >= INTUOS4S && features->type <= INTUOS4L)
+ if ((features->type >= INTUOS4S && features->type <= INTUOS4L) ||
+ features->type == WACOM_21UX2) {
t = (t << 1) | (data[1] & 1);
+ }
input_report_abs(input, ABS_PRESSURE, t);
input_report_abs(input, ABS_TILT_X,
((data[7] << 1) & 0x7e) | (data[8] >> 7));
}
else if(callid>=0x0000 && callid<=0x7FFF)
{
+ int len;
+
pr_debug("%s: Got Incoming Call\n",
sc_adapter[card]->devicename);
- strcpy(setup.phone,&(rcvmsg.msg_data.byte_array[4]));
- strcpy(setup.eazmsn,
- sc_adapter[card]->channel[rcvmsg.phy_link_no-1].dn);
+ len = strlcpy(setup.phone, &(rcvmsg.msg_data.byte_array[4]),
+ sizeof(setup.phone));
+ if (len >= sizeof(setup.phone))
+ continue;
+ len = strlcpy(setup.eazmsn,
+ sc_adapter[card]->channel[rcvmsg.phy_link_no - 1].dn,
+ sizeof(setup.eazmsn));
+ if (len >= sizeof(setup.eazmsn))
+ continue;
setup.si1 = 7;
setup.si2 = 0;
setup.plan = 0;
* Handle a GetMyNumber Rsp
*/
if (IS_CE_MESSAGE(rcvmsg,Call,0,GetMyNumber)){
- strcpy(sc_adapter[card]->channel[rcvmsg.phy_link_no-1].dn,rcvmsg.msg_data.byte_array);
+ strlcpy(sc_adapter[card]->channel[rcvmsg.phy_link_no - 1].dn,
+ rcvmsg.msg_data.byte_array,
+ sizeof(rcvmsg.msg_data.byte_array));
continue;
}
int cmd_level;
int slow_level;
- read_lock(&led_dat->rw_lock);
+ read_lock_irq(&led_dat->rw_lock);
cmd_level = gpio_get_value(led_dat->cmd);
slow_level = gpio_get_value(led_dat->slow);
}
}
- read_unlock(&led_dat->rw_lock);
+ read_unlock_irq(&led_dat->rw_lock);
return ret;
}
enum ns2_led_modes mode)
{
int i;
+ unsigned long flags;
- write_lock(&led_dat->rw_lock);
+ write_lock_irqsave(&led_dat->rw_lock, flags);
for (i = 0; i < ARRAY_SIZE(ns2_led_modval); i++) {
if (mode == ns2_led_modval[i].mode) {
}
}
- write_unlock(&led_dat->rw_lock);
+ write_unlock_irqrestore(&led_dat->rw_lock, flags);
}
static void ns2_led_set(struct led_classdev *led_cdev,
BLOCKING_NOTIFIER_HEAD(adb_client_list);
static int adb_got_sleep;
static int adb_inited;
-static DECLARE_MUTEX(adb_probe_mutex);
+static DEFINE_SEMAPHORE(adb_probe_mutex);
static int sleepy_trackpad;
static int autopoll_devs;
int __adb_probe_sync;
page = bitmap->sb_page;
offset = sizeof(bitmap_super_t);
if (!file)
- read_sb_page(bitmap->mddev,
- bitmap->mddev->bitmap_info.offset,
- page,
- index, count);
+ page = read_sb_page(
+ bitmap->mddev,
+ bitmap->mddev->bitmap_info.offset,
+ page,
+ index, count);
} else if (file) {
page = read_page(file, index, bitmap, count);
offset = 0;
bmask = queue_logical_block_size(rdev->bdev->bd_disk->queue)-1;
if (rdev->sb_size & bmask)
rdev->sb_size = (rdev->sb_size | bmask) + 1;
- }
+ } else
+ max_dev = le32_to_cpu(sb->max_dev);
+
for (i=0; i<max_dev;i++)
sb->dev_roles[i] = cpu_to_le16(0xfffe);
if (mddev->ro && !test_bit(MD_RECOVERY_NEEDED, &mddev->recovery))
return;
if ( ! (
- (mddev->flags && !mddev->external) ||
+ (mddev->flags & ~ (1<<MD_CHANGE_PENDING)) ||
test_bit(MD_RECOVERY_NEEDED, &mddev->recovery) ||
test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
(mddev->external == 0 && mddev->safemode == 1) ||
/* take from bio_init */
bio->bi_next = NULL;
+ bio->bi_flags &= ~(BIO_POOL_MASK-1);
bio->bi_flags |= 1 << BIO_UPTODATE;
+ bio->bi_comp_cpu = -1;
bio->bi_rw = READ;
bio->bi_vcnt = 0;
bio->bi_idx = 0;
!test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery))
break;
BUG_ON(sync_blocks < (PAGE_SIZE>>9));
- if (len > (sync_blocks<<9))
+ if ((len >> 9) > sync_blocks)
len = sync_blocks<<9;
}
* a keyup event might follow immediately after the keydown.
*/
spin_lock_irqsave(&ir->keylock, flags);
- if (time_is_after_eq_jiffies(ir->keyup_jiffies))
+ if (time_is_before_eq_jiffies(ir->keyup_jiffies))
ir_keyup(ir);
spin_unlock_irqrestore(&ir->keylock, flags);
}
(ir_dev->props && ir_dev->props->driver_type == RC_DRIVER_IR_RAW) ?
" in raw mode" : "");
+ /*
+ * Default delay of 250ms is too short for some protocols, expecially
+ * since the timeout is currently set to 250ms. Increase it to 500ms,
+ * to avoid wrong repetition of the keycodes.
+ */
+ input_dev->rep[REP_DELAY] = 500;
+
return 0;
out_event:
features |= LIRC_CAN_SET_SEND_CARRIER;
if (ir_dev->props->s_tx_duty_cycle)
- features |= LIRC_CAN_SET_REC_DUTY_CYCLE;
+ features |= LIRC_CAN_SET_SEND_DUTY_CYCLE;
}
if (ir_dev->props->s_rx_carrier_range)
"rc%u", (unsigned int)ir->devno);
if (IS_ERR(ir->raw->thread)) {
+ int ret = PTR_ERR(ir->raw->thread);
+
kfree(ir->raw);
ir->raw = NULL;
- return PTR_ERR(ir->raw->thread);
+ return ret;
}
mutex_lock(&ir_raw_handler_lock);
char *tmp = buf;
int i;
- if (ir_dev->props->driver_type == RC_DRIVER_SCANCODE) {
+ if (ir_dev->props && ir_dev->props->driver_type == RC_DRIVER_SCANCODE) {
enabled = ir_dev->rc_tab.ir_type;
allowed = ir_dev->props->allowed_protos;
- } else {
+ } else if (ir_dev->raw) {
enabled = ir_dev->raw->enabled_protocols;
allowed = ir_raw_get_allowed_protocols();
- }
+ } else
+ return sprintf(tmp, "[builtin]\n");
IR_dprintk(1, "allowed - 0x%llx, enabled - 0x%llx\n",
(long long)allowed,
int rc, i, count = 0;
unsigned long flags;
- if (ir_dev->props->driver_type == RC_DRIVER_SCANCODE)
+ if (ir_dev->props && ir_dev->props->driver_type == RC_DRIVER_SCANCODE)
type = ir_dev->rc_tab.ir_type;
- else
+ else if (ir_dev->raw)
type = ir_dev->raw->enabled_protocols;
+ else {
+ IR_dprintk(1, "Protocol switching not supported\n");
+ return -EINVAL;
+ }
while ((tmp = strsep((char **) &data, " \n")) != NULL) {
if (!*tmp)
}
}
- if (ir_dev->props->driver_type == RC_DRIVER_SCANCODE) {
+ if (ir_dev->props && ir_dev->props->driver_type == RC_DRIVER_SCANCODE) {
spin_lock_irqsave(&ir_dev->rc_tab.lock, flags);
ir_dev->rc_tab.ir_type = type;
spin_unlock_irqrestore(&ir_dev->rc_tab.lock, flags);
{ 0x800f0416, KEY_PLAY },
{ 0x800f0418, KEY_PAUSE },
+ { 0x800f046e, KEY_PLAYPAUSE },
{ 0x800f0419, KEY_STOP },
{ 0x800f0417, KEY_RECORD },
{ 0x800f0411, KEY_VOLUMEDOWN },
{ 0x800f0412, KEY_CHANNELUP },
{ 0x800f0413, KEY_CHANNELDOWN },
+ { 0x800f043a, KEY_BRIGHTNESSUP },
+ { 0x800f0480, KEY_BRIGHTNESSDOWN },
{ 0x800f0401, KEY_NUMERIC_1 },
{ 0x800f0402, KEY_NUMERIC_2 },
{ USB_DEVICE(VENDOR_PHILIPS, 0x0613) },
/* Philips eHome Infrared Transceiver */
{ USB_DEVICE(VENDOR_PHILIPS, 0x0815) },
+ /* Philips/Spinel plus IR transceiver for ASUS */
+ { USB_DEVICE(VENDOR_PHILIPS, 0x206c) },
+ /* Philips/Spinel plus IR transceiver for ASUS */
+ { USB_DEVICE(VENDOR_PHILIPS, 0x2088) },
/* Realtek MCE IR Receiver */
{ USB_DEVICE(VENDOR_REALTEK, 0x0161) },
/* SMK/Toshiba G83C0004D410 */
else
dev->props.rc.core.bulk_mode = false;
- /* Need a higher delay, to avoid wrong repeat */
- dev->rc_input_dev->rep[REP_DELAY] = 500;
-
dib0700_rc_setup(dev);
return 0;
return adap->fe == NULL ? -ENODEV : 0;
}
+/* STK7770P */
+static struct dib7000p_config dib7770p_dib7000p_config = {
+ .output_mpeg2_in_188_bytes = 1,
+
+ .agc_config_count = 1,
+ .agc = &dib7070_agc_config,
+ .bw = &dib7070_bw_config_12_mhz,
+ .tuner_is_baseband = 1,
+ .spur_protect = 1,
+
+ .gpio_dir = DIB7000P_GPIO_DEFAULT_DIRECTIONS,
+ .gpio_val = DIB7000P_GPIO_DEFAULT_VALUES,
+ .gpio_pwm_pos = DIB7000P_GPIO_DEFAULT_PWM_POS,
+
+ .hostbus_diversity = 1,
+ .enable_current_mirror = 1,
+ .disable_sample_and_hold = 0,
+};
+
+static int stk7770p_frontend_attach(struct dvb_usb_adapter *adap)
+{
+ struct usb_device_descriptor *p = &adap->dev->udev->descriptor;
+ if (p->idVendor == cpu_to_le16(USB_VID_PINNACLE) &&
+ p->idProduct == cpu_to_le16(USB_PID_PINNACLE_PCTV72E))
+ dib0700_set_gpio(adap->dev, GPIO6, GPIO_OUT, 0);
+ else
+ dib0700_set_gpio(adap->dev, GPIO6, GPIO_OUT, 1);
+ msleep(10);
+ dib0700_set_gpio(adap->dev, GPIO9, GPIO_OUT, 1);
+ dib0700_set_gpio(adap->dev, GPIO4, GPIO_OUT, 1);
+ dib0700_set_gpio(adap->dev, GPIO7, GPIO_OUT, 1);
+ dib0700_set_gpio(adap->dev, GPIO10, GPIO_OUT, 0);
+
+ dib0700_ctrl_clock(adap->dev, 72, 1);
+
+ msleep(10);
+ dib0700_set_gpio(adap->dev, GPIO10, GPIO_OUT, 1);
+ msleep(10);
+ dib0700_set_gpio(adap->dev, GPIO0, GPIO_OUT, 1);
+
+ if (dib7000p_i2c_enumeration(&adap->dev->i2c_adap, 1, 18,
+ &dib7770p_dib7000p_config) != 0) {
+ err("%s: dib7000p_i2c_enumeration failed. Cannot continue\n",
+ __func__);
+ return -ENODEV;
+ }
+
+ adap->fe = dvb_attach(dib7000p_attach, &adap->dev->i2c_adap, 0x80,
+ &dib7770p_dib7000p_config);
+ return adap->fe == NULL ? -ENODEV : 0;
+}
+
/* DIB807x generic */
static struct dibx000_agc_config dib807x_agc_config[2] = {
{
/* 60 */{ USB_DEVICE(USB_VID_TERRATEC, USB_PID_TERRATEC_CINERGY_T_XXS_2) },
{ USB_DEVICE(USB_VID_DIBCOM, USB_PID_DIBCOM_STK807XPVR) },
{ USB_DEVICE(USB_VID_DIBCOM, USB_PID_DIBCOM_STK807XP) },
- { USB_DEVICE(USB_VID_PIXELVIEW, USB_PID_PIXELVIEW_SBTVD) },
+ { USB_DEVICE_VER(USB_VID_PIXELVIEW, USB_PID_PIXELVIEW_SBTVD, 0x000, 0x3f00) },
{ USB_DEVICE(USB_VID_EVOLUTEPC, USB_PID_TVWAY_PLUS) },
/* 65 */{ USB_DEVICE(USB_VID_PINNACLE, USB_PID_PINNACLE_PCTV73ESE) },
{ USB_DEVICE(USB_VID_PINNACLE, USB_PID_PINNACLE_PCTV282E) },
.pid_filter_count = 32,
.pid_filter = stk70x0p_pid_filter,
.pid_filter_ctrl = stk70x0p_pid_filter_ctrl,
- .frontend_attach = stk7070p_frontend_attach,
+ .frontend_attach = stk7770p_frontend_attach,
.tuner_attach = dib7770p_tuner_attach,
DIB0700_DEFAULT_STREAMING_CONFIG(0x02),
}
}
kfree(p);
- if (fw) {
- release_firmware(fw);
- }
+ release_firmware(fw);
return ret;
}
// dprintk( "908: %x, 909: %x\n", reg_908, reg_909);
+ reg_909 |= (state->cfg.disable_sample_and_hold & 1) << 4;
+ reg_908 |= (state->cfg.enable_current_mirror & 1) << 7;
+
dib7000p_write_word(state, 908, reg_908);
dib7000p_write_word(state, 909, reg_909);
}
default:
case GUARD_INTERVAL_1_32: value *= 1; break;
}
- state->div_sync_wait = (value * 3) / 2 + 32; // add 50% SFN margin + compensate for one DVSY-fifo TODO
+ if (state->cfg.diversity_delay == 0)
+ state->div_sync_wait = (value * 3) / 2 + 48; // add 50% SFN margin + compensate for one DVSY-fifo
+ else
+ state->div_sync_wait = (value * 3) / 2 + state->cfg.diversity_delay; // add 50% SFN margin + compensate for one DVSY-fifo
/* deactive the possibility of diversity reception if extended interleaver */
state->div_force_off = !1 && ch->u.ofdm.transmission_mode != TRANSMISSION_MODE_8K;
int (*agc_control) (struct dvb_frontend *, u8 before);
u8 output_mode;
+ u8 disable_sample_and_hold : 1;
+
+ u8 enable_current_mirror : 1;
+ u8 diversity_delay;
+
};
#define DEFAULT_DIB7000P_I2C_ADDRESS 18
*
* @return pointer to descriptor on success, NULL on error.
*/
-struct smscore_buffer_t *smscore_getbuffer(struct smscore_device_t *coredev)
+
+struct smscore_buffer_t *get_entry(struct smscore_device_t *coredev)
{
struct smscore_buffer_t *cb = NULL;
unsigned long flags;
- DEFINE_WAIT(wait);
-
spin_lock_irqsave(&coredev->bufferslock, flags);
-
- /* This function must return a valid buffer, since the buffer list is
- * finite, we check that there is an available buffer, if not, we wait
- * until such buffer become available.
- */
-
- prepare_to_wait(&coredev->buffer_mng_waitq, &wait, TASK_INTERRUPTIBLE);
- if (list_empty(&coredev->buffers)) {
- spin_unlock_irqrestore(&coredev->bufferslock, flags);
- schedule();
- spin_lock_irqsave(&coredev->bufferslock, flags);
+ if (!list_empty(&coredev->buffers)) {
+ cb = (struct smscore_buffer_t *) coredev->buffers.next;
+ list_del(&cb->entry);
}
+ spin_unlock_irqrestore(&coredev->bufferslock, flags);
+ return cb;
+}
- finish_wait(&coredev->buffer_mng_waitq, &wait);
-
- cb = (struct smscore_buffer_t *) coredev->buffers.next;
- list_del(&cb->entry);
+struct smscore_buffer_t *smscore_getbuffer(struct smscore_device_t *coredev)
+{
+ struct smscore_buffer_t *cb = NULL;
- spin_unlock_irqrestore(&coredev->bufferslock, flags);
+ wait_event(coredev->buffer_mng_waitq, (cb = get_entry(coredev)));
return cb;
}
radio->registers[POWERCFG] = POWERCFG_ENABLE;
if (si470x_set_register(radio, POWERCFG) < 0) {
retval = -EIO;
- goto err_all;
+ goto err_video;
}
msleep(110);
EXTRA_CFLAGS += -Idrivers/media/common/tuners
EXTRA_CFLAGS += -Idrivers/media/dvb/dvb-core
EXTRA_CFLAGS += -Idrivers/media/dvb/frontends
+EXTRA_CFLAGS += -Idrivers/media/dvb/dvb-usb
#include <media/v4l2-chip-ident.h>
#include <media/cx25840.h>
+#include "dvb-usb-ids.h"
#include "xc5000.h"
#include "cx231xx.h"
.driver_info = CX231XX_BOARD_CNXT_RDE_250},
{USB_DEVICE(0x0572, 0x58A1),
.driver_info = CX231XX_BOARD_CNXT_RDU_250},
+ {USB_DEVICE_VER(USB_VID_PIXELVIEW, USB_PID_PIXELVIEW_SBTVD, 0x4000,0x4fff),
+ .driver_info = CX231XX_BOARD_UNKNOWN},
{},
};
dev->board.name, dev->model);
/* set the direction for GPIO pins */
- cx231xx_set_gpio_direction(dev, dev->board.tuner_gpio->bit, 1);
- cx231xx_set_gpio_value(dev, dev->board.tuner_gpio->bit, 1);
- cx231xx_set_gpio_direction(dev, dev->board.tuner_sif_gpio, 1);
+ if (dev->board.tuner_gpio) {
+ cx231xx_set_gpio_direction(dev, dev->board.tuner_gpio->bit, 1);
+ cx231xx_set_gpio_value(dev, dev->board.tuner_gpio->bit, 1);
+ cx231xx_set_gpio_direction(dev, dev->board.tuner_sif_gpio, 1);
- /* request some modules if any required */
+ /* request some modules if any required */
- /* reset the Tuner */
- cx231xx_gpio_set(dev, dev->board.tuner_gpio);
+ /* reset the Tuner */
+ cx231xx_gpio_set(dev, dev->board.tuner_gpio);
+ }
/* set the mode to Analog mode initially */
cx231xx_set_mode(dev, CX231XX_ANALOG_MODE);
state->volume = v4l2_ctrl_new_std(&state->hdl,
&cx25840_audio_ctrl_ops, V4L2_CID_AUDIO_VOLUME,
- 0, 65335, 65535 / 100, default_volume);
+ 0, 65535, 65535 / 100, default_volume);
state->mute = v4l2_ctrl_new_std(&state->hdl,
&cx25840_audio_ctrl_ops, V4L2_CID_AUDIO_MUTE,
0, 1, 1, 0);
config VIDEO_CX88_ALSA
tristate "Conexant 2388x DMA audio support"
- depends on VIDEO_CX88 && SND && EXPERIMENTAL
+ depends on VIDEO_CX88 && SND
select SND_PCM
---help---
This is a video4linux driver for direct (DMA) audio on
usb_rcvintpipe(dev, ep->bEndpointAddress),
buffer, buffer_len,
int_irq, (void *)gspca_dev, interval);
+ urb->transfer_flags |= URB_NO_TRANSFER_DMA_MAP;
gspca_dev->int_urb = urb;
ret = usb_submit_urb(urb, GFP_KERNEL);
if (ret < 0) {
(data[33] << 10);
avg_lum >>= 9;
atomic_set(&sd->avg_lum, avg_lum);
- gspca_frame_add(gspca_dev, LAST_PACKET,
- data, len);
+ gspca_frame_add(gspca_dev, LAST_PACKET, NULL, 0);
return;
}
if (gspca_dev->last_packet_type == LAST_PACKET) {
struct fb_vblank vblank;
u32 trace;
+ memset(&vblank, 0, sizeof(struct fb_vblank));
+
vblank.flags = FB_VBLANK_HAVE_COUNT |FB_VBLANK_HAVE_VCOUNT |
FB_VBLANK_HAVE_VSYNC;
trace = read_reg(IVTV_REG_DEC_LINE_FIELD) >> 16;
return -EFAULT;
}
- if (in_buf->vb.size < out_buf->vb.size) {
+ if (in_buf->vb.size > out_buf->vb.size) {
v4l2_err(&dev->v4l2_dev, "Output buffer is too small\n");
return -EINVAL;
}
v4l2_m2m_release(dev->m2m_dev);
del_timer_sync(&dev->timer);
video_unregister_device(dev->vfd);
+ video_device_release(dev->vfd);
v4l2_device_unregister(&dev->v4l2_dev);
kfree(dev);
dev_dbg(&client->dev, "%s left=%d, top=%d, width=%d, height=%d\n",
__func__, rect.left, rect.top, rect.width, rect.height);
+ if (a->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+ return -EINVAL;
+
ret = mt9m111_make_rect(client, &rect);
if (!ret)
mt9m111->rect = rect;
static int mt9m111_cropcap(struct v4l2_subdev *sd, struct v4l2_cropcap *a)
{
+ if (a->type != V4L2_BUF_TYPE_VIDEO_CAPTURE)
+ return -EINVAL;
+
a->bounds.left = MT9M111_MIN_DARK_COLS;
a->bounds.top = MT9M111_MIN_DARK_ROWS;
a->bounds.width = MT9M111_MAX_WIDTH;
a->bounds.height = MT9M111_MAX_HEIGHT;
a->defrect = a->bounds;
- a->type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
a->pixelaspect.numerator = 1;
a->pixelaspect.denominator = 1;
mf->width = mt9m111->rect.width;
mf->height = mt9m111->rect.height;
mf->code = mt9m111->fmt->code;
+ mf->colorspace = mt9m111->fmt->colorspace;
mf->field = V4L2_FIELD_NONE;
return 0;
if (mt9v022->model != V4L2_IDENT_MT9V022IX7ATC)
return -EINVAL;
break;
- case 0:
- /* No format change, only geometry */
- break;
default:
return -EINVAL;
}
spin_lock_irqsave(&pcdev->lock, flags);
+ if (*fb_active == NULL)
+ goto out;
+
vb = &(*fb_active)->vb;
dev_dbg(pcdev->dev, "%s (vb=0x%p) 0x%08lx %d\n", __func__,
vb, vb->baddr, vb->bsize);
*fb_active = buf;
+out:
spin_unlock_irqrestore(&pcdev->lock, flags);
}
if (ret >= 0) {
ret = pvr2_ctrl_range_check(cptr,*valptr);
}
- if (maskptr) *maskptr = ~0;
+ *maskptr = ~0;
} else if (cptr->info->type == pvr2_ctl_bool) {
ret = parse_token(ptr,len,valptr,boolNames,
ARRAY_SIZE(boolNames));
} else if (ret == 0) {
*valptr = (*valptr & 1) ? !0 : 0;
}
- if (maskptr) *maskptr = 1;
+ *maskptr = 1;
} else if (cptr->info->type == pvr2_ctl_enum) {
ret = parse_token(
ptr,len,valptr,
if (ret >= 0) {
ret = pvr2_ctrl_range_check(cptr,*valptr);
}
- if (maskptr) *maskptr = ~0;
+ *maskptr = ~0;
} else if (cptr->info->type == pvr2_ctl_bitmask) {
ret = parse_tlist(
ptr,len,maskptr,valptr,
dbg("ctx->out_order_1p= %d", ctx->out_order_1p);
}
+static void fimc_prepare_dma_offset(struct fimc_ctx *ctx, struct fimc_frame *f)
+{
+ struct samsung_fimc_variant *variant = ctx->fimc_dev->variant;
+
+ f->dma_offset.y_h = f->offs_h;
+ if (!variant->pix_hoff)
+ f->dma_offset.y_h *= (f->fmt->depth >> 3);
+
+ f->dma_offset.y_v = f->offs_v;
+
+ f->dma_offset.cb_h = f->offs_h;
+ f->dma_offset.cb_v = f->offs_v;
+
+ f->dma_offset.cr_h = f->offs_h;
+ f->dma_offset.cr_v = f->offs_v;
+
+ if (!variant->pix_hoff) {
+ if (f->fmt->planes_cnt == 3) {
+ f->dma_offset.cb_h >>= 1;
+ f->dma_offset.cr_h >>= 1;
+ }
+ if (f->fmt->color == S5P_FIMC_YCBCR420) {
+ f->dma_offset.cb_v >>= 1;
+ f->dma_offset.cr_v >>= 1;
+ }
+ }
+
+ dbg("in_offset: color= %d, y_h= %d, y_v= %d",
+ f->fmt->color, f->dma_offset.y_h, f->dma_offset.y_v);
+}
+
/**
* fimc_prepare_config - check dimensions, operation and color mode
* and pre-calculate offset and the scaling coefficients.
{
struct fimc_frame *s_frame, *d_frame;
struct fimc_vid_buffer *buf = NULL;
- struct samsung_fimc_variant *variant = ctx->fimc_dev->variant;
int ret = 0;
s_frame = &ctx->s_frame;
swap(d_frame->width, d_frame->height);
}
- /* Prepare the output offset ratios for scaler. */
- d_frame->dma_offset.y_h = d_frame->offs_h;
- if (!variant->pix_hoff)
- d_frame->dma_offset.y_h *= (d_frame->fmt->depth >> 3);
-
- d_frame->dma_offset.y_v = d_frame->offs_v;
-
- d_frame->dma_offset.cb_h = d_frame->offs_h;
- d_frame->dma_offset.cb_v = d_frame->offs_v;
-
- d_frame->dma_offset.cr_h = d_frame->offs_h;
- d_frame->dma_offset.cr_v = d_frame->offs_v;
+ /* Prepare the DMA offset ratios for scaler. */
+ fimc_prepare_dma_offset(ctx, &ctx->s_frame);
+ fimc_prepare_dma_offset(ctx, &ctx->d_frame);
- if (!variant->pix_hoff && d_frame->fmt->planes_cnt == 3) {
- d_frame->dma_offset.cb_h >>= 1;
- d_frame->dma_offset.cb_v >>= 1;
- d_frame->dma_offset.cr_h >>= 1;
- d_frame->dma_offset.cr_v >>= 1;
- }
-
- dbg("out offset: color= %d, y_h= %d, y_v= %d",
- d_frame->fmt->color,
- d_frame->dma_offset.y_h, d_frame->dma_offset.y_v);
-
- /* Prepare the input offset ratios for scaler. */
- s_frame->dma_offset.y_h = s_frame->offs_h;
- if (!variant->pix_hoff)
- s_frame->dma_offset.y_h *= (s_frame->fmt->depth >> 3);
- s_frame->dma_offset.y_v = s_frame->offs_v;
-
- s_frame->dma_offset.cb_h = s_frame->offs_h;
- s_frame->dma_offset.cb_v = s_frame->offs_v;
-
- s_frame->dma_offset.cr_h = s_frame->offs_h;
- s_frame->dma_offset.cr_v = s_frame->offs_v;
-
- if (!variant->pix_hoff && s_frame->fmt->planes_cnt == 3) {
- s_frame->dma_offset.cb_h >>= 1;
- s_frame->dma_offset.cb_v >>= 1;
- s_frame->dma_offset.cr_h >>= 1;
- s_frame->dma_offset.cr_v >>= 1;
- }
-
- dbg("in offset: color= %d, y_h= %d, y_v= %d",
- s_frame->fmt->color, s_frame->dma_offset.y_h,
- s_frame->dma_offset.y_v);
-
- fimc_set_yuv_order(ctx);
-
- /* Check against the scaler ratio. */
if (s_frame->height > (SCALER_MAX_VRATIO * d_frame->height) ||
s_frame->width > (SCALER_MAX_HRATIO * d_frame->width)) {
err("out of scaler range");
return -EINVAL;
}
+ fimc_set_yuv_order(ctx);
}
/* Input DMA mode is not allowed when the scaler is disabled. */
} else {
v4l2_err(&ctx->fimc_dev->m2m.v4l2_dev,
"Wrong buffer/video queue type (%d)\n", f->type);
- return -EINVAL;
+ ret = -EINVAL;
+ goto s_fmt_out;
}
pix = &f->fmt.pix;
}
fimc->work_queue = create_workqueue(dev_name(&fimc->pdev->dev));
- if (!fimc->work_queue)
+ if (!fimc->work_queue) {
+ ret = -ENOMEM;
goto err_irq;
+ }
ret = fimc_register_m2m_device(fimc);
if (ret)
};
static struct samsung_fimc_variant fimc01_variant_s5pv210 = {
+ .pix_hoff = 1,
.has_inp_rot = 1,
.has_out_rot = 1,
.min_inp_pixsize = 16,
};
static struct samsung_fimc_variant fimc2_variant_s5pv210 = {
+ .pix_hoff = 1,
.min_inp_pixsize = 16,
.min_out_pixsize = 32,
},
[SAA7134_BOARD_BEHOLD_COLUMBUS_TVFM] = {
/* Beholder Intl. Ltd. 2008 */
- /*Dmitry Belimov <d.belimov@gmail.com> */
- .name = "Beholder BeholdTV Columbus TVFM",
+ /* Dmitry Belimov <d.belimov@gmail.com> */
+ .name = "Beholder BeholdTV Columbus TV/FM",
.audio_clock = 0x00187de7,
.tuner_type = TUNER_ALPS_TSBE5_PAL,
- .radio_type = UNSET,
- .tuner_addr = ADDR_UNSET,
- .radio_addr = ADDR_UNSET,
+ .radio_type = TUNER_TEA5767,
+ .tuner_addr = 0xc2 >> 1,
+ .radio_addr = 0xc0 >> 1,
.tda9887_conf = TDA9887_PRESENT,
.gpiomask = 0x000A8004,
.inputs = {{
int saa7164_buffer_dealloc(struct saa7164_tsport *port,
struct saa7164_buffer *buf)
{
- struct saa7164_dev *dev = port->dev;
+ struct saa7164_dev *dev;
- if ((buf == 0) || (port == 0))
+ if (!buf || !port)
return SAA_ERR_BAD_PARAMETER;
+ dev = port->dev;
dprintk(DBGLVL_BUF, "%s() deallocating buffer @ 0x%p\n", __func__, buf);
max(frame->dwFrameInterval[0],
frame->dwDefaultFrameInterval));
+ if (dev->quirks & UVC_QUIRK_RESTRICT_FRAME_RATE) {
+ frame->bFrameIntervalType = 1;
+ frame->dwFrameInterval[0] =
+ frame->dwDefaultFrameInterval;
+ }
+
uvc_trace(UVC_TRACE_DESCR, "- %ux%u (%u.%u fps)\n",
frame->wWidth, frame->wHeight,
10000000/frame->dwDefaultFrameInterval,
.bInterfaceClass = USB_CLASS_VENDOR_SPEC,
.bInterfaceSubClass = 1,
.bInterfaceProtocol = 0 },
+ /* Chicony CNF7129 (Asus EEE 100HE) */
+ { .match_flags = USB_DEVICE_ID_MATCH_DEVICE
+ | USB_DEVICE_ID_MATCH_INT_INFO,
+ .idVendor = 0x04f2,
+ .idProduct = 0xb071,
+ .bInterfaceClass = USB_CLASS_VIDEO,
+ .bInterfaceSubClass = 1,
+ .bInterfaceProtocol = 0,
+ .driver_info = UVC_QUIRK_RESTRICT_FRAME_RATE },
/* Alcor Micro AU3820 (Future Boy PC USB Webcam) */
{ .match_flags = USB_DEVICE_ID_MATCH_DEVICE
| USB_DEVICE_ID_MATCH_INT_INFO,
.bInterfaceProtocol = 0,
.driver_info = UVC_QUIRK_PROBE_MINMAX
| UVC_QUIRK_PROBE_DEF },
+ /* IMC Networks (Medion Akoya) */
+ { .match_flags = USB_DEVICE_ID_MATCH_DEVICE
+ | USB_DEVICE_ID_MATCH_INT_INFO,
+ .idVendor = 0x13d3,
+ .idProduct = 0x5103,
+ .bInterfaceClass = USB_CLASS_VIDEO,
+ .bInterfaceSubClass = 1,
+ .bInterfaceProtocol = 0,
+ .driver_info = UVC_QUIRK_STREAM_NO_FID },
/* Syntek (HP Spartan) */
{ .match_flags = USB_DEVICE_ID_MATCH_DEVICE
| USB_DEVICE_ID_MATCH_INT_INFO,
#define UVC_QUIRK_IGNORE_SELECTOR_UNIT 0x00000020
#define UVC_QUIRK_FIX_BANDWIDTH 0x00000080
#define UVC_QUIRK_PROBE_DEF 0x00000100
+#define UVC_QUIRK_RESTRICT_FRAME_RATE 0x00000200
/* Format flags */
#define UVC_FMT_FLAG_COMPRESSED 0x00000001
struct video_code32 {
char loadwhat[16]; /* name or tag of file being passed */
compat_int_t datasize;
- unsigned char *data;
+ compat_uptr_t data;
};
-static int get_microcode32(struct video_code *kp, struct video_code32 __user *up)
+static struct video_code __user *get_microcode32(struct video_code32 *kp)
{
- if (!access_ok(VERIFY_READ, up, sizeof(struct video_code32)) ||
- copy_from_user(kp->loadwhat, up->loadwhat, sizeof(up->loadwhat)) ||
- get_user(kp->datasize, &up->datasize) ||
- copy_from_user(kp->data, up->data, up->datasize))
- return -EFAULT;
- return 0;
+ struct video_code __user *up;
+
+ up = compat_alloc_user_space(sizeof(*up));
+
+ /*
+ * NOTE! We don't actually care if these fail. If the
+ * user address is invalid, the native ioctl will do
+ * the error handling for us
+ */
+ (void) copy_to_user(up->loadwhat, kp->loadwhat, sizeof(up->loadwhat));
+ (void) put_user(kp->datasize, &up->datasize);
+ (void) put_user(compat_ptr(kp->data), &up->data);
+ return up;
}
#define VIDIOCGTUNER32 _IOWR('v', 4, struct video_tuner32)
struct video_tuner vt;
struct video_buffer vb;
struct video_window vw;
- struct video_code vc;
+ struct video_code32 vc;
struct video_audio va;
#endif
struct v4l2_format v2f;
break;
case VIDIOCSMICROCODE:
- err = get_microcode32(&karg.vc, up);
- compatible_arg = 0;
+ /* Copy the 32-bit "video_code32" to kernel space */
+ if (copy_from_user(&karg.vc, up, sizeof(karg.vc)))
+ return -EFAULT;
+ /* Convert the 32-bit version to a 64-bit version in user space */
+ up = get_microcode32(&karg.vc);
break;
case VIDIOCSFREQ:
}
/* read() method */
- dma_free_coherent(q->dev, mem->size, mem->vaddr, mem->dma_handle);
- mem->vaddr = NULL;
+ if (mem->vaddr) {
+ dma_free_coherent(q->dev, mem->size, mem->vaddr, mem->dma_handle);
+ mem->vaddr = NULL;
+ }
}
EXPORT_SYMBOL_GPL(videobuf_dma_contig_free);
* must free the memory.
*/
static struct scatterlist *videobuf_pages_to_sg(struct page **pages,
- int nr_pages, int offset)
+ int nr_pages, int offset, size_t size)
{
struct scatterlist *sglist;
int i;
/* DMA to highmem pages might not work */
goto highmem;
sg_set_page(&sglist[0], pages[0], PAGE_SIZE - offset, offset);
+ size -= PAGE_SIZE - offset;
for (i = 1; i < nr_pages; i++) {
if (NULL == pages[i])
goto nopage;
if (PageHighMem(pages[i]))
goto highmem;
- sg_set_page(&sglist[i], pages[i], PAGE_SIZE, 0);
+ sg_set_page(&sglist[i], pages[i], min(PAGE_SIZE, size), 0);
+ size -= min(PAGE_SIZE, size);
}
return sglist;
first = (data & PAGE_MASK) >> PAGE_SHIFT;
last = ((data+size-1) & PAGE_MASK) >> PAGE_SHIFT;
- dma->offset = data & ~PAGE_MASK;
+ dma->offset = data & ~PAGE_MASK;
+ dma->size = size;
dma->nr_pages = last-first+1;
dma->pages = kmalloc(dma->nr_pages * sizeof(struct page *), GFP_KERNEL);
if (NULL == dma->pages)
if (dma->pages) {
dma->sglist = videobuf_pages_to_sg(dma->pages, dma->nr_pages,
- dma->offset);
+ dma->offset, dma->size);
}
if (dma->vaddr) {
dma->sglist = videobuf_vmalloc_to_sg(dma->vaddr,
irq_tsc = cache_tsc;
for (i = 0; i < ARRAY_SIZE(max8925_irqs); i++) {
irq_data = &max8925_irqs[i];
+ /* 1 -- disable, 0 -- enable */
switch (irq_data->mask_reg) {
case MAX8925_CHG_IRQ1_MASK:
- irq_chg[0] &= irq_data->enable;
+ irq_chg[0] &= ~irq_data->enable;
break;
case MAX8925_CHG_IRQ2_MASK:
- irq_chg[1] &= irq_data->enable;
+ irq_chg[1] &= ~irq_data->enable;
break;
case MAX8925_ON_OFF_IRQ1_MASK:
- irq_on[0] &= irq_data->enable;
+ irq_on[0] &= ~irq_data->enable;
break;
case MAX8925_ON_OFF_IRQ2_MASK:
- irq_on[1] &= irq_data->enable;
+ irq_on[1] &= ~irq_data->enable;
break;
case MAX8925_RTC_IRQ_MASK:
- irq_rtc &= irq_data->enable;
+ irq_rtc &= ~irq_data->enable;
break;
case MAX8925_TSC_IRQ_MASK:
- irq_tsc &= irq_data->enable;
+ irq_tsc &= ~irq_data->enable;
break;
default:
dev_err(chip->dev, "wrong IRQ\n");
irq = irq - wm831x->irq_base;
- if (irq < WM831X_IRQ_GPIO_1 || irq > WM831X_IRQ_GPIO_11)
- return -EINVAL;
+ if (irq < WM831X_IRQ_GPIO_1 || irq > WM831X_IRQ_GPIO_11) {
+ /* Ignore internal-only IRQs */
+ if (irq >= 0 && irq < WM831X_NUM_IRQS)
+ return 0;
+ else
+ return -EINVAL;
+ }
switch (type) {
case IRQ_TYPE_EDGE_BOTH:
If unsure, say N.
To compile this driver as a module, choose M here: the
- module will be called vmware_balloon.
+ module will be called vmw_balloon.
config ARM_CHARLCD
bool "ARM Ltd. Character LCD Driver"
obj-$(CONFIG_HMC6352) += hmc6352.o
obj-y += eeprom/
obj-y += cb710/
-obj-$(CONFIG_VMWARE_BALLOON) += vmware_balloon.o
+obj-$(CONFIG_VMWARE_BALLOON) += vmw_balloon.o
obj-$(CONFIG_ARM_CHARLCD) += arm-charlcd.o
ddata = i2c_get_clientdata(client);
sysfs_remove_group(&client->dev.kobj, &bh1780_attr_group);
- i2c_set_clientdata(client, NULL);
kfree(ddata);
return 0;
if (host->bus_ops && !host->bus_dead) {
if (host->bus_ops->suspend)
err = host->bus_ops->suspend(host);
+ if (err == -ENOSYS || !host->bus_ops->resume) {
+ /*
+ * We simply "remove" the card in this case.
+ * It will be redetected on resume.
+ */
+ if (host->bus_ops->remove)
+ host->bus_ops->remove(host);
+ mmc_claim_host(host);
+ mmc_detach_bus(host);
+ mmc_release_host(host);
+ host->pm_flags = 0;
+ err = 0;
+ }
}
mmc_bus_put(host);
goto err;
}
- err = mmc_sd_get_cid(host, host->ocr & ocr, card->raw_cid);
-
- if (!err) {
+ if (ocr & R4_MEMORY_PRESENT
+ && mmc_sd_get_cid(host, host->ocr & ocr, card->raw_cid) == 0) {
card->type = MMC_TYPE_SD_COMBO;
if (oldcard && (oldcard->type != MMC_TYPE_SD_COMBO ||
#include <linux/clk.h>
#include <linux/atmel_pdc.h>
#include <linux/gfp.h>
+#include <linux/highmem.h>
#include <linux/mmc/host.h>
while (delay--) {
reg = readw(host->base + MMC_REG_STATUS);
- if (reg & STATUS_CARD_BUS_CLK_RUN)
+ if (reg & STATUS_CARD_BUS_CLK_RUN) {
/* Check twice before cut */
reg = readw(host->base + MMC_REG_STATUS);
if (reg & STATUS_CARD_BUS_CLK_RUN)
return 0;
+ }
if (test_bit(IMXMCI_PEND_STARTED_b, &host->pending_events))
return 0;
int ret = 0;
struct platform_device *pdev = to_platform_device(dev);
struct omap_hsmmc_host *host = platform_get_drvdata(pdev);
- pm_message_t state = PMSG_SUSPEND; /* unused by MMC core */
if (host && host->suspended)
return 0;
}
}
cancel_work_sync(&host->mmc_carddetect_work);
- mmc_host_enable(host->mmc);
ret = mmc_suspend_host(host->mmc);
+ mmc_host_enable(host->mmc);
if (ret == 0) {
omap_hsmmc_disable_irq(host);
OMAP_HSMMC_WRITE(host->base, HCTL,
host->pio_active = XFER_NONE;
#ifdef CONFIG_MMC_S3C_PIODMA
- host->dodma = host->pdata->dma;
+ host->dodma = host->pdata->use_dma;
#endif
host->mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
static void sdhci_s3c_notify_change(struct platform_device *dev, int state)
{
struct sdhci_host *host = platform_get_drvdata(dev);
+ unsigned long flags;
+
if (host) {
- spin_lock(&host->lock);
+ spin_lock_irqsave(&host->lock, flags);
if (state) {
dev_dbg(&dev->dev, "card inserted.\n");
host->flags &= ~SDHCI_DEVICE_DEAD;
host->quirks &= ~SDHCI_QUIRK_BROKEN_CARD_DETECTION;
}
tasklet_schedule(&host->card_tasklet);
- spin_unlock(&host->lock);
+ spin_unlock_irqrestore(&host->lock, flags);
}
}
sdhci_remove_host(host, 1);
for (ptr = 0; ptr < 3; ptr++) {
- clk_disable(sc->clk_bus[ptr]);
- clk_put(sc->clk_bus[ptr]);
+ if (sc->clk_bus[ptr]) {
+ clk_disable(sc->clk_bus[ptr]);
+ clk_put(sc->clk_bus[ptr]);
+ }
}
clk_disable(sc->clk_io);
clk_put(sc->clk_io);
static void tmio_mmc_pio_irq(struct tmio_mmc_host *host)
{
struct mmc_data *data = host->data;
+ void *sg_virt;
unsigned short *buf;
unsigned int count;
unsigned long flags;
return;
}
- buf = (unsigned short *)(tmio_mmc_kmap_atomic(host, &flags) +
- host->sg_off);
+ sg_virt = tmio_mmc_kmap_atomic(host->sg_ptr, &flags);
+ buf = (unsigned short *)(sg_virt + host->sg_off);
count = host->sg_ptr->length - host->sg_off;
if (count > data->blksz)
host->sg_off += count;
- tmio_mmc_kunmap_atomic(host, &flags);
+ tmio_mmc_kunmap_atomic(sg_virt, &flags);
if (host->sg_off == host->sg_ptr->length)
tmio_mmc_next_sg(host);
#define ack_mmc_irqs(host, i) \
do { \
- u32 mask;\
- mask = sd_ctrl_read32((host), CTL_STATUS); \
- mask &= ~((i) & TMIO_MASK_IRQ); \
- sd_ctrl_write32((host), CTL_STATUS, mask); \
+ sd_ctrl_write32((host), CTL_STATUS, ~(i)); \
} while (0)
return --host->sg_len;
}
-static inline char *tmio_mmc_kmap_atomic(struct tmio_mmc_host *host,
+static inline char *tmio_mmc_kmap_atomic(struct scatterlist *sg,
unsigned long *flags)
{
- struct scatterlist *sg = host->sg_ptr;
-
local_irq_save(*flags);
return kmap_atomic(sg_page(sg), KM_BIO_SRC_IRQ) + sg->offset;
}
-static inline void tmio_mmc_kunmap_atomic(struct tmio_mmc_host *host,
+static inline void tmio_mmc_kunmap_atomic(void *virt,
unsigned long *flags)
{
- kunmap_atomic(sg_page(host->sg_ptr), KM_BIO_SRC_IRQ);
+ kunmap_atomic(virt, KM_BIO_SRC_IRQ);
local_irq_restore(*flags);
}
static int __devexit bf5xx_nand_remove(struct platform_device *pdev)
{
struct bf5xx_nand_info *info = to_nand_info(pdev);
- struct mtd_info *mtd = NULL;
platform_set_drvdata(pdev, NULL);
* and their partitions, then go through freeing the
* resources used
*/
- mtd = &info->mtd;
- if (mtd) {
- nand_release(mtd);
- kfree(mtd);
- }
+ nand_release(&info->mtd);
peripheral_free_list(bfin_nfc_pin_req);
bf5xx_nand_dma_remove(info);
struct nand_chip *chip = mtd->priv;
int ret;
- ret = nand_scan_ident(mtd, 1);
+ ret = nand_scan_ident(mtd, 1, NULL);
if (ret)
return ret;
#include <linux/clk.h>
#include <linux/err.h>
#include <linux/io.h>
+#include <linux/irq.h>
+#include <linux/completion.h>
#include <asm/mach/flash.h>
#include <mach/mxc_nand.h>
#define NFC_V1_V2_CONFIG1_BIG (1 << 5)
#define NFC_V1_V2_CONFIG1_RST (1 << 6)
#define NFC_V1_V2_CONFIG1_CE (1 << 7)
-#define NFC_V1_V2_CONFIG1_ONE_CYCLE (1 << 8)
+#define NFC_V2_CONFIG1_ONE_CYCLE (1 << 8)
+#define NFC_V2_CONFIG1_PPB(x) (((x) & 0x3) << 9)
+#define NFC_V2_CONFIG1_FP_INT (1 << 11)
#define NFC_V1_V2_CONFIG2_INT (1 << 15)
int irq;
int eccsize;
- wait_queue_head_t irq_waitq;
+ struct completion op_completion;
uint8_t *data_buf;
unsigned int buf_start;
void (*send_read_id)(struct mxc_nand_host *);
uint16_t (*get_dev_status)(struct mxc_nand_host *);
int (*check_int)(struct mxc_nand_host *);
+ void (*irq_control)(struct mxc_nand_host *, int);
};
/* OOB placement block for use with hardware ecc generation */
{
struct mxc_nand_host *host = dev_id;
- disable_irq_nosync(irq);
+ if (!host->check_int(host))
+ return IRQ_NONE;
- wake_up(&host->irq_waitq);
+ host->irq_control(host, 0);
+
+ complete(&host->op_completion);
return IRQ_HANDLED;
}
if (!(tmp & NFC_V1_V2_CONFIG2_INT))
return 0;
- writew(tmp & ~NFC_V1_V2_CONFIG2_INT, NFC_V1_V2_CONFIG2);
+ if (!cpu_is_mx21())
+ writew(tmp & ~NFC_V1_V2_CONFIG2_INT, NFC_V1_V2_CONFIG2);
return 1;
}
+/*
+ * It has been observed that the i.MX21 cannot read the CONFIG2:INT bit
+ * if interrupts are masked (CONFIG1:INT_MSK is set). To handle this, the
+ * driver can enable/disable the irq line rather than simply masking the
+ * interrupts.
+ */
+static void irq_control_mx21(struct mxc_nand_host *host, int activate)
+{
+ if (activate)
+ enable_irq(host->irq);
+ else
+ disable_irq_nosync(host->irq);
+}
+
+static void irq_control_v1_v2(struct mxc_nand_host *host, int activate)
+{
+ uint16_t tmp;
+
+ tmp = readw(NFC_V1_V2_CONFIG1);
+
+ if (activate)
+ tmp &= ~NFC_V1_V2_CONFIG1_INT_MSK;
+ else
+ tmp |= NFC_V1_V2_CONFIG1_INT_MSK;
+
+ writew(tmp, NFC_V1_V2_CONFIG1);
+}
+
+static void irq_control_v3(struct mxc_nand_host *host, int activate)
+{
+ uint32_t tmp;
+
+ tmp = readl(NFC_V3_CONFIG2);
+
+ if (activate)
+ tmp &= ~NFC_V3_CONFIG2_INT_MSK;
+ else
+ tmp |= NFC_V3_CONFIG2_INT_MSK;
+
+ writel(tmp, NFC_V3_CONFIG2);
+}
+
/* This function polls the NANDFC to wait for the basic operation to
* complete by checking the INT bit of config2 register.
*/
if (useirq) {
if (!host->check_int(host)) {
-
- enable_irq(host->irq);
-
- wait_event(host->irq_waitq, host->check_int(host));
+ INIT_COMPLETION(host->op_completion);
+ host->irq_control(host, 1);
+ wait_for_completion(&host->op_completion);
}
} else {
while (max_retries-- > 0) {
/* Wait for operation to complete */
wait_op_done(host, true);
+ memcpy(host->data_buf, host->main_area0, 16);
+
if (this->options & NAND_BUSWIDTH_16) {
- void __iomem *main_buf = host->main_area0;
/* compress the ID info */
- writeb(readb(main_buf + 2), main_buf + 1);
- writeb(readb(main_buf + 4), main_buf + 2);
- writeb(readb(main_buf + 6), main_buf + 3);
- writeb(readb(main_buf + 8), main_buf + 4);
- writeb(readb(main_buf + 10), main_buf + 5);
+ host->data_buf[1] = host->data_buf[2];
+ host->data_buf[2] = host->data_buf[4];
+ host->data_buf[3] = host->data_buf[6];
+ host->data_buf[4] = host->data_buf[8];
+ host->data_buf[5] = host->data_buf[10];
}
- memcpy(host->data_buf, host->main_area0, 16);
}
static uint16_t get_dev_status_v3(struct mxc_nand_host *host)
{
struct nand_chip *nand_chip = mtd->priv;
struct mxc_nand_host *host = nand_chip->priv;
- uint16_t tmp;
+ uint16_t config1 = 0;
- /* enable interrupt, disable spare enable */
- tmp = readw(NFC_V1_V2_CONFIG1);
- tmp &= ~NFC_V1_V2_CONFIG1_INT_MSK;
- tmp &= ~NFC_V1_V2_CONFIG1_SP_EN;
- if (nand_chip->ecc.mode == NAND_ECC_HW) {
- tmp |= NFC_V1_V2_CONFIG1_ECC_EN;
- } else {
- tmp &= ~NFC_V1_V2_CONFIG1_ECC_EN;
- }
+ if (nand_chip->ecc.mode == NAND_ECC_HW)
+ config1 |= NFC_V1_V2_CONFIG1_ECC_EN;
+
+ if (nfc_is_v21())
+ config1 |= NFC_V2_CONFIG1_FP_INT;
+
+ if (!cpu_is_mx21())
+ config1 |= NFC_V1_V2_CONFIG1_INT_MSK;
if (nfc_is_v21() && mtd->writesize) {
+ uint16_t pages_per_block = mtd->erasesize / mtd->writesize;
+
host->eccsize = get_eccsize(mtd);
if (host->eccsize == 4)
- tmp |= NFC_V2_CONFIG1_ECC_MODE_4;
+ config1 |= NFC_V2_CONFIG1_ECC_MODE_4;
+
+ config1 |= NFC_V2_CONFIG1_PPB(ffs(pages_per_block) - 6);
} else {
host->eccsize = 1;
}
- writew(tmp, NFC_V1_V2_CONFIG1);
+ writew(config1, NFC_V1_V2_CONFIG1);
/* preset operation */
/* Unlock the internal RAM Buffer */
NFC_V3_CONFIG2_2CMD_PHASES |
NFC_V3_CONFIG2_SPAS(mtd->oobsize >> 1) |
NFC_V3_CONFIG2_ST_CMD(0x70) |
+ NFC_V3_CONFIG2_INT_MSK |
NFC_V3_CONFIG2_NUM_ADDR_PHASE0;
if (chip->ecc.mode == NAND_ECC_HW)
host->send_read_id = send_read_id_v1_v2;
host->get_dev_status = get_dev_status_v1_v2;
host->check_int = check_int_v1_v2;
+ if (cpu_is_mx21())
+ host->irq_control = irq_control_mx21;
+ else
+ host->irq_control = irq_control_v1_v2;
}
if (nfc_is_v21()) {
host->send_read_id = send_read_id_v3;
host->check_int = check_int_v3;
host->get_dev_status = get_dev_status_v3;
+ host->irq_control = irq_control_v3;
oob_smallpage = &nandv2_hw_eccoob_smallpage;
oob_largepage = &nandv2_hw_eccoob_largepage;
} else
this->options |= NAND_USE_FLASH_BBT;
}
- init_waitqueue_head(&host->irq_waitq);
+ init_completion(&host->op_completion);
host->irq = platform_get_irq(pdev, 0);
+ /*
+ * mask the interrupt. For i.MX21 explicitely call
+ * irq_control_v1_v2 to use the mask bit. We can't call
+ * disable_irq_nosync() for an interrupt we do not own yet.
+ */
+ if (cpu_is_mx21())
+ irq_control_v1_v2(host, 0);
+ else
+ host->irq_control(host, 0);
+
err = request_irq(host->irq, mxc_nfc_irq, IRQF_DISABLED, DRIVER_NAME, host);
if (err)
goto eirq;
+ host->irq_control(host, 0);
+
+ /*
+ * Now that the interrupt is disabled make sure the interrupt
+ * mask bit is cleared on i.MX21. Otherwise we can't read
+ * the interrupt status bit on this machine.
+ */
+ if (cpu_is_mx21())
+ irq_control_v1_v2(host, 1);
+
/* first scan to find the device and get the page size */
if (nand_scan_ident(mtd, 1, NULL)) {
err = -ENXIO;
prefetch_status = gpmc_read_status(GPMC_PREFETCH_COUNT);
} while (prefetch_status);
/* disable and stop the PFPW engine */
- gpmc_prefetch_reset();
+ gpmc_prefetch_reset(info->gpmc_cs);
dma_unmap_single(&info->pdev->dev, dma_addr, len, dir);
return 0;
goto fail_free_irq;
}
+#ifdef CONFIG_MTD_PARTITIONS
if (mtd_has_cmdlinepart()) {
static const char *probes[] = { "cmdlinepart", NULL };
struct mtd_partition *parts;
}
return add_mtd_partitions(mtd, pdata->parts, pdata->nr_parts);
+#else
+ return 0;
+#endif
fail_free_irq:
free_irq(irq, info);
platform_set_drvdata(pdev, NULL);
del_mtd_device(mtd);
+#ifdef CONFIG_MTD_PARTITIONS
del_mtd_partitions(mtd);
+#endif
irq = platform_get_irq(pdev, 0);
if (irq >= 0)
free_irq(irq, info);
do {
status = readl(base + S5PC110_DMA_TRANS_STATUS);
+ if (status & S5PC110_DMA_TRANS_STATUS_TE) {
+ writel(S5PC110_DMA_TRANS_CMD_TEC,
+ base + S5PC110_DMA_TRANS_CMD);
+ return -EIO;
+ }
} while (!(status & S5PC110_DMA_TRANS_STATUS_TD));
- if (status & S5PC110_DMA_TRANS_STATUS_TE) {
- writel(S5PC110_DMA_TRANS_CMD_TEC, base + S5PC110_DMA_TRANS_CMD);
- writel(S5PC110_DMA_TRANS_CMD_TDC, base + S5PC110_DMA_TRANS_CMD);
- return -EIO;
- }
-
writel(S5PC110_DMA_TRANS_CMD_TDC, base + S5PC110_DMA_TRANS_CMD);
return 0;
unsigned char *buffer, int offset, size_t count)
{
struct onenand_chip *this = mtd->priv;
- void __iomem *bufferram;
void __iomem *p;
void *buf = (void *) buffer;
dma_addr_t dma_src, dma_dst;
int err;
- p = bufferram = this->base + area;
+ p = this->base + area;
if (ONENAND_CURRENT_BUFFERRAM(this)) {
if (area == ONENAND_DATARAM)
p += this->writesize;
normal:
if (count != mtd->writesize) {
/* Copy the bufferram to memory to prevent unaligned access */
- memcpy(this->page_buf, bufferram, mtd->writesize);
+ memcpy(this->page_buf, p, mtd->writesize);
p = this->page_buf + offset;
}
lp->tx_len = lp->exec_box->data[9]; /* Transmit list count */
lp->rx_len = lp->exec_box->data[11]; /* Receive list count */
- init_MUTEX_LOCKED(&lp->cmd_mutex);
+ sema_init(&lp->cmd_mutex, 0);
init_completion(&lp->execution_cmd);
init_completion(&lp->xceiver_cmd);
must_free_region:1, /* Flag: if zero, Cardbus owns the I/O region */
large_frames:1, /* accept large frames */
handling_irq:1; /* private in_irq indicator */
+ /* {get|set}_wol operations are already serialized by rtnl.
+ * no additional locking is required for the enable_wol and acpi_set_WOL()
+ */
int drv_flags;
u16 status_enable;
u16 intr_enable;
}
}
- if (status & RxEarly) { /* Rx early is unused. */
- vortex_rx(dev);
+ if (status & RxEarly) /* Rx early is unused. */
iowrite16(AckIntr | RxEarly, ioaddr + EL3_CMD);
- }
+
if (status & StatsFull) { /* Empty statistics. */
static int DoneDidThat;
if (vortex_debug > 4)
if (status & (HostError | RxEarly | StatsFull | TxComplete | IntReq)) {
if (status == 0xffff)
break;
+ if (status & RxEarly)
+ vortex_rx(dev);
+ spin_unlock(&vp->window_lock);
vortex_error(dev, status);
+ spin_lock(&vp->window_lock);
+ window_set(vp, 7);
}
if (--work_done < 0) {
{
struct vortex_private *vp = netdev_priv(dev);
- spin_lock_irq(&vp->lock);
+ if (!VORTEX_PCI(vp))
+ return;
+
wol->supported = WAKE_MAGIC;
wol->wolopts = 0;
if (vp->enable_wol)
wol->wolopts |= WAKE_MAGIC;
- spin_unlock_irq(&vp->lock);
}
static int vortex_set_wol(struct net_device *dev, struct ethtool_wolinfo *wol)
{
struct vortex_private *vp = netdev_priv(dev);
+
+ if (!VORTEX_PCI(vp))
+ return -EOPNOTSUPP;
+
if (wol->wolopts & ~WAKE_MAGIC)
return -EINVAL;
- spin_lock_irq(&vp->lock);
if (wol->wolopts & WAKE_MAGIC)
vp->enable_wol = 1;
else
vp->enable_wol = 0;
acpi_set_WOL(dev);
- spin_unlock_irq(&vp->lock);
return 0;
}
return;
}
+ if (VORTEX_PCI(vp)->current_state < PCI_D3hot)
+ return;
+
/* Change the power state to D3; RxEnable doesn't take effect. */
pci_set_power_state(VORTEX_PCI(vp), PCI_D3hot);
}
config MV643XX_ETH
tristate "Marvell Discovery (643XX) and Orion ethernet support"
- depends on MV64X60 || PPC32 || PLAT_ORION
+ depends on (MV64X60 || PPC32 || PLAT_ORION) && INET
select INET_LRO
select PHYLIB
help
config PASEMI_MAC
tristate "PA Semi 1/10Gbit MAC"
- depends on PPC_PASEMI && PCI
+ depends on PPC_PASEMI && PCI && INET
select PHYLIB
select INET_LRO
help
rrd_ring->desc = NULL;
rrd_ring->dma = 0;
+
+ adapter->cmb.dma = 0;
+ adapter->cmb.cmb = NULL;
+
+ adapter->smb.dma = 0;
+ adapter->smb.smb = NULL;
}
static void atl1_setup_mac_ctrl(struct atl1_adapter *adapter)
pci_enable_wake(pdev, PCI_D3cold, 0);
atl1_reset_hw(&adapter->hw);
- adapter->cmb.cmb->int_stats = 0;
- if (netif_running(netdev))
+ if (netif_running(netdev)) {
+ adapter->cmb.cmb->int_stats = 0;
atl1_up(adapter);
+ }
netif_device_attach(netdev);
return 0;
b44_tx(bp);
/* spin_unlock(&bp->tx_lock); */
}
+ if (bp->istat & ISTAT_RFO) { /* fast recovery, in ~20msec */
+ bp->istat &= ~ISTAT_RFO;
+ b44_disable_ints(bp);
+ ssb_device_enable(bp->sdev, 0); /* resets ISTAT_RFO */
+ b44_init_rings(bp);
+ b44_init_hw(bp, B44_FULL_RESET_SKIP_PHY);
+ netif_wake_queue(bp->dev);
+ }
+
spin_unlock_irqrestore(&bp->lock, flags);
work_done = 0;
dev->irq = sdev->irq;
SET_ETHTOOL_OPS(dev, &b44_ethtool_ops);
- netif_carrier_off(dev);
-
err = ssb_bus_powerup(sdev->bus, 0);
if (err) {
dev_err(sdev->dev,
goto err_out_powerdown;
}
+ netif_carrier_off(dev);
+
ssb_set_drvdata(sdev, dev);
/* Chip reset provides power to the b44 MAC & PCI cores, which
u64 be_rx_bytes_prev;
u64 be_rx_pkts;
u32 be_rx_rate;
+ u32 be_rx_mcast_pkt;
/* number of non ether type II frames dropped where
* frame len > length field of Mac Hdr */
u32 be_802_3_dropped_frames;
while ((compl = be_mcc_compl_get(adapter))) {
if (compl->flags & CQE_FLAGS_ASYNC_MASK) {
/* Interpret flags as an async trailer */
- BUG_ON(!is_link_state_evt(compl->flags));
-
- /* Interpret compl as a async link evt */
- be_async_link_state_process(adapter,
+ if (is_link_state_evt(compl->flags))
+ be_async_link_state_process(adapter,
(struct be_async_event_link_state *) compl);
} else if (compl->flags & CQE_FLAGS_COMPLETED_MASK) {
*status = be_mcc_compl_process(adapter, compl);
if (msecs > 4000) {
dev_err(&adapter->pdev->dev, "mbox poll timed out\n");
- be_dump_ue(adapter);
+ be_detect_dump_ue(adapter);
return -1;
}
extern int be_cmd_get_phy_info(struct be_adapter *adapter,
struct be_dma_mem *cmd);
extern int be_cmd_set_qos(struct be_adapter *adapter, u32 bps, u32 domain);
-extern void be_dump_ue(struct be_adapter *adapter);
+extern void be_detect_dump_ue(struct be_adapter *adapter);
{DRVSTAT_INFO(be_rx_events)},
{DRVSTAT_INFO(be_tx_compl)},
{DRVSTAT_INFO(be_rx_compl)},
+ {DRVSTAT_INFO(be_rx_mcast_pkt)},
{DRVSTAT_INFO(be_ethrx_post_fail)},
{DRVSTAT_INFO(be_802_3_dropped_frames)},
{DRVSTAT_INFO(be_802_3_malformed_frames)},
#define FLASH_FCoE_BIOS_START_g3 (13631488)
#define FLASH_REDBOOT_START_g3 (262144)
-
-
+/************* Rx Packet Type Encoding **************/
+#define BE_UNICAST_PACKET 0
+#define BE_MULTICAST_PACKET 1
+#define BE_BROADCAST_PACKET 2
+#define BE_RSVD_PACKET 3
/*
* BE descriptors: host memory data structures whose formats
dev_stats->tx_packets = drvr_stats(adapter)->be_tx_pkts;
dev_stats->rx_bytes = drvr_stats(adapter)->be_rx_bytes;
dev_stats->tx_bytes = drvr_stats(adapter)->be_tx_bytes;
+ dev_stats->multicast = drvr_stats(adapter)->be_rx_mcast_pkt;
/* bad pkts received */
dev_stats->rx_errors = port_stats->rx_crc_errors +
/* no space available in linux */
dev_stats->tx_dropped = 0;
- dev_stats->multicast = port_stats->rx_multicast_frames;
dev_stats->collisions = 0;
/* detailed tx_errors */
}
static void be_rx_stats_update(struct be_adapter *adapter,
- u32 pktsize, u16 numfrags)
+ u32 pktsize, u16 numfrags, u8 pkt_type)
{
struct be_drvr_stats *stats = drvr_stats(adapter);
stats->be_rx_frags += numfrags;
stats->be_rx_bytes += pktsize;
stats->be_rx_pkts++;
+
+ if (pkt_type == BE_MULTICAST_PACKET)
+ stats->be_rx_mcast_pkt++;
}
static inline bool do_pkt_csum(struct be_eth_rx_compl *rxcp, bool cso)
u16 rxq_idx, i, j;
u32 pktsize, hdr_len, curr_frag_len, size;
u8 *start;
+ u8 pkt_type;
rxq_idx = AMAP_GET_BITS(struct amap_eth_rx_compl, fragndx, rxcp);
pktsize = AMAP_GET_BITS(struct amap_eth_rx_compl, pktsize, rxcp);
+ pkt_type = AMAP_GET_BITS(struct amap_eth_rx_compl, cast_enc, rxcp);
page_info = get_rx_page_info(adapter, rxq_idx);
BUG_ON(j > MAX_SKB_FRAGS);
done:
- be_rx_stats_update(adapter, pktsize, num_rcvd);
+ be_rx_stats_update(adapter, pktsize, num_rcvd, pkt_type);
}
/* Process the RX completion indicated by rxcp when GRO is disabled */
u32 num_rcvd, pkt_size, remaining, vlanf, curr_frag_len;
u16 i, rxq_idx = 0, vid, j;
u8 vtm;
+ u8 pkt_type;
num_rcvd = AMAP_GET_BITS(struct amap_eth_rx_compl, numfrags, rxcp);
/* Is it a flush compl that has no data */
vlanf = AMAP_GET_BITS(struct amap_eth_rx_compl, vtp, rxcp);
rxq_idx = AMAP_GET_BITS(struct amap_eth_rx_compl, fragndx, rxcp);
vtm = AMAP_GET_BITS(struct amap_eth_rx_compl, vtm, rxcp);
+ pkt_type = AMAP_GET_BITS(struct amap_eth_rx_compl, cast_enc, rxcp);
/* vlanf could be wrongly set in some cards.
* ignore if vtm is not set */
vlan_gro_frags(&eq_obj->napi, adapter->vlan_grp, vid);
}
- be_rx_stats_update(adapter, pkt_size, num_rcvd);
+ be_rx_stats_update(adapter, pkt_size, num_rcvd, pkt_type);
}
static struct be_eth_rx_compl *be_rx_compl_get(struct be_adapter *adapter)
return 1;
}
-static inline bool be_detect_ue(struct be_adapter *adapter)
-{
- u32 online0 = 0, online1 = 0;
-
- pci_read_config_dword(adapter->pdev, PCICFG_ONLINE0, &online0);
-
- pci_read_config_dword(adapter->pdev, PCICFG_ONLINE1, &online1);
-
- if (!online0 || !online1) {
- adapter->ue_detected = true;
- dev_err(&adapter->pdev->dev,
- "UE Detected!! online0=%d online1=%d\n",
- online0, online1);
- return true;
- }
-
- return false;
-}
-
-void be_dump_ue(struct be_adapter *adapter)
+void be_detect_dump_ue(struct be_adapter *adapter)
{
u32 ue_status_lo, ue_status_hi, ue_status_lo_mask, ue_status_hi_mask;
u32 i;
ue_status_lo = (ue_status_lo & (~ue_status_lo_mask));
ue_status_hi = (ue_status_hi & (~ue_status_hi_mask));
+ if (ue_status_lo || ue_status_hi) {
+ adapter->ue_detected = true;
+ dev_err(&adapter->pdev->dev, "UE Detected!!\n");
+ }
+
if (ue_status_lo) {
for (i = 0; ue_status_lo; ue_status_lo >>= 1, i++) {
if (ue_status_lo & 1)
adapter->rx_post_starved = false;
be_post_rx_frags(adapter);
}
- if (!adapter->ue_detected) {
- if (be_detect_ue(adapter))
- be_dump_ue(adapter);
- }
+ if (!adapter->ue_detected)
+ be_detect_dump_ue(adapter);
schedule_delayed_work(&adapter->work, msecs_to_jiffies(1000));
}
if (!(dev->flags & IFF_MASTER))
goto out;
+ if (!pskb_may_pull(skb, sizeof(struct lacpdu)))
+ goto out;
+
read_lock(&bond->lock);
slave = bond_get_slave_by_dev((struct bonding *)netdev_priv(dev),
orig_dev);
goto out;
}
+ if (!pskb_may_pull(skb, arp_hdr_len(bond_dev)))
+ goto out;
+
if (skb->len < sizeof(struct arp_pkt)) {
pr_debug("Packet is too small to be an ARP\n");
goto out;
* so it can wait
*/
bond_for_each_slave(bond, slave, i) {
+ unsigned long trans_start = dev_trans_start(slave->dev);
+
if (slave->link != BOND_LINK_UP) {
- if (time_before_eq(jiffies, dev_trans_start(slave->dev) + delta_in_ticks) &&
- time_before_eq(jiffies, slave->dev->last_rx + delta_in_ticks)) {
+ if (time_in_range(jiffies,
+ trans_start - delta_in_ticks,
+ trans_start + delta_in_ticks) &&
+ time_in_range(jiffies,
+ slave->dev->last_rx - delta_in_ticks,
+ slave->dev->last_rx + delta_in_ticks)) {
slave->link = BOND_LINK_UP;
slave->state = BOND_STATE_ACTIVE;
* when the source ip is 0, so don't take the link down
* if we don't know our ip yet
*/
- if (time_after_eq(jiffies, dev_trans_start(slave->dev) + 2*delta_in_ticks) ||
- (time_after_eq(jiffies, slave->dev->last_rx + 2*delta_in_ticks))) {
+ if (!time_in_range(jiffies,
+ trans_start - delta_in_ticks,
+ trans_start + 2 * delta_in_ticks) ||
+ !time_in_range(jiffies,
+ slave->dev->last_rx - delta_in_ticks,
+ slave->dev->last_rx + 2 * delta_in_ticks)) {
slave->link = BOND_LINK_DOWN;
slave->state = BOND_STATE_BACKUP;
{
struct slave *slave;
int i, commit = 0;
+ unsigned long trans_start;
bond_for_each_slave(bond, slave, i) {
slave->new_link = BOND_LINK_NOCHANGE;
if (slave->link != BOND_LINK_UP) {
- if (time_before_eq(jiffies, slave_last_rx(bond, slave) +
- delta_in_ticks)) {
+ if (time_in_range(jiffies,
+ slave_last_rx(bond, slave) - delta_in_ticks,
+ slave_last_rx(bond, slave) + delta_in_ticks)) {
+
slave->new_link = BOND_LINK_UP;
commit++;
}
* active. This avoids bouncing, as the last receive
* times need a full ARP monitor cycle to be updated.
*/
- if (!time_after_eq(jiffies, slave->jiffies +
- 2 * delta_in_ticks))
+ if (time_in_range(jiffies,
+ slave->jiffies - delta_in_ticks,
+ slave->jiffies + 2 * delta_in_ticks))
continue;
/*
*/
if (slave->state == BOND_STATE_BACKUP &&
!bond->current_arp_slave &&
- time_after(jiffies, slave_last_rx(bond, slave) +
- 3 * delta_in_ticks)) {
+ !time_in_range(jiffies,
+ slave_last_rx(bond, slave) - delta_in_ticks,
+ slave_last_rx(bond, slave) + 3 * delta_in_ticks)) {
+
slave->new_link = BOND_LINK_DOWN;
commit++;
}
* - (more than 2*delta since receive AND
* the bond has an IP address)
*/
+ trans_start = dev_trans_start(slave->dev);
if ((slave->state == BOND_STATE_ACTIVE) &&
- (time_after_eq(jiffies, dev_trans_start(slave->dev) +
- 2 * delta_in_ticks) ||
- (time_after_eq(jiffies, slave_last_rx(bond, slave)
- + 2 * delta_in_ticks)))) {
+ (!time_in_range(jiffies,
+ trans_start - delta_in_ticks,
+ trans_start + 2 * delta_in_ticks) ||
+ !time_in_range(jiffies,
+ slave_last_rx(bond, slave) - delta_in_ticks,
+ slave_last_rx(bond, slave) + 2 * delta_in_ticks))) {
+
slave->new_link = BOND_LINK_DOWN;
commit++;
}
{
struct slave *slave;
int i;
+ unsigned long trans_start;
bond_for_each_slave(bond, slave, i) {
switch (slave->new_link) {
continue;
case BOND_LINK_UP:
+ trans_start = dev_trans_start(slave->dev);
if ((!bond->curr_active_slave &&
- time_before_eq(jiffies,
- dev_trans_start(slave->dev) +
- delta_in_ticks)) ||
+ time_in_range(jiffies,
+ trans_start - delta_in_ticks,
+ trans_start + delta_in_ticks)) ||
bond->curr_active_slave != slave) {
slave->link = BOND_LINK_UP;
bond->current_arp_slave = NULL;
res = dev_alloc_name(bond_dev, "bond%d");
if (res < 0)
goto out;
+ } else {
+ /*
+ * If we're given a name to register
+ * we need to ensure that its not already
+ * registered
+ */
+ res = -EEXIST;
+ if (__dev_get_by_name(net, name) != NULL)
+ goto out;
}
res = register_netdevice(bond_dev);
case CHELSIO_GET_QSET_NUM:{
struct ch_reg edata;
+ memset(&edata, 0, sizeof(struct ch_reg));
+
edata.cmd = CHELSIO_GET_QSET_NUM;
edata.val = pi->nqsets;
if (copy_to_user(useraddr, &edata, sizeof(edata)))
E1000_SCTL = 0x00024, /* SerDes Control - RW */
E1000_FCAL = 0x00028, /* Flow Control Address Low - RW */
E1000_FCAH = 0x0002C, /* Flow Control Address High -RW */
+ E1000_FEXTNVM4 = 0x00024, /* Future Extended NVM 4 - RW */
E1000_FEXTNVM = 0x00028, /* Future Extended NVM - RW */
E1000_FCT = 0x00030, /* Flow Control Type - RW */
E1000_VET = 0x00038, /* VLAN Ether Type - RW */
#define E1000_FEXTNVM_SW_CONFIG 1
#define E1000_FEXTNVM_SW_CONFIG_ICH8M (1 << 27) /* Bit redefined for ICH8M :/ */
+#define E1000_FEXTNVM4_BEACON_DURATION_MASK 0x7
+#define E1000_FEXTNVM4_BEACON_DURATION_8USEC 0x7
+#define E1000_FEXTNVM4_BEACON_DURATION_16USEC 0x3
+
#define PCIE_ICH8_SNOOP_ALL PCIE_NO_SNOOP_ALL
#define E1000_ICH_RAR_ENTRIES 7
/* SMBus Address Phy Register */
#define HV_SMB_ADDR PHY_REG(768, 26)
+#define HV_SMB_ADDR_MASK 0x007F
#define HV_SMB_ADDR_PEC_EN 0x0200
#define HV_SMB_ADDR_VALID 0x0080
static s32 e1000_set_mdio_slow_mode_hv(struct e1000_hw *hw);
static bool e1000_check_mng_mode_ich8lan(struct e1000_hw *hw);
static bool e1000_check_mng_mode_pchlan(struct e1000_hw *hw);
+static s32 e1000_k1_workaround_lv(struct e1000_hw *hw);
+static void e1000_gate_hw_phy_config_ich8lan(struct e1000_hw *hw, bool gate);
static inline u16 __er16flash(struct e1000_hw *hw, unsigned long reg)
{
static s32 e1000_init_phy_params_pchlan(struct e1000_hw *hw)
{
struct e1000_phy_info *phy = &hw->phy;
- u32 ctrl;
+ u32 ctrl, fwsm;
s32 ret_val = 0;
phy->addr = 1;
* disabled, then toggle the LANPHYPC Value bit to force
* the interconnect to PCIe mode.
*/
- if (!(er32(FWSM) & E1000_ICH_FWSM_FW_VALID)) {
+ fwsm = er32(FWSM);
+ if (!(fwsm & E1000_ICH_FWSM_FW_VALID)) {
ctrl = er32(CTRL);
ctrl |= E1000_CTRL_LANPHYPC_OVERRIDE;
ctrl &= ~E1000_CTRL_LANPHYPC_VALUE;
ctrl &= ~E1000_CTRL_LANPHYPC_OVERRIDE;
ew32(CTRL, ctrl);
msleep(50);
+
+ /*
+ * Gate automatic PHY configuration by hardware on
+ * non-managed 82579
+ */
+ if (hw->mac.type == e1000_pch2lan)
+ e1000_gate_hw_phy_config_ich8lan(hw, true);
}
/*
if (ret_val)
goto out;
+ /* Ungate automatic PHY configuration on non-managed 82579 */
+ if ((hw->mac.type == e1000_pch2lan) &&
+ !(fwsm & E1000_ICH_FWSM_FW_VALID)) {
+ msleep(10);
+ e1000_gate_hw_phy_config_ich8lan(hw, false);
+ }
+
phy->id = e1000_phy_unknown;
ret_val = e1000e_get_phy_id(hw);
if (ret_val)
if (mac->type == e1000_ich8lan)
e1000e_set_kmrn_lock_loss_workaround_ich8lan(hw, true);
- /* Disable PHY configuration by hardware, config by software */
- if (mac->type == e1000_pch2lan) {
- u32 extcnf_ctrl = er32(EXTCNF_CTRL);
-
- extcnf_ctrl |= E1000_EXTCNF_CTRL_GATE_PHY_CFG;
- ew32(EXTCNF_CTRL, extcnf_ctrl);
- }
+ /* Gate automatic PHY configuration by hardware on managed 82579 */
+ if ((mac->type == e1000_pch2lan) &&
+ (er32(FWSM) & E1000_ICH_FWSM_FW_VALID))
+ e1000_gate_hw_phy_config_ich8lan(hw, true);
return 0;
}
goto out;
}
+ if (hw->mac.type == e1000_pch2lan) {
+ ret_val = e1000_k1_workaround_lv(hw);
+ if (ret_val)
+ goto out;
+ }
+
/*
* Check if there was DownShift, must be checked
* immediately after link-up
}
/**
+ * e1000_write_smbus_addr - Write SMBus address to PHY needed during Sx states
+ * @hw: pointer to the HW structure
+ *
+ * Assumes semaphore already acquired.
+ *
+ **/
+static s32 e1000_write_smbus_addr(struct e1000_hw *hw)
+{
+ u16 phy_data;
+ u32 strap = er32(STRAP);
+ s32 ret_val = 0;
+
+ strap &= E1000_STRAP_SMBUS_ADDRESS_MASK;
+
+ ret_val = e1000_read_phy_reg_hv_locked(hw, HV_SMB_ADDR, &phy_data);
+ if (ret_val)
+ goto out;
+
+ phy_data &= ~HV_SMB_ADDR_MASK;
+ phy_data |= (strap >> E1000_STRAP_SMBUS_ADDRESS_SHIFT);
+ phy_data |= HV_SMB_ADDR_PEC_EN | HV_SMB_ADDR_VALID;
+ ret_val = e1000_write_phy_reg_hv_locked(hw, HV_SMB_ADDR, phy_data);
+
+out:
+ return ret_val;
+}
+
+/**
* e1000_sw_lcd_config_ich8lan - SW-based LCD Configuration
* @hw: pointer to the HW structure
*
**/
static s32 e1000_sw_lcd_config_ich8lan(struct e1000_hw *hw)
{
- struct e1000_adapter *adapter = hw->adapter;
struct e1000_phy_info *phy = &hw->phy;
u32 i, data, cnf_size, cnf_base_addr, sw_cfg_mask;
s32 ret_val = 0;
if (phy->type != e1000_phy_igp_3)
return ret_val;
- if (adapter->pdev->device == E1000_DEV_ID_ICH8_IGP_AMT) {
+ if ((hw->adapter->pdev->device == E1000_DEV_ID_ICH8_IGP_AMT) ||
+ (hw->adapter->pdev->device == E1000_DEV_ID_ICH8_IGP_C)) {
sw_cfg_mask = E1000_FEXTNVM_SW_CONFIG;
break;
}
cnf_base_addr = data & E1000_EXTCNF_CTRL_EXT_CNF_POINTER_MASK;
cnf_base_addr >>= E1000_EXTCNF_CTRL_EXT_CNF_POINTER_SHIFT;
- if (!(data & E1000_EXTCNF_CTRL_OEM_WRITE_ENABLE) &&
- ((hw->mac.type == e1000_pchlan) ||
- (hw->mac.type == e1000_pch2lan))) {
+ if ((!(data & E1000_EXTCNF_CTRL_OEM_WRITE_ENABLE) &&
+ (hw->mac.type == e1000_pchlan)) ||
+ (hw->mac.type == e1000_pch2lan)) {
/*
* HW configures the SMBus address and LEDs when the
* OEM and LCD Write Enable bits are set in the NVM.
* When both NVM bits are cleared, SW will configure
* them instead.
*/
- data = er32(STRAP);
- data &= E1000_STRAP_SMBUS_ADDRESS_MASK;
- reg_data = data >> E1000_STRAP_SMBUS_ADDRESS_SHIFT;
- reg_data |= HV_SMB_ADDR_PEC_EN | HV_SMB_ADDR_VALID;
- ret_val = e1000_write_phy_reg_hv_locked(hw, HV_SMB_ADDR,
- reg_data);
+ ret_val = e1000_write_smbus_addr(hw);
if (ret_val)
goto out;
goto out;
/* Enable jumbo frame workaround in the PHY */
- e1e_rphy(hw, PHY_REG(769, 20), &data);
- ret_val = e1e_wphy(hw, PHY_REG(769, 20), data & ~(1 << 14));
- if (ret_val)
- goto out;
e1e_rphy(hw, PHY_REG(769, 23), &data);
data &= ~(0x7F << 5);
data |= (0x37 << 5);
goto out;
e1e_rphy(hw, PHY_REG(769, 16), &data);
data &= ~(1 << 13);
- data |= (1 << 12);
ret_val = e1e_wphy(hw, PHY_REG(769, 16), data);
if (ret_val)
goto out;
mac_reg = er32(RCTL);
mac_reg &= ~E1000_RCTL_SECRC;
- ew32(FFLT_DBG, mac_reg);
+ ew32(RCTL, mac_reg);
ret_val = e1000e_read_kmrn_reg(hw,
E1000_KMRNCTRLSTA_CTRL_OFFSET,
goto out;
/* Write PHY register values back to h/w defaults */
- e1e_rphy(hw, PHY_REG(769, 20), &data);
- ret_val = e1e_wphy(hw, PHY_REG(769, 20), data & ~(1 << 14));
- if (ret_val)
- goto out;
e1e_rphy(hw, PHY_REG(769, 23), &data);
data &= ~(0x7F << 5);
ret_val = e1e_wphy(hw, PHY_REG(769, 23), data);
if (ret_val)
goto out;
e1e_rphy(hw, PHY_REG(769, 16), &data);
- data &= ~(1 << 12);
data |= (1 << 13);
ret_val = e1e_wphy(hw, PHY_REG(769, 16), data);
if (ret_val)
}
/**
+ * e1000_k1_gig_workaround_lv - K1 Si workaround
+ * @hw: pointer to the HW structure
+ *
+ * Workaround to set the K1 beacon duration for 82579 parts
+ **/
+static s32 e1000_k1_workaround_lv(struct e1000_hw *hw)
+{
+ s32 ret_val = 0;
+ u16 status_reg = 0;
+ u32 mac_reg;
+
+ if (hw->mac.type != e1000_pch2lan)
+ goto out;
+
+ /* Set K1 beacon duration based on 1Gbps speed or otherwise */
+ ret_val = e1e_rphy(hw, HV_M_STATUS, &status_reg);
+ if (ret_val)
+ goto out;
+
+ if ((status_reg & (HV_M_STATUS_LINK_UP | HV_M_STATUS_AUTONEG_COMPLETE))
+ == (HV_M_STATUS_LINK_UP | HV_M_STATUS_AUTONEG_COMPLETE)) {
+ mac_reg = er32(FEXTNVM4);
+ mac_reg &= ~E1000_FEXTNVM4_BEACON_DURATION_MASK;
+
+ if (status_reg & HV_M_STATUS_SPEED_1000)
+ mac_reg |= E1000_FEXTNVM4_BEACON_DURATION_8USEC;
+ else
+ mac_reg |= E1000_FEXTNVM4_BEACON_DURATION_16USEC;
+
+ ew32(FEXTNVM4, mac_reg);
+ }
+
+out:
+ return ret_val;
+}
+
+/**
+ * e1000_gate_hw_phy_config_ich8lan - disable PHY config via hardware
+ * @hw: pointer to the HW structure
+ * @gate: boolean set to true to gate, false to ungate
+ *
+ * Gate/ungate the automatic PHY configuration via hardware; perform
+ * the configuration via software instead.
+ **/
+static void e1000_gate_hw_phy_config_ich8lan(struct e1000_hw *hw, bool gate)
+{
+ u32 extcnf_ctrl;
+
+ if (hw->mac.type != e1000_pch2lan)
+ return;
+
+ extcnf_ctrl = er32(EXTCNF_CTRL);
+
+ if (gate)
+ extcnf_ctrl |= E1000_EXTCNF_CTRL_GATE_PHY_CFG;
+ else
+ extcnf_ctrl &= ~E1000_EXTCNF_CTRL_GATE_PHY_CFG;
+
+ ew32(EXTCNF_CTRL, extcnf_ctrl);
+ return;
+}
+
+/**
* e1000_lan_init_done_ich8lan - Check for PHY config completion
* @hw: pointer to the HW structure
*
if (e1000_check_reset_block(hw))
goto out;
+ /* Allow time for h/w to get to quiescent state after reset */
+ msleep(10);
+
/* Perform any necessary post-reset workarounds */
switch (hw->mac.type) {
case e1000_pchlan:
/* Configure the LCD with the OEM bits in NVM */
ret_val = e1000_oem_bits_config_ich8lan(hw, true);
+ /* Ungate automatic PHY configuration on non-managed 82579 */
+ if ((hw->mac.type == e1000_pch2lan) &&
+ !(er32(FWSM) & E1000_ICH_FWSM_FW_VALID)) {
+ msleep(10);
+ e1000_gate_hw_phy_config_ich8lan(hw, false);
+ }
+
out:
return ret_val;
}
{
s32 ret_val = 0;
+ /* Gate automatic PHY configuration by hardware on non-managed 82579 */
+ if ((hw->mac.type == e1000_pch2lan) &&
+ !(er32(FWSM) & E1000_ICH_FWSM_FW_VALID))
+ e1000_gate_hw_phy_config_ich8lan(hw, true);
+
ret_val = e1000e_phy_hw_reset_generic(hw);
if (ret_val)
goto out;
* external PHY is reset.
*/
ctrl |= E1000_CTRL_PHY_RST;
+
+ /*
+ * Gate automatic PHY configuration by hardware on
+ * non-managed 82579
+ */
+ if ((hw->mac.type == e1000_pch2lan) &&
+ !(er32(FWSM) & E1000_ICH_FWSM_FW_VALID))
+ e1000_gate_hw_phy_config_ich8lan(hw, true);
}
ret_val = e1000_acquire_swflag_ich8lan(hw);
e_dbg("Issuing a global reset to ich8lan\n");
void e1000e_disable_gig_wol_ich8lan(struct e1000_hw *hw)
{
u32 phy_ctrl;
+ s32 ret_val;
phy_ctrl = er32(PHY_CTRL);
phy_ctrl |= E1000_PHY_CTRL_D0A_LPLU | E1000_PHY_CTRL_GBE_DISABLE;
ew32(PHY_CTRL, phy_ctrl);
- if (hw->mac.type >= e1000_pchlan)
- e1000_phy_hw_reset_ich8lan(hw);
+ if (hw->mac.type >= e1000_pchlan) {
+ e1000_oem_bits_config_ich8lan(hw, true);
+ ret_val = hw->phy.ops.acquire(hw);
+ if (ret_val)
+ return;
+ e1000_write_smbus_addr(hw);
+ hw->phy.ops.release(hw);
+ }
}
/**
u32 psrctl = 0;
u32 pages = 0;
+ /* Workaround Si errata on 82579 - configure jumbo frame flow */
+ if (hw->mac.type == e1000_pch2lan) {
+ s32 ret_val;
+
+ if (adapter->netdev->mtu > ETH_DATA_LEN)
+ ret_val = e1000_lv_jumbo_workaround_ich8lan(hw, true);
+ else
+ ret_val = e1000_lv_jumbo_workaround_ich8lan(hw, false);
+ }
+
/* Program MC offset vector base */
rctl = er32(RCTL);
rctl &= ~(3 << E1000_RCTL_MO_SHIFT);
e1e_wphy(hw, 22, phy_data);
}
- /* Workaround Si errata on 82579 - configure jumbo frame flow */
- if (hw->mac.type == e1000_pch2lan) {
- s32 ret_val;
-
- if (rctl & E1000_RCTL_LPE)
- ret_val = e1000_lv_jumbo_workaround_ich8lan(hw, true);
- else
- ret_val = e1000_lv_jumbo_workaround_ich8lan(hw, false);
- }
-
/* Setup buffer sizes */
rctl &= ~E1000_RCTL_SZ_4096;
rctl |= E1000_RCTL_BSEX;
return -EINVAL;
}
+ /* Jumbo frame workaround on 82579 requires CRC be stripped */
+ if ((adapter->hw.mac.type == e1000_pch2lan) &&
+ !(adapter->flags2 & FLAG2_CRC_STRIPPING) &&
+ (new_mtu > ETH_DATA_LEN)) {
+ e_err("Jumbo Frames not supported on 82579 when CRC "
+ "stripping is disabled.\n");
+ return -EINVAL;
+ }
+
/* 82573 Errata 17 */
if (((adapter->hw.mac.type == e1000_82573) ||
(adapter->hw.mac.type == e1000_82574)) &&
int length = cqe->num_bytes_transfered - 4; /*remove CRC */
skb_put(skb, length);
- skb->ip_summed = CHECKSUM_UNNECESSARY;
skb->protocol = eth_type_trans(skb, dev);
+
+ /* The packet was not an IPV4 packet so a complemented checksum was
+ calculated. The value is found in the Internet Checksum field. */
+ if (cqe->status & EHEA_CQE_BLIND_CKSUM) {
+ skb->ip_summed = CHECKSUM_COMPLETE;
+ skb->csum = csum_unfold(~cqe->inet_checksum_value);
+ } else
+ skb->ip_summed = CHECKSUM_UNNECESSARY;
}
static inline struct sk_buff *get_skb_by_index(struct sk_buff **skb_array,
#define EHEA_CQE_TYPE_RQ 0x60
#define EHEA_CQE_STAT_ERR_MASK 0x700F
#define EHEA_CQE_STAT_FAT_ERR_MASK 0xF
+#define EHEA_CQE_BLIND_CKSUM 0x8000
#define EHEA_CQE_STAT_ERR_TCP 0x4000
#define EHEA_CQE_STAT_ERR_IP 0x2000
#define EHEA_CQE_STAT_ERR_CRC 0x1000
equalizer_t *eql;
master_config_t mc;
+ memset(&mc, 0, sizeof(master_config_t));
+
if (eql_is_master(dev)) {
eql = netdev_priv(dev);
mc.max_slaves = eql->max_slaves;
{
struct fec_enet_private *fep = netdev_priv(dev);
struct phy_device *phy_dev = NULL;
- int ret;
+ char mdio_bus_id[MII_BUS_ID_SIZE];
+ char phy_name[MII_BUS_ID_SIZE + 3];
+ int phy_id;
fep->phy_dev = NULL;
- /* find the first phy */
- phy_dev = phy_find_first(fep->mii_bus);
- if (!phy_dev) {
- printk(KERN_ERR "%s: no PHY found\n", dev->name);
- return -ENODEV;
+ /* check for attached phy */
+ for (phy_id = 0; (phy_id < PHY_MAX_ADDR); phy_id++) {
+ if ((fep->mii_bus->phy_mask & (1 << phy_id)))
+ continue;
+ if (fep->mii_bus->phy_map[phy_id] == NULL)
+ continue;
+ if (fep->mii_bus->phy_map[phy_id]->phy_id == 0)
+ continue;
+ strncpy(mdio_bus_id, fep->mii_bus->id, MII_BUS_ID_SIZE);
+ break;
}
- /* attach the mac to the phy */
- ret = phy_connect_direct(dev, phy_dev,
- &fec_enet_adjust_link, 0,
- PHY_INTERFACE_MODE_MII);
- if (ret) {
- printk(KERN_ERR "%s: Could not attach to PHY\n", dev->name);
- return ret;
+ if (phy_id >= PHY_MAX_ADDR) {
+ printk(KERN_INFO "%s: no PHY, assuming direct connection "
+ "to switch\n", dev->name);
+ strncpy(mdio_bus_id, "0", MII_BUS_ID_SIZE);
+ phy_id = 0;
+ }
+
+ snprintf(phy_name, MII_BUS_ID_SIZE, PHY_ID_FMT, mdio_bus_id, phy_id);
+ phy_dev = phy_connect(dev, phy_name, &fec_enet_adjust_link, 0,
+ PHY_INTERFACE_MODE_MII);
+ if (IS_ERR(phy_dev)) {
+ printk(KERN_ERR "%s: could not attach to PHY\n", dev->name);
+ return PTR_ERR(phy_dev);
}
/* mask with MAC supported features */
fep->mii_bus->read = fec_enet_mdio_read;
fep->mii_bus->write = fec_enet_mdio_write;
fep->mii_bus->reset = fec_enet_mdio_reset;
- snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", pdev->id);
+ snprintf(fep->mii_bus->id, MII_BUS_ID_SIZE, "%x", pdev->id + 1);
fep->mii_bus->priv = fep;
fep->mii_bus->parent = &pdev->dev;
if (ret)
goto failed_mii_init;
+ /* Carrier starts down, phylib will bring it up */
+ netif_carrier_off(ndev);
+
ret = register_netdev(ndev);
if (ret)
goto failed_register;
spin_lock_init(&sp->lock);
atomic_set(&sp->refcnt, 1);
- init_MUTEX_LOCKED(&sp->dead_sem);
+ sema_init(&sp->dead_sem, 0);
/* !!! length of the buffers. MTU is IP MTU, not PACLEN! */
spin_lock_init(&ax->buflock);
atomic_set(&ax->refcnt, 1);
- init_MUTEX_LOCKED(&ax->dead_sem);
+ sema_init(&ax->dead_sem, 0);
ax->tty = tty;
tty->disc_data = ax;
if (dev->emac_irq != NO_IRQ)
irq_dispose_mapping(dev->emac_irq);
err_free:
- kfree(ndev);
+ free_netdev(ndev);
err_gone:
/* if we were on the bootlist, remove us as we won't show up and
* wake up all waiters to notify them in case they were waiting
if (dev->emac_irq != NO_IRQ)
irq_dispose_mapping(dev->emac_irq);
- kfree(dev->ndev);
+ free_netdev(dev->ndev);
return 0;
}
dev->tx_skb = NULL;
spin_lock_init(&dev->tx_lock);
- init_MUTEX(&dev->fsm.sem);
+ sema_init(&dev->fsm.sem, 1);
dev->drv = drv;
dev->netdev = ndev;
ks8851_wrreg16(ks, KS_RXQCR,
ks->rc_rxqcr | RXQCR_SDA | RXQCR_ADRFE);
- if (rxlen > 0) {
- skb = netdev_alloc_skb(ks->netdev, rxlen + 2 + 8);
- if (!skb) {
- /* todo - dump frame and move on */
- }
+ if (rxlen > 4) {
+ unsigned int rxalign;
+
+ rxlen -= 4;
+ rxalign = ALIGN(rxlen, 4);
+ skb = netdev_alloc_skb_ip_align(ks->netdev, rxalign);
+ if (skb) {
- /* two bytes to ensure ip is aligned, and four bytes
- * for the status header and 4 bytes of garbage */
- skb_reserve(skb, 2 + 4 + 4);
+ /* 4 bytes of status header + 4 bytes of
+ * garbage: we put them before ethernet
+ * header, so that they are copied,
+ * but ignored.
+ */
- rxpkt = skb_put(skb, rxlen - 4) - 8;
+ rxpkt = skb_put(skb, rxlen) - 8;
- /* align the packet length to 4 bytes, and add 4 bytes
- * as we're getting the rx status header as well */
- ks8851_rdfifo(ks, rxpkt, ALIGN(rxlen, 4) + 8);
+ ks8851_rdfifo(ks, rxpkt, rxalign + 8);
- if (netif_msg_pktdata(ks))
- ks8851_dbg_dumpkkt(ks, rxpkt);
+ if (netif_msg_pktdata(ks))
+ ks8851_dbg_dumpkkt(ks, rxpkt);
- skb->protocol = eth_type_trans(skb, ks->netdev);
- netif_rx(skb);
+ skb->protocol = eth_type_trans(skb, ks->netdev);
+ netif_rx(skb);
- ks->netdev->stats.rx_packets++;
- ks->netdev->stats.rx_bytes += rxlen - 4;
+ ks->netdev->stats.rx_packets++;
+ ks->netdev->stats.rx_bytes += rxlen;
+ }
}
ks8851_wrreg16(ks, KS_RXQCR, ks->rc_rxqcr);
#include <linux/of_device.h>
#include <linux/of_mdio.h>
#include <linux/of_platform.h>
+#include <linux/of_address.h>
#include <linux/skbuff.h>
#include <linux/spinlock.h>
#include <linux/tcp.h> /* needed for sizeof(tcphdr) */
#include <linux/phy.h>
#include <linux/of.h>
#include <linux/of_device.h>
+#include <linux/of_address.h>
#include <linux/slab.h>
#include <linux/of_mdio.h>
if (pkt_offset)
skb_pull(skb, pkt_offset);
- skb->truesize = skb->len + sizeof(struct sk_buff);
skb->protocol = eth_type_trans(skb, netdev);
napi_gro_receive(&sds_ring->napi, skb);
skb_put(skb, lro_length + data_offset);
- skb->truesize = skb->len + sizeof(struct sk_buff) + skb_headroom(skb);
-
skb_pull(skb, l2_hdr_offset);
skb->protocol = eth_type_trans(skb, netdev);
struct niu_parent *parent = np->parent;
struct niu_tcam_entry *tp;
int i, idx, cnt;
- u16 n_entries;
unsigned long flags;
-
+ int ret = 0;
/* put the tcam size here */
nfc->data = tcam_get_size(np);
niu_lock_parent(np, flags);
- n_entries = nfc->rule_cnt;
for (cnt = 0, i = 0; i < nfc->data; i++) {
idx = tcam_get_index(np, i);
tp = &parent->tcam[idx];
if (!tp->valid)
continue;
+ if (cnt == nfc->rule_cnt) {
+ ret = -EMSGSIZE;
+ break;
+ }
rule_locs[cnt] = i;
cnt++;
}
niu_unlock_parent(np, flags);
- if (n_entries != cnt) {
- /* print warning, this should not happen */
- netdev_info(np->dev, "niu%d: In %s(): n_entries[%d] != cnt[%d]!!!\n",
- np->parent->index, __func__, n_entries, cnt);
- }
-
- return 0;
+ return ret;
}
static int niu_get_nfc(struct net_device *dev, struct ethtool_rxnfc *cmd,
unsigned int vcc,
void *priv_data)
{
- int *has_shmem = priv_data;
+ int *priv = priv_data;
+ int try = (*priv & 0x1);
int i;
cistpl_io_t *io = &cfg->io;
i = p_dev->resource[1]->end = 0;
}
- *has_shmem = ((cfg->mem.nwin == 1) &&
- (cfg->mem.win[0].len >= 0x4000));
+ *priv &= ((cfg->mem.nwin == 1) &&
+ (cfg->mem.win[0].len >= 0x4000)) ? 0x10 : ~0x10;
+
p_dev->resource[0]->start = io->win[i].base;
p_dev->resource[0]->end = io->win[i].len;
- p_dev->io_lines = io->flags & CISTPL_IO_LINES_MASK;
+ if (!try)
+ p_dev->io_lines = io->flags & CISTPL_IO_LINES_MASK;
+ else
+ p_dev->io_lines = 16;
if (p_dev->resource[0]->end + p_dev->resource[1]->end >= 32)
return try_io_port(p_dev);
- return 0;
+ return -EINVAL;
+}
+
+static hw_info_t *pcnet_try_config(struct pcmcia_device *link,
+ int *has_shmem, int try)
+{
+ struct net_device *dev = link->priv;
+ hw_info_t *local_hw_info;
+ pcnet_dev_t *info = PRIV(dev);
+ int priv = try;
+ int ret;
+
+ ret = pcmcia_loop_config(link, pcnet_confcheck, &priv);
+ if (ret) {
+ dev_warn(&link->dev, "no useable port range found\n");
+ return NULL;
+ }
+ *has_shmem = (priv & 0x10);
+
+ if (!link->irq)
+ return NULL;
+
+ if (resource_size(link->resource[1]) == 8) {
+ link->conf.Attributes |= CONF_ENABLE_SPKR;
+ link->conf.Status = CCSR_AUDIO_ENA;
+ }
+ if ((link->manf_id == MANFID_IBM) &&
+ (link->card_id == PRODID_IBM_HOME_AND_AWAY))
+ link->conf.ConfigIndex |= 0x10;
+
+ ret = pcmcia_request_configuration(link, &link->conf);
+ if (ret)
+ return NULL;
+
+ dev->irq = link->irq;
+ dev->base_addr = link->resource[0]->start;
+
+ if (info->flags & HAS_MISC_REG) {
+ if ((if_port == 1) || (if_port == 2))
+ dev->if_port = if_port;
+ else
+ dev_notice(&link->dev, "invalid if_port requested\n");
+ } else
+ dev->if_port = 0;
+
+ if ((link->conf.ConfigBase == 0x03c0) &&
+ (link->manf_id == 0x149) && (link->card_id == 0xc1ab)) {
+ dev_info(&link->dev,
+ "this is an AX88190 card - use axnet_cs instead.\n");
+ return NULL;
+ }
+
+ local_hw_info = get_hwinfo(link);
+ if (!local_hw_info)
+ local_hw_info = get_prom(link);
+ if (!local_hw_info)
+ local_hw_info = get_dl10019(link);
+ if (!local_hw_info)
+ local_hw_info = get_ax88190(link);
+ if (!local_hw_info)
+ local_hw_info = get_hwired(link);
+
+ return local_hw_info;
}
static int pcnet_config(struct pcmcia_device *link)
{
struct net_device *dev = link->priv;
pcnet_dev_t *info = PRIV(dev);
- int ret, start_pg, stop_pg, cm_offset;
+ int start_pg, stop_pg, cm_offset;
int has_shmem = 0;
hw_info_t *local_hw_info;
dev_dbg(&link->dev, "pcnet_config\n");
- ret = pcmcia_loop_config(link, pcnet_confcheck, &has_shmem);
- if (ret)
- goto failed;
-
- if (!link->irq)
- goto failed;
-
- if (resource_size(link->resource[1]) == 8) {
- link->conf.Attributes |= CONF_ENABLE_SPKR;
- link->conf.Status = CCSR_AUDIO_ENA;
- }
- if ((link->manf_id == MANFID_IBM) &&
- (link->card_id == PRODID_IBM_HOME_AND_AWAY))
- link->conf.ConfigIndex |= 0x10;
-
- ret = pcmcia_request_configuration(link, &link->conf);
- if (ret)
- goto failed;
- dev->irq = link->irq;
- dev->base_addr = link->resource[0]->start;
- if (info->flags & HAS_MISC_REG) {
- if ((if_port == 1) || (if_port == 2))
- dev->if_port = if_port;
- else
- printk(KERN_NOTICE "pcnet_cs: invalid if_port requested\n");
- } else {
- dev->if_port = 0;
- }
-
- if ((link->conf.ConfigBase == 0x03c0) &&
- (link->manf_id == 0x149) && (link->card_id == 0xc1ab)) {
- printk(KERN_INFO "pcnet_cs: this is an AX88190 card!\n");
- printk(KERN_INFO "pcnet_cs: use axnet_cs instead.\n");
- goto failed;
- }
-
- local_hw_info = get_hwinfo(link);
- if (local_hw_info == NULL)
- local_hw_info = get_prom(link);
- if (local_hw_info == NULL)
- local_hw_info = get_dl10019(link);
- if (local_hw_info == NULL)
- local_hw_info = get_ax88190(link);
- if (local_hw_info == NULL)
- local_hw_info = get_hwired(link);
-
- if (local_hw_info == NULL) {
- printk(KERN_NOTICE "pcnet_cs: unable to read hardware net"
- " address for io base %#3lx\n", dev->base_addr);
- goto failed;
+ local_hw_info = pcnet_try_config(link, &has_shmem, 0);
+ if (!local_hw_info) {
+ /* check whether forcing io_lines to 16 helps... */
+ pcmcia_disable_device(link);
+ local_hw_info = pcnet_try_config(link, &has_shmem, 1);
+ if (local_hw_info == NULL) {
+ dev_notice(&link->dev, "unable to read hardware net"
+ " address for io base %#3lx\n", dev->base_addr);
+ goto failed;
+ }
}
info->flags = local_hw_info->flags;
* may call phy routines that try to grab the same lock, and that may
* lead to a deadlock.
*/
- if (phydev->attached_dev)
+ if (phydev->attached_dev && phydev->adjust_link)
phy_stop_machine(phydev);
if (!mdio_bus_phy_may_suspend(phydev))
return ret;
no_resume:
- if (phydev->attached_dev)
+ if (phydev->attached_dev && phydev->adjust_link)
phy_start_machine(phydev, NULL);
return 0;
tasklet_init(&ap->tsk, ppp_async_process, (unsigned long) ap);
atomic_set(&ap->refcnt, 1);
- init_MUTEX_LOCKED(&ap->dead_sem);
+ sema_init(&ap->dead_sem, 0);
ap->chan.private = ap;
ap->chan.ops = &async_ops;
hdrlen = (ppp->flags & SC_MP_XSHORTSEQ)? MPHDRLEN_SSN: MPHDRLEN;
i = 0;
list_for_each_entry(pch, &ppp->channels, clist) {
- navail += pch->avail = (pch->chan != NULL);
- pch->speed = pch->chan->speed;
+ if (pch->chan) {
+ pch->avail = 1;
+ navail++;
+ pch->speed = pch->chan->speed;
+ } else {
+ pch->avail = 0;
+ }
if (pch->avail) {
if (skb_queue_empty(&pch->file.xq) ||
!pch->had_frag) {
return -ENOMEM;
}
- skb_reserve(skb, 2);
+ skb_reserve(skb, NET_IP_ALIGN);
dma = pci_map_single(pdev, skb->data,
rds_ring->dma_size, PCI_DMA_FROMDEVICE);
if (pkt_offset)
skb_pull(skb, pkt_offset);
- skb->truesize = skb->len + sizeof(struct sk_buff);
skb->protocol = eth_type_trans(skb, netdev);
napi_gro_receive(&sds_ring->napi, skb);
skb_put(skb, lro_length + data_offset);
- skb->truesize = skb->len + sizeof(struct sk_buff) + skb_headroom(skb);
-
skb_pull(skb, l2_hdr_offset);
skb->protocol = eth_type_trans(skb, netdev);
if (pkt_offset)
skb_pull(skb, pkt_offset);
- skb->truesize = skb->len + sizeof(struct sk_buff);
-
if (!qlcnic_check_loopback_buff(skb->data))
adapter->diag_cnt++;
if ((RTL_R8(ChipCmd) & CmdRxEnb) == 0)
return;
- counters = pci_alloc_consistent(tp->pci_dev, sizeof(*counters), &paddr);
+ counters = dma_alloc_coherent(&tp->pci_dev->dev, sizeof(*counters),
+ &paddr, GFP_KERNEL);
if (!counters)
return;
RTL_W32(CounterAddrLow, 0);
RTL_W32(CounterAddrHigh, 0);
- pci_free_consistent(tp->pci_dev, sizeof(*counters), counters, paddr);
+ dma_free_coherent(&tp->pci_dev->dev, sizeof(*counters), counters,
+ paddr);
}
static void rtl8169_get_ethtool_stats(struct net_device *dev,
.hw_start = rtl_hw_start_8168,
.region = 2,
.align = 8,
- .intr_event = SYSErr | LinkChg | RxOverflow |
+ .intr_event = SYSErr | RxFIFOOver | LinkChg | RxOverflow |
TxErr | TxOK | RxOK | RxErr,
.napi_event = TxErr | TxOK | RxOK | RxOverflow,
.features = RTL_FEATURE_GMII | RTL_FEATURE_MSI,
/*
* Rx and Tx desscriptors needs 256 bytes alignment.
- * pci_alloc_consistent provides more.
+ * dma_alloc_coherent provides more.
*/
- tp->TxDescArray = pci_alloc_consistent(pdev, R8169_TX_RING_BYTES,
- &tp->TxPhyAddr);
+ tp->TxDescArray = dma_alloc_coherent(&pdev->dev, R8169_TX_RING_BYTES,
+ &tp->TxPhyAddr, GFP_KERNEL);
if (!tp->TxDescArray)
goto err_pm_runtime_put;
- tp->RxDescArray = pci_alloc_consistent(pdev, R8169_RX_RING_BYTES,
- &tp->RxPhyAddr);
+ tp->RxDescArray = dma_alloc_coherent(&pdev->dev, R8169_RX_RING_BYTES,
+ &tp->RxPhyAddr, GFP_KERNEL);
if (!tp->RxDescArray)
goto err_free_tx_0;
err_release_ring_2:
rtl8169_rx_clear(tp);
err_free_rx_1:
- pci_free_consistent(pdev, R8169_RX_RING_BYTES, tp->RxDescArray,
- tp->RxPhyAddr);
+ dma_free_coherent(&pdev->dev, R8169_RX_RING_BYTES, tp->RxDescArray,
+ tp->RxPhyAddr);
tp->RxDescArray = NULL;
err_free_tx_0:
- pci_free_consistent(pdev, R8169_TX_RING_BYTES, tp->TxDescArray,
- tp->TxPhyAddr);
+ dma_free_coherent(&pdev->dev, R8169_TX_RING_BYTES, tp->TxDescArray,
+ tp->TxPhyAddr);
tp->TxDescArray = NULL;
err_pm_runtime_put:
pm_runtime_put_noidle(&pdev->dev);
{
struct pci_dev *pdev = tp->pci_dev;
- pci_unmap_single(pdev, le64_to_cpu(desc->addr), tp->rx_buf_sz,
+ dma_unmap_single(&pdev->dev, le64_to_cpu(desc->addr), tp->rx_buf_sz,
PCI_DMA_FROMDEVICE);
dev_kfree_skb(*sk_buff);
*sk_buff = NULL;
static struct sk_buff *rtl8169_alloc_rx_skb(struct pci_dev *pdev,
struct net_device *dev,
struct RxDesc *desc, int rx_buf_sz,
- unsigned int align)
+ unsigned int align, gfp_t gfp)
{
struct sk_buff *skb;
dma_addr_t mapping;
pad = align ? align : NET_IP_ALIGN;
- skb = netdev_alloc_skb(dev, rx_buf_sz + pad);
+ skb = __netdev_alloc_skb(dev, rx_buf_sz + pad, gfp);
if (!skb)
goto err_out;
skb_reserve(skb, align ? ((pad - 1) & (unsigned long)skb->data) : pad);
- mapping = pci_map_single(pdev, skb->data, rx_buf_sz,
+ mapping = dma_map_single(&pdev->dev, skb->data, rx_buf_sz,
PCI_DMA_FROMDEVICE);
rtl8169_map_to_asic(desc, mapping, rx_buf_sz);
}
static u32 rtl8169_rx_fill(struct rtl8169_private *tp, struct net_device *dev,
- u32 start, u32 end)
+ u32 start, u32 end, gfp_t gfp)
{
u32 cur;
skb = rtl8169_alloc_rx_skb(tp->pci_dev, dev,
tp->RxDescArray + i,
- tp->rx_buf_sz, tp->align);
+ tp->rx_buf_sz, tp->align, gfp);
if (!skb)
break;
memset(tp->tx_skb, 0x0, NUM_TX_DESC * sizeof(struct ring_info));
memset(tp->Rx_skbuff, 0x0, NUM_RX_DESC * sizeof(struct sk_buff *));
- if (rtl8169_rx_fill(tp, dev, 0, NUM_RX_DESC) != NUM_RX_DESC)
+ if (rtl8169_rx_fill(tp, dev, 0, NUM_RX_DESC, GFP_KERNEL) != NUM_RX_DESC)
goto err_out;
rtl8169_mark_as_last_descriptor(tp->RxDescArray + NUM_RX_DESC - 1);
{
unsigned int len = tx_skb->len;
- pci_unmap_single(pdev, le64_to_cpu(desc->addr), len, PCI_DMA_TODEVICE);
+ dma_unmap_single(&pdev->dev, le64_to_cpu(desc->addr), len,
+ PCI_DMA_TODEVICE);
desc->opts1 = 0x00;
desc->opts2 = 0x00;
desc->addr = 0x00;
txd = tp->TxDescArray + entry;
len = frag->size;
addr = ((void *) page_address(frag->page)) + frag->page_offset;
- mapping = pci_map_single(tp->pci_dev, addr, len, PCI_DMA_TODEVICE);
+ mapping = dma_map_single(&tp->pci_dev->dev, addr, len,
+ PCI_DMA_TODEVICE);
/* anti gcc 2.95.3 bugware (sic) */
status = opts1 | len | (RingEnd * !((entry + 1) % NUM_TX_DESC));
tp->tx_skb[entry].skb = skb;
}
- mapping = pci_map_single(tp->pci_dev, skb->data, len, PCI_DMA_TODEVICE);
+ mapping = dma_map_single(&tp->pci_dev->dev, skb->data, len,
+ PCI_DMA_TODEVICE);
tp->tx_skb[entry].len = len;
txd->addr = cpu_to_le64(mapping);
if (!skb)
goto out;
- pci_dma_sync_single_for_cpu(tp->pci_dev, addr, pkt_size,
- PCI_DMA_FROMDEVICE);
+ dma_sync_single_for_cpu(&tp->pci_dev->dev, addr, pkt_size,
+ PCI_DMA_FROMDEVICE);
skb_copy_from_linear_data(*sk_buff, skb->data, pkt_size);
*sk_buff = skb;
done = true;
rtl8169_rx_csum(skb, desc);
if (rtl8169_try_rx_copy(&skb, tp, pkt_size, addr)) {
- pci_dma_sync_single_for_device(pdev, addr,
+ dma_sync_single_for_device(&pdev->dev, addr,
pkt_size, PCI_DMA_FROMDEVICE);
rtl8169_mark_to_asic(desc, tp->rx_buf_sz);
} else {
- pci_unmap_single(pdev, addr, tp->rx_buf_sz,
+ dma_unmap_single(&pdev->dev, addr, tp->rx_buf_sz,
PCI_DMA_FROMDEVICE);
tp->Rx_skbuff[entry] = NULL;
}
count = cur_rx - tp->cur_rx;
tp->cur_rx = cur_rx;
- delta = rtl8169_rx_fill(tp, dev, tp->dirty_rx, tp->cur_rx);
+ delta = rtl8169_rx_fill(tp, dev, tp->dirty_rx, tp->cur_rx, GFP_ATOMIC);
if (!delta && count)
netif_info(tp, intr, dev, "no Rx buffer allocated\n");
tp->dirty_rx += delta;
}
/* Work around for rx fifo overflow */
- if (unlikely(status & RxFIFOOver) &&
- (tp->mac_version == RTL_GIGA_MAC_VER_11)) {
+ if (unlikely(status & RxFIFOOver)) {
netif_stop_queue(dev);
rtl8169_tx_timeout(dev);
break;
free_irq(dev->irq, dev);
- pci_free_consistent(pdev, R8169_RX_RING_BYTES, tp->RxDescArray,
- tp->RxPhyAddr);
- pci_free_consistent(pdev, R8169_TX_RING_BYTES, tp->TxDescArray,
- tp->TxPhyAddr);
+ dma_free_coherent(&pdev->dev, R8169_RX_RING_BYTES, tp->RxDescArray,
+ tp->RxPhyAddr);
+ dma_free_coherent(&pdev->dev, R8169_TX_RING_BYTES, tp->TxDescArray,
+ tp->TxPhyAddr);
tp->TxDescArray = NULL;
tp->RxDescArray = NULL;
free_pages((unsigned long)rionet_active, rdev->net->hport->sys_size ?
__ilog2(sizeof(void *)) + 4 : 0);
unregister_netdev(ndev);
- kfree(ndev);
+ free_netdev(ndev);
list_for_each_entry_safe(peer, tmp, &rionet_peers, node) {
list_del(&peer->node);
err_out_free_page:
free_page((unsigned long) sp->srings);
err_out_free_dev:
- kfree(dev);
+ free_netdev(dev);
err_out:
return err;
#include <linux/seq_file.h>
#include <linux/mii.h>
#include <linux/slab.h>
+#include <linux/dmi.h>
#include <asm/irq.h>
#include "skge.h"
netif_info(skge, probe, skge->netdev, "addr %pM\n", dev->dev_addr);
}
+static int only_32bit_dma;
+
static int __devinit skge_probe(struct pci_dev *pdev,
const struct pci_device_id *ent)
{
pci_set_master(pdev);
- if (!pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
+ if (!only_32bit_dma && !pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
using_dac = 1;
err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
} else if (!(err = pci_set_dma_mask(pdev, DMA_BIT_MASK(32)))) {
.shutdown = skge_shutdown,
};
+static struct dmi_system_id skge_32bit_dma_boards[] = {
+ {
+ .ident = "Gigabyte nForce boards",
+ .matches = {
+ DMI_MATCH(DMI_BOARD_VENDOR, "Gigabyte Technology Co"),
+ DMI_MATCH(DMI_BOARD_NAME, "nForce"),
+ },
+ },
+ {}
+};
+
static int __init skge_init_module(void)
{
+ if (dmi_check_system(skge_32bit_dma_boards))
+ only_32bit_dma = 1;
skge_debug_init();
return pci_register_driver(&skge_driver);
}
MODULE_LICENSE("GPL");
MODULE_VERSION(SMSC_DRV_VERSION);
+MODULE_ALIAS("platform:smsc911x");
#if USE_DEBUG > 0
static int debug = 16;
if (!netif_running(dev))
return 0;
- spin_lock(&priv->lock);
-
if (priv->shutdown) {
/* Re-open the interface and re-init the MAC/DMA
- and the rings. */
+ and the rings (i.e. on hibernation stage) */
stmmac_open(dev);
- goto out_resume;
+ return 0;
}
+ spin_lock(&priv->lock);
+
/* Power Down bit, into the PM register, is cleared
* automatically as soon as a magic packet or a Wake-up frame
* is received. Anyway, it's better to manually clear
netif_start_queue(dev);
-out_resume:
spin_unlock(&priv->lock);
return 0;
}
desc_idx, *post_ptr);
drop_it_no_recycle:
/* Other statistics kept track of by card. */
- tp->net_stats.rx_dropped++;
+ tp->rx_dropped++;
goto next_pkt;
}
if (len > (tp->dev->mtu + ETH_HLEN) &&
skb->protocol != htons(ETH_P_8021Q)) {
dev_kfree_skb(skb);
- goto next_pkt;
+ goto drop_it_no_recycle;
}
if (desc->type_flags & RXD_FLAG_VLAN &&
stats->rx_missed_errors = old_stats->rx_missed_errors +
get_stat64(&hw_stats->rx_discards);
+ stats->rx_dropped = tp->rx_dropped;
+
return stats;
}
/* begin "everything else" cacheline(s) section */
- struct rtnl_link_stats64 net_stats;
+ unsigned long rx_dropped;
struct rtnl_link_stats64 net_stats_prev;
struct tg3_ethtool_stats estats;
struct tg3_ethtool_stats estats_prev;
NWayState = (1 << 14) | (1 << 13) | (1 << 12),
NWayRestart = (1 << 12),
NonselPortActive = (1 << 9),
+ SelPortActive = (1 << 8),
LinkFailStatus = (1 << 2),
NetCxnErr = (1 << 1),
};
/* 21041 transceiver register settings: TP AUTO, BNC, AUI, TP, TP FD*/
static u16 t21041_csr13[] = { 0xEF01, 0xEF09, 0xEF09, 0xEF01, 0xEF09, };
-static u16 t21041_csr14[] = { 0xFFFF, 0xF7FD, 0xF7FD, 0x6F3F, 0x6F3D, };
+static u16 t21041_csr14[] = { 0xFFFF, 0xF7FD, 0xF7FD, 0x7F3F, 0x7F3D, };
+/* If on-chip autonegotiation is broken, use half-duplex (FF3F) instead */
+static u16 t21041_csr14_brk[] = { 0xFF3F, 0xF7FD, 0xF7FD, 0x7F3F, 0x7F3D, };
static u16 t21041_csr15[] = { 0x0008, 0x0006, 0x000E, 0x0008, 0x0008, };
unsigned int carrier;
unsigned long flags;
+ /* clear port active bits */
+ dw32(SIAStatus, NonselPortActive | SelPortActive);
+
carrier = (status & NetCxnErr) ? 0 : 1;
if (carrier) {
static void de_media_interrupt (struct de_private *de, u32 status)
{
if (status & LinkPass) {
+ /* Ignore if current media is AUI or BNC and we can't use TP */
+ if ((de->media_type == DE_MEDIA_AUI ||
+ de->media_type == DE_MEDIA_BNC) &&
+ (de->media_lock ||
+ !de_ok_to_advertise(de, DE_MEDIA_TP_AUTO)))
+ return;
+ /* If current media is not TP, change it to TP */
+ if ((de->media_type == DE_MEDIA_AUI ||
+ de->media_type == DE_MEDIA_BNC)) {
+ de->media_type = DE_MEDIA_TP_AUTO;
+ de_stop_rxtx(de);
+ de_set_media(de);
+ de_start_rxtx(de);
+ }
de_link_up(de);
mod_timer(&de->media_timer, jiffies + DE_TIMER_LINK);
return;
}
BUG_ON(!(status & LinkFail));
-
- if (netif_carrier_ok(de->dev)) {
+ /* Mark the link as down only if current media is TP */
+ if (netif_carrier_ok(de->dev) && de->media_type != DE_MEDIA_AUI &&
+ de->media_type != DE_MEDIA_BNC) {
de_link_down(de);
mod_timer(&de->media_timer, jiffies + DE_TIMER_NO_LINK);
}
if (de->de21040)
return;
+ dw32(CSR13, 0); /* Reset phy */
pci_read_config_dword(de->pdev, PCIPM, &pmctl);
pmctl |= PM_Sleep;
pci_write_config_dword(de->pdev, PCIPM, pmctl);
return 0; /* nothing to change */
de_link_down(de);
+ mod_timer(&de->media_timer, jiffies + DE_TIMER_NO_LINK);
de_stop_rxtx(de);
de->media_type = new_media;
de->media_lock = media_lock;
de->media_advertise = ecmd->advertising;
de_set_media(de);
+ if (netif_running(de->dev))
+ de_start_rxtx(de);
return 0;
}
for (i = 0; i < DE_MAX_MEDIA; i++) {
if (de->media[i].csr13 == 0xffff)
de->media[i].csr13 = t21041_csr13[i];
- if (de->media[i].csr14 == 0xffff)
- de->media[i].csr14 = t21041_csr14[i];
+ if (de->media[i].csr14 == 0xffff) {
+ /* autonegotiation is broken at least on some chip
+ revisions - rev. 0x21 works, 0x11 does not */
+ if (de->pdev->revision < 0x20)
+ de->media[i].csr14 = t21041_csr14_brk[i];
+ else
+ de->media[i].csr14 = t21041_csr14[i];
+ }
if (de->media[i].csr15 == 0xffff)
de->media[i].csr15 = t21041_csr15[i];
}
dev_err(&dev->dev, "pci_enable_device failed in resume\n");
goto out;
}
+ pci_set_master(pdev);
+ de_init_rings(de);
de_init_hw(de);
out_attach:
netif_device_attach(dev);
struct uart_icount cnow;
struct hso_tiocmget *tiocmget = serial->tiocmget;
+ memset(&icount, 0, sizeof(struct serial_icounter_struct));
+
if (!tiocmget)
return -ENOENT;
spin_lock_irq(&serial->serial_lock);
.ndo_get_stats = &ipheth_stats,
};
-static struct device_type ipheth_type = {
- .name = "wwan",
-};
-
static int ipheth_probe(struct usb_interface *intf,
const struct usb_device_id *id)
{
netdev->netdev_ops = &ipheth_netdev_ops;
netdev->watchdog_timeo = IPHETH_TX_TIMEOUT;
- strcpy(netdev->name, "wwan%d");
+ strcpy(netdev->name, "eth%d");
dev = netdev_priv(netdev);
dev->udev = udev;
SET_NETDEV_DEV(netdev, &intf->dev);
SET_ETHTOOL_OPS(netdev, &ops);
- SET_NETDEV_DEVTYPE(netdev, &ipheth_type);
retval = register_netdev(netdev);
if (retval) {
netif_napi_add(dev, &vptr->napi, velocity_poll, VELOCITY_NAPI_WEIGHT);
dev->features |= NETIF_F_HW_VLAN_TX | NETIF_F_HW_VLAN_FILTER |
- NETIF_F_HW_VLAN_RX | NETIF_F_IP_CSUM | NETIF_F_SG;
+ NETIF_F_HW_VLAN_RX | NETIF_F_IP_CSUM;
ret = register_netdev(dev);
if (ret < 0)
/* Initialize the chardev data structures */
mutex_init(&chan->rlock);
- init_MUTEX(&chan->wsem);
+ sema_init(&chan->wsem, 1);
/* Register the network interface */
if (!(chan->netdev = alloc_hdlcdev(chan))) {
int i, result;
struct device *dev = i2400m_dev(i2400m);
const struct i2400m_msg_hdr *msg_hdr;
- size_t pl_itr, pl_size, skb_len;
+ size_t pl_itr, pl_size;
unsigned long flags;
- unsigned num_pls, single_last;
+ unsigned num_pls, single_last, skb_len;
skb_len = skb->len;
- d_fnstart(4, dev, "(i2400m %p skb %p [size %zu])\n",
+ d_fnstart(4, dev, "(i2400m %p skb %p [size %u])\n",
i2400m, skb, skb_len);
result = -EIO;
msg_hdr = (void *) skb->data;
- result = i2400m_rx_msg_hdr_check(i2400m, msg_hdr, skb->len);
+ result = i2400m_rx_msg_hdr_check(i2400m, msg_hdr, skb_len);
if (result < 0)
goto error_msg_hdr_check;
result = -EIO;
pl_itr = sizeof(*msg_hdr) + /* Check payload descriptor(s) */
num_pls * sizeof(msg_hdr->pld[0]);
pl_itr = ALIGN(pl_itr, I2400M_PL_ALIGN);
- if (pl_itr > skb->len) { /* got all the payload descriptors? */
+ if (pl_itr > skb_len) { /* got all the payload descriptors? */
dev_err(dev, "RX: HW BUG? message too short (%u bytes) for "
"%u payload descriptors (%zu each, total %zu)\n",
- skb->len, num_pls, sizeof(msg_hdr->pld[0]), pl_itr);
+ skb_len, num_pls, sizeof(msg_hdr->pld[0]), pl_itr);
goto error_pl_descr_short;
}
/* Walk each payload payload--check we really got it */
/* work around old gcc warnings */
pl_size = i2400m_pld_size(&msg_hdr->pld[i]);
result = i2400m_rx_pl_descr_check(i2400m, &msg_hdr->pld[i],
- pl_itr, skb->len);
+ pl_itr, skb_len);
if (result < 0)
goto error_pl_descr_check;
single_last = num_pls == 1 || i == num_pls - 1;
if (i < i2400m->rx_pl_min)
i2400m->rx_pl_min = i;
i2400m->rx_num++;
- i2400m->rx_size_acc += skb->len;
- if (skb->len < i2400m->rx_size_min)
- i2400m->rx_size_min = skb->len;
- if (skb->len > i2400m->rx_size_max)
- i2400m->rx_size_max = skb->len;
+ i2400m->rx_size_acc += skb_len;
+ if (skb_len < i2400m->rx_size_min)
+ i2400m->rx_size_min = skb_len;
+ if (skb_len > i2400m->rx_size_max)
+ i2400m->rx_size_max = skb_len;
spin_unlock_irqrestore(&i2400m->rx_lock, flags);
error_pl_descr_check:
error_pl_descr_short:
error_msg_hdr_check:
- d_fnend(4, dev, "(i2400m %p skb %p [size %zu]) = %d\n",
+ d_fnend(4, dev, "(i2400m %p skb %p [size %u]) = %d\n",
i2400m, skb, skb_len, result);
return result;
}
if (conf_is_ht40(conf))
return clockrate * 2;
- return clockrate * 2;
+ return clockrate;
}
static int32_t ath9k_hw_ani_get_listen_time(struct ath_hw *ah)
clear_bit(STATUS_SCAN_HW, &priv->status);
clear_bit(STATUS_SCANNING, &priv->status);
/* inform mac80211 scan aborted */
- queue_work(priv->workqueue, &priv->scan_completed);
+ queue_work(priv->workqueue, &priv->abort_scan);
}
int iwlagn_manage_ibss_station(struct iwl_priv *priv,
if (test_bit(STATUS_EXIT_PENDING, &priv->status))
return -EINVAL;
+ if (test_bit(STATUS_SCANNING, &priv->status)) {
+ IWL_DEBUG_INFO(priv, "scan in progress.\n");
+ return -EINVAL;
+ }
+
if (mode >= IWL_MAX_FORCE_RESET) {
IWL_DEBUG_INFO(priv, "invalid reset request.\n");
return -EINVAL;
clear_bit(STATUS_SCANNING, &priv->status);
/* inform mac80211 scan aborted */
- queue_work(priv->workqueue, &priv->scan_completed);
+ queue_work(priv->workqueue, &priv->abort_scan);
}
static void iwl3945_bg_restart(struct work_struct *data)
.notifier_call = module_load_notify,
};
-
-static void end_sync(void)
-{
- end_cpu_work();
- /* make sure we don't leak task structs */
- process_task_mortuary();
- process_task_mortuary();
-}
-
-
int sync_start(void)
{
int err;
if (!zalloc_cpumask_var(&marked_cpus, GFP_KERNEL))
return -ENOMEM;
- start_cpu_work();
+ mutex_lock(&buffer_mutex);
err = task_handoff_register(&task_free_nb);
if (err)
if (err)
goto out4;
+ start_cpu_work();
+
out:
+ mutex_unlock(&buffer_mutex);
return err;
out4:
profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
out2:
task_handoff_unregister(&task_free_nb);
out1:
- end_sync();
free_cpumask_var(marked_cpus);
goto out;
}
void sync_stop(void)
{
+ /* flush buffers */
+ mutex_lock(&buffer_mutex);
+ end_cpu_work();
unregister_module_notifier(&module_load_nb);
profile_event_unregister(PROFILE_MUNMAP, &munmap_nb);
profile_event_unregister(PROFILE_TASK_EXIT, &task_exit_nb);
task_handoff_unregister(&task_free_nb);
- end_sync();
+ mutex_unlock(&buffer_mutex);
+ flush_scheduled_work();
+
+ /* make sure we don't leak task structs */
+ process_task_mortuary();
+ process_task_mortuary();
+
free_cpumask_var(marked_cpus);
}
cancel_delayed_work(&b->work);
}
-
- flush_scheduled_work();
}
/*
mutex_unlock(&start_mutex);
}
-int oprofile_set_backtrace(unsigned long val)
+int oprofile_set_ulong(unsigned long *addr, unsigned long val)
{
- int err = 0;
+ int err = -EBUSY;
mutex_lock(&start_mutex);
-
- if (oprofile_started) {
- err = -EBUSY;
- goto out;
- }
-
- if (!oprofile_ops.backtrace) {
- err = -EINVAL;
- goto out;
+ if (!oprofile_started) {
+ *addr = val;
+ err = 0;
}
-
- oprofile_backtrace_depth = val;
-
-out:
mutex_unlock(&start_mutex);
+
return err;
}
printk(KERN_INFO "oprofile: using timer interrupt.\n");
err = oprofile_timer_init(&oprofile_ops);
if (err)
- goto out_arch;
+ return err;
}
- err = oprofilefs_register();
- if (err)
- goto out_arch;
- return 0;
-
-out_arch:
- oprofile_arch_exit();
- return err;
+ return oprofilefs_register();
}
int oprofile_timer_init(struct oprofile_operations *ops);
void oprofile_timer_exit(void);
-int oprofile_set_backtrace(unsigned long depth);
+int oprofile_set_ulong(unsigned long *addr, unsigned long val);
int oprofile_set_timeout(unsigned long time);
#endif /* OPROF_H */
if (*offset)
return -EINVAL;
+ if (!oprofile_ops.backtrace)
+ return -EINVAL;
+
retval = oprofilefs_ulong_from_user(&val, buf, count);
if (retval)
return retval;
- retval = oprofile_set_backtrace(val);
-
+ retval = oprofile_set_ulong(&oprofile_backtrace_depth, val);
if (retval)
return retval;
+
return count;
}
--- /dev/null
+/*
+ * Copyright 2010 ARM Ltd.
+ *
+ * Perf-events backend for OProfile.
+ */
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/oprofile.h>
+#include <linux/slab.h>
+
+/*
+ * Per performance monitor configuration as set via oprofilefs.
+ */
+struct op_counter_config {
+ unsigned long count;
+ unsigned long enabled;
+ unsigned long event;
+ unsigned long unit_mask;
+ unsigned long kernel;
+ unsigned long user;
+ struct perf_event_attr attr;
+};
+
+static int oprofile_perf_enabled;
+static DEFINE_MUTEX(oprofile_perf_mutex);
+
+static struct op_counter_config *counter_config;
+static struct perf_event **perf_events[nr_cpumask_bits];
+static int num_counters;
+
+/*
+ * Overflow callback for oprofile.
+ */
+static void op_overflow_handler(struct perf_event *event, int unused,
+ struct perf_sample_data *data, struct pt_regs *regs)
+{
+ int id;
+ u32 cpu = smp_processor_id();
+
+ for (id = 0; id < num_counters; ++id)
+ if (perf_events[cpu][id] == event)
+ break;
+
+ if (id != num_counters)
+ oprofile_add_sample(regs, id);
+ else
+ pr_warning("oprofile: ignoring spurious overflow "
+ "on cpu %u\n", cpu);
+}
+
+/*
+ * Called by oprofile_perf_setup to create perf attributes to mirror the oprofile
+ * settings in counter_config. Attributes are created as `pinned' events and
+ * so are permanently scheduled on the PMU.
+ */
+static void op_perf_setup(void)
+{
+ int i;
+ u32 size = sizeof(struct perf_event_attr);
+ struct perf_event_attr *attr;
+
+ for (i = 0; i < num_counters; ++i) {
+ attr = &counter_config[i].attr;
+ memset(attr, 0, size);
+ attr->type = PERF_TYPE_RAW;
+ attr->size = size;
+ attr->config = counter_config[i].event;
+ attr->sample_period = counter_config[i].count;
+ attr->pinned = 1;
+ }
+}
+
+static int op_create_counter(int cpu, int event)
+{
+ struct perf_event *pevent;
+
+ if (!counter_config[event].enabled || perf_events[cpu][event])
+ return 0;
+
+ pevent = perf_event_create_kernel_counter(&counter_config[event].attr,
+ cpu, NULL,
+ op_overflow_handler);
+
+ if (IS_ERR(pevent))
+ return PTR_ERR(pevent);
+
+ if (pevent->state != PERF_EVENT_STATE_ACTIVE) {
+ perf_event_release_kernel(pevent);
+ pr_warning("oprofile: failed to enable event %d "
+ "on CPU %d\n", event, cpu);
+ return -EBUSY;
+ }
+
+ perf_events[cpu][event] = pevent;
+
+ return 0;
+}
+
+static void op_destroy_counter(int cpu, int event)
+{
+ struct perf_event *pevent = perf_events[cpu][event];
+
+ if (pevent) {
+ perf_event_release_kernel(pevent);
+ perf_events[cpu][event] = NULL;
+ }
+}
+
+/*
+ * Called by oprofile_perf_start to create active perf events based on the
+ * perviously configured attributes.
+ */
+static int op_perf_start(void)
+{
+ int cpu, event, ret = 0;
+
+ for_each_online_cpu(cpu) {
+ for (event = 0; event < num_counters; ++event) {
+ ret = op_create_counter(cpu, event);
+ if (ret)
+ return ret;
+ }
+ }
+
+ return ret;
+}
+
+/*
+ * Called by oprofile_perf_stop at the end of a profiling run.
+ */
+static void op_perf_stop(void)
+{
+ int cpu, event;
+
+ for_each_online_cpu(cpu)
+ for (event = 0; event < num_counters; ++event)
+ op_destroy_counter(cpu, event);
+}
+
+static int oprofile_perf_create_files(struct super_block *sb, struct dentry *root)
+{
+ unsigned int i;
+
+ for (i = 0; i < num_counters; i++) {
+ struct dentry *dir;
+ char buf[4];
+
+ snprintf(buf, sizeof buf, "%d", i);
+ dir = oprofilefs_mkdir(sb, root, buf);
+ oprofilefs_create_ulong(sb, dir, "enabled", &counter_config[i].enabled);
+ oprofilefs_create_ulong(sb, dir, "event", &counter_config[i].event);
+ oprofilefs_create_ulong(sb, dir, "count", &counter_config[i].count);
+ oprofilefs_create_ulong(sb, dir, "unit_mask", &counter_config[i].unit_mask);
+ oprofilefs_create_ulong(sb, dir, "kernel", &counter_config[i].kernel);
+ oprofilefs_create_ulong(sb, dir, "user", &counter_config[i].user);
+ }
+
+ return 0;
+}
+
+static int oprofile_perf_setup(void)
+{
+ spin_lock(&oprofilefs_lock);
+ op_perf_setup();
+ spin_unlock(&oprofilefs_lock);
+ return 0;
+}
+
+static int oprofile_perf_start(void)
+{
+ int ret = -EBUSY;
+
+ mutex_lock(&oprofile_perf_mutex);
+ if (!oprofile_perf_enabled) {
+ ret = 0;
+ op_perf_start();
+ oprofile_perf_enabled = 1;
+ }
+ mutex_unlock(&oprofile_perf_mutex);
+ return ret;
+}
+
+static void oprofile_perf_stop(void)
+{
+ mutex_lock(&oprofile_perf_mutex);
+ if (oprofile_perf_enabled)
+ op_perf_stop();
+ oprofile_perf_enabled = 0;
+ mutex_unlock(&oprofile_perf_mutex);
+}
+
+#ifdef CONFIG_PM
+
+static int oprofile_perf_suspend(struct platform_device *dev, pm_message_t state)
+{
+ mutex_lock(&oprofile_perf_mutex);
+ if (oprofile_perf_enabled)
+ op_perf_stop();
+ mutex_unlock(&oprofile_perf_mutex);
+ return 0;
+}
+
+static int oprofile_perf_resume(struct platform_device *dev)
+{
+ mutex_lock(&oprofile_perf_mutex);
+ if (oprofile_perf_enabled && op_perf_start())
+ oprofile_perf_enabled = 0;
+ mutex_unlock(&oprofile_perf_mutex);
+ return 0;
+}
+
+static struct platform_driver oprofile_driver = {
+ .driver = {
+ .name = "oprofile-perf",
+ },
+ .resume = oprofile_perf_resume,
+ .suspend = oprofile_perf_suspend,
+};
+
+static struct platform_device *oprofile_pdev;
+
+static int __init init_driverfs(void)
+{
+ int ret;
+
+ ret = platform_driver_register(&oprofile_driver);
+ if (ret)
+ return ret;
+
+ oprofile_pdev = platform_device_register_simple(
+ oprofile_driver.driver.name, 0, NULL, 0);
+ if (IS_ERR(oprofile_pdev)) {
+ ret = PTR_ERR(oprofile_pdev);
+ platform_driver_unregister(&oprofile_driver);
+ }
+
+ return ret;
+}
+
+static void exit_driverfs(void)
+{
+ platform_device_unregister(oprofile_pdev);
+ platform_driver_unregister(&oprofile_driver);
+}
+
+#else
+
+static inline int init_driverfs(void) { return 0; }
+static inline void exit_driverfs(void) { }
+
+#endif /* CONFIG_PM */
+
+void oprofile_perf_exit(void)
+{
+ int cpu, id;
+ struct perf_event *event;
+
+ for_each_possible_cpu(cpu) {
+ for (id = 0; id < num_counters; ++id) {
+ event = perf_events[cpu][id];
+ if (event)
+ perf_event_release_kernel(event);
+ }
+
+ kfree(perf_events[cpu]);
+ }
+
+ kfree(counter_config);
+ exit_driverfs();
+}
+
+int __init oprofile_perf_init(struct oprofile_operations *ops)
+{
+ int cpu, ret = 0;
+
+ ret = init_driverfs();
+ if (ret)
+ return ret;
+
+ memset(&perf_events, 0, sizeof(perf_events));
+
+ num_counters = perf_num_counters();
+ if (num_counters <= 0) {
+ pr_info("oprofile: no performance counters\n");
+ ret = -ENODEV;
+ goto out;
+ }
+
+ counter_config = kcalloc(num_counters,
+ sizeof(struct op_counter_config), GFP_KERNEL);
+
+ if (!counter_config) {
+ pr_info("oprofile: failed to allocate %d "
+ "counters\n", num_counters);
+ ret = -ENOMEM;
+ num_counters = 0;
+ goto out;
+ }
+
+ for_each_possible_cpu(cpu) {
+ perf_events[cpu] = kcalloc(num_counters,
+ sizeof(struct perf_event *), GFP_KERNEL);
+ if (!perf_events[cpu]) {
+ pr_info("oprofile: failed to allocate %d perf events "
+ "for cpu %d\n", num_counters, cpu);
+ ret = -ENOMEM;
+ goto out;
+ }
+ }
+
+ ops->create_files = oprofile_perf_create_files;
+ ops->setup = oprofile_perf_setup;
+ ops->start = oprofile_perf_start;
+ ops->stop = oprofile_perf_stop;
+ ops->shutdown = oprofile_perf_stop;
+ ops->cpu_type = op_name_from_perf_id();
+
+ if (!ops->cpu_type)
+ ret = -ENODEV;
+ else
+ pr_info("oprofile: using %s\n", ops->cpu_type);
+
+out:
+ if (ret)
+ oprofile_perf_exit();
+
+ return ret;
+}
static ssize_t ulong_write_file(struct file *file, char const __user *buf, size_t count, loff_t *offset)
{
- unsigned long *value = file->private_data;
+ unsigned long value;
int retval;
if (*offset)
return -EINVAL;
- retval = oprofilefs_ulong_from_user(value, buf, count);
+ retval = oprofilefs_ulong_from_user(&value, buf, count);
+ if (retval)
+ return retval;
+ retval = oprofile_set_ulong(file->private_data, value);
if (retval)
return retval;
+
return count;
}
};
-static struct dentry *__oprofilefs_create_file(struct super_block *sb,
+static int __oprofilefs_create_file(struct super_block *sb,
struct dentry *root, char const *name, const struct file_operations *fops,
- int perm)
+ int perm, void *priv)
{
struct dentry *dentry;
struct inode *inode;
dentry = d_alloc_name(root, name);
if (!dentry)
- return NULL;
+ return -ENOMEM;
inode = oprofilefs_get_inode(sb, S_IFREG | perm);
if (!inode) {
dput(dentry);
- return NULL;
+ return -ENOMEM;
}
inode->i_fop = fops;
d_add(dentry, inode);
- return dentry;
+ dentry->d_inode->i_private = priv;
+ return 0;
}
int oprofilefs_create_ulong(struct super_block *sb, struct dentry *root,
char const *name, unsigned long *val)
{
- struct dentry *d = __oprofilefs_create_file(sb, root, name,
- &ulong_fops, 0644);
- if (!d)
- return -EFAULT;
-
- d->d_inode->i_private = val;
- return 0;
+ return __oprofilefs_create_file(sb, root, name,
+ &ulong_fops, 0644, val);
}
int oprofilefs_create_ro_ulong(struct super_block *sb, struct dentry *root,
char const *name, unsigned long *val)
{
- struct dentry *d = __oprofilefs_create_file(sb, root, name,
- &ulong_ro_fops, 0444);
- if (!d)
- return -EFAULT;
-
- d->d_inode->i_private = val;
- return 0;
+ return __oprofilefs_create_file(sb, root, name,
+ &ulong_ro_fops, 0444, val);
}
int oprofilefs_create_ro_atomic(struct super_block *sb, struct dentry *root,
char const *name, atomic_t *val)
{
- struct dentry *d = __oprofilefs_create_file(sb, root, name,
- &atomic_ro_fops, 0444);
- if (!d)
- return -EFAULT;
-
- d->d_inode->i_private = val;
- return 0;
+ return __oprofilefs_create_file(sb, root, name,
+ &atomic_ro_fops, 0444, val);
}
int oprofilefs_create_file(struct super_block *sb, struct dentry *root,
char const *name, const struct file_operations *fops)
{
- if (!__oprofilefs_create_file(sb, root, name, fops, 0644))
- return -EFAULT;
- return 0;
+ return __oprofilefs_create_file(sb, root, name, fops, 0644, NULL);
}
int oprofilefs_create_file_perm(struct super_block *sb, struct dentry *root,
char const *name, const struct file_operations *fops, int perm)
{
- if (!__oprofilefs_create_file(sb, root, name, fops, perm))
- return -EFAULT;
- return 0;
+ return __oprofilefs_create_file(sb, root, name, fops, perm, NULL);
}
spin_lock_init(&tmp->pardevice_lock);
tmp->ieee1284.mode = IEEE1284_MODE_COMPAT;
tmp->ieee1284.phase = IEEE1284_PH_FWD_IDLE;
- init_MUTEX_LOCKED (&tmp->ieee1284.irq); /* actually a semaphore at 0 */
+ sema_init(&tmp->ieee1284.irq, 0);
tmp->spintime = parport_default_spintime;
atomic_set (&tmp->ref_count, 1);
INIT_LIST_HEAD(&tmp->full_list);
#define DMA_32BIT_PFN IOVA_PFN(DMA_BIT_MASK(32))
#define DMA_64BIT_PFN IOVA_PFN(DMA_BIT_MASK(64))
+/* page table handling */
+#define LEVEL_STRIDE (9)
+#define LEVEL_MASK (((u64)1 << LEVEL_STRIDE) - 1)
+
+static inline int agaw_to_level(int agaw)
+{
+ return agaw + 2;
+}
+
+static inline int agaw_to_width(int agaw)
+{
+ return 30 + agaw * LEVEL_STRIDE;
+}
+
+static inline int width_to_agaw(int width)
+{
+ return (width - 30) / LEVEL_STRIDE;
+}
+
+static inline unsigned int level_to_offset_bits(int level)
+{
+ return (level - 1) * LEVEL_STRIDE;
+}
+
+static inline int pfn_level_offset(unsigned long pfn, int level)
+{
+ return (pfn >> level_to_offset_bits(level)) & LEVEL_MASK;
+}
+
+static inline unsigned long level_mask(int level)
+{
+ return -1UL << level_to_offset_bits(level);
+}
+
+static inline unsigned long level_size(int level)
+{
+ return 1UL << level_to_offset_bits(level);
+}
+
+static inline unsigned long align_to_level(unsigned long pfn, int level)
+{
+ return (pfn + level_size(level) - 1) & level_mask(level);
+}
/* VT-d pages must always be _smaller_ than MM pages. Otherwise things
are never going to work. */
}
-static inline int width_to_agaw(int width);
-
static int __iommu_calculate_agaw(struct intel_iommu *iommu, int max_gaw)
{
unsigned long sagaw;
spin_unlock_irqrestore(&iommu->lock, flags);
}
-/* page table handling */
-#define LEVEL_STRIDE (9)
-#define LEVEL_MASK (((u64)1 << LEVEL_STRIDE) - 1)
-
-static inline int agaw_to_level(int agaw)
-{
- return agaw + 2;
-}
-
-static inline int agaw_to_width(int agaw)
-{
- return 30 + agaw * LEVEL_STRIDE;
-
-}
-
-static inline int width_to_agaw(int width)
-{
- return (width - 30) / LEVEL_STRIDE;
-}
-
-static inline unsigned int level_to_offset_bits(int level)
-{
- return (level - 1) * LEVEL_STRIDE;
-}
-
-static inline int pfn_level_offset(unsigned long pfn, int level)
-{
- return (pfn >> level_to_offset_bits(level)) & LEVEL_MASK;
-}
-
-static inline unsigned long level_mask(int level)
-{
- return -1UL << level_to_offset_bits(level);
-}
-
-static inline unsigned long level_size(int level)
-{
- return 1UL << level_to_offset_bits(level);
-}
-
-static inline unsigned long align_to_level(unsigned long pfn, int level)
-{
- return (pfn + level_size(level) - 1) & level_mask(level);
-}
-
static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
unsigned long pfn)
{
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2a40, quirk_iommu_rwbf);
+#define GGC 0x52
+#define GGC_MEMORY_SIZE_MASK (0xf << 8)
+#define GGC_MEMORY_SIZE_NONE (0x0 << 8)
+#define GGC_MEMORY_SIZE_1M (0x1 << 8)
+#define GGC_MEMORY_SIZE_2M (0x3 << 8)
+#define GGC_MEMORY_VT_ENABLED (0x8 << 8)
+#define GGC_MEMORY_SIZE_2M_VT (0x9 << 8)
+#define GGC_MEMORY_SIZE_3M_VT (0xa << 8)
+#define GGC_MEMORY_SIZE_4M_VT (0xb << 8)
+
+static void __devinit quirk_calpella_no_shadow_gtt(struct pci_dev *dev)
+{
+ unsigned short ggc;
+
+ if (pci_read_config_word(dev, GGC, &ggc))
+ return;
+
+ if (!(ggc & GGC_MEMORY_VT_ENABLED)) {
+ printk(KERN_INFO "DMAR: BIOS has allocated no shadow GTT; disabling IOMMU for graphics\n");
+ dmar_map_gfx = 0;
+ }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, quirk_calpella_no_shadow_gtt);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, quirk_calpella_no_shadow_gtt);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0062, quirk_calpella_no_shadow_gtt);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x006a, quirk_calpella_no_shadow_gtt);
+
/* On Tylersburg chipsets, some BIOSes have been known to enable the
ISOCH DMAR unit for the Azalia sound device, but not give it any
TLB entries, which causes it to deadlock. Check for that. We do
* the VF BAR size multiplied by the number of VFs. The alignment
* is just the VF BAR size.
*/
-int pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
+resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev, int resno)
{
struct resource tmp;
enum pci_bar_type type;
extern void pci_iov_release(struct pci_dev *dev);
extern int pci_iov_resource_bar(struct pci_dev *dev, int resno,
enum pci_bar_type *type);
-extern int pci_sriov_resource_alignment(struct pci_dev *dev, int resno);
+extern resource_size_t pci_sriov_resource_alignment(struct pci_dev *dev,
+ int resno);
extern void pci_restore_iov_state(struct pci_dev *dev);
extern int pci_iov_bus_range(struct pci_bus *bus);
}
#endif /* CONFIG_PCI_IOV */
-static inline int pci_resource_alignment(struct pci_dev *dev,
+static inline resource_size_t pci_resource_alignment(struct pci_dev *dev,
struct resource *res)
{
#ifdef CONFIG_PCI_IOV
DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NEC, PCI_DEVICE_ID_NEC_CBUS_3, quirk_isa_dma_hangs);
/*
+ * Intel NM10 "TigerPoint" LPC PM1a_STS.BM_STS must be clear
+ * for some HT machines to use C4 w/o hanging.
+ */
+static void __devinit quirk_tigerpoint_bm_sts(struct pci_dev *dev)
+{
+ u32 pmbase;
+ u16 pm1a;
+
+ pci_read_config_dword(dev, 0x40, &pmbase);
+ pmbase = pmbase & 0xff80;
+ pm1a = inw(pmbase);
+
+ if (pm1a & 0x10) {
+ dev_info(&dev->dev, FW_BUG "TigerPoint LPC.BM_STS cleared\n");
+ outw(0x10, pmbase);
+ }
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_TGP_LPC, quirk_tigerpoint_bm_sts);
+
+/*
* Chipsets where PCI->PCI transfers vanish or hang
*/
static void __devinit quirk_nopcipci(struct pci_dev *dev)
c = p_dev->function_config;
if (!(c->state & CONFIG_LOCKED)) {
- dev_dbg(&s->dev, "Configuration isnt't locked\n");
+ dev_dbg(&p_dev->dev, "Configuration isnt't locked\n");
mutex_unlock(&s->ops_mutex);
return -EACCES;
}
s->win[w].card_start = offset;
ret = s->ops->set_mem_map(s, &s->win[w]);
if (ret)
- dev_warn(&s->dev, "failed to set_mem_map\n");
+ dev_warn(&p_dev->dev, "failed to set_mem_map\n");
mutex_unlock(&s->ops_mutex);
return ret;
} /* pcmcia_map_mem_page */
c = p_dev->function_config;
if (!(s->state & SOCKET_PRESENT)) {
- dev_dbg(&s->dev, "No card present\n");
+ dev_dbg(&p_dev->dev, "No card present\n");
ret = -ENODEV;
goto unlock;
}
if (!(c->state & CONFIG_LOCKED)) {
- dev_dbg(&s->dev, "Configuration isnt't locked\n");
+ dev_dbg(&p_dev->dev, "Configuration isnt't locked\n");
ret = -EACCES;
goto unlock;
}
if (mod->Attributes & (CONF_IRQ_CHANGE_VALID | CONF_VCC_CHANGE_VALID)) {
- dev_dbg(&s->dev,
+ dev_dbg(&p_dev->dev,
"changing Vcc or IRQ is not allowed at this time\n");
ret = -EINVAL;
goto unlock;
if ((mod->Attributes & CONF_VPP1_CHANGE_VALID) &&
(mod->Attributes & CONF_VPP2_CHANGE_VALID)) {
if (mod->Vpp1 != mod->Vpp2) {
- dev_dbg(&s->dev, "Vpp1 and Vpp2 must be the same\n");
+ dev_dbg(&p_dev->dev,
+ "Vpp1 and Vpp2 must be the same\n");
ret = -EINVAL;
goto unlock;
}
s->socket.Vpp = mod->Vpp1;
if (s->ops->set_socket(s, &s->socket)) {
- dev_printk(KERN_WARNING, &s->dev,
+ dev_printk(KERN_WARNING, &p_dev->dev,
"Unable to set VPP\n");
ret = -EIO;
goto unlock;
}
} else if ((mod->Attributes & CONF_VPP1_CHANGE_VALID) ||
(mod->Attributes & CONF_VPP2_CHANGE_VALID)) {
- dev_dbg(&s->dev, "changing Vcc is not allowed at this time\n");
+ dev_dbg(&p_dev->dev,
+ "changing Vcc is not allowed at this time\n");
ret = -EINVAL;
goto unlock;
}
win = &s->win[w];
if (!(p_dev->_win & CLIENT_WIN_REQ(w))) {
- dev_dbg(&s->dev, "not releasing unknown window\n");
+ dev_dbg(&p_dev->dev, "not releasing unknown window\n");
mutex_unlock(&s->ops_mutex);
return -EINVAL;
}
return -ENODEV;
if (req->IntType & INT_CARDBUS) {
- dev_dbg(&s->dev, "IntType may not be INT_CARDBUS\n");
+ dev_dbg(&p_dev->dev, "IntType may not be INT_CARDBUS\n");
return -EINVAL;
}
c = p_dev->function_config;
if (c->state & CONFIG_LOCKED) {
mutex_unlock(&s->ops_mutex);
- dev_dbg(&s->dev, "Configuration is locked\n");
+ dev_dbg(&p_dev->dev, "Configuration is locked\n");
return -EACCES;
}
s->socket.Vpp = req->Vpp;
if (s->ops->set_socket(s, &s->socket)) {
mutex_unlock(&s->ops_mutex);
- dev_printk(KERN_WARNING, &s->dev,
+ dev_printk(KERN_WARNING, &p_dev->dev,
"Unable to set socket state\n");
return -EINVAL;
}
int ret = -EINVAL;
mutex_lock(&s->ops_mutex);
- dev_dbg(&s->dev, "pcmcia_request_io: %pR , %pR", &c->io[0], &c->io[1]);
+ dev_dbg(&p_dev->dev, "pcmcia_request_io: %pR , %pR",
+ &c->io[0], &c->io[1]);
if (!(s->state & SOCKET_PRESENT)) {
- dev_dbg(&s->dev, "pcmcia_request_io: No card present\n");
+ dev_dbg(&p_dev->dev, "pcmcia_request_io: No card present\n");
goto out;
}
if (c->state & CONFIG_LOCKED) {
- dev_dbg(&s->dev, "Configuration is locked\n");
+ dev_dbg(&p_dev->dev, "Configuration is locked\n");
goto out;
}
if (c->state & CONFIG_IO_REQ) {
- dev_dbg(&s->dev, "IO already configured\n");
+ dev_dbg(&p_dev->dev, "IO already configured\n");
goto out;
}
if (c->io[1].end) {
ret = alloc_io_space(s, &c->io[1], p_dev->io_lines);
if (ret) {
+ struct resource tmp = c->io[0];
+ /* release the previously allocated resource */
release_io_space(s, &c->io[0]);
+ /* but preserve the settings, for they worked... */
+ c->io[0].end = resource_size(&tmp);
+ c->io[0].start = tmp.start;
+ c->io[0].flags = tmp.flags;
goto out;
}
} else
c->state |= CONFIG_IO_REQ;
p_dev->_io = 1;
- dev_dbg(&s->dev, "pcmcia_request_io succeeded: %pR , %pR",
+ dev_dbg(&p_dev->dev, "pcmcia_request_io succeeded: %pR , %pR",
&c->io[0], &c->io[1]);
out:
mutex_unlock(&s->ops_mutex);
int w;
if (!(s->state & SOCKET_PRESENT)) {
- dev_dbg(&s->dev, "No card present\n");
+ dev_dbg(&p_dev->dev, "No card present\n");
return -ENODEV;
}
req->Size = s->map_size;
align = (s->features & SS_CAP_MEM_ALIGN) ? req->Size : s->map_size;
if (req->Size & (s->map_size-1)) {
- dev_dbg(&s->dev, "invalid map size\n");
+ dev_dbg(&p_dev->dev, "invalid map size\n");
return -EINVAL;
}
if ((req->Base && (s->features & SS_CAP_STATIC_MAP)) ||
(req->Base & (align-1))) {
- dev_dbg(&s->dev, "invalid base address\n");
+ dev_dbg(&p_dev->dev, "invalid base address\n");
return -EINVAL;
}
if (req->Base)
if (!(s->state & SOCKET_WIN_REQ(w)))
break;
if (w == MAX_WIN) {
- dev_dbg(&s->dev, "all windows are used already\n");
+ dev_dbg(&p_dev->dev, "all windows are used already\n");
mutex_unlock(&s->ops_mutex);
return -EINVAL;
}
win->res = pcmcia_find_mem_region(req->Base, req->Size, align,
0, s);
if (!win->res) {
- dev_dbg(&s->dev, "allocating mem region failed\n");
+ dev_dbg(&p_dev->dev, "allocating mem region failed\n");
mutex_unlock(&s->ops_mutex);
return -EINVAL;
}
win->card_start = 0;
if (s->ops->set_mem_map(s, win) != 0) {
- dev_dbg(&s->dev, "failed to set memory mapping\n");
+ dev_dbg(&p_dev->dev, "failed to set memory mapping\n");
mutex_unlock(&s->ops_mutex);
return -EIO;
}
if (win->res)
request_resource(&iomem_resource, res);
- dev_dbg(&s->dev, "request_window results in %pR\n", res);
+ dev_dbg(&p_dev->dev, "request_window results in %pR\n", res);
mutex_unlock(&s->ops_mutex);
*wh = res;
if (!pci_resource_start(dev, 0)) {
dev_warn(&dev->dev, "refusing to load the driver as the "
"io_base is NULL.\n");
- goto err_out_free_mem;
+ goto err_out_disable;
}
dev_info(&dev->dev, "Cirrus PD6729 PCI to PCMCIA Bridge at 0x%llx "
* TODO:
* - handle CPU hotplug
* - provide turbo enable/disable api
- * - make sure we can write turbo enable/disable reg based on MISC_EN
*
* Related documents:
* - CDI 403777, 403778 - Auburndale EDS vol 1 & 2
#define THM_TC2 0xac
#define THM_DTV 0xb0
#define THM_ITV 0xd8
-#define ITV_ME_SEQNO_MASK 0x000f0000 /* ME should update every ~200ms */
+#define ITV_ME_SEQNO_MASK 0x00ff0000 /* ME should update every ~200ms */
#define ITV_ME_SEQNO_SHIFT (16)
#define ITV_MCH_TEMP_MASK 0x0000ff00
#define ITV_MCH_TEMP_SHIFT (8)
bool gpu_preferred;
bool poll_turbo_status;
bool second_cpu;
+ bool turbo_toggle_allowed;
struct ips_mcp_limits *limits;
/* Optional MCH interfaces for if i915 is in use */
new_limit = cur_limit - 8; /* 1W decrease */
/* Clamp to SKU TDP limit */
- if (((new_limit * 10) / 8) < (ips->orig_turbo_limit & TURBO_TDP_MASK))
+ if (new_limit < (ips->orig_turbo_limit & TURBO_TDP_MASK))
new_limit = ips->orig_turbo_limit & TURBO_TDP_MASK;
thm_writew(THM_MPCPC, (new_limit * 10) / 8);
if (ips->__cpu_turbo_on)
return;
- on_each_cpu(do_enable_cpu_turbo, ips, 1);
+ if (ips->turbo_toggle_allowed)
+ on_each_cpu(do_enable_cpu_turbo, ips, 1);
ips->__cpu_turbo_on = true;
}
if (!ips->__cpu_turbo_on)
return;
- on_each_cpu(do_disable_cpu_turbo, ips, 1);
+ if (ips->turbo_toggle_allowed)
+ on_each_cpu(do_disable_cpu_turbo, ips, 1);
ips->__cpu_turbo_on = false;
}
{
unsigned long flags;
bool ret = false;
+ u32 temp_limit;
+ u32 avg_power;
+ const char *msg = "MCP limit exceeded: ";
spin_lock_irqsave(&ips->turbo_status_lock, flags);
- if (ips->mcp_avg_temp > (ips->mcp_temp_limit * 100))
- ret = true;
- if (ips->cpu_avg_power + ips->mch_avg_power > ips->mcp_power_limit)
+
+ temp_limit = ips->mcp_temp_limit * 100;
+ if (ips->mcp_avg_temp > temp_limit) {
+ dev_info(&ips->dev->dev,
+ "%sAvg temp %u, limit %u\n", msg, ips->mcp_avg_temp,
+ temp_limit);
ret = true;
- spin_unlock_irqrestore(&ips->turbo_status_lock, flags);
+ }
- if (ret)
+ avg_power = ips->cpu_avg_power + ips->mch_avg_power;
+ if (avg_power > ips->mcp_power_limit) {
dev_info(&ips->dev->dev,
- "MCP power or thermal limit exceeded\n");
+ "%sAvg power %u, limit %u\n", msg, avg_power,
+ ips->mcp_power_limit);
+ ret = true;
+ }
+
+ spin_unlock_irqrestore(&ips->turbo_status_lock, flags);
return ret;
}
}
/**
+ * verify_limits - verify BIOS provided limits
+ * @ips: IPS structure
+ *
+ * BIOS can optionally provide non-default limits for power and temp. Check
+ * them here and use the defaults if the BIOS values are not provided or
+ * are otherwise unusable.
+ */
+static void verify_limits(struct ips_driver *ips)
+{
+ if (ips->mcp_power_limit < ips->limits->mcp_power_limit ||
+ ips->mcp_power_limit > 35000)
+ ips->mcp_power_limit = ips->limits->mcp_power_limit;
+
+ if (ips->mcp_temp_limit < ips->limits->core_temp_limit ||
+ ips->mcp_temp_limit < ips->limits->mch_temp_limit ||
+ ips->mcp_temp_limit > 150)
+ ips->mcp_temp_limit = min(ips->limits->core_temp_limit,
+ ips->limits->mch_temp_limit);
+}
+
+/**
* update_turbo_limits - get various limits & settings from regs
* @ips: IPS driver struct
*
u32 hts = thm_readl(THM_HTS);
ips->cpu_turbo_enabled = !(hts & HTS_PCTD_DIS);
- ips->gpu_turbo_enabled = !(hts & HTS_GTD_DIS);
+ /*
+ * Disable turbo for now, until we can figure out why the power figures
+ * are wrong
+ */
+ ips->cpu_turbo_enabled = false;
+
+ if (ips->gpu_busy)
+ ips->gpu_turbo_enabled = !(hts & HTS_GTD_DIS);
+
ips->core_power_limit = thm_readw(THM_MPCPC);
ips->mch_power_limit = thm_readw(THM_MMGPC);
ips->mcp_temp_limit = thm_readw(THM_PTL);
ips->mcp_power_limit = thm_readw(THM_MPPC);
+ verify_limits(ips);
/* Ignore BIOS CPU vs GPU pref */
}
ret = (ret * 1000) / 65535;
*last = val;
- return ret;
+ return 0;
}
static const u16 temp_decay_factor = 2;
kfree(mch_samples);
kfree(cpu_samples);
kfree(mchp_samples);
- kthread_stop(ips->adjust);
return -ENOMEM;
}
ITV_ME_SEQNO_SHIFT;
seqno_timestamp = get_jiffies_64();
- old_cpu_power = thm_readl(THM_CEC) / 65535;
+ old_cpu_power = thm_readl(THM_CEC);
schedule_timeout_interruptible(msecs_to_jiffies(IPS_SAMPLE_PERIOD));
/* Collect an initial average */
STS_GPL_SHIFT;
/* ignore EC CPU vs GPU pref */
ips->cpu_turbo_enabled = !(sts & STS_PCTD_DIS);
- ips->gpu_turbo_enabled = !(sts & STS_GTD_DIS);
+ /*
+ * Disable turbo for now, until we can figure
+ * out why the power figures are wrong
+ */
+ ips->cpu_turbo_enabled = false;
+ if (ips->gpu_busy)
+ ips->gpu_turbo_enabled = !(sts & STS_GTD_DIS);
ips->mcp_temp_limit = (sts & STS_PTL_MASK) >>
STS_PTL_SHIFT;
ips->mcp_power_limit = (tc1 & STS_PPL_MASK) >>
STS_PPL_SHIFT;
+ verify_limits(ips);
spin_unlock(&ips->turbo_status_lock);
thm_writeb(THM_SEC, SEC_ACK);
* turbo manually or we'll get an illegal MSR access, even though
* turbo will still be available.
*/
- if (!(misc_en & IA32_MISC_TURBO_EN))
- ; /* add turbo MSR write allowed flag if necessary */
+ if (misc_en & IA32_MISC_TURBO_EN)
+ ips->turbo_toggle_allowed = true;
+ else
+ ips->turbo_toggle_allowed = false;
if (strstr(boot_cpu_data.x86_model_id, "CPU M"))
limits = &ips_sv_limits;
tdp = turbo_power & TURBO_TDP_MASK;
/* Sanity check TDP against CPU */
- if (limits->mcp_power_limit != (tdp / 8) * 1000) {
- dev_warn(&ips->dev->dev, "Warning: CPU TDP doesn't match expected value (found %d, expected %d)\n",
- tdp / 8, limits->mcp_power_limit / 1000);
+ if (limits->core_power_limit != (tdp / 8) * 1000) {
+ dev_info(&ips->dev->dev, "CPU TDP doesn't match expected value (found %d, expected %d)\n",
+ tdp / 8, limits->core_power_limit / 1000);
+ limits->core_power_limit = (tdp / 8) * 1000;
}
out:
return true;
out_put_busy:
- symbol_put(i915_gpu_turbo_disable);
+ symbol_put(i915_gpu_busy);
out_put_lower:
symbol_put(i915_gpu_lower);
out_put_raise:
/* Save turbo limits & ratios */
rdmsrl(TURBO_POWER_CURRENT_LIMIT, ips->orig_turbo_limit);
- ips_enable_cpu_turbo(ips);
- ips->cpu_turbo_enabled = true;
+ ips_disable_cpu_turbo(ips);
+ ips->cpu_turbo_enabled = false;
- /* Set up the work queue and monitor/adjust threads */
- ips->monitor = kthread_run(ips_monitor, ips, "ips-monitor");
- if (IS_ERR(ips->monitor)) {
+ /* Create thermal adjust thread */
+ ips->adjust = kthread_create(ips_adjust, ips, "ips-adjust");
+ if (IS_ERR(ips->adjust)) {
dev_err(&dev->dev,
- "failed to create thermal monitor thread, aborting\n");
+ "failed to create thermal adjust thread, aborting\n");
ret = -ENOMEM;
goto error_free_irq;
+
}
- ips->adjust = kthread_create(ips_adjust, ips, "ips-adjust");
- if (IS_ERR(ips->adjust)) {
+ /*
+ * Set up the work queue and monitor thread. The monitor thread
+ * will wake up ips_adjust thread.
+ */
+ ips->monitor = kthread_run(ips_monitor, ips, "ips-monitor");
+ if (IS_ERR(ips->monitor)) {
dev_err(&dev->dev,
- "failed to create thermal adjust thread, aborting\n");
+ "failed to create thermal monitor thread, aborting\n");
ret = -ENOMEM;
goto error_thread_cleanup;
}
return ret;
error_thread_cleanup:
- kthread_stop(ips->monitor);
+ kthread_stop(ips->adjust);
error_free_irq:
free_irq(ips->dev->irq, ips);
error_unmap:
TPACPI_Q_IBM('1', 'D', TPACPI_HK_Q_INIMASK), /* X22, X23, X24 */
};
-typedef u16 tpacpi_keymap_t[TPACPI_HOTKEY_MAP_LEN];
+typedef u16 tpacpi_keymap_entry_t;
+typedef tpacpi_keymap_entry_t tpacpi_keymap_t[TPACPI_HOTKEY_MAP_LEN];
static int __init hotkey_init(struct ibm_init_struct *iibm)
{
};
#define TPACPI_HOTKEY_MAP_SIZE sizeof(tpacpi_keymap_t)
-#define TPACPI_HOTKEY_MAP_TYPESIZE sizeof(tpacpi_keymap_t[0])
+#define TPACPI_HOTKEY_MAP_TYPESIZE sizeof(tpacpi_keymap_entry_t)
int res, i;
int status;
empty_design_prop = POWER_SUPPLY_PROP_ENERGY_EMPTY_DESIGN;
now_prop = POWER_SUPPLY_PROP_ENERGY_NOW;
avg_prop = POWER_SUPPLY_PROP_ENERGY_AVG;
+ break;
case SOURCE_VOLTAGE:
full_prop = POWER_SUPPLY_PROP_VOLTAGE_MAX;
empty_prop = POWER_SUPPLY_PROP_VOLTAGE_MIN;
{
u32 data[3];
u8 *p = (u8 *)&data[1];
- int err = intel_scu_ipc_command(IPC_CMD_BATTERY_PROPERTY,
- IPCMSG_BATTERY, NULL, 0, data, 3);
+ int err = intel_scu_ipc_command(IPCMSG_BATTERY,
+ IPC_CMD_BATTERY_PROPERTY, NULL, 0, data, 3);
prop->capacity = data[0];
prop->crnt = *p++;
static int pmic_scu_ipc_set_charger(int charger)
{
- return intel_scu_ipc_simple_command(charger, IPCMSG_BATTERY);
+ return intel_scu_ipc_simple_command(IPCMSG_BATTERY, charger);
}
/**
struct pm8607_regulator_info *info = rdev_get_drvdata(rdev);
int ret = -EINVAL;
- if (info->vol_table && (index < (2 << info->vol_nbits))) {
+ if (info->vol_table && (index < (1 << info->vol_nbits))) {
ret = info->vol_table[index];
if (info->slope_double)
ret <<= 1;
max_uV = max_uV >> 1;
}
if (info->vol_table) {
- for (i = 0; i < (2 << info->vol_nbits); i++) {
+ for (i = 0; i < (1 << info->vol_nbits); i++) {
if (!info->vol_table[i])
break;
if ((min_uV <= info->vol_table[i])
"%s: failed to register regulator %s err %d\n",
__func__, ab3100_regulator_desc[i].name,
err);
- i--;
/* remove the already registered regulators */
- while (i > 0) {
+ while (--i >= 0)
regulator_unregister(ab3100_regulators[i].rdev);
- i--;
- }
return err;
}
if (info->fixed_uV)
return info->fixed_uV;
- if (selector > info->voltages_len)
+ if (selector >= info->voltages_len)
return -EINVAL;
return info->supported_voltages[selector];
static __devinit int ab8500_regulator_probe(struct platform_device *pdev)
{
struct ab8500 *ab8500 = dev_get_drvdata(pdev->dev.parent);
- struct ab8500_platform_data *pdata = dev_get_platdata(ab8500->dev);
+ struct ab8500_platform_data *pdata;
int i, err;
if (!ab8500) {
dev_err(&pdev->dev, "null mfd parent\n");
return -EINVAL;
}
+ pdata = dev_get_platdata(ab8500->dev);
/* register all regulators */
for (i = 0; i < ARRAY_SIZE(ab8500_regulator_info); i++) {
dev_err(&pdev->dev, "failed to register regulator %s\n",
info->desc.name);
/* when we fail, un-register all earlier regulators */
- i--;
- while (i > 0) {
+ while (--i >= 0) {
info = &ab8500_regulator_info[i];
regulator_unregister(info->regulator);
- i--;
}
return err;
}
unsigned int current_level;
unsigned int current_mask;
unsigned int current_offset;
- struct regulator_dev rdev;
+ struct regulator_dev *rdev;
};
static int ad5398_calc_current(struct ad5398_chip_info *chip,
static int __devinit ad5398_probe(struct i2c_client *client,
const struct i2c_device_id *id)
{
- struct regulator_dev *rdev;
struct regulator_init_data *init_data = client->dev.platform_data;
struct ad5398_chip_info *chip;
const struct ad5398_current_data_format *df =
chip->current_offset = df->current_offset;
chip->current_mask = (chip->current_level - 1) << chip->current_offset;
- rdev = regulator_register(&ad5398_reg, &client->dev, init_data, chip);
- if (IS_ERR(rdev)) {
- ret = PTR_ERR(rdev);
+ chip->rdev = regulator_register(&ad5398_reg, &client->dev,
+ init_data, chip);
+ if (IS_ERR(chip->rdev)) {
+ ret = PTR_ERR(chip->rdev);
dev_err(&client->dev, "failed to register %s %s\n",
id->name, ad5398_reg.name);
goto err;
{
struct ad5398_chip_info *chip = i2c_get_clientdata(client);
- regulator_unregister(&chip->rdev);
+ regulator_unregister(chip->rdev);
kfree(chip);
- i2c_set_clientdata(client, NULL);
return 0;
}
constraints->min_uA != constraints->max_uA) {
ret = _regulator_get_current_limit(rdev);
if (ret > 0)
- count += sprintf(buf + count, "at %d uA ", ret / 1000);
+ count += sprintf(buf + count, "at %d mA ", ret / 1000);
}
if (constraints->valid_modes_mask & REGULATOR_MODE_FAST)
dev_set_name(&rdev->dev, "regulator.%d",
atomic_inc_return(®ulator_no) - 1);
ret = device_register(&rdev->dev);
- if (ret != 0)
+ if (ret != 0) {
+ put_device(&rdev->dev);
goto clean;
+ }
dev_set_drvdata(&rdev->dev, rdev);
mutex_init(&pmic->mtx);
for (i = 0; i < 3; i++) {
- pmic->rdev[i] = regulator_register(&isl_rd[0], &i2c->dev,
+ pmic->rdev[i] = regulator_register(&isl_rd[i], &i2c->dev,
init_data, pmic);
if (IS_ERR(pmic->rdev[i])) {
dev_err(&i2c->dev, "failed to register %s\n", id->name);
struct isl_pmic *pmic = i2c_get_clientdata(i2c);
int i;
- i2c_set_clientdata(i2c, NULL);
-
for (i = 0; i < 3; i++)
regulator_unregister(pmic->rdev[i]);
if (max_uV < MAX1586_V6_MIN_UV || max_uV > MAX1586_V6_MAX_UV)
return -EINVAL;
- if (min_uV >= 3000000)
- selector = 3;
- if (min_uV < 3000000)
- selector = 2;
- if (min_uV < 2500000)
- selector = 1;
if (min_uV < 1800000)
selector = 0;
+ else if (min_uV < 2500000)
+ selector = 1;
+ else if (min_uV < 3000000)
+ selector = 2;
+ else if (min_uV >= 3000000)
+ selector = 3;
if (max1586_v6_calc_voltage(selector) > max_uV)
return -EINVAL;
/* set external clock frequency */
info->extclk_freq = pdata->extclk_freq;
max8649_set_bits(info->i2c, MAX8649_SYNC, MAX8649_EXT_MASK,
- info->extclk_freq);
+ info->extclk_freq << 6);
}
if (pdata->ramp_timing) {
if (!max8998)
return -ENOMEM;
- size = sizeof(struct regulator_dev *) * (pdata->num_regulators + 1);
+ size = sizeof(struct regulator_dev *) * pdata->num_regulators;
max8998->rdev = kzalloc(size, GFP_KERNEL);
if (!max8998->rdev) {
kfree(max8998);
}
rdev = max8998->rdev;
+ max8998->dev = &pdev->dev;
max8998->iodev = iodev;
+ max8998->num_regulators = pdata->num_regulators;
platform_set_drvdata(pdev, max8998);
for (i = 0; i < pdata->num_regulators; i++) {
return 0;
err:
- for (i = 0; i <= max8998->num_regulators; i++)
+ for (i = 0; i < max8998->num_regulators; i++)
if (rdev[i])
regulator_unregister(rdev[i]);
struct regulator_dev **rdev = max8998->rdev;
int i;
- for (i = 0; i <= max8998->num_regulators; i++)
+ for (i = 0; i < max8998->num_regulators; i++)
if (rdev[i])
regulator_unregister(rdev[i]);
return error;
}
-/**
- * tps6507x_remove - TPS6507x driver i2c remove handler
- * @client: i2c driver client device structure
- *
- * Unregister TPS driver as an i2c client device driver
- */
static int __devexit tps6507x_pmic_remove(struct platform_device *pdev)
{
struct tps6507x_dev *tps6507x_dev = platform_get_drvdata(pdev);
mask = ((1 << ri->volt_nbits) - 1) << ri->volt_shift;
val = (val & mask) >> ri->volt_shift;
- if (val > ri->desc.n_voltages)
+ if (val >= ri->desc.n_voltages)
BUG();
return ri->voltages[val] * 1000;
if (ret)
return ret;
- return tps6586x_set_bits(parent, ri->go_reg, ri->go_bit);
+ return tps6586x_set_bits(parent, ri->go_reg, 1 << ri->go_bit);
}
static int tps6586x_regulator_enable(struct regulator_dev *rdev)
case REGULATOR_MODE_IDLE:
ret = wm831x_set_bits(wm831x, ctrl_reg,
- WM831X_LDO1_LP_MODE,
- WM831X_LDO1_LP_MODE);
+ WM831X_LDO1_LP_MODE, 0);
if (ret < 0)
return ret;
WM831X_LDO1_ON_MODE);
if (ret < 0)
return ret;
+ break;
case REGULATOR_MODE_STANDBY:
ret = wm831x_set_bits(wm831x, ctrl_reg,
- WM831X_LDO1_LP_MODE, 0);
+ WM831X_LDO1_LP_MODE,
+ WM831X_LDO1_LP_MODE);
if (ret < 0)
return ret;
mode = REGULATOR_MODE_NORMAL;
} else if (!active && !sleep)
mode = REGULATOR_MODE_IDLE;
- else if (!sleep)
+ else if (sleep)
mode = REGULATOR_MODE_STANDBY;
return mode;
err = PTR_ERR(rtc);
return err;
}
+ platform_set_drvdata(pdev, rtc);
return 0;
}
struct rtc_device *rtc = platform_get_drvdata(pdev);
rtc_device_unregister(rtc);
+ platform_set_drvdata(pdev, NULL);
return 0;
}
enable_irq_wake(IRQ_RTC);
bfin_rtc_sync_pending(&pdev->dev);
} else
- bfin_rtc_int_clear(-1);
+ bfin_rtc_int_clear(0);
return 0;
}
{
if (device_may_wakeup(&pdev->dev))
disable_irq_wake(IRQ_RTC);
- else
- bfin_write_RTC_ISTAT(-1);
+
+ /*
+ * Since only some of the RTC bits are maintained externally in the
+ * Vbat domain, we need to wait for the RTC MMRs to be synced into
+ * the core after waking up. This happens every RTC 1HZ. Once that
+ * has happened, we can go ahead and re-enable the important write
+ * complete interrupt event.
+ */
+ while (!(bfin_read_RTC_ISTAT() & RTC_ISTAT_SEC))
+ continue;
+ bfin_rtc_int_set(RTC_ISTAT_WRITE_COMPLETE);
return 0;
}
free_irq(client->irq, client);
out_free:
- i2c_set_clientdata(client, NULL);
kfree(ds3232);
return ret;
}
}
rtc_device_unregister(ds3232->rtc);
- i2c_set_clientdata(client, NULL);
kfree(ds3232);
return 0;
}
t->time.tm_isdst = -1;
t->enabled = !!(reg[M41T80_REG_ALARM_MON] & M41T80_ALMON_AFE);
t->pending = !!(reg[M41T80_REG_FLAGS] & M41T80_FLAGS_AF);
- return rtc_valid_tm(t);
+ return 0;
}
static struct rtc_class_ops m41t80_rtc_ops = {
}
if (request_irq(adev->irq[0], pl031_interrupt,
- IRQF_DISABLED | IRQF_SHARED, "rtc-pl031", ldata)) {
+ IRQF_DISABLED, "rtc-pl031", ldata)) {
ret = -EIO;
goto out_no_irq;
}
s3c_rtc_setaie(alrm->enabled);
- if (alrm->enabled)
- enable_irq_wake(s3c_rtc_alarmno);
- else
- disable_irq_wake(s3c_rtc_alarmno);
-
return 0;
}
ticnt_en_save &= S3C64XX_RTCCON_TICEN;
}
s3c_rtc_enable(pdev, 0);
+
+ if (device_may_wakeup(&pdev->dev))
+ enable_irq_wake(s3c_rtc_alarmno);
+
return 0;
}
tmp = readb(s3c_rtc_base + S3C2410_RTCCON);
writeb(tmp | ticnt_en_save, s3c_rtc_base + S3C2410_RTCCON);
}
+
+ if (device_may_wakeup(&pdev->dev))
+ disable_irq_wake(s3c_rtc_alarmno);
+
return 0;
}
#else
if (!blkdat->request_queue)
return -ENOMEM;
- elevator_exit(blkdat->request_queue->elevator);
- rc = elevator_init(blkdat->request_queue, "noop");
+ rc = elevator_change(blkdat->request_queue, "noop");
if (rc)
goto cleanup_queue;
dev_fsm, dev_fsm_len, GFP_KERNEL);
if (priv->fsm == NULL) {
CTCMY_DBF_DEV(SETUP, dev, "init_fsm error");
- kfree(dev);
+ free_netdev(dev);
return NULL;
}
fsm_newstate(priv->fsm, DEV_STATE_STOPPED);
grp = ctcmpc_init_mpc_group(priv);
if (grp == NULL) {
MPC_DBF_DEV(SETUP, dev, "init_mpc_group error");
- kfree(dev);
+ free_netdev(dev);
return NULL;
}
tasklet_init(&grp->mpc_tasklet2,
enum iscsi_host_param param, char *buf)
{
struct beiscsi_hba *phba = (struct beiscsi_hba *)iscsi_host_priv(shost);
- int len = 0;
- int status;
+ int status = 0;
SE_DEBUG(DBG_LVL_8, "In beiscsi_get_host_param, param= %d\n", param);
switch (param) {
default:
return iscsi_host_get_param(shost, param, buf);
}
- return len;
+ return status;
}
int beiscsi_get_macaddr(char *buf, struct beiscsi_hba *phba)
memset(req, 0, sizeof(*req));
wrb->tag0 |= tag;
- be_wrb_hdr_prepare(wrb, sizeof(*req), true, 1);
+ be_wrb_hdr_prepare(wrb, sizeof(*req), false, 1);
be_cmd_hdr_prepare(&req->hdr, CMD_SUBSYSTEM_ISCSI,
OPCODE_COMMON_ISCSI_TCP_CONNECT_AND_OFFLOAD,
sizeof(*req));
{
struct scsi_sense_hdr sshdr;
- scmd_printk(KERN_INFO, cmd, "");
+ scmd_printk(KERN_INFO, cmd, " ");
scsi_decode_sense_buffer(cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE,
&sshdr);
scsi_show_sense_hdr(&sshdr);
scsi_decode_sense_extras(cmd->sense_buffer, SCSI_SENSE_BUFFERSIZE,
&sshdr);
- scmd_printk(KERN_INFO, cmd, "");
+ scmd_printk(KERN_INFO, cmd, " ");
scsi_show_extd_sense(sshdr.asc, sshdr.ascq);
}
EXPORT_SYMBOL(scsi_print_sense);
void scsi_print_result(struct scsi_cmnd *cmd)
{
- scmd_printk(KERN_INFO, cmd, "");
+ scmd_printk(KERN_INFO, cmd, " ");
scsi_show_result(cmd->result);
}
EXPORT_SYMBOL(scsi_print_result);
misc_fw_support = readl(&cfgtable->misc_fw_support);
use_doorbell = misc_fw_support & MISC_FW_DOORBELL_RESET;
+ /* The doorbell reset seems to cause lockups on some Smart
+ * Arrays (e.g. P410, P410i, maybe others). Until this is
+ * fixed or at least isolated, avoid the doorbell reset.
+ */
+ use_doorbell = 0;
+
rc = hpsa_controller_hard_reset(pdev, vaddr, use_doorbell);
if (rc)
goto unmap_cfgtable;
{
_osd_req_encode_common(or, OSD_ACT_READ, obj, offset, len);
WARN_ON(or->in.bio || or->in.total_bytes);
- WARN_ON(1 == (bio->bi_rw & REQ_WRITE));
+ WARN_ON(bio->bi_rw & REQ_WRITE);
or->in.bio = bio;
or->in.total_bytes = len;
}
qla24xx_disable_vp(vha);
+ vha->flags.delete_progress = 1;
+
fc_remove_host(vha->host);
scsi_remove_host(vha->host);
- qla2x00_free_fcports(vha);
+ if (vha->timer_active) {
+ qla2x00_vp_stop_timer(vha);
+ DEBUG15(printk(KERN_INFO "scsi(%ld): timer for the vport[%d]"
+ " = %p has stopped\n", vha->host_no, vha->vp_idx, vha));
+ }
qla24xx_deallocate_vp_id(vha);
+ /* No pending activities shall be there on the vha now */
+ DEBUG(msleep(random32()%10)); /* Just to see if something falls on
+ * the net we have placed below */
+
+ BUG_ON(atomic_read(&vha->vref_count));
+
+ qla2x00_free_fcports(vha);
+
mutex_lock(&ha->vport_lock);
ha->cur_vport_count--;
clear_bit(vha->vp_idx, ha->vp_idx_map);
mutex_unlock(&ha->vport_lock);
- if (vha->timer_active) {
- qla2x00_vp_stop_timer(vha);
- DEBUG15(printk ("scsi(%ld): timer for the vport[%d] = %p "
- "has stopped\n",
- vha->host_no, vha->vp_idx, vha));
- }
-
if (vha->req->id && !ha->flags.cpu_affinity_enabled) {
if (qla25xx_delete_req_que(vha, vha->req) != QLA_SUCCESS)
qla_printk(KERN_WARNING, ha,
/* #define QL_DEBUG_LEVEL_17 */ /* Output EEH trace messages */
/* #define QL_DEBUG_LEVEL_18 */ /* Output T10 CRC trace messages */
-/* #define QL_PRINTK_BUF */ /* Captures printk to buffer */
-
/*
* Macros use for debugging the driver.
*/
#define MBX_UPDATE_FLASH_ACTIVE 3
struct mutex vport_lock; /* Virtual port synchronization */
+ spinlock_t vport_slock; /* order is hardware_lock, then vport_slock */
struct completion mbx_cmd_comp; /* Serialize mbx access */
struct completion mbx_intr_comp; /* Used for completion notification */
struct completion dcbx_comp; /* For set port config notification */
uint32_t management_server_logged_in :1;
uint32_t process_response_queue :1;
uint32_t difdix_supported:1;
+ uint32_t delete_progress:1;
} flags;
atomic_t loop_state;
struct req_que *req;
int fw_heartbeat_counter;
int seconds_since_last_heartbeat;
+
+ atomic_t vref_count;
} scsi_qla_host_t;
/*
test_bit(LOOP_RESYNC_NEEDED, &ha->dpc_flags) || \
atomic_read(&ha->loop_state) == LOOP_DOWN)
+#define QLA_VHA_MARK_BUSY(__vha, __bail) do { \
+ atomic_inc(&__vha->vref_count); \
+ mb(); \
+ if (__vha->flags.delete_progress) { \
+ atomic_dec(&__vha->vref_count); \
+ __bail = 1; \
+ } else { \
+ __bail = 0; \
+ } \
+} while (0)
+
+#define QLA_VHA_MARK_NOT_BUSY(__vha) do { \
+ atomic_dec(&__vha->vref_count); \
+} while (0)
+
+
#define qla_printk(level, ha, format, arg...) \
dev_printk(level , &((ha)->pdev->dev) , format , ## arg)
{
struct srb_ctx *ctx = sp->ctx;
struct srb_iocb *iocb = ctx->u.iocb_cmd;
+ struct scsi_qla_host *vha = sp->fcport->vha;
del_timer_sync(&iocb->timer);
kfree(iocb);
kfree(ctx);
mempool_free(sp, sp->fcport->vha->hw->srb_mempool);
+
+ QLA_VHA_MARK_NOT_BUSY(vha);
}
inline srb_t *
qla2x00_get_ctx_sp(scsi_qla_host_t *vha, fc_port_t *fcport, size_t size,
unsigned long tmo)
{
- srb_t *sp;
+ srb_t *sp = NULL;
struct qla_hw_data *ha = vha->hw;
struct srb_ctx *ctx;
struct srb_iocb *iocb;
+ uint8_t bail;
+
+ QLA_VHA_MARK_BUSY(vha, bail);
+ if (bail)
+ return NULL;
sp = mempool_alloc(ha->srb_mempool, GFP_KERNEL);
if (!sp)
iocb->timer.function = qla2x00_ctx_sp_timeout;
add_timer(&iocb->timer);
done:
+ if (!sp)
+ QLA_VHA_MARK_NOT_BUSY(vha);
return sp;
}
qla2x00_init_response_q_entries(rsp);
}
+ spin_lock_irqsave(&ha->vport_slock, flags);
/* Clear RSCN queue. */
list_for_each_entry(vp, &ha->vp_list, list) {
vp->rscn_in_ptr = 0;
vp->rscn_out_ptr = 0;
}
+
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
ha->isp_ops->config_rings(vha);
spin_unlock_irqrestore(&ha->hardware_lock, flags);
/* Bypass virtual ports of the same host. */
found = 0;
if (ha->num_vhosts) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
list_for_each_entry_safe(vp, tvp, &ha->vp_list, list) {
if (new_fcport->d_id.b24 == vp->d_id.b24) {
found = 1;
break;
}
}
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
if (found)
continue;
}
struct qla_hw_data *ha = vha->hw;
struct scsi_qla_host *vp;
struct scsi_qla_host *tvp;
+ unsigned long flags = 0;
rval = QLA_SUCCESS;
/* Check for loop ID being already in use. */
found = 0;
fcport = NULL;
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
list_for_each_entry_safe(vp, tvp, &ha->vp_list, list) {
list_for_each_entry(fcport, &vp->vp_fcports, list) {
if (fcport->loop_id == dev->loop_id &&
if (found)
break;
}
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
/* If not in use then it is free to use. */
if (!found) {
qla2x00_update_fcports(scsi_qla_host_t *base_vha)
{
fc_port_t *fcport;
- struct scsi_qla_host *tvp, *vha;
+ struct scsi_qla_host *vha;
+ struct qla_hw_data *ha = base_vha->hw;
+ unsigned long flags;
+ spin_lock_irqsave(&ha->vport_slock, flags);
/* Go with deferred removal of rport references. */
- list_for_each_entry_safe(vha, tvp, &base_vha->hw->vp_list, list)
- list_for_each_entry(fcport, &vha->vp_fcports, list)
+ list_for_each_entry(vha, &base_vha->hw->vp_list, list) {
+ atomic_inc(&vha->vref_count);
+ list_for_each_entry(fcport, &vha->vp_fcports, list) {
if (fcport && fcport->drport &&
- atomic_read(&fcport->state) != FCS_UNCONFIGURED)
+ atomic_read(&fcport->state) != FCS_UNCONFIGURED) {
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
qla2x00_rport_del(fcport);
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ }
+ }
+ atomic_dec(&vha->vref_count);
+ }
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
}
void
{
struct qla_hw_data *ha = vha->hw;
struct scsi_qla_host *vp, *base_vha = pci_get_drvdata(ha->pdev);
- struct scsi_qla_host *tvp;
+ unsigned long flags;
vha->flags.online = 0;
ha->flags.chip_reset_done = 0;
if (atomic_read(&vha->loop_state) != LOOP_DOWN) {
atomic_set(&vha->loop_state, LOOP_DOWN);
qla2x00_mark_all_devices_lost(vha, 0);
- list_for_each_entry_safe(vp, tvp, &base_vha->hw->vp_list, list)
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ list_for_each_entry(vp, &base_vha->hw->vp_list, list) {
+ atomic_inc(&vp->vref_count);
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
qla2x00_mark_all_devices_lost(vp, 0);
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ atomic_dec(&vp->vref_count);
+ }
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
} else {
if (!atomic_read(&vha->loop_down_timer))
atomic_set(&vha->loop_down_timer,
uint8_t status = 0;
struct qla_hw_data *ha = vha->hw;
struct scsi_qla_host *vp;
- struct scsi_qla_host *tvp;
struct req_que *req = ha->req_q_map[0];
+ unsigned long flags;
if (vha->flags.online) {
qla2x00_abort_isp_cleanup(vha);
DEBUG(printk(KERN_INFO
"qla2x00_abort_isp(%ld): succeeded.\n",
vha->host_no));
- list_for_each_entry_safe(vp, tvp, &ha->vp_list, list) {
- if (vp->vp_idx)
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ list_for_each_entry(vp, &ha->vp_list, list) {
+ if (vp->vp_idx) {
+ atomic_inc(&vp->vref_count);
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
qla2x00_vp_abort_isp(vp);
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ atomic_dec(&vp->vref_count);
+ }
}
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
} else {
qla_printk(KERN_INFO, ha,
"qla2x00_abort_isp: **** FAILED ****\n");
struct req_que *req = ha->req_q_map[0];
struct rsp_que *rsp = ha->rsp_q_map[0];
struct scsi_qla_host *vp;
- struct scsi_qla_host *tvp;
+ unsigned long flags;
status = qla2x00_init_rings(vha);
if (!status) {
DEBUG(printk(KERN_INFO
"qla82xx_restart_isp(%ld): succeeded.\n",
vha->host_no));
- list_for_each_entry_safe(vp, tvp, &ha->vp_list, list) {
- if (vp->vp_idx)
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ list_for_each_entry(vp, &ha->vp_list, list) {
+ if (vp->vp_idx) {
+ atomic_inc(&vp->vref_count);
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
qla2x00_vp_abort_isp(vp);
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ atomic_dec(&vp->vref_count);
+ }
}
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
} else {
qla_printk(KERN_INFO, ha,
"qla82xx_restart_isp: **** FAILED ****\n");
cp->result = DID_ERROR << 16;
break;
}
- } else if (!lscsi_status) {
+ } else {
DEBUG2(qla_printk(KERN_INFO, ha,
"scsi(%ld:%d:%d) Dropped frame(s) detected (0x%x "
"of 0x%x bytes).\n", vha->host_no, cp->device->id,
cp->device->lun, resid, scsi_bufflen(cp)));
- cp->result = DID_ERROR << 16;
- break;
+ cp->result = DID_ERROR << 16 | lscsi_status;
+ goto check_scsi_status;
}
cp->result = DID_OK << 16 | lscsi_status;
logit = 0;
+check_scsi_status:
/*
* Check to see if SCSI Status is non zero. If so report SCSI
* Status.
uint16_t stat = le16_to_cpu(rptid_entry->vp_idx);
struct qla_hw_data *ha = vha->hw;
scsi_qla_host_t *vp;
- scsi_qla_host_t *tvp;
+ unsigned long flags;
if (rptid_entry->entry_status != 0)
return;
return;
}
- list_for_each_entry_safe(vp, tvp, &ha->vp_list, list)
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ list_for_each_entry(vp, &ha->vp_list, list)
if (vp_idx == vp->vp_idx)
break;
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
if (!vp)
return;
{
uint32_t vp_id;
struct qla_hw_data *ha = vha->hw;
+ unsigned long flags;
/* Find an empty slot and assign an vp_id */
mutex_lock(&ha->vport_lock);
set_bit(vp_id, ha->vp_idx_map);
ha->num_vhosts++;
vha->vp_idx = vp_id;
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
list_add_tail(&vha->list, &ha->vp_list);
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
mutex_unlock(&ha->vport_lock);
return vp_id;
}
{
uint16_t vp_id;
struct qla_hw_data *ha = vha->hw;
+ unsigned long flags = 0;
mutex_lock(&ha->vport_lock);
+ /*
+ * Wait for all pending activities to finish before removing vport from
+ * the list.
+ * Lock needs to be held for safe removal from the list (it
+ * ensures no active vp_list traversal while the vport is removed
+ * from the queue)
+ */
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ while (atomic_read(&vha->vref_count)) {
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
+ msleep(500);
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ }
+ list_del(&vha->list);
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
vp_id = vha->vp_idx;
ha->num_vhosts--;
clear_bit(vp_id, ha->vp_idx_map);
- list_del(&vha->list);
+
mutex_unlock(&ha->vport_lock);
}
{
scsi_qla_host_t *vha;
struct scsi_qla_host *tvha;
+ unsigned long flags;
+ spin_lock_irqsave(&ha->vport_slock, flags);
/* Locate matching device in database. */
list_for_each_entry_safe(vha, tvha, &ha->vp_list, list) {
- if (!memcmp(port_name, vha->port_name, WWN_SIZE))
+ if (!memcmp(port_name, vha->port_name, WWN_SIZE)) {
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
return vha;
+ }
}
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
return NULL;
}
static void
qla2x00_mark_vp_devices_dead(scsi_qla_host_t *vha)
{
+ /*
+ * !!! NOTE !!!
+ * This function, if called in contexts other than vp create, disable
+ * or delete, please make sure this is synchronized with the
+ * delete thread.
+ */
fc_port_t *fcport;
list_for_each_entry(fcport, &vha->vp_fcports, list) {
"loop_id=0x%04x :%x\n",
vha->host_no, fcport->loop_id, fcport->vp_idx));
- atomic_set(&fcport->state, FCS_DEVICE_DEAD);
qla2x00_mark_device_lost(vha, fcport, 0, 0);
atomic_set(&fcport->state, FCS_UNCONFIGURED);
}
void
qla2x00_alert_all_vps(struct rsp_que *rsp, uint16_t *mb)
{
- scsi_qla_host_t *vha, *tvha;
+ scsi_qla_host_t *vha;
struct qla_hw_data *ha = rsp->hw;
int i = 0;
+ unsigned long flags;
- list_for_each_entry_safe(vha, tvha, &ha->vp_list, list) {
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ list_for_each_entry(vha, &ha->vp_list, list) {
if (vha->vp_idx) {
+ atomic_inc(&vha->vref_count);
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
switch (mb[0]) {
case MBA_LIP_OCCURRED:
case MBA_LOOP_UP:
qla2x00_async_event(vha, rsp, mb);
break;
}
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ atomic_dec(&vha->vref_count);
}
i++;
}
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
}
int
int ret;
struct qla_hw_data *ha = vha->hw;
scsi_qla_host_t *vp;
- struct scsi_qla_host *tvp;
+ unsigned long flags = 0;
if (vha->vp_idx)
return;
if (!(ha->current_topology & ISP_CFG_F))
return;
- list_for_each_entry_safe(vp, tvp, &ha->vp_list, list) {
- if (vp->vp_idx)
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ list_for_each_entry(vp, &ha->vp_list, list) {
+ if (vp->vp_idx) {
+ atomic_inc(&vp->vref_count);
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
ret = qla2x00_do_dpc_vp(vp);
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ atomic_dec(&vp->vref_count);
+ }
}
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
}
int
sufficient_dsds:
req_cnt = 1;
+ if (req->cnt < (req_cnt + 2)) {
+ cnt = (uint16_t)RD_REG_DWORD_RELAXED(
+ ®->req_q_out[0]);
+ if (req->ring_index < cnt)
+ req->cnt = cnt - req->ring_index;
+ else
+ req->cnt = req->length -
+ (req->ring_index - cnt);
+ }
+
+ if (req->cnt < (req_cnt + 2))
+ goto queuing_error;
+
ctx = sp->ctx = mempool_alloc(ha->ctx_mempool, GFP_ATOMIC);
if (!sp->ctx) {
DEBUG(printk(KERN_INFO
set_bit(ISP_ABORT_NEEDED, &vha->dpc_flags);
}
qla2xxx_wake_dpc(vha);
+ ha->flags.fw_hung = 1;
if (ha->flags.mbox_busy) {
- ha->flags.fw_hung = 1;
ha->flags.mbox_int = 1;
DEBUG2(qla_printk(KERN_ERR, ha,
- "Due to fw hung, doing premature "
- "completion of mbx command\n"));
- complete(&ha->mbx_intr_comp);
+ "Due to fw hung, doing premature "
+ "completion of mbx command\n"));
+ if (test_bit(MBX_INTR_WAIT,
+ &ha->mbx_cmd_flags))
+ complete(&ha->mbx_intr_comp);
}
}
- }
+ } else
+ vha->seconds_since_last_heartbeat = 0;
vha->fw_heartbeat_counter = fw_heartbeat_counter;
}
"%s(): Adapter reset needed!\n", __func__);
set_bit(ISP_ABORT_NEEDED, &vha->dpc_flags);
qla2xxx_wake_dpc(vha);
+ ha->flags.fw_hung = 1;
if (ha->flags.mbox_busy) {
- ha->flags.fw_hung = 1;
ha->flags.mbox_int = 1;
DEBUG2(qla_printk(KERN_ERR, ha,
- "Need reset, doing premature "
- "completion of mbx command\n"));
- complete(&ha->mbx_intr_comp);
+ "Need reset, doing premature "
+ "completion of mbx command\n"));
+ if (test_bit(MBX_INTR_WAIT,
+ &ha->mbx_cmd_flags))
+ complete(&ha->mbx_intr_comp);
}
} else {
qla82xx_check_fw_alive(vha);
static void
qla2x00_remove_one(struct pci_dev *pdev)
{
- scsi_qla_host_t *base_vha, *vha, *temp;
+ scsi_qla_host_t *base_vha, *vha;
struct qla_hw_data *ha;
+ unsigned long flags;
base_vha = pci_get_drvdata(pdev);
ha = base_vha->hw;
- list_for_each_entry_safe(vha, temp, &ha->vp_list, list) {
- if (vha && vha->fc_vport)
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ list_for_each_entry(vha, &ha->vp_list, list) {
+ atomic_inc(&vha->vref_count);
+
+ if (vha && vha->fc_vport) {
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
+
fc_vport_terminate(vha->fc_vport);
+
+ spin_lock_irqsave(&ha->vport_slock, flags);
+ }
+
+ atomic_dec(&vha->vref_count);
}
+ spin_unlock_irqrestore(&ha->vport_slock, flags);
set_bit(UNLOADING, &base_vha->dpc_flags);
qla2x00_alloc_work(struct scsi_qla_host *vha, enum qla_work_type type)
{
struct qla_work_evt *e;
+ uint8_t bail;
+
+ QLA_VHA_MARK_BUSY(vha, bail);
+ if (bail)
+ return NULL;
e = kzalloc(sizeof(struct qla_work_evt), GFP_ATOMIC);
- if (!e)
+ if (!e) {
+ QLA_VHA_MARK_NOT_BUSY(vha);
return NULL;
+ }
INIT_LIST_HEAD(&e->list);
e->type = type;
}
if (e->flags & QLA_EVT_FLAG_FREE)
kfree(e);
+
+ /* For each work completed decrement vha ref count */
+ QLA_VHA_MARK_NOT_BUSY(vha);
}
}
/*
* Driver version
*/
-#define QLA2XXX_VERSION "8.03.03-k0"
+#define QLA2XXX_VERSION "8.03.04-k0"
#define QLA_DRIVER_MAJOR_VER 8
#define QLA_DRIVER_MINOR_VER 3
-#define QLA_DRIVER_PATCH_VER 3
+#define QLA_DRIVER_PATCH_VER 4
#define QLA_DRIVER_BETA_VER 0
/* If the user actually wanted this page, we can skip the rest */
if (page == 0)
- return -EINVAL;
+ return 0;
for (i = 0; i < min((int)buf[3], buf_len - 4); i++)
if (buf[i + 4] == page)
goto found;
- if (i < buf[3] && i > buf_len)
+ if (i < buf[3] && i >= buf_len - 4)
/* ran off the end of the buffer, give us benefit of doubt */
goto found;
/* The device claims it doesn't support the requested page */
err_exit:
scsi_release_buffers(cmd);
- scsi_put_command(cmd);
cmd->request->special = NULL;
+ scsi_put_command(cmd);
return error;
}
EXPORT_SYMBOL(scsi_init_io);
SCSI_LOG_HLQUEUE(3, sd_printk(KERN_INFO, sdkp, "sd_release\n"));
- if (atomic_dec_return(&sdkp->openers) && sdev->removable) {
+ if (atomic_dec_return(&sdkp->openers) == 0 && sdev->removable) {
if (scsi_block_when_processing_errors(sdev))
scsi_set_medium_removal(sdev, SCSI_REMOVAL_ALLOW);
}
static void sd_print_sense_hdr(struct scsi_disk *sdkp,
struct scsi_sense_hdr *sshdr)
{
- sd_printk(KERN_INFO, sdkp, "");
+ sd_printk(KERN_INFO, sdkp, " ");
scsi_show_sense_hdr(sshdr);
- sd_printk(KERN_INFO, sdkp, "");
+ sd_printk(KERN_INFO, sdkp, " ");
scsi_show_extd_sense(sshdr->asc, sshdr->ascq);
}
static void sd_print_result(struct scsi_disk *sdkp, int result)
{
- sd_printk(KERN_INFO, sdkp, "");
+ sd_printk(KERN_INFO, sdkp, " ");
scsi_show_result(result);
}
static void sym_print_msg(struct sym_ccb *cp, char *label, u_char *msg)
{
- if (label)
- sym_print_addr(cp->cmd, "%s: ", label);
- else
- sym_print_addr(cp->cmd, "");
+ sym_print_addr(cp->cmd, "%s: ", label);
spi_print_msg(msg);
printf("\n");
switch (np->msgin [2]) {
case M_X_MODIFY_DP:
if (DEBUG_FLAGS & DEBUG_POINTER)
- sym_print_msg(cp, NULL, np->msgin);
+ sym_print_msg(cp, "extended msg ",
+ np->msgin);
tmp = (np->msgin[3]<<24) + (np->msgin[4]<<16) +
(np->msgin[5]<<8) + (np->msgin[6]);
sym_modify_dp(np, tp, cp, tmp);
*/
case M_IGN_RESIDUE:
if (DEBUG_FLAGS & DEBUG_POINTER)
- sym_print_msg(cp, NULL, np->msgin);
+ sym_print_msg(cp, "1 or 2 byte ", np->msgin);
if (cp->host_flags & HF_SENSE)
OUTL_DSP(np, SCRIPTA_BA(np, clrack));
else
spin_unlock_irqrestore(&uap->port.lock, flags);
}
-static void pl010_set_ldisc(struct uart_port *port)
+static void pl010_set_ldisc(struct uart_port *port, int new)
{
- int line = port->line;
-
- if (line >= port->state->port.tty->driver->num)
- return;
-
- if (port->state->port.tty->ldisc->ops->num == N_PPS) {
+ if (new == N_PPS) {
port->flags |= UPF_HARDPPS_CD;
pl010_enable_ms(port);
} else
if (!port) {
printk(KERN_WARNING
"IOC3 serial memory not available for port\n");
+ ret = -ENOMEM;
goto out4;
}
spin_lock_init(&port->ip_lock);
#include <linux/init.h>
#include <linux/console.h>
#include <linux/sysrq.h>
+#include <linux/slab.h>
#include <linux/serial_reg.h>
#include <linux/circ_buf.h>
#include <linux/delay.h>
}
phsu = hsu;
-
hsu_debugfs_init(hsu);
return;
static void serial_hsu_remove(struct pci_dev *pdev)
{
- struct hsu_port *hsu;
- int i;
+ void *priv = pci_get_drvdata(pdev);
+ struct uart_hsu_port *up;
- hsu = pci_get_drvdata(pdev);
- if (!hsu)
+ if (!priv)
return;
- for (i = 0; i < 3; i++)
- uart_remove_one_port(&serial_hsu_reg, &hsu->port[i].port);
+ /* For port 0/1/2, priv is the address of uart_hsu_port */
+ if (pdev->device != 0x081E) {
+ up = priv;
+ uart_remove_one_port(&serial_hsu_reg, &up->port);
+ }
pci_set_drvdata(pdev, NULL);
- free_irq(hsu->irq, hsu);
+ free_irq(pdev->irq, priv);
pci_disable_device(pdev);
}
psc_fifoc = of_iomap(np, 0);
if (!psc_fifoc) {
pr_err("%s: Can't map FIFOC\n", __func__);
+ of_node_put(np);
return -ENODEV;
}
#include <linux/module.h>
#include <linux/ioport.h>
+#include <linux/irq.h>
#include <linux/init.h>
#include <linux/console.h>
#include <linux/sysrq.h>
info->p_dev = link;
link->priv = info;
- link->resource[0]->flags |= IO_DATA_PATH_WIDTH_8;
- link->resource[0]->end = 8;
link->conf.Attributes = CONF_ENABLE_IRQ;
if (do_sound) {
link->conf.Attributes |= CONF_ENABLE_SPKR;
/*====================================================================*/
+static int pfc_config(struct pcmcia_device *p_dev)
+{
+ unsigned int port = 0;
+ struct serial_info *info = p_dev->priv;
+
+ if ((p_dev->resource[1]->end != 0) &&
+ (resource_size(p_dev->resource[1]) == 8)) {
+ port = p_dev->resource[1]->start;
+ info->slave = 1;
+ } else if ((info->manfid == MANFID_OSITECH) &&
+ (resource_size(p_dev->resource[0]) == 0x40)) {
+ port = p_dev->resource[0]->start + 0x28;
+ info->slave = 1;
+ }
+ if (info->slave)
+ return setup_serial(p_dev, info, port, p_dev->irq);
+
+ dev_warn(&p_dev->dev, "no usable port range found, giving up\n");
+ return -ENODEV;
+}
+
static int simple_config_check(struct pcmcia_device *p_dev,
cistpl_cftable_entry_t *cf,
cistpl_cftable_entry_t *dflt,
struct serial_info *info = link->priv;
int i = -ENODEV, try;
- /* If the card is already configured, look up the port and irq */
- if (link->function_config) {
- unsigned int port = 0;
- if ((link->resource[1]->end != 0) &&
- (resource_size(link->resource[1]) == 8)) {
- port = link->resource[1]->end;
- info->slave = 1;
- } else if ((info->manfid == MANFID_OSITECH) &&
- (resource_size(link->resource[0]) == 0x40)) {
- port = link->resource[0]->start + 0x28;
- info->slave = 1;
- }
- if (info->slave) {
- return setup_serial(link, info, port,
- link->irq);
- }
- }
+ link->resource[0]->flags |= IO_DATA_PATH_WIDTH_8;
+ link->resource[0]->end = 8;
/* First pass: look for a config entry that looks normal.
* Two tries: without IO aliases, then with aliases */
if (!pcmcia_loop_config(link, simple_config_check_notpicky, NULL))
goto found_port;
- printk(KERN_NOTICE
- "serial_cs: no usable port range found, giving up\n");
+ dev_warn(&link->dev, "no usable port range found, giving up\n");
return -1;
found_port:
int i, base2 = 0;
/* First, look for a generic full-sized window */
+ link->resource[0]->flags |= IO_DATA_PATH_WIDTH_8;
link->resource[0]->end = info->multi * 8;
if (pcmcia_loop_config(link, multi_config_check, &base2)) {
/* If that didn't work, look for two windows */
info->multi = 2;
if (pcmcia_loop_config(link, multi_config_check_notpicky,
&base2)) {
- printk(KERN_NOTICE "serial_cs: no usable port range"
+ dev_warn(&link->dev, "no usable port range "
"found, giving up\n");
return -ENODEV;
}
}
if (!link->irq)
- dev_warn(&link->dev,
- "serial_cs: no usable IRQ found, continuing...\n");
+ dev_warn(&link->dev, "no usable IRQ found, continuing...\n");
/*
* Apply any configuration quirks.
multifunction cards that ask for appropriate IO port ranges */
if ((info->multi == 0) &&
(link->has_func_id) &&
+ (link->socket->pcmcia_pfc == 0) &&
((link->func_id == CISTPL_FUNCID_MULTI) ||
(link->func_id == CISTPL_FUNCID_SERIAL)))
pcmcia_loop_config(link, serial_check_for_multi, info);
if (info->quirk && info->quirk->multi != -1)
info->multi = info->quirk->multi;
- if (info->multi > 1)
+ dev_info(&link->dev,
+ "trying to set up [0x%04x:0x%04x] (pfc: %d, multi: %d, quirk: %p)\n",
+ link->manf_id, link->card_id,
+ link->socket->pcmcia_pfc, info->multi, info->quirk);
+ if (link->socket->pcmcia_pfc)
+ i = pfc_config(link);
+ else if (info->multi > 1)
i = multi_config(link);
else
i = simple_config(link);
return 0;
failed:
- dev_warn(&link->dev, "serial_cs: failed to initialize\n");
+ dev_warn(&link->dev, "failed to initialize\n");
serial_remove(link);
return -ENODEV;
}
msg->state = NULL;
if (msg->complete)
msg->complete(msg->context);
- /* This message is completed, so let's turn off the clock! */
+ /* This message is completed, so let's turn off the clocks! */
clk_disable(pl022->clk);
+ amba_pclk_disable(pl022->adev);
}
/**
/* Setup the SPI using the per chip configuration */
pl022->cur_chip = spi_get_ctldata(pl022->cur_msg->spi);
/*
- * We enable the clock here, then the clock will be disabled when
+ * We enable the clocks here, then the clocks will be disabled when
* giveback() is called in each method (poll/interrupt/DMA)
*/
+ amba_pclk_enable(pl022->adev);
clk_enable(pl022->clk);
restore_state(pl022);
flush(pl022);
}
/* Disable SSP */
- clk_enable(pl022->clk);
writew((readw(SSP_CR1(pl022->virtbase)) & (~SSP_CR1_MASK_SSE)),
SSP_CR1(pl022->virtbase));
load_ssp_default_config(pl022);
- clk_disable(pl022->clk);
status = request_irq(adev->irq[0], pl022_interrupt_handler, 0, "pl022",
pl022);
goto err_spi_register;
}
dev_dbg(dev, "probe succeded\n");
+ /* Disable the silicon block pclk and clock it when needed */
+ amba_pclk_disable(adev);
return 0;
err_spi_register:
return status;
}
- clk_enable(pl022->clk);
+ amba_pclk_enable(adev);
load_ssp_default_config(pl022);
- clk_disable(pl022->clk);
+ amba_pclk_disable(adev);
dev_dbg(&adev->dev, "suspended\n");
return 0;
}
return amba_driver_register(&pl022_driver);
}
-module_init(pl022_init);
+subsys_initcall(pl022_init);
static void __exit pl022_exit(void)
{
wait_till_not_busy(dws);
}
-static void null_cs_control(u32 command)
-{
-}
-
static int null_writer(struct dw_spi *dws)
{
u8 n_bytes = dws->n_bytes;
struct spi_transfer,
transfer_list);
- if (!last_transfer->cs_change)
+ if (!last_transfer->cs_change && dws->cs_control)
dws->cs_control(MRST_SPI_DEASSERT);
msg->state = NULL;
static irqreturn_t dw_spi_irq(int irq, void *dev_id)
{
struct dw_spi *dws = dev_id;
+ u16 irq_status, irq_mask = 0x3f;
+
+ irq_status = dw_readw(dws, isr) & irq_mask;
+ if (!irq_status)
+ return IRQ_NONE;
if (!dws->cur_msg) {
spi_mask_intr(dws, SPI_INT_TXEI);
*/
if (dws->cs_control) {
if (dws->rx && dws->tx)
- chip->tmode = 0x00;
+ chip->tmode = SPI_TMOD_TR;
else if (dws->rx)
- chip->tmode = 0x02;
+ chip->tmode = SPI_TMOD_RO;
else
- chip->tmode = 0x01;
+ chip->tmode = SPI_TMOD_TO;
- cr0 &= ~(0x3 << SPI_MODE_OFFSET);
+ cr0 &= ~SPI_TMOD_MASK;
cr0 |= (chip->tmode << SPI_TMOD_OFFSET);
}
chip = kzalloc(sizeof(struct chip_data), GFP_KERNEL);
if (!chip)
return -ENOMEM;
-
- chip->cs_control = null_cs_control;
- chip->enable_dma = 0;
}
/*
dws->dma_inited = 0;
dws->dma_addr = (dma_addr_t)(dws->paddr + 0x60);
- ret = request_irq(dws->irq, dw_spi_irq, 0,
+ ret = request_irq(dws->irq, dw_spi_irq, IRQF_SHARED,
"dw_spi", dws);
if (ret < 0) {
dev_err(&master->dev, "can not get IRQ\n");
#include <linux/init.h>
#include <linux/cache.h>
#include <linux/mutex.h>
+#include <linux/of_device.h>
#include <linux/slab.h>
#include <linux/mod_devicetable.h>
#include <linux/spi/spi.h>
const struct spi_device *spi = to_spi_device(dev);
const struct spi_driver *sdrv = to_spi_driver(drv);
+ /* Attempt an OF style match */
+ if (of_driver_match_device(dev, drv))
+ return 1;
+
if (sdrv->id_table)
return !!spi_match_id(sdrv->id_table, spi);
EXPORT_SYMBOL_GPL(spi_register_master);
-static int __unregister(struct device *dev, void *master_dev)
+static int __unregister(struct device *dev, void *null)
{
- /* note: before about 2.6.14-rc1 this would corrupt memory: */
- if (dev != master_dev)
- spi_unregister_device(to_spi_device(dev));
+ spi_unregister_device(to_spi_device(dev));
return 0;
}
{
int dummy;
- dummy = device_for_each_child(master->dev.parent, &master->dev,
- __unregister);
+ dummy = device_for_each_child(&master->dev, NULL, __unregister);
device_unregister(&master->dev);
}
EXPORT_SYMBOL_GPL(spi_unregister_master);
spi_gpio->bitbang.master = spi_master_get(master);
spi_gpio->bitbang.chipselect = spi_gpio_chipselect;
- if ((master_flags & (SPI_MASTER_NO_RX | SPI_MASTER_NO_RX)) == 0) {
+ if ((master_flags & (SPI_MASTER_NO_TX | SPI_MASTER_NO_RX)) == 0) {
spi_gpio->bitbang.txrx_word[SPI_MODE_0] = spi_gpio_txrx_word_mode0;
spi_gpio->bitbang.txrx_word[SPI_MODE_1] = spi_gpio_txrx_word_mode1;
spi_gpio->bitbang.txrx_word[SPI_MODE_2] = spi_gpio_txrx_word_mode2;
xfer_ofs = mspi->xfer_in_progress->len - mspi->count;
- out_be32(&rx_bd->cbd_bufaddr, mspi->rx_dma + xfer_ofs);
+ if (mspi->rx_dma == mspi->dma_dummy_rx)
+ out_be32(&rx_bd->cbd_bufaddr, mspi->rx_dma);
+ else
+ out_be32(&rx_bd->cbd_bufaddr, mspi->rx_dma + xfer_ofs);
out_be16(&rx_bd->cbd_datlen, 0);
out_be16(&rx_bd->cbd_sc, BD_SC_EMPTY | BD_SC_INTRPT | BD_SC_WRAP);
- out_be32(&tx_bd->cbd_bufaddr, mspi->tx_dma + xfer_ofs);
+ if (mspi->tx_dma == mspi->dma_dummy_tx)
+ out_be32(&tx_bd->cbd_bufaddr, mspi->tx_dma);
+ else
+ out_be32(&tx_bd->cbd_bufaddr, mspi->tx_dma + xfer_ofs);
out_be16(&tx_bd->cbd_datlen, xfer_len);
out_be16(&tx_bd->cbd_sc, BD_SC_READY | BD_SC_INTRPT | BD_SC_WRAP |
BD_SC_LAST);
val = readl(regs + S3C64XX_SPI_STATUS);
} while (TX_FIFO_LVL(val, sci) && loops--);
+ if (loops == 0)
+ dev_warn(&sdd->pdev->dev, "Timed out flushing TX FIFO\n");
+
/* Flush RxFIFO*/
loops = msecs_to_loops(1);
do {
break;
} while (loops--);
+ if (loops == 0)
+ dev_warn(&sdd->pdev->dev, "Timed out flushing RX FIFO\n");
+
val = readl(regs + S3C64XX_SPI_CH_CFG);
val &= ~S3C64XX_SPI_CH_SW_RST;
writel(val, regs + S3C64XX_SPI_CH_CFG);
/* millisecs to xfer 'len' bytes @ 'cur_speed' */
ms = xfer->len * 8 * 1000 / sdd->cur_speed;
- ms += 5; /* some tolerance */
+ ms += 10; /* some tolerance */
if (dma_mode) {
val = msecs_to_jiffies(ms) + 10;
val = wait_for_completion_timeout(&sdd->xfer_completion, val);
} else {
+ u32 status;
val = msecs_to_loops(ms);
do {
- val = readl(regs + S3C64XX_SPI_STATUS);
- } while (RX_FIFO_LVL(val, sci) < xfer->len && --val);
+ status = readl(regs + S3C64XX_SPI_STATUS);
+ } while (RX_FIFO_LVL(status, sci) < xfer->len && --val);
}
if (!val)
writel(val, regs + S3C64XX_SPI_CLK_CFG);
}
-void s3c64xx_spi_dma_rxcb(struct s3c2410_dma_chan *chan, void *buf_id,
- int size, enum s3c2410_dma_buffresult res)
+static void s3c64xx_spi_dma_rxcb(struct s3c2410_dma_chan *chan, void *buf_id,
+ int size, enum s3c2410_dma_buffresult res)
{
struct s3c64xx_spi_driver_data *sdd = buf_id;
unsigned long flags;
spin_unlock_irqrestore(&sdd->lock, flags);
}
-void s3c64xx_spi_dma_txcb(struct s3c2410_dma_chan *chan, void *buf_id,
- int size, enum s3c2410_dma_buffresult res)
+static void s3c64xx_spi_dma_txcb(struct s3c2410_dma_chan *chan, void *buf_id,
+ int size, enum s3c2410_dma_buffresult res)
{
struct s3c64xx_spi_driver_data *sdd = buf_id;
unsigned long flags;
list_for_each_entry(xfer, &msg->transfers, transfer_list) {
if (xfer->tx_buf != NULL) {
- xfer->tx_dma = dma_map_single(dev, xfer->tx_buf,
- xfer->len, DMA_TO_DEVICE);
+ xfer->tx_dma = dma_map_single(dev,
+ (void *)xfer->tx_buf, xfer->len,
+ DMA_TO_DEVICE);
if (dma_mapping_error(dev, xfer->tx_dma)) {
dev_err(dev, "dma_map_single Tx failed\n");
xfer->tx_dma = XFER_DMAADDR_INVALID;
return -ENODEV;
}
+ sci = pdev->dev.platform_data;
+ if (!sci->src_clk_name) {
+ dev_err(&pdev->dev,
+ "Board init must call s3c64xx_spi_set_info()\n");
+ return -EINVAL;
+ }
+
/* Check for availability of necessary resource */
dmatx_res = platform_get_resource(pdev, IORESOURCE_DMA, 0);
return -ENOMEM;
}
- sci = pdev->dev.platform_data;
-
platform_set_drvdata(pdev, master);
sdd = spi_master_get_devdata(master);
{
return platform_driver_probe(&s3c64xx_spi_driver, s3c64xx_spi_probe);
}
-module_init(s3c64xx_spi_init);
+subsys_initcall(s3c64xx_spi_init);
static void __exit s3c64xx_spi_exit(void)
{
#include "hash.h"
#include <linux/if_arp.h>
-#include <linux/netfilter_bridge.h>
#define MIN(x, y) ((x) < (y) ? (x) : (y))
return NOTIFY_DONE;
}
-static int batman_skb_recv_finish(struct sk_buff *skb)
-{
- return NF_ACCEPT;
-}
-
/* receive a packet with the batman ethertype coming on a hard
* interface */
int batman_skb_recv(struct sk_buff *skb, struct net_device *dev,
if (atomic_read(&module_state) != MODULE_ACTIVE)
goto err_free;
- /* if netfilter/ebtables wants to block incoming batman
- * packets then give them a chance to do so here */
- ret = NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_IN, skb, dev, NULL,
- batman_skb_recv_finish);
- if (ret != 1)
- goto err_out;
-
/* packet should hold at least type and version */
if (unlikely(skb_headlen(skb) < 2))
goto err_free;
#include "vis.h"
#include "aggregation.h"
-#include <linux/netfilter_bridge.h>
static void send_outstanding_bcast_packet(struct work_struct *work);
/* dev_queue_xmit() returns a negative result on error. However on
* congestion and traffic shaping, it drops and returns NET_XMIT_DROP
- * (which is > 0). This will not be treated as an error.
- * Also, if netfilter/ebtables wants to block outgoing batman
- * packets then giving them a chance to do so here */
+ * (which is > 0). This will not be treated as an error. */
- return NF_HOOK(PF_BRIDGE, NF_BR_LOCAL_OUT, skb, NULL, skb->dev,
- dev_queue_xmit);
+ return dev_queue_xmit(skb);
send_skb_err:
kfree_skb(skb);
return NET_XMIT_DROP;
extern long st_register(struct st_proto_s *);
extern long st_unregister(enum proto_type);
-extern struct platform_device *st_get_plat_device(void);
#endif /* ST_H */
#include "st_ll.h"
#include "st.h"
-#define VERBOSE
/* strings to be used for rfkill entries and by
* ST Core to be used for sysfs debug entry
*/
long err = 0;
unsigned long flags = 0;
- st_kim_ref(&st_gdata);
+ st_kim_ref(&st_gdata, 0);
pr_info("%s(%d) ", __func__, new_proto->type);
if (st_gdata == NULL || new_proto == NULL || new_proto->recv == NULL
|| new_proto->reg_complete_cb == NULL) {
pr_debug("%s: %d ", __func__, type);
- st_kim_ref(&st_gdata);
+ st_kim_ref(&st_gdata, 0);
if (type < ST_BT || type >= ST_MAX) {
pr_err(" protocol %d not supported", type);
return -EPROTONOSUPPORT;
#endif
long len;
- st_kim_ref(&st_gdata);
+ st_kim_ref(&st_gdata, 0);
if (unlikely(skb == NULL || st_gdata == NULL
|| st_gdata->tty == NULL)) {
pr_err("data/tty unavailable to perform write");
struct st_data_s *st_gdata;
pr_info("%s ", __func__);
- st_kim_ref(&st_gdata);
+ st_kim_ref(&st_gdata, 0);
st_gdata->tty = tty;
tty->disc_data = st_gdata;
void st_core_exit(struct st_data_s *);
/* ask for reference from KIM */
-void st_kim_ref(struct st_data_s **);
+void st_kim_ref(struct st_data_s **, int);
#define GPS_STUB_TEST
#ifdef GPS_STUB_TEST
PROTO_ENTRY(ST_GPS, "GPS"),
};
+#define MAX_ST_DEVICES 3 /* Imagine 1 on each UART for now */
+struct platform_device *st_kim_devices[MAX_ST_DEVICES];
/**********************************************************************/
/* internal functions */
/**
+ * st_get_plat_device -
+ * function which returns the reference to the platform device
+ * requested by id. As of now only 1 such device exists (id=0)
+ * the context requesting for reference can get the id to be
+ * requested by a. The protocol driver which is registering or
+ * b. the tty device which is opened.
+ */
+static struct platform_device *st_get_plat_device(int id)
+{
+ return st_kim_devices[id];
+}
+
+/**
* validate_firmware_response -
* function to return whether the firmware response was proper
* in case of error don't complete so that waiting for proper
struct kim_data_s *kim_gdata;
pr_info(" %s ", __func__);
- kim_pdev = st_get_plat_device();
+ kim_pdev = st_get_plat_device(0);
kim_gdata = dev_get_drvdata(&kim_pdev->dev);
if (kim_gdata->gpios[type] == -1) {
* This would enable multiple such platform devices to exist
* on a given platform
*/
-void st_kim_ref(struct st_data_s **core_data)
+void st_kim_ref(struct st_data_s **core_data, int id)
{
struct platform_device *pdev;
struct kim_data_s *kim_gdata;
/* get kim_gdata reference from platform device */
- pdev = st_get_plat_device();
+ pdev = st_get_plat_device(id);
kim_gdata = dev_get_drvdata(&pdev->dev);
*core_data = kim_gdata->core_data;
}
long *gpios = pdev->dev.platform_data;
struct kim_data_s *kim_gdata;
+ st_kim_devices[pdev->id] = pdev;
kim_gdata = kzalloc(sizeof(struct kim_data_s), GFP_ATOMIC);
if (!kim_gdata) {
pr_err("no mem to allocate");
config VIDEO_TM6000
tristate "TV Master TM5600/6000/6010 driver"
- depends on VIDEO_DEV && I2C && INPUT && USB && EXPERIMENTAL
+ depends on VIDEO_DEV && I2C && INPUT && IR_CORE && USB && EXPERIMENTAL
select VIDEO_TUNER
select MEDIA_TUNER_XC2028
select MEDIA_TUNER_XC5000
}
struct tm6000_ir_poll_result {
- u8 rc_data[4];
+ u16 rc_data;
};
struct tm6000_IR {
int polling;
struct delayed_work work;
u8 wait:1;
+ u8 key:1;
struct urb *int_urb;
u8 *urb_data;
- u8 key:1;
int (*get_key) (struct tm6000_IR *, struct tm6000_ir_poll_result *);
if (urb->status != 0)
printk(KERN_INFO "not ready\n");
- else if (urb->actual_length > 0)
+ else if (urb->actual_length > 0) {
memcpy(ir->urb_data, urb->transfer_buffer, urb->actual_length);
- dprintk("data %02x %02x %02x %02x\n", ir->urb_data[0],
- ir->urb_data[1], ir->urb_data[2], ir->urb_data[3]);
+ dprintk("data %02x %02x %02x %02x\n", ir->urb_data[0],
+ ir->urb_data[1], ir->urb_data[2], ir->urb_data[3]);
- ir->key = 1;
+ ir->key = 1;
+ }
rc = usb_submit_urb(urb, GFP_ATOMIC);
}
int rc;
u8 buf[2];
- if (ir->wait && !&dev->int_in) {
- poll_result->rc_data[0] = 0xff;
+ if (ir->wait && !&dev->int_in)
return 0;
- }
if (&dev->int_in) {
- poll_result->rc_data[0] = ir->urb_data[0];
- poll_result->rc_data[1] = ir->urb_data[1];
+ if (ir->ir.ir_type == IR_TYPE_RC5)
+ poll_result->rc_data = ir->urb_data[0];
+ else
+ poll_result->rc_data = ir->urb_data[0] | ir->urb_data[1] << 8;
} else {
tm6000_set_reg(dev, REQ_04_EN_DISABLE_MCU_INT, 2, 0);
msleep(10);
tm6000_set_reg(dev, REQ_04_EN_DISABLE_MCU_INT, 2, 1);
msleep(10);
- rc = tm6000_read_write_usb(dev, USB_DIR_IN | USB_TYPE_VENDOR |
- USB_RECIP_DEVICE, REQ_02_GET_IR_CODE, 0, 0, buf, 1);
+ if (ir->ir.ir_type == IR_TYPE_RC5) {
+ rc = tm6000_read_write_usb(dev, USB_DIR_IN |
+ USB_TYPE_VENDOR | USB_RECIP_DEVICE,
+ REQ_02_GET_IR_CODE, 0, 0, buf, 1);
- msleep(10);
+ msleep(10);
- dprintk("read data=%02x\n", buf[0]);
- if (rc < 0)
- return rc;
+ dprintk("read data=%02x\n", buf[0]);
+ if (rc < 0)
+ return rc;
- poll_result->rc_data[0] = buf[0];
+ poll_result->rc_data = buf[0];
+ } else {
+ rc = tm6000_read_write_usb(dev, USB_DIR_IN |
+ USB_TYPE_VENDOR | USB_RECIP_DEVICE,
+ REQ_02_GET_IR_CODE, 0, 0, buf, 2);
+
+ msleep(10);
+
+ dprintk("read data=%04x\n", buf[0] | buf[1] << 8);
+ if (rc < 0)
+ return rc;
+
+ poll_result->rc_data = buf[0] | buf[1] << 8;
+ }
+ if ((poll_result->rc_data & 0x00ff) != 0xff)
+ ir->key = 1;
}
return 0;
}
return;
}
- dprintk("ir->get_key result data=%02x %02x\n",
- poll_result.rc_data[0], poll_result.rc_data[1]);
+ dprintk("ir->get_key result data=%04x\n", poll_result.rc_data);
- if (poll_result.rc_data[0] != 0xff && ir->key == 1) {
+ if (ir->key) {
ir_input_keydown(ir->input->input_dev, &ir->ir,
- poll_result.rc_data[0] | poll_result.rc_data[1] << 8);
+ (u32)poll_result.rc_data);
ir_input_nokey(ir->input->input_dev, &ir->ir);
ir->key = 0;
DBG_PRT(MSG_LEVEL_DEBUG, KERN_INFO "wpa_ie_len = %d\n", param->u.wpa_associate.wpa_ie_len);
- if (param->u.wpa_associate.wpa_ie &&
- copy_from_user(&abyWPAIE[0], param->u.wpa_associate.wpa_ie, param->u.wpa_associate.wpa_ie_len))
- return -EINVAL;
+ if (param->u.wpa_associate.wpa_ie_len) {
+ if (!param->u.wpa_associate.wpa_ie)
+ return -EINVAL;
+ if (param->u.wpa_associate.wpa_ie_len > sizeof(abyWPAIE))
+ return -EINVAL;
+ if (copy_from_user(&abyWPAIE[0], param->u.wpa_associate.wpa_ie, param->u.wpa_associate.wpa_ie_len))
+ return -EFAULT;
+ }
if (param->u.wpa_associate.mode == 1)
pMgmt->eConfigMode = WMAC_CONFIG_IBSS_STA;
If you are unsure about this, say N here.
config USB_SUSPEND
- bool "USB runtime power management (suspend/resume and wakeup)"
+ bool "USB runtime power management (autosuspend) and wakeup"
depends on USB && PM_RUNTIME
help
If you say Y here, you can use driver calls or the sysfs
- "power/level" file to suspend or resume individual USB
- peripherals and to enable or disable autosuspend (see
+ "power/control" file to enable or disable autosuspend for
+ individual USB peripherals (see
Documentation/usb/power-management.txt for more details).
Also, USB "remote wakeup" signaling is supported, whereby some
int usb_register_dev(struct usb_interface *intf,
struct usb_class_driver *class_driver)
{
- int retval = -EINVAL;
+ int retval;
int minor_base = class_driver->minor_base;
- int minor = 0;
+ int minor;
char name[20];
char *temp;
*/
minor_base = 0;
#endif
- intf->minor = -1;
-
- dbg ("looking for a minor, starting at %d", minor_base);
if (class_driver->fops == NULL)
- goto exit;
+ return -EINVAL;
+ if (intf->minor >= 0)
+ return -EADDRINUSE;
+
+ retval = init_usb_class();
+ if (retval)
+ return retval;
+
+ dev_dbg(&intf->dev, "looking for a minor, starting at %d", minor_base);
down_write(&minor_rwsem);
for (minor = minor_base; minor < MAX_USB_MINORS; ++minor) {
continue;
usb_minors[minor] = class_driver->fops;
-
- retval = 0;
+ intf->minor = minor;
break;
}
up_write(&minor_rwsem);
-
- if (retval)
- goto exit;
-
- retval = init_usb_class();
- if (retval)
- goto exit;
-
- intf->minor = minor;
+ if (intf->minor < 0)
+ return -EXFULL;
/* create a usb class device for this usb interface */
snprintf(name, sizeof(name), class_driver->name, minor - minor_base);
"%s", temp);
if (IS_ERR(intf->usb_dev)) {
down_write(&minor_rwsem);
- usb_minors[intf->minor] = NULL;
+ usb_minors[minor] = NULL;
+ intf->minor = -1;
up_write(&minor_rwsem);
retval = PTR_ERR(intf->usb_dev);
}
-exit:
return retval;
}
EXPORT_SYMBOL_GPL(usb_register_dev);
intf->dev.groups = usb_interface_groups;
intf->dev.dma_mask = dev->dev.dma_mask;
INIT_WORK(&intf->reset_ws, __usb_queue_reset_device);
+ intf->minor = -1;
device_initialize(&intf->dev);
dev_set_name(&intf->dev, "%d-%s:%d.%d",
dev->bus->busnum, dev->devpath,
ehci->broken_periodic = 1;
ehci_info(ehci, "using broken periodic workaround\n");
}
+ if (pdev->device == 0x0806 || pdev->device == 0x0811
+ || pdev->device == 0x0829) {
+ ehci_info(ehci, "disable lpm for langwell/penwell\n");
+ ehci->has_lpm = 0;
+ }
break;
case PCI_VENDOR_ID_TDI:
if (pdev->device == PCI_DEVICE_ID_TDI_EHCI) {
index, transmit ? 'T' : 'R', cppi_ch);
cppi_ch->hw_ep = ep;
cppi_ch->channel.status = MUSB_DMA_STATUS_FREE;
+ cppi_ch->channel.max_len = 0x7fffffff;
DBG(4, "Allocate CPPI%d %cX\n", index, transmit ? 'T' : 'R');
return &cppi_ch->channel;
static int musb_test_mode_open(struct inode *inode, struct file *file)
{
- file->private_data = inode->i_private;
-
return single_open(file, musb_test_mode_show, inode->i_private);
}
static ssize_t musb_test_mode_write(struct file *file,
const char __user *ubuf, size_t count, loff_t *ppos)
{
- struct musb *musb = file->private_data;
+ struct seq_file *s = file->private_data;
+ struct musb *musb = s->private;
u8 test = 0;
char buf[18];
#ifndef CONFIG_MUSB_PIO_ONLY
if (is_dma_capable() && musb_ep->dma) {
struct dma_controller *c = musb->dma_controller;
+ size_t request_size;
+
+ /* setup DMA, then program endpoint CSR */
+ request_size = min_t(size_t, request->length - request->actual,
+ musb_ep->dma->max_len);
use_dma = (request->dma != DMA_ADDR_INVALID);
#ifdef CONFIG_USB_INVENTRA_DMA
{
- size_t request_size;
-
- /* setup DMA, then program endpoint CSR */
- request_size = min_t(size_t, request->length,
- musb_ep->dma->max_len);
if (request_size < musb_ep->packet_sz)
musb_ep->dma->desired_mode = 0;
else
use_dma = use_dma && c->channel_program(
musb_ep->dma, musb_ep->packet_sz,
0,
- request->dma,
- request->length);
+ request->dma + request->actual,
+ request_size);
if (!use_dma) {
c->channel_release(musb_ep->dma);
musb_ep->dma = NULL;
use_dma = use_dma && c->channel_program(
musb_ep->dma, musb_ep->packet_sz,
request->zero,
- request->dma,
- request->length);
+ request->dma + request->actual,
+ request_size);
#endif
}
#endif
request->zero = 0;
}
- /* ... or if not, then complete it. */
- musb_g_giveback(musb_ep, request, 0);
-
- /*
- * Kickstart next transfer if appropriate;
- * the packet that just completed might not
- * be transmitted for hours or days.
- * REVISIT for double buffering...
- * FIXME revisit for stalls too...
- */
- musb_ep_select(mbase, epnum);
- csr = musb_readw(epio, MUSB_TXCSR);
- if (csr & MUSB_TXCSR_FIFONOTEMPTY)
- return;
-
- request = musb_ep->desc ? next_request(musb_ep) : NULL;
- if (!request) {
- DBG(4, "%s idle now\n",
- musb_ep->end_point.name);
- return;
+ if (request->actual == request->length) {
+ musb_g_giveback(musb_ep, request, 0);
+ request = musb_ep->desc ? next_request(musb_ep) : NULL;
+ if (!request) {
+ DBG(4, "%s idle now\n",
+ musb_ep->end_point.name);
+ return;
+ }
}
}
{
const u8 epnum = req->epnum;
struct usb_request *request = &req->request;
- struct musb_ep *musb_ep = &musb->endpoints[epnum].ep_out;
+ struct musb_ep *musb_ep;
void __iomem *epio = musb->endpoints[epnum].regs;
unsigned fifo_count = 0;
- u16 len = musb_ep->packet_sz;
+ u16 len;
u16 csr = musb_readw(epio, MUSB_RXCSR);
+ struct musb_hw_ep *hw_ep = &musb->endpoints[epnum];
+
+ if (hw_ep->is_shared_fifo)
+ musb_ep = &hw_ep->ep_in;
+ else
+ musb_ep = &hw_ep->ep_out;
+
+ len = musb_ep->packet_sz;
/* We shouldn't get here while DMA is active, but we do... */
if (dma_channel_status(musb_ep->dma) == MUSB_DMA_STATUS_BUSY) {
*/
csr |= MUSB_RXCSR_DMAENAB;
-#ifdef USE_MODE1
csr |= MUSB_RXCSR_AUTOCLEAR;
+#ifdef USE_MODE1
/* csr |= MUSB_RXCSR_DMAMODE; */
/* this special sequence (enabling and then
if (request->actual < request->length) {
int transfer_size = 0;
#ifdef USE_MODE1
- transfer_size = min(request->length,
+ transfer_size = min(request->length - request->actual,
channel->max_len);
#else
- transfer_size = len;
+ transfer_size = min(request->length - request->actual,
+ (unsigned)len);
#endif
if (transfer_size <= musb_ep->packet_sz)
musb_ep->dma->desired_mode = 0;
u16 csr;
struct usb_request *request;
void __iomem *mbase = musb->mregs;
- struct musb_ep *musb_ep = &musb->endpoints[epnum].ep_out;
+ struct musb_ep *musb_ep;
void __iomem *epio = musb->endpoints[epnum].regs;
struct dma_channel *dma;
+ struct musb_hw_ep *hw_ep = &musb->endpoints[epnum];
+
+ if (hw_ep->is_shared_fifo)
+ musb_ep = &hw_ep->ep_in;
+ else
+ musb_ep = &hw_ep->ep_out;
musb_ep_select(mbase, epnum);
/*
* Context: controller locked, IRQs blocked.
*/
-static void musb_ep_restart(struct musb *musb, struct musb_request *req)
+void musb_ep_restart(struct musb *musb, struct musb_request *req)
{
DBG(3, "<== %s request %p len %u on hw_ep%d\n",
req->tx ? "TX/IN" : "RX/OUT",
extern void musb_g_giveback(struct musb_ep *, struct usb_request *, int);
+extern void musb_ep_restart(struct musb *, struct musb_request *);
+
#endif /* __MUSB_GADGET_H */
ctrlrequest->wIndex & 0x0f;
struct musb_ep *musb_ep;
struct musb_hw_ep *ep;
+ struct musb_request *request;
void __iomem *regs;
int is_in;
u16 csr;
musb_writew(regs, MUSB_RXCSR, csr);
}
+ /* Maybe start the first request in the queue */
+ request = to_musb_request(
+ next_request(musb_ep));
+ if (!musb_ep->busy && request) {
+ DBG(3, "restarting the request\n");
+ musb_ep_restart(musb, request);
+ }
+
/* select ep0 again */
musb_ep_select(mbase, 0);
} break;
qh->segsize = length;
+ /*
+ * Ensure the data reaches to main memory before starting
+ * DMA transfer
+ */
+ wmb();
+
if (!dma->channel_program(channel, pkt_size, mode,
urb->transfer_dma + offset, length)) {
dma->channel_release(channel);
}
}
-static void twl4030_phy_power(struct twl4030_usb *twl, int on)
+static void __twl4030_phy_power(struct twl4030_usb *twl, int on)
{
- u8 pwr;
+ u8 pwr = twl4030_usb_read(twl, PHY_PWR_CTRL);
+
+ if (on)
+ pwr &= ~PHY_PWR_PHYPWD;
+ else
+ pwr |= PHY_PWR_PHYPWD;
- pwr = twl4030_usb_read(twl, PHY_PWR_CTRL);
+ WARN_ON(twl4030_usb_write_verify(twl, PHY_PWR_CTRL, pwr) < 0);
+}
+
+static void twl4030_phy_power(struct twl4030_usb *twl, int on)
+{
if (on) {
regulator_enable(twl->usb3v1);
regulator_enable(twl->usb1v8);
twl_i2c_write_u8(TWL4030_MODULE_PM_RECEIVER, 0,
VUSB_DEDICATED2);
regulator_enable(twl->usb1v5);
- pwr &= ~PHY_PWR_PHYPWD;
- WARN_ON(twl4030_usb_write_verify(twl, PHY_PWR_CTRL, pwr) < 0);
+ __twl4030_phy_power(twl, 1);
twl4030_usb_write(twl, PHY_CLK_CTRL,
twl4030_usb_read(twl, PHY_CLK_CTRL) |
(PHY_CLK_CTRL_CLOCKGATING_EN |
PHY_CLK_CTRL_CLK32K_EN));
- } else {
- pwr |= PHY_PWR_PHYPWD;
- WARN_ON(twl4030_usb_write_verify(twl, PHY_PWR_CTRL, pwr) < 0);
+ } else {
+ __twl4030_phy_power(twl, 0);
regulator_disable(twl->usb1v5);
regulator_disable(twl->usb1v8);
regulator_disable(twl->usb3v1);
twl4030_phy_power(twl, 0);
twl->asleep = 1;
+ dev_dbg(twl->dev, "%s\n", __func__);
}
-static void twl4030_phy_resume(struct twl4030_usb *twl)
+static void __twl4030_phy_resume(struct twl4030_usb *twl)
{
- if (!twl->asleep)
- return;
-
twl4030_phy_power(twl, 1);
twl4030_i2c_access(twl, 1);
twl4030_usb_set_mode(twl, twl->usb_mode);
if (twl->usb_mode == T2_USB_MODE_ULPI)
twl4030_i2c_access(twl, 0);
+}
+
+static void twl4030_phy_resume(struct twl4030_usb *twl)
+{
+ if (!twl->asleep)
+ return;
+ __twl4030_phy_resume(twl);
twl->asleep = 0;
+ dev_dbg(twl->dev, "%s\n", __func__);
}
static int twl4030_usb_ldo_init(struct twl4030_usb *twl)
twl_i2c_write_u8(TWL4030_MODULE_PM_MASTER, 0xC0, PROTECT_KEY);
twl_i2c_write_u8(TWL4030_MODULE_PM_MASTER, 0x0C, PROTECT_KEY);
- /* put VUSB3V1 LDO in active state */
- twl_i2c_write_u8(TWL4030_MODULE_PM_RECEIVER, 0, VUSB_DEDICATED2);
+ /* Keep VUSB3V1 LDO in sleep state until VBUS/ID change detected*/
+ /*twl_i2c_write_u8(TWL4030_MODULE_PM_RECEIVER, 0, VUSB_DEDICATED2);*/
/* input to VUSB3V1 LDO is from VBAT, not VBUS */
twl_i2c_write_u8(TWL4030_MODULE_PM_RECEIVER, 0x14, VUSB_DEDICATED1);
return IRQ_HANDLED;
}
+static void twl4030_usb_phy_init(struct twl4030_usb *twl)
+{
+ int status;
+
+ status = twl4030_usb_linkstat(twl);
+ if (status >= 0) {
+ if (status == USB_EVENT_NONE) {
+ __twl4030_phy_power(twl, 0);
+ twl->asleep = 1;
+ } else {
+ __twl4030_phy_resume(twl);
+ twl->asleep = 0;
+ }
+
+ blocking_notifier_call_chain(&twl->otg.notifier, status,
+ twl->otg.gadget);
+ }
+ sysfs_notify(&twl->dev->kobj, NULL, "vbus");
+}
+
static int twl4030_set_suspend(struct otg_transceiver *x, int suspend)
{
struct twl4030_usb *twl = xceiv_to_twl(x);
struct twl4030_usb_data *pdata = pdev->dev.platform_data;
struct twl4030_usb *twl;
int status, err;
- u8 pwr;
if (!pdata) {
dev_dbg(&pdev->dev, "platform_data not available\n");
twl->otg.set_peripheral = twl4030_set_peripheral;
twl->otg.set_suspend = twl4030_set_suspend;
twl->usb_mode = pdata->usb_mode;
-
- pwr = twl4030_usb_read(twl, PHY_PWR_CTRL);
-
- twl->asleep = (pwr & PHY_PWR_PHYPWD);
+ twl->asleep = 1;
/* init spinlock for workqueue */
spin_lock_init(&twl->lock);
return status;
}
- /* The IRQ handler just handles changes from the previous states
- * of the ID and VBUS pins ... in probe() we must initialize that
- * previous state. The easy way: fake an IRQ.
- *
- * REVISIT: a real IRQ might have happened already, if PREEMPT is
- * enabled. Else the IRQ may not yet be configured or enabled,
- * because of scheduling delays.
+ /* Power down phy or make it work according to
+ * current link state.
*/
- twl4030_usb_irq(twl->irq, twl);
+ twl4030_usb_phy_init(twl);
dev_info(&pdev->dev, "Initialized TWL4030 USB module\n");
return 0;
case TIOCGICOUNT:
cnow = mos7720_port->icount;
+
+ memset(&icount, 0, sizeof(struct serial_icounter_struct));
+
icount.cts = cnow.cts;
icount.dsr = cnow.dsr;
icount.rng = cnow.rng;
case TIOCGICOUNT:
cnow = mos7840_port->icount;
smp_rmb();
+
+ memset(&icount, 0, sizeof(struct serial_icounter_struct));
+
icount.cts = cnow.cts;
icount.dsr = cnow.dsr;
icount.rng = cnow.rng;
size_t len, total_len = 0;
int err, wmem;
size_t hdr_size;
- struct socket *sock = rcu_dereference(vq->private_data);
+ struct socket *sock;
+
+ sock = rcu_dereference_check(vq->private_data,
+ lockdep_is_held(&vq->mutex));
if (!sock)
return;
int r, nlogs = 0;
while (datalen > 0) {
- if (unlikely(headcount >= VHOST_NET_MAX_SG)) {
+ if (unlikely(seg >= VHOST_NET_MAX_SG)) {
r = -ENOBUFS;
goto err;
}
static void vhost_net_enable_vq(struct vhost_net *n,
struct vhost_virtqueue *vq)
{
- struct socket *sock = vq->private_data;
+ struct socket *sock;
+
+ sock = rcu_dereference_protected(vq->private_data,
+ lockdep_is_held(&vq->mutex));
if (!sock)
return;
if (vq == n->vqs + VHOST_NET_VQ_TX) {
struct socket *sock;
mutex_lock(&vq->mutex);
- sock = vq->private_data;
+ sock = rcu_dereference_protected(vq->private_data,
+ lockdep_is_held(&vq->mutex));
vhost_net_disable_vq(n, vq);
rcu_assign_pointer(vq->private_data, NULL);
mutex_unlock(&vq->mutex);
}
/* start polling new socket */
- oldsock = vq->private_data;
+ oldsock = rcu_dereference_protected(vq->private_data,
+ lockdep_is_held(&vq->mutex));
if (sock != oldsock) {
vhost_net_disable_vq(n, vq);
rcu_assign_pointer(vq->private_data, sock);
return 0;
}
+static void vhost_work_init(struct vhost_work *work, vhost_work_fn_t fn)
+{
+ INIT_LIST_HEAD(&work->node);
+ work->fn = fn;
+ init_waitqueue_head(&work->done);
+ work->flushing = 0;
+ work->queue_seq = work->done_seq = 0;
+}
+
/* Init poll structure */
void vhost_poll_init(struct vhost_poll *poll, vhost_work_fn_t fn,
unsigned long mask, struct vhost_dev *dev)
{
- struct vhost_work *work = &poll->work;
-
init_waitqueue_func_entry(&poll->wait, vhost_poll_wakeup);
init_poll_funcptr(&poll->table, vhost_poll_func);
poll->mask = mask;
poll->dev = dev;
- INIT_LIST_HEAD(&work->node);
- work->fn = fn;
- init_waitqueue_head(&work->done);
- work->flushing = 0;
- work->queue_seq = work->done_seq = 0;
+ vhost_work_init(&poll->work, fn);
}
/* Start polling a file. We add ourselves to file's wait queue. The caller must
remove_wait_queue(poll->wqh, &poll->wait);
}
-/* Flush any work that has been scheduled. When calling this, don't hold any
- * locks that are also used by the callback. */
-void vhost_poll_flush(struct vhost_poll *poll)
+static void vhost_work_flush(struct vhost_dev *dev, struct vhost_work *work)
{
- struct vhost_work *work = &poll->work;
unsigned seq;
int left;
int flushing;
- spin_lock_irq(&poll->dev->work_lock);
+ spin_lock_irq(&dev->work_lock);
seq = work->queue_seq;
work->flushing++;
- spin_unlock_irq(&poll->dev->work_lock);
+ spin_unlock_irq(&dev->work_lock);
wait_event(work->done, ({
- spin_lock_irq(&poll->dev->work_lock);
+ spin_lock_irq(&dev->work_lock);
left = seq - work->done_seq <= 0;
- spin_unlock_irq(&poll->dev->work_lock);
+ spin_unlock_irq(&dev->work_lock);
left;
}));
- spin_lock_irq(&poll->dev->work_lock);
+ spin_lock_irq(&dev->work_lock);
flushing = --work->flushing;
- spin_unlock_irq(&poll->dev->work_lock);
+ spin_unlock_irq(&dev->work_lock);
BUG_ON(flushing < 0);
}
-void vhost_poll_queue(struct vhost_poll *poll)
+/* Flush any work that has been scheduled. When calling this, don't hold any
+ * locks that are also used by the callback. */
+void vhost_poll_flush(struct vhost_poll *poll)
+{
+ vhost_work_flush(poll->dev, &poll->work);
+}
+
+static inline void vhost_work_queue(struct vhost_dev *dev,
+ struct vhost_work *work)
{
- struct vhost_dev *dev = poll->dev;
- struct vhost_work *work = &poll->work;
unsigned long flags;
spin_lock_irqsave(&dev->work_lock, flags);
spin_unlock_irqrestore(&dev->work_lock, flags);
}
+void vhost_poll_queue(struct vhost_poll *poll)
+{
+ vhost_work_queue(poll->dev, &poll->work);
+}
+
static void vhost_vq_reset(struct vhost_dev *dev,
struct vhost_virtqueue *vq)
{
return dev->mm == current->mm ? 0 : -EPERM;
}
+struct vhost_attach_cgroups_struct {
+ struct vhost_work work;
+ struct task_struct *owner;
+ int ret;
+};
+
+static void vhost_attach_cgroups_work(struct vhost_work *work)
+{
+ struct vhost_attach_cgroups_struct *s;
+ s = container_of(work, struct vhost_attach_cgroups_struct, work);
+ s->ret = cgroup_attach_task_all(s->owner, current);
+}
+
+static int vhost_attach_cgroups(struct vhost_dev *dev)
+{
+ struct vhost_attach_cgroups_struct attach;
+ attach.owner = current;
+ vhost_work_init(&attach.work, vhost_attach_cgroups_work);
+ vhost_work_queue(dev, &attach.work);
+ vhost_work_flush(dev, &attach.work);
+ return attach.ret;
+}
+
/* Caller should have device mutex */
static long vhost_dev_set_owner(struct vhost_dev *dev)
{
}
dev->worker = worker;
- err = cgroup_attach_task_current_cg(worker);
+ wake_up_process(worker); /* avoid contributing to loadavg */
+
+ err = vhost_attach_cgroups(dev);
if (err)
goto err_cgroup;
- wake_up_process(worker); /* avoid contributing to loadavg */
return 0;
err_cgroup:
kthread_stop(worker);
+ dev->worker = NULL;
err_worker:
if (dev->mm)
mmput(dev->mm);
vhost_dev_cleanup(dev);
memory->nregions = 0;
- dev->memory = memory;
+ RCU_INIT_POINTER(dev->memory, memory);
return 0;
}
fput(dev->log_file);
dev->log_file = NULL;
/* No one will access memory at this point */
- kfree(dev->memory);
- dev->memory = NULL;
+ kfree(rcu_dereference_protected(dev->memory,
+ lockdep_is_held(&dev->mutex)));
+ RCU_INIT_POINTER(dev->memory, NULL);
if (dev->mm)
mmput(dev->mm);
dev->mm = NULL;
/* Caller should have device mutex but not vq mutex */
int vhost_log_access_ok(struct vhost_dev *dev)
{
- return memory_access_ok(dev, dev->memory, 1);
+ struct vhost_memory *mp;
+
+ mp = rcu_dereference_protected(dev->memory,
+ lockdep_is_held(&dev->mutex));
+ return memory_access_ok(dev, mp, 1);
}
/* Verify access for write logging. */
/* Caller should have vq mutex and device mutex */
static int vq_log_access_ok(struct vhost_virtqueue *vq, void __user *log_base)
{
- return vq_memory_access_ok(log_base, vq->dev->memory,
+ struct vhost_memory *mp;
+
+ mp = rcu_dereference_protected(vq->dev->memory,
+ lockdep_is_held(&vq->mutex));
+ return vq_memory_access_ok(log_base, mp,
vhost_has_feature(vq->dev, VHOST_F_LOG_ALL)) &&
(!vq->log_used || log_access_ok(log_base, vq->log_addr,
sizeof *vq->used +
kfree(newmem);
return -EFAULT;
}
- oldmem = d->memory;
+ oldmem = rcu_dereference_protected(d->memory,
+ lockdep_is_held(&d->mutex));
rcu_assign_pointer(d->memory, newmem);
synchronize_rcu();
kfree(oldmem);
if (r < 0)
return r;
len -= l;
- if (!len)
+ if (!len) {
+ if (vq->log_ctx)
+ eventfd_signal(vq->log_ctx, 1);
return 0;
+ }
}
- if (vq->log_ctx)
- eventfd_signal(vq->log_ctx, 1);
/* Length written exceeds what we have stored. This is a bug. */
BUG();
return 0;
* vhost_work execution acts instead of rcu_read_lock() and the end of
* vhost_work execution acts instead of rcu_read_lock().
* Writers use virtqueue mutex. */
- void *private_data;
+ void __rcu *private_data;
/* Log write descriptors */
void __user *log_base;
struct vhost_log log[VHOST_NET_MAX_SG];
/* Readers use RCU to access memory table pointer
* log base pointer and features.
* Writers use mutex below.*/
- struct vhost_memory *memory;
+ struct vhost_memory __rcu *memory;
struct mm_struct *mm;
struct mutex mutex;
unsigned acked_features;
static inline int vhost_has_feature(struct vhost_dev *dev, int bit)
{
- unsigned acked_features = rcu_dereference(dev->acked_features);
+ unsigned acked_features;
+
+ acked_features =
+ rcu_dereference_index_check(dev->acked_features,
+ lockdep_is_held(&dev->mutex));
return acked_features & (1 << bit);
}
softback_buf = 0UL;
for (i = 0; i < FB_MAX; i++) {
- int pending;
+ int pending = 0;
mapped = 0;
info = registered_fb[i];
if (info == NULL)
continue;
- pending = cancel_work_sync(&info->queue);
+ if (info->queue.func)
+ pending = cancel_work_sync(&info->queue);
DPRINTK("fbcon: %s pending work\n", (pending ? "canceled" :
"no"));
#include <linux/platform_device.h>
#include <linux/screen_info.h>
#include <linux/dmi.h>
-
+#include <linux/pci.h>
#include <video/vga.h>
static struct fb_var_screeninfo efifb_defined __devinitdata = {
M_I20, /* 20-Inch iMac */
M_I20_SR, /* 20-Inch iMac (Santa Rosa) */
M_I24, /* 24-Inch iMac */
+ M_I24_8_1, /* 24-Inch iMac, 8,1th gen */
+ M_I24_10_1, /* 24-Inch iMac, 10,1th gen */
+ M_I27_11_1, /* 27-Inch iMac, 11,1th gen */
M_MINI, /* Mac Mini */
+ M_MINI_3_1, /* Mac Mini, 3,1th gen */
+ M_MINI_4_1, /* Mac Mini, 4,1th gen */
M_MB, /* MacBook */
M_MB_2, /* MacBook, 2nd rev. */
M_MB_3, /* MacBook, 3rd rev. */
+ M_MB_5_1, /* MacBook, 5th rev. */
+ M_MB_6_1, /* MacBook, 6th rev. */
+ M_MB_7_1, /* MacBook, 7th rev. */
M_MB_SR, /* MacBook, 2nd gen, (Santa Rosa) */
M_MBA, /* MacBook Air */
M_MBP, /* MacBook Pro */
M_MBP_2, /* MacBook Pro 2nd gen */
+ M_MBP_2_2, /* MacBook Pro 2,2nd gen */
M_MBP_SR, /* MacBook Pro (Santa Rosa) */
M_MBP_4, /* MacBook Pro, 4th gen */
M_MBP_5_1, /* MacBook Pro, 5,1th gen */
+ M_MBP_5_2, /* MacBook Pro, 5,2th gen */
+ M_MBP_5_3, /* MacBook Pro, 5,3rd gen */
+ M_MBP_6_1, /* MacBook Pro, 6,1th gen */
+ M_MBP_6_2, /* MacBook Pro, 6,2th gen */
+ M_MBP_7_1, /* MacBook Pro, 7,1th gen */
M_UNKNOWN /* placeholder */
};
[M_I20] = { "i20", 0x80010000, 1728 * 4, 1680, 1050 }, /* guess */
[M_I20_SR] = { "imac7", 0x40010000, 1728 * 4, 1680, 1050 },
[M_I24] = { "i24", 0x80010000, 2048 * 4, 1920, 1200 }, /* guess */
+ [M_I24_8_1] = { "imac8", 0xc0060000, 2048 * 4, 1920, 1200 },
+ [M_I24_10_1] = { "imac10", 0xc0010000, 2048 * 4, 1920, 1080 },
+ [M_I27_11_1] = { "imac11", 0xc0010000, 2560 * 4, 2560, 1440 },
[M_MINI]= { "mini", 0x80000000, 2048 * 4, 1024, 768 },
+ [M_MINI_3_1] = { "mini31", 0x40010000, 1024 * 4, 1024, 768 },
+ [M_MINI_4_1] = { "mini41", 0xc0010000, 2048 * 4, 1920, 1200 },
[M_MB] = { "macbook", 0x80000000, 2048 * 4, 1280, 800 },
+ [M_MB_5_1] = { "macbook51", 0x80010000, 2048 * 4, 1280, 800 },
+ [M_MB_6_1] = { "macbook61", 0x80010000, 2048 * 4, 1280, 800 },
+ [M_MB_7_1] = { "macbook71", 0x80010000, 2048 * 4, 1280, 800 },
[M_MBA] = { "mba", 0x80000000, 2048 * 4, 1280, 800 },
[M_MBP] = { "mbp", 0x80010000, 1472 * 4, 1440, 900 },
[M_MBP_2] = { "mbp2", 0, 0, 0, 0 }, /* placeholder */
+ [M_MBP_2_2] = { "mbp22", 0x80010000, 1472 * 4, 1440, 900 },
[M_MBP_SR] = { "mbp3", 0x80030000, 2048 * 4, 1440, 900 },
[M_MBP_4] = { "mbp4", 0xc0060000, 2048 * 4, 1920, 1200 },
[M_MBP_5_1] = { "mbp51", 0xc0010000, 2048 * 4, 1440, 900 },
+ [M_MBP_5_2] = { "mbp52", 0xc0010000, 2048 * 4, 1920, 1200 },
+ [M_MBP_5_3] = { "mbp53", 0xd0010000, 2048 * 4, 1440, 900 },
+ [M_MBP_6_1] = { "mbp61", 0x90030000, 2048 * 4, 1920, 1200 },
+ [M_MBP_6_2] = { "mbp62", 0x90030000, 2048 * 4, 1680, 1050 },
+ [M_MBP_7_1] = { "mbp71", 0xc0010000, 2048 * 4, 1280, 800 },
[M_UNKNOWN] = { NULL, 0, 0, 0, 0 }
};
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "iMac6,1", M_I24),
EFIFB_DMI_SYSTEM_ID("Apple Inc.", "iMac6,1", M_I24),
EFIFB_DMI_SYSTEM_ID("Apple Inc.", "iMac7,1", M_I20_SR),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "iMac8,1", M_I24_8_1),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "iMac10,1", M_I24_10_1),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "iMac11,1", M_I27_11_1),
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "Macmini1,1", M_MINI),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "Macmini3,1", M_MINI_3_1),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "Macmini4,1", M_MINI_4_1),
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "MacBook1,1", M_MB),
/* At least one of these two will be right; maybe both? */
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "MacBook2,1", M_MB),
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "MacBook3,1", M_MB),
EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBook3,1", M_MB),
EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBook4,1", M_MB),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBook5,1", M_MB_5_1),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBook6,1", M_MB_6_1),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBook7,1", M_MB_7_1),
EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookAir1,1", M_MBA),
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "MacBookPro1,1", M_MBP),
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "MacBookPro2,1", M_MBP_2),
+ EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "MacBookPro2,2", M_MBP_2_2),
EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookPro2,1", M_MBP_2),
EFIFB_DMI_SYSTEM_ID("Apple Computer, Inc.", "MacBookPro3,1", M_MBP_SR),
EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookPro3,1", M_MBP_SR),
EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookPro4,1", M_MBP_4),
EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookPro5,1", M_MBP_5_1),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookPro5,2", M_MBP_5_2),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookPro5,3", M_MBP_5_3),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookPro6,1", M_MBP_6_1),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookPro6,2", M_MBP_6_2),
+ EFIFB_DMI_SYSTEM_ID("Apple Inc.", "MacBookPro7,1", M_MBP_7_1),
{},
};
{
struct efifb_dmi_info *info = id->driver_data;
if (info->base == 0)
- return -ENODEV;
+ return 0;
printk(KERN_INFO "efifb: dmi detected %s - framebuffer at %p "
"(%dx%d, stride %d)\n", id->ident,
info->stride);
/* Trust the bootloader over the DMI tables */
- if (screen_info.lfb_base == 0)
+ if (screen_info.lfb_base == 0) {
+#if defined(CONFIG_PCI)
+ struct pci_dev *dev = NULL;
+ int found_bar = 0;
+#endif
screen_info.lfb_base = info->base;
- if (screen_info.lfb_linelength == 0)
- screen_info.lfb_linelength = info->stride;
- if (screen_info.lfb_width == 0)
- screen_info.lfb_width = info->width;
- if (screen_info.lfb_height == 0)
- screen_info.lfb_height = info->height;
- if (screen_info.orig_video_isVGA == 0)
- screen_info.orig_video_isVGA = VIDEO_TYPE_EFI;
- return 0;
+#if defined(CONFIG_PCI)
+ /* make sure that the address in the table is actually on a
+ * VGA device's PCI BAR */
+
+ for_each_pci_dev(dev) {
+ int i;
+ if ((dev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
+ continue;
+ for (i = 0; i < DEVICE_COUNT_RESOURCE; i++) {
+ resource_size_t start, end;
+
+ start = pci_resource_start(dev, i);
+ if (start == 0)
+ break;
+ end = pci_resource_end(dev, i);
+ if (screen_info.lfb_base >= start &&
+ screen_info.lfb_base < end) {
+ found_bar = 1;
+ }
+ }
+ }
+ if (!found_bar)
+ screen_info.lfb_base = 0;
+#endif
+ }
+ if (screen_info.lfb_base) {
+ if (screen_info.lfb_linelength == 0)
+ screen_info.lfb_linelength = info->stride;
+ if (screen_info.lfb_width == 0)
+ screen_info.lfb_width = info->width;
+ if (screen_info.lfb_height == 0)
+ screen_info.lfb_height = info->height;
+ if (screen_info.orig_video_isVGA == 0)
+ screen_info.orig_video_isVGA = VIDEO_TYPE_EFI;
+ } else {
+ screen_info.lfb_linelength = 0;
+ screen_info.lfb_width = 0;
+ screen_info.lfb_height = 0;
+ screen_info.orig_video_isVGA = 0;
+ return 0;
+ }
+ return 1;
}
static int efifb_setcolreg(unsigned regno, unsigned red, unsigned green,
* Set bit to enable graphics DMA.
*/
x = readl(fbi->reg_base + LCD_SPU_DMA_CTRL0);
- x |= fbi->active ? 0x00000100 : 0;
- fbi->active = 0;
+ x &= ~CFG_GRA_ENA_MASK;
+ x |= fbi->active ? CFG_GRA_ENA(1) : CFG_GRA_ENA(0);
/*
* If we are in a pseudo-color mode, we need to enable
.fb_imageblit = cfb_imageblit,
};
-static int __init pxa168fb_init_mode(struct fb_info *info,
+static int __devinit pxa168fb_init_mode(struct fb_info *info,
struct pxa168fb_mach_info *mi)
{
struct pxa168fb_info *fbi = info->par;
return ret;
}
-static int __init pxa168fb_probe(struct platform_device *pdev)
+static int __devinit pxa168fb_probe(struct platform_device *pdev)
{
struct pxa168fb_mach_info *mi;
struct fb_info *info = 0;
.probe = pxa168fb_probe,
};
-static int __devinit pxa168fb_init(void)
+static int __init pxa168fb_init(void)
{
return platform_driver_register(&pxa168fb_driver);
}
break;
case FBIOGET_VBLANK:
+
+ memset(&sisvbblank, 0, sizeof(struct fb_vblank));
+
sisvbblank.count = 0;
sisvbblank.flags = sisfb_setupvbblankflags(ivideo, &sisvbblank.vcount, &sisvbblank.hcount);
{
struct viafb_ioctl_info viainfo;
+ memset(&viainfo, 0, sizeof(struct viafb_ioctl_info));
+
viainfo.viafb_id = VIAID;
viainfo.vendor_id = PCI_VIA_VENDOR_ID;
here to enable the OMAP1610/OMAP1710/OMAP2420/OMAP3430/OMAP4430 watchdog timer.
config PNX4008_WATCHDOG
- tristate "PNX4008 Watchdog"
- depends on ARCH_PNX4008
+ tristate "PNX4008 and LPC32XX Watchdog"
+ depends on ARCH_PNX4008 || ARCH_LPC32XX
help
Say Y here if to include support for the watchdog timer
- in the PNX4008 processor.
+ in the PNX4008 or LPC32XX processor.
This driver can be built as a module by choosing M. The module
will be called pnx4008_wdt.
if (ret) {
printk(KERN_ERR "%s: failed to request irq 1 - %d\n",
ident.identity, ret);
- return ret;
+ goto out;
}
ret = misc_register(&sbwdog_miscdev);
printk(KERN_INFO "%s: timeout is %ld.%ld secs\n",
ident.identity,
timeout / 1000000, (timeout / 100000) % 10);
- } else
- free_irq(1, (void *)user_dog);
+ return 0;
+ }
+ free_irq(1, (void *)user_dog);
+out:
+ unregister_reboot_notifier(&sbwdog_notifier);
+
return ret;
}
static void __exit sbwdog_exit(void)
{
misc_deregister(&sbwdog_miscdev);
+ free_irq(1, (void *)user_dog);
+ unregister_reboot_notifier(&sbwdog_notifier);
}
module_init(sbwdog_init);
wdt->pdev = pdev;
mutex_init(&wdt->lock);
+ /* make sure that the watchdog is disabled */
+ ts72xx_wdt_stop(wdt);
+
error = misc_register(&ts72xx_wdt_miscdev);
if (error) {
dev_err(&pdev->dev, "failed to register miscdev\n");
{
int ret = 0;
- blocking_notifier_chain_register(&xenstore_chain, nb);
+ if (xenstored_ready > 0)
+ ret = nb->notifier_call(nb, 0, NULL);
+ else
+ blocking_notifier_chain_register(&xenstore_chain, nb);
return ret;
}
void xenbus_probe(struct work_struct *unused)
{
- BUG_ON((xenstored_ready <= 0));
+ xenstored_ready = 1;
/* Enumerate devices in xenstore and watch for changes. */
xenbus_probe_devices(&xenbus_frontend);
xen_store_evtchn = xen_start_info->store_evtchn;
xen_store_mfn = xen_start_info->store_mfn;
xen_store_interface = mfn_to_virt(xen_store_mfn);
+ xenstored_ready = 1;
}
- xenstored_ready = 1;
}
/* Initialize the interface to xenstore. */
fid = filp->private_data;
P9_DPRINTK(P9_DEBUG_VFS,
- "inode: %p filp: %p fid: %d\n", inode, filp, fid->fid);
+ "v9fs_dir_release: inode: %p filp: %p fid: %d\n",
+ inode, filp, fid ? fid->fid : -1);
filemap_write_and_wait(inode->i_mapping);
- p9_client_clunk(fid);
+ if (fid)
+ p9_client_clunk(fid);
return 0;
}
P9_DPRINTK(P9_DEBUG_VFS, "inode creation failed %d\n", err);
goto error;
}
- dentry->d_op = &v9fs_cached_dentry_operations;
+ if (v9ses->cache)
+ dentry->d_op = &v9fs_cached_dentry_operations;
+ else
+ dentry->d_op = &v9fs_dentry_operations;
d_instantiate(dentry, inode);
err = v9fs_fid_add(dentry, fid);
if (err < 0)
v9fs_stat2inode(st, dentry->d_inode, dentry->d_inode->i_sb);
generic_fillattr(dentry->d_inode, stat);
+ p9stat_free(st);
kfree(st);
return 0;
}
retval = strnlen(buffer, buflen);
done:
+ p9stat_free(st);
kfree(st);
return retval;
}
.unlink = v9fs_vfs_unlink,
.mkdir = v9fs_vfs_mkdir,
.rmdir = v9fs_vfs_rmdir,
- .mknod = v9fs_vfs_mknod_dotl,
+ .mknod = v9fs_vfs_mknod,
.rename = v9fs_vfs_rename,
.getattr = v9fs_vfs_getattr,
.setattr = v9fs_vfs_setattr,
fid = v9fs_session_init(v9ses, dev_name, data);
if (IS_ERR(fid)) {
retval = PTR_ERR(fid);
+ /*
+ * we need to call session_close to tear down some
+ * of the data structure setup by session_init
+ */
goto close_session;
}
retval = -ENOMEM;
goto release_sb;
}
-
sb->s_root = root;
if (v9fs_proto_dotl(v9ses)) {
st = p9_client_getattr_dotl(fid, P9_STATS_BASIC);
if (IS_ERR(st)) {
retval = PTR_ERR(st);
- goto clunk_fid;
+ goto release_sb;
}
v9fs_stat2inode_dotl(st, root->d_inode);
st = p9_client_stat(fid);
if (IS_ERR(st)) {
retval = PTR_ERR(st);
- goto clunk_fid;
+ goto release_sb;
}
root->d_inode->i_ino = v9fs_qid2ino(&st->qid);
v9fs_fid_add(root, fid);
-P9_DPRINTK(P9_DEBUG_VFS, " simple set mount, return 0\n");
+ P9_DPRINTK(P9_DEBUG_VFS, " simple set mount, return 0\n");
simple_set_mnt(mnt, sb);
return 0;
clunk_fid:
p9_client_clunk(fid);
-
close_session:
v9fs_session_close(v9ses);
kfree(v9ses);
return retval;
-
release_sb:
+ /*
+ * we will do the session_close and root dentry release
+ * in the below call. But we need to clunk fid, because we haven't
+ * attached the fid to dentry so it won't get clunked
+ * automatically.
+ */
+ p9_client_clunk(fid);
deactivate_locked_super(sb);
return retval;
}
{
struct affs_inode_info *ei = (struct affs_inode_info *) foo;
- init_MUTEX(&ei->i_link_lock);
- init_MUTEX(&ei->i_ext_lock);
+ sema_init(&ei->i_link_lock, 1);
+ sema_init(&ei->i_ext_lock, 1);
inode_init_once(&ei->vfs_inode);
}
*/
ret = retry(iocb);
- if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED)
+ if (ret != -EIOCBRETRY && ret != -EIOCBQUEUED) {
+ /*
+ * There's no easy way to restart the syscall since other AIO's
+ * may be already running. Just fail this IO with EINTR.
+ */
+ if (unlikely(ret == -ERESTARTSYS || ret == -ERESTARTNOINTR ||
+ ret == -ERESTARTNOHAND || ret == -ERESTART_RESTARTBLOCK))
+ ret = -EINTR;
aio_complete(iocb, ret, 0);
+ }
out:
spin_lock_irq(&ctx->ctx_lock);
if (unlikely(nr < 0))
return -EINVAL;
+ if (unlikely(nr > LONG_MAX/sizeof(*iocbpp)))
+ nr = LONG_MAX/sizeof(*iocbpp);
+
if (unlikely(!access_ok(VERIFY_READ, iocbpp, (nr*sizeof(*iocbpp)))))
return -EFAULT;
if (!dump_write(file, dump_start, dump_size))
goto end_coredump;
}
-/* Finally dump the task struct. Not be used by gdb, but could be useful */
- set_fs(KERNEL_DS);
- if (!dump_write(file, current, sizeof(*current)))
- goto end_coredump;
end_coredump:
set_fs(fs);
return has_dumped;
{
int err = register_filesystem(&bm_fs_type);
if (!err) {
- err = register_binfmt(&misc_format);
+ err = insert_binfmt(&misc_format);
if (err)
unregister_filesystem(&bm_fs_type);
}
/* Allocate kernel buffer for protection data */
len = sectors * blk_integrity_tuple_size(bi);
- buf = kmalloc(len, GFP_NOIO | __GFP_NOFAIL | q->bounce_gfp);
+ buf = kmalloc(len, GFP_NOIO | q->bounce_gfp);
if (unlikely(buf == NULL)) {
printk(KERN_ERR "could not allocate integrity buffer\n");
- return -EIO;
+ return -ENOMEM;
}
end = (((unsigned long) buf) + len + PAGE_SIZE - 1) >> PAGE_SHIFT;
config CEPH_FS
tristate "Ceph distributed file system (EXPERIMENTAL)"
depends on INET && EXPERIMENTAL
+ select CEPH_LIB
select LIBCRC32C
select CRYPTO_AES
+ select CRYPTO
+ default n
help
Choose Y or M here to include support for mounting the
experimental Ceph distributed file system. Ceph is an extremely
If unsure, say N.
-config CEPH_FS_PRETTYDEBUG
- bool "Include file:line in ceph debug output"
- depends on CEPH_FS
- default n
- help
- If you say Y here, debug output will include a filename and
- line to aid debugging. This icnreases kernel size and slows
- execution slightly when debug call sites are enabled (e.g.,
- via CONFIG_DYNAMIC_DEBUG).
-
- If unsure, say N.
-
ceph-objs := super.o inode.o dir.o file.o locks.o addr.o ioctl.o \
export.o caps.o snap.o xattr.o \
- messenger.o msgpool.o buffer.o pagelist.o \
- mds_client.o mdsmap.o \
- mon_client.o \
- osd_client.o osdmap.o crush/crush.o crush/mapper.o crush/hash.o \
- debugfs.o \
- auth.o auth_none.o \
- crypto.o armor.o \
- auth_x.o \
- ceph_fs.o ceph_strings.o ceph_hash.o ceph_frag.o
+ mds_client.o mdsmap.o strings.o ceph_frag.o \
+ debugfs.o
else
#Otherwise we were called directly from the command
+++ /dev/null
-#
-# The following files are shared by (and manually synchronized
-# between) the Ceph userland and kernel client.
-#
-# userland kernel
-src/include/ceph_fs.h fs/ceph/ceph_fs.h
-src/include/ceph_fs.cc fs/ceph/ceph_fs.c
-src/include/msgr.h fs/ceph/msgr.h
-src/include/rados.h fs/ceph/rados.h
-src/include/ceph_strings.cc fs/ceph/ceph_strings.c
-src/include/ceph_frag.h fs/ceph/ceph_frag.h
-src/include/ceph_frag.cc fs/ceph/ceph_frag.c
-src/include/ceph_hash.h fs/ceph/ceph_hash.h
-src/include/ceph_hash.cc fs/ceph/ceph_hash.c
-src/crush/crush.c fs/ceph/crush/crush.c
-src/crush/crush.h fs/ceph/crush/crush.h
-src/crush/mapper.c fs/ceph/crush/mapper.c
-src/crush/mapper.h fs/ceph/crush/mapper.h
-src/crush/hash.h fs/ceph/crush/hash.h
-src/crush/hash.c fs/ceph/crush/hash.c
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/backing-dev.h>
#include <linux/fs.h>
#include <linux/task_io_accounting_ops.h>
#include "super.h"
-#include "osd_client.h"
+#include "mds_client.h"
+#include <linux/ceph/osd_client.h>
/*
* Ceph address space ops.
{
struct inode *inode = filp->f_dentry->d_inode;
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_osd_client *osdc = &ceph_inode_to_client(inode)->osdc;
+ struct ceph_osd_client *osdc =
+ &ceph_inode_to_client(inode)->client->osdc;
int err = 0;
u64 len = PAGE_CACHE_SIZE;
{
struct inode *inode = file->f_dentry->d_inode;
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_osd_client *osdc = &ceph_inode_to_client(inode)->osdc;
+ struct ceph_osd_client *osdc =
+ &ceph_inode_to_client(inode)->client->osdc;
int rc = 0;
struct page **pages;
loff_t offset;
{
struct inode *inode;
struct ceph_inode_info *ci;
- struct ceph_client *client;
+ struct ceph_fs_client *fsc;
struct ceph_osd_client *osdc;
loff_t page_off = page->index << PAGE_CACHE_SHIFT;
int len = PAGE_CACHE_SIZE;
}
inode = page->mapping->host;
ci = ceph_inode(inode);
- client = ceph_inode_to_client(inode);
- osdc = &client->osdc;
+ fsc = ceph_inode_to_client(inode);
+ osdc = &fsc->client->osdc;
/* verify this is a writeable snap context */
snapc = (void *)page->private;
if (i_size < page_off + len)
len = i_size - page_off;
- dout("writepage %p page %p index %lu on %llu~%u\n",
- inode, page, page->index, page_off, len);
+ dout("writepage %p page %p index %lu on %llu~%u snapc %p\n",
+ inode, page, page->index, page_off, len, snapc);
- writeback_stat = atomic_long_inc_return(&client->writeback_count);
+ writeback_stat = atomic_long_inc_return(&fsc->writeback_count);
if (writeback_stat >
- CONGESTION_ON_THRESH(client->mount_args->congestion_kb))
- set_bdi_congested(&client->backing_dev_info, BLK_RW_ASYNC);
+ CONGESTION_ON_THRESH(fsc->mount_options->congestion_kb))
+ set_bdi_congested(&fsc->backing_dev_info, BLK_RW_ASYNC);
set_page_writeback(page);
err = ceph_osdc_writepages(osdc, ceph_vino(inode),
struct address_space *mapping = inode->i_mapping;
__s32 rc = -EIO;
u64 bytes = 0;
- struct ceph_client *client = ceph_inode_to_client(inode);
+ struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
long writeback_stat;
unsigned issued = ceph_caps_issued(ci);
WARN_ON(!PageUptodate(page));
writeback_stat =
- atomic_long_dec_return(&client->writeback_count);
+ atomic_long_dec_return(&fsc->writeback_count);
if (writeback_stat <
- CONGESTION_OFF_THRESH(client->mount_args->congestion_kb))
- clear_bdi_congested(&client->backing_dev_info,
+ CONGESTION_OFF_THRESH(fsc->mount_options->congestion_kb))
+ clear_bdi_congested(&fsc->backing_dev_info,
BLK_RW_ASYNC);
ceph_put_snap_context((void *)page->private);
* mempool. we avoid the mempool if we can because req->r_num_pages
* may be less than the maximum write size.
*/
-static void alloc_page_vec(struct ceph_client *client,
+static void alloc_page_vec(struct ceph_fs_client *fsc,
struct ceph_osd_request *req)
{
req->r_pages = kmalloc(sizeof(struct page *) * req->r_num_pages,
GFP_NOFS);
if (!req->r_pages) {
- req->r_pages = mempool_alloc(client->wb_pagevec_pool, GFP_NOFS);
+ req->r_pages = mempool_alloc(fsc->wb_pagevec_pool, GFP_NOFS);
req->r_pages_from_pool = 1;
WARN_ON(!req->r_pages);
}
struct inode *inode = mapping->host;
struct backing_dev_info *bdi = mapping->backing_dev_info;
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_client *client;
+ struct ceph_fs_client *fsc;
pgoff_t index, start, end;
int range_whole = 0;
int should_loop = 1;
wbc->sync_mode == WB_SYNC_NONE ? "NONE" :
(wbc->sync_mode == WB_SYNC_ALL ? "ALL" : "HOLD"));
- client = ceph_inode_to_client(inode);
- if (client->mount_state == CEPH_MOUNT_SHUTDOWN) {
+ fsc = ceph_inode_to_client(inode);
+ if (fsc->mount_state == CEPH_MOUNT_SHUTDOWN) {
pr_warning("writepage_start %p on forced umount\n", inode);
return -EIO; /* we're in a forced umount, don't write! */
}
- if (client->mount_args->wsize && client->mount_args->wsize < wsize)
- wsize = client->mount_args->wsize;
+ if (fsc->mount_options->wsize && fsc->mount_options->wsize < wsize)
+ wsize = fsc->mount_options->wsize;
if (wsize < PAGE_CACHE_SIZE)
wsize = PAGE_CACHE_SIZE;
max_pages_ever = wsize >> PAGE_CACHE_SHIFT;
/* ok */
if (locked_pages == 0) {
/* prepare async write request */
- offset = page->index << PAGE_CACHE_SHIFT;
+ offset = (unsigned long long)page->index
+ << PAGE_CACHE_SHIFT;
len = wsize;
- req = ceph_osdc_new_request(&client->osdc,
+ req = ceph_osdc_new_request(&fsc->client->osdc,
&ci->i_layout,
ceph_vino(inode),
offset, &len,
&inode->i_mtime, true, 1);
max_pages = req->r_num_pages;
- alloc_page_vec(client, req);
+ alloc_page_vec(fsc, req);
req->r_callback = writepages_finish;
req->r_inode = inode;
}
inode, page, page->index);
writeback_stat =
- atomic_long_inc_return(&client->writeback_count);
+ atomic_long_inc_return(&fsc->writeback_count);
if (writeback_stat > CONGESTION_ON_THRESH(
- client->mount_args->congestion_kb)) {
- set_bdi_congested(&client->backing_dev_info,
+ fsc->mount_options->congestion_kb)) {
+ set_bdi_congested(&fsc->backing_dev_info,
BLK_RW_ASYNC);
}
op->payload_len = cpu_to_le32(len);
req->r_request->hdr.data_len = cpu_to_le32(len);
- ceph_osdc_start_request(&client->osdc, req, true);
+ ceph_osdc_start_request(&fsc->client->osdc, req, true);
req = NULL;
/* continue? */
{
struct inode *inode = file->f_dentry->d_inode;
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_mds_client *mdsc = &ceph_inode_to_client(inode)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
loff_t page_off = pos & PAGE_CACHE_MASK;
int pos_in_page = pos & ~PAGE_CACHE_MASK;
int end_in_page = pos_in_page + len;
struct page *page, void *fsdata)
{
struct inode *inode = file->f_dentry->d_inode;
- struct ceph_client *client = ceph_inode_to_client(inode);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
unsigned from = pos & (PAGE_CACHE_SIZE - 1);
int check_cap = 0;
{
struct inode *inode = vma->vm_file->f_dentry->d_inode;
struct page *page = vmf->page;
- struct ceph_mds_client *mdsc = &ceph_inode_to_client(inode)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
loff_t off = page->index << PAGE_CACHE_SHIFT;
loff_t size, len;
int ret;
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/writeback.h>
#include "super.h"
-#include "decode.h"
-#include "messenger.h"
+#include "mds_client.h"
+#include <linux/ceph/decode.h>
+#include <linux/ceph/messenger.h>
/*
* Capability management
spin_unlock(&mdsc->caps_list_lock);
}
-void ceph_reservation_status(struct ceph_client *client,
+void ceph_reservation_status(struct ceph_fs_client *fsc,
int *total, int *avail, int *used, int *reserved,
int *min)
{
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_mds_client *mdsc = fsc->mdsc;
if (total)
*total = mdsc->caps_total_count;
static void __cap_set_timeouts(struct ceph_mds_client *mdsc,
struct ceph_inode_info *ci)
{
- struct ceph_mount_args *ma = mdsc->client->mount_args;
+ struct ceph_mount_options *ma = mdsc->fsc->mount_options;
ci->i_hold_caps_min = round_jiffies(jiffies +
ma->caps_wanted_delay_min * HZ);
unsigned seq, unsigned mseq, u64 realmino, int flags,
struct ceph_cap_reservation *caps_reservation)
{
- struct ceph_mds_client *mdsc = &ceph_inode_to_client(inode)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_cap *new_cap = NULL;
struct ceph_cap *cap;
used |= CEPH_CAP_PIN;
if (ci->i_rd_ref)
used |= CEPH_CAP_FILE_RD;
- if (ci->i_rdcache_ref || ci->i_rdcache_gen)
+ if (ci->i_rdcache_ref || ci->vfs_inode.i_data.nrpages)
used |= CEPH_CAP_FILE_CACHE;
if (ci->i_wr_ref)
used |= CEPH_CAP_FILE_WR;
struct ceph_mds_session *session = cap->session;
struct ceph_inode_info *ci = cap->ci;
struct ceph_mds_client *mdsc =
- &ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
+ ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
int removed = 0;
dout("__ceph_remove_cap %p from %p\n", cap, &ci->vfs_inode);
* asynchronously back to the MDS once sync writes complete and dirty
* data is written out.
*
+ * Unless @again is true, skip cap_snaps that were already sent to
+ * the MDS (i.e., during this session).
+ *
* Called under i_lock. Takes s_mutex as needed.
*/
void __ceph_flush_snaps(struct ceph_inode_info *ci,
- struct ceph_mds_session **psession)
+ struct ceph_mds_session **psession,
+ int again)
__releases(ci->vfs_inode->i_lock)
__acquires(ci->vfs_inode->i_lock)
{
int mds;
struct ceph_cap_snap *capsnap;
u32 mseq;
- struct ceph_mds_client *mdsc = &ceph_inode_to_client(inode)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_inode_to_client(inode)->mdsc;
struct ceph_mds_session *session = NULL; /* if session != NULL, we hold
session->s_mutex */
u64 next_follows = 0; /* keep track of how far we've gotten through the
* pages to be written out.
*/
if (capsnap->dirty_pages || capsnap->writing)
- continue;
+ break;
/*
* if cap writeback already occurred, we should have dropped
dout("no auth cap (migrating?), doing nothing\n");
goto out;
}
+
+ /* only flush each capsnap once */
+ if (!again && !list_empty(&capsnap->flushing_item)) {
+ dout("already flushed %p, skipping\n", capsnap);
+ continue;
+ }
+
mds = ci->i_auth_cap->session->s_mds;
mseq = ci->i_auth_cap->mseq;
&session->s_cap_snaps_flushing);
spin_unlock(&inode->i_lock);
- dout("flush_snaps %p cap_snap %p follows %lld size %llu\n",
- inode, capsnap, next_follows, capsnap->size);
+ dout("flush_snaps %p cap_snap %p follows %lld tid %llu\n",
+ inode, capsnap, capsnap->follows, capsnap->flush_tid);
send_cap_msg(session, ceph_vino(inode).ino, 0,
CEPH_CAP_OP_FLUSHSNAP, capsnap->issued, 0,
capsnap->dirty, 0, capsnap->flush_tid, 0, mseq,
struct inode *inode = &ci->vfs_inode;
spin_lock(&inode->i_lock);
- __ceph_flush_snaps(ci, NULL);
+ __ceph_flush_snaps(ci, NULL, 0);
spin_unlock(&inode->i_lock);
}
void __ceph_mark_dirty_caps(struct ceph_inode_info *ci, int mask)
{
struct ceph_mds_client *mdsc =
- &ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
+ ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
struct inode *inode = &ci->vfs_inode;
int was = ci->i_dirty_caps;
int dirty = 0;
static int __mark_caps_flushing(struct inode *inode,
struct ceph_mds_session *session)
{
- struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
int flushing;
/*
* try to invalidate mapping pages without blocking.
*/
-static int mapping_is_empty(struct address_space *mapping)
-{
- struct page *page = find_get_page(mapping, 0);
-
- if (!page)
- return 1;
-
- put_page(page);
- return 0;
-}
-
static int try_nonblocking_invalidate(struct inode *inode)
{
struct ceph_inode_info *ci = ceph_inode(inode);
invalidate_mapping_pages(&inode->i_data, 0, -1);
spin_lock(&inode->i_lock);
- if (mapping_is_empty(&inode->i_data) &&
+ if (inode->i_data.nrpages == 0 &&
invalidating_gen == ci->i_rdcache_gen) {
/* success. */
dout("try_nonblocking_invalidate %p success\n", inode);
void ceph_check_caps(struct ceph_inode_info *ci, int flags,
struct ceph_mds_session *session)
{
- struct ceph_client *client = ceph_inode_to_client(&ci->vfs_inode);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_inode_to_client(&ci->vfs_inode);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct inode *inode = &ci->vfs_inode;
struct ceph_cap *cap;
int file_wanted, used;
/* flush snaps first time around only */
if (!list_empty(&ci->i_cap_snaps))
- __ceph_flush_snaps(ci, &session);
+ __ceph_flush_snaps(ci, &session, 0);
goto retry_locked;
retry:
spin_lock(&inode->i_lock);
*/
if ((!is_delayed || mdsc->stopping) &&
ci->i_wrbuffer_ref == 0 && /* no dirty pages... */
- ci->i_rdcache_gen && /* may have cached pages */
+ inode->i_data.nrpages && /* have cached pages */
(file_wanted == 0 || /* no open files */
(revoking & (CEPH_CAP_FILE_CACHE|
CEPH_CAP_FILE_LAZYIO))) && /* or revoking cache */
static int try_flush_caps(struct inode *inode, struct ceph_mds_session *session,
unsigned *flush_tid)
{
- struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_inode_info *ci = ceph_inode(inode);
int unlock_session = session ? 0 : 1;
int flushing = 0;
caps_are_flushed(inode, flush_tid));
} else {
struct ceph_mds_client *mdsc =
- &ceph_sb_to_client(inode->i_sb)->mdsc;
+ ceph_sb_to_client(inode->i_sb)->mdsc;
spin_lock(&inode->i_lock);
if (__ceph_caps_dirty(ci))
if (cap && cap->session == session) {
dout("kick_flushing_caps %p cap %p capsnap %p\n", inode,
cap, capsnap);
- __ceph_flush_snaps(ci, &session);
+ __ceph_flush_snaps(ci, &session, 1);
} else {
pr_err("%p auth cap %p not mds%d ???\n", inode,
cap, session->s_mds);
{
struct ceph_inode_info *ci = ceph_inode(inode);
int mds = session->s_mds;
- int seq = le32_to_cpu(grant->seq);
+ unsigned seq = le32_to_cpu(grant->seq);
+ unsigned issue_seq = le32_to_cpu(grant->issue_seq);
int newcaps = le32_to_cpu(grant->caps);
int issued, implemented, used, wanted, dirty;
u64 size = le64_to_cpu(grant->size);
int revoked_rdcache = 0;
int queue_invalidate = 0;
- dout("handle_cap_grant inode %p cap %p mds%d seq %d %s\n",
- inode, cap, mds, seq, ceph_cap_string(newcaps));
+ dout("handle_cap_grant inode %p cap %p mds%d seq %u/%u %s\n",
+ inode, cap, mds, seq, issue_seq, ceph_cap_string(newcaps));
dout(" size %llu max_size %llu, i_size %llu\n", size, max_size,
inode->i_size);
}
cap->seq = seq;
+ cap->issue_seq = issue_seq;
/* file layout may have changed */
ci->i_layout = grant->layout;
__releases(inode->i_lock)
{
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
unsigned seq = le32_to_cpu(m->seq);
int dirty = le32_to_cpu(m->dirty);
int cleaned = 0;
struct ceph_msg *msg)
{
struct ceph_mds_client *mdsc = session->s_mdsc;
- struct super_block *sb = mdsc->client->sb;
+ struct super_block *sb = mdsc->fsc->sb;
struct inode *inode;
struct ceph_cap *cap;
struct ceph_mds_caps *h;
if (op == CEPH_CAP_OP_IMPORT)
__queue_cap_release(session, vino.ino, cap_id,
mseq, seq);
-
- /*
- * send any full release message to try to move things
- * along for the mds (who clearly thinks we still have this
- * cap).
- */
- ceph_add_cap_releases(mdsc, session);
- ceph_send_cap_releases(mdsc, session);
- goto done;
+ goto flush_cap_releases;
}
/* these will work even if we don't have a cap yet */
dout(" no cap on %p ino %llx.%llx from mds%d\n",
inode, ceph_ino(inode), ceph_snap(inode), mds);
spin_unlock(&inode->i_lock);
- goto done;
+ goto flush_cap_releases;
}
/* note that each of these drops i_lock for us */
ceph_cap_op_name(op));
}
+ goto done;
+
+flush_cap_releases:
+ /*
+ * send any full release message to try to move things
+ * along for the mds (who clearly thinks we still have this
+ * cap).
+ */
+ ceph_add_cap_releases(mdsc, session);
+ ceph_send_cap_releases(mdsc, session);
+
done:
mutex_unlock(&session->s_mutex);
done_unlocked:
/*
* Ceph 'frag' type
*/
-#include "types.h"
+#include <linux/module.h>
+#include <linux/ceph/types.h>
int ceph_frag_compare(__u32 a, __u32 b)
{
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/device.h>
#include <linux/slab.h>
#include <linux/debugfs.h>
#include <linux/seq_file.h>
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/debugfs.h>
+
#include "super.h"
-#include "mds_client.h"
-#include "mon_client.h"
-#include "auth.h"
#ifdef CONFIG_DEBUG_FS
-/*
- * Implement /sys/kernel/debug/ceph fun
- *
- * /sys/kernel/debug/ceph/client* - an instance of the ceph client
- * .../osdmap - current osdmap
- * .../mdsmap - current mdsmap
- * .../monmap - current monmap
- * .../osdc - active osd requests
- * .../mdsc - active mds requests
- * .../monc - mon client state
- * .../dentry_lru - dump contents of dentry lru
- * .../caps - expose cap (reservation) stats
- * .../bdi - symlink to ../../bdi/something
- */
-
-static struct dentry *ceph_debugfs_dir;
-
-static int monmap_show(struct seq_file *s, void *p)
-{
- int i;
- struct ceph_client *client = s->private;
-
- if (client->monc.monmap == NULL)
- return 0;
-
- seq_printf(s, "epoch %d\n", client->monc.monmap->epoch);
- for (i = 0; i < client->monc.monmap->num_mon; i++) {
- struct ceph_entity_inst *inst =
- &client->monc.monmap->mon_inst[i];
-
- seq_printf(s, "\t%s%lld\t%s\n",
- ENTITY_NAME(inst->name),
- pr_addr(&inst->addr.in_addr));
- }
- return 0;
-}
+#include "mds_client.h"
static int mdsmap_show(struct seq_file *s, void *p)
{
int i;
- struct ceph_client *client = s->private;
+ struct ceph_fs_client *fsc = s->private;
- if (client->mdsc.mdsmap == NULL)
+ if (fsc->mdsc == NULL || fsc->mdsc->mdsmap == NULL)
return 0;
- seq_printf(s, "epoch %d\n", client->mdsc.mdsmap->m_epoch);
- seq_printf(s, "root %d\n", client->mdsc.mdsmap->m_root);
+ seq_printf(s, "epoch %d\n", fsc->mdsc->mdsmap->m_epoch);
+ seq_printf(s, "root %d\n", fsc->mdsc->mdsmap->m_root);
seq_printf(s, "session_timeout %d\n",
- client->mdsc.mdsmap->m_session_timeout);
+ fsc->mdsc->mdsmap->m_session_timeout);
seq_printf(s, "session_autoclose %d\n",
- client->mdsc.mdsmap->m_session_autoclose);
- for (i = 0; i < client->mdsc.mdsmap->m_max_mds; i++) {
+ fsc->mdsc->mdsmap->m_session_autoclose);
+ for (i = 0; i < fsc->mdsc->mdsmap->m_max_mds; i++) {
struct ceph_entity_addr *addr =
- &client->mdsc.mdsmap->m_info[i].addr;
- int state = client->mdsc.mdsmap->m_info[i].state;
+ &fsc->mdsc->mdsmap->m_info[i].addr;
+ int state = fsc->mdsc->mdsmap->m_info[i].state;
- seq_printf(s, "\tmds%d\t%s\t(%s)\n", i, pr_addr(&addr->in_addr),
+ seq_printf(s, "\tmds%d\t%s\t(%s)\n", i,
+ ceph_pr_addr(&addr->in_addr),
ceph_mds_state_name(state));
}
return 0;
}
-static int osdmap_show(struct seq_file *s, void *p)
-{
- int i;
- struct ceph_client *client = s->private;
- struct rb_node *n;
-
- if (client->osdc.osdmap == NULL)
- return 0;
- seq_printf(s, "epoch %d\n", client->osdc.osdmap->epoch);
- seq_printf(s, "flags%s%s\n",
- (client->osdc.osdmap->flags & CEPH_OSDMAP_NEARFULL) ?
- " NEARFULL" : "",
- (client->osdc.osdmap->flags & CEPH_OSDMAP_FULL) ?
- " FULL" : "");
- for (n = rb_first(&client->osdc.osdmap->pg_pools); n; n = rb_next(n)) {
- struct ceph_pg_pool_info *pool =
- rb_entry(n, struct ceph_pg_pool_info, node);
- seq_printf(s, "pg_pool %d pg_num %d / %d, lpg_num %d / %d\n",
- pool->id, pool->v.pg_num, pool->pg_num_mask,
- pool->v.lpg_num, pool->lpg_num_mask);
- }
- for (i = 0; i < client->osdc.osdmap->max_osd; i++) {
- struct ceph_entity_addr *addr =
- &client->osdc.osdmap->osd_addr[i];
- int state = client->osdc.osdmap->osd_state[i];
- char sb[64];
-
- seq_printf(s, "\tosd%d\t%s\t%3d%%\t(%s)\n",
- i, pr_addr(&addr->in_addr),
- ((client->osdc.osdmap->osd_weight[i]*100) >> 16),
- ceph_osdmap_state_str(sb, sizeof(sb), state));
- }
- return 0;
-}
-
-static int monc_show(struct seq_file *s, void *p)
-{
- struct ceph_client *client = s->private;
- struct ceph_mon_generic_request *req;
- struct ceph_mon_client *monc = &client->monc;
- struct rb_node *rp;
-
- mutex_lock(&monc->mutex);
-
- if (monc->have_mdsmap)
- seq_printf(s, "have mdsmap %u\n", (unsigned)monc->have_mdsmap);
- if (monc->have_osdmap)
- seq_printf(s, "have osdmap %u\n", (unsigned)monc->have_osdmap);
- if (monc->want_next_osdmap)
- seq_printf(s, "want next osdmap\n");
-
- for (rp = rb_first(&monc->generic_request_tree); rp; rp = rb_next(rp)) {
- __u16 op;
- req = rb_entry(rp, struct ceph_mon_generic_request, node);
- op = le16_to_cpu(req->request->hdr.type);
- if (op == CEPH_MSG_STATFS)
- seq_printf(s, "%lld statfs\n", req->tid);
- else
- seq_printf(s, "%lld unknown\n", req->tid);
- }
-
- mutex_unlock(&monc->mutex);
- return 0;
-}
-
+/*
+ * mdsc debugfs
+ */
static int mdsc_show(struct seq_file *s, void *p)
{
- struct ceph_client *client = s->private;
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = s->private;
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
struct rb_node *rp;
int pathlen;
return 0;
}
-static int osdc_show(struct seq_file *s, void *pp)
-{
- struct ceph_client *client = s->private;
- struct ceph_osd_client *osdc = &client->osdc;
- struct rb_node *p;
-
- mutex_lock(&osdc->request_mutex);
- for (p = rb_first(&osdc->requests); p; p = rb_next(p)) {
- struct ceph_osd_request *req;
- struct ceph_osd_request_head *head;
- struct ceph_osd_op *op;
- int num_ops;
- int opcode, olen;
- int i;
-
- req = rb_entry(p, struct ceph_osd_request, r_node);
-
- seq_printf(s, "%lld\tosd%d\t%d.%x\t", req->r_tid,
- req->r_osd ? req->r_osd->o_osd : -1,
- le32_to_cpu(req->r_pgid.pool),
- le16_to_cpu(req->r_pgid.ps));
-
- head = req->r_request->front.iov_base;
- op = (void *)(head + 1);
-
- num_ops = le16_to_cpu(head->num_ops);
- olen = le32_to_cpu(head->object_len);
- seq_printf(s, "%.*s", olen,
- (const char *)(head->ops + num_ops));
-
- if (req->r_reassert_version.epoch)
- seq_printf(s, "\t%u'%llu",
- (unsigned)le32_to_cpu(req->r_reassert_version.epoch),
- le64_to_cpu(req->r_reassert_version.version));
- else
- seq_printf(s, "\t");
-
- for (i = 0; i < num_ops; i++) {
- opcode = le16_to_cpu(op->op);
- seq_printf(s, "\t%s", ceph_osd_op_name(opcode));
- op++;
- }
-
- seq_printf(s, "\n");
- }
- mutex_unlock(&osdc->request_mutex);
- return 0;
-}
-
static int caps_show(struct seq_file *s, void *p)
{
- struct ceph_client *client = s->private;
+ struct ceph_fs_client *fsc = s->private;
int total, avail, used, reserved, min;
- ceph_reservation_status(client, &total, &avail, &used, &reserved, &min);
+ ceph_reservation_status(fsc, &total, &avail, &used, &reserved, &min);
seq_printf(s, "total\t\t%d\n"
"avail\t\t%d\n"
"used\t\t%d\n"
static int dentry_lru_show(struct seq_file *s, void *ptr)
{
- struct ceph_client *client = s->private;
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = s->private;
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_dentry_info *di;
spin_lock(&mdsc->dentry_lru_lock);
return 0;
}
-#define DEFINE_SHOW_FUNC(name) \
-static int name##_open(struct inode *inode, struct file *file) \
-{ \
- struct seq_file *sf; \
- int ret; \
- \
- ret = single_open(file, name, NULL); \
- sf = file->private_data; \
- sf->private = inode->i_private; \
- return ret; \
-} \
- \
-static const struct file_operations name##_fops = { \
- .open = name##_open, \
- .read = seq_read, \
- .llseek = seq_lseek, \
- .release = single_release, \
-};
-
-DEFINE_SHOW_FUNC(monmap_show)
-DEFINE_SHOW_FUNC(mdsmap_show)
-DEFINE_SHOW_FUNC(osdmap_show)
-DEFINE_SHOW_FUNC(monc_show)
-DEFINE_SHOW_FUNC(mdsc_show)
-DEFINE_SHOW_FUNC(osdc_show)
-DEFINE_SHOW_FUNC(dentry_lru_show)
-DEFINE_SHOW_FUNC(caps_show)
+CEPH_DEFINE_SHOW_FUNC(mdsmap_show)
+CEPH_DEFINE_SHOW_FUNC(mdsc_show)
+CEPH_DEFINE_SHOW_FUNC(caps_show)
+CEPH_DEFINE_SHOW_FUNC(dentry_lru_show)
+
+/*
+ * debugfs
+ */
static int congestion_kb_set(void *data, u64 val)
{
- struct ceph_client *client = (struct ceph_client *)data;
-
- if (client)
- client->mount_args->congestion_kb = (int)val;
+ struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
+ fsc->mount_options->congestion_kb = (int)val;
return 0;
}
static int congestion_kb_get(void *data, u64 *val)
{
- struct ceph_client *client = (struct ceph_client *)data;
-
- if (client)
- *val = (u64)client->mount_args->congestion_kb;
+ struct ceph_fs_client *fsc = (struct ceph_fs_client *)data;
+ *val = (u64)fsc->mount_options->congestion_kb;
return 0;
}
-
DEFINE_SIMPLE_ATTRIBUTE(congestion_kb_fops, congestion_kb_get,
congestion_kb_set, "%llu\n");
-int __init ceph_debugfs_init(void)
-{
- ceph_debugfs_dir = debugfs_create_dir("ceph", NULL);
- if (!ceph_debugfs_dir)
- return -ENOMEM;
- return 0;
-}
-void ceph_debugfs_cleanup(void)
+void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
{
- debugfs_remove(ceph_debugfs_dir);
+ dout("ceph_fs_debugfs_cleanup\n");
+ debugfs_remove(fsc->debugfs_bdi);
+ debugfs_remove(fsc->debugfs_congestion_kb);
+ debugfs_remove(fsc->debugfs_mdsmap);
+ debugfs_remove(fsc->debugfs_caps);
+ debugfs_remove(fsc->debugfs_mdsc);
+ debugfs_remove(fsc->debugfs_dentry_lru);
}
-int ceph_debugfs_client_init(struct ceph_client *client)
+int ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
{
- int ret = 0;
- char name[80];
-
- snprintf(name, sizeof(name), "%pU.client%lld", &client->fsid,
- client->monc.auth->global_id);
+ char name[100];
+ int err = -ENOMEM;
- client->debugfs_dir = debugfs_create_dir(name, ceph_debugfs_dir);
- if (!client->debugfs_dir)
- goto out;
-
- client->monc.debugfs_file = debugfs_create_file("monc",
- 0600,
- client->debugfs_dir,
- client,
- &monc_show_fops);
- if (!client->monc.debugfs_file)
+ dout("ceph_fs_debugfs_init\n");
+ fsc->debugfs_congestion_kb =
+ debugfs_create_file("writeback_congestion_kb",
+ 0600,
+ fsc->client->debugfs_dir,
+ fsc,
+ &congestion_kb_fops);
+ if (!fsc->debugfs_congestion_kb)
goto out;
- client->mdsc.debugfs_file = debugfs_create_file("mdsc",
- 0600,
- client->debugfs_dir,
- client,
- &mdsc_show_fops);
- if (!client->mdsc.debugfs_file)
- goto out;
+ dout("a\n");
- client->osdc.debugfs_file = debugfs_create_file("osdc",
- 0600,
- client->debugfs_dir,
- client,
- &osdc_show_fops);
- if (!client->osdc.debugfs_file)
+ snprintf(name, sizeof(name), "../../bdi/%s",
+ dev_name(fsc->backing_dev_info.dev));
+ fsc->debugfs_bdi =
+ debugfs_create_symlink("bdi",
+ fsc->client->debugfs_dir,
+ name);
+ if (!fsc->debugfs_bdi)
goto out;
- client->debugfs_monmap = debugfs_create_file("monmap",
+ dout("b\n");
+ fsc->debugfs_mdsmap = debugfs_create_file("mdsmap",
0600,
- client->debugfs_dir,
- client,
- &monmap_show_fops);
- if (!client->debugfs_monmap)
- goto out;
-
- client->debugfs_mdsmap = debugfs_create_file("mdsmap",
- 0600,
- client->debugfs_dir,
- client,
+ fsc->client->debugfs_dir,
+ fsc,
&mdsmap_show_fops);
- if (!client->debugfs_mdsmap)
- goto out;
-
- client->debugfs_osdmap = debugfs_create_file("osdmap",
- 0600,
- client->debugfs_dir,
- client,
- &osdmap_show_fops);
- if (!client->debugfs_osdmap)
+ if (!fsc->debugfs_mdsmap)
goto out;
- client->debugfs_dentry_lru = debugfs_create_file("dentry_lru",
- 0600,
- client->debugfs_dir,
- client,
- &dentry_lru_show_fops);
- if (!client->debugfs_dentry_lru)
+ dout("ca\n");
+ fsc->debugfs_mdsc = debugfs_create_file("mdsc",
+ 0600,
+ fsc->client->debugfs_dir,
+ fsc,
+ &mdsc_show_fops);
+ if (!fsc->debugfs_mdsc)
goto out;
- client->debugfs_caps = debugfs_create_file("caps",
+ dout("da\n");
+ fsc->debugfs_caps = debugfs_create_file("caps",
0400,
- client->debugfs_dir,
- client,
+ fsc->client->debugfs_dir,
+ fsc,
&caps_show_fops);
- if (!client->debugfs_caps)
+ if (!fsc->debugfs_caps)
goto out;
- client->debugfs_congestion_kb =
- debugfs_create_file("writeback_congestion_kb",
- 0600,
- client->debugfs_dir,
- client,
- &congestion_kb_fops);
- if (!client->debugfs_congestion_kb)
+ dout("ea\n");
+ fsc->debugfs_dentry_lru = debugfs_create_file("dentry_lru",
+ 0600,
+ fsc->client->debugfs_dir,
+ fsc,
+ &dentry_lru_show_fops);
+ if (!fsc->debugfs_dentry_lru)
goto out;
- sprintf(name, "../../bdi/%s", dev_name(client->sb->s_bdi->dev));
- client->debugfs_bdi = debugfs_create_symlink("bdi", client->debugfs_dir,
- name);
-
return 0;
out:
- ceph_debugfs_client_cleanup(client);
- return ret;
+ ceph_fs_debugfs_cleanup(fsc);
+ return err;
}
-void ceph_debugfs_client_cleanup(struct ceph_client *client)
-{
- debugfs_remove(client->debugfs_bdi);
- debugfs_remove(client->debugfs_caps);
- debugfs_remove(client->debugfs_dentry_lru);
- debugfs_remove(client->debugfs_osdmap);
- debugfs_remove(client->debugfs_mdsmap);
- debugfs_remove(client->debugfs_monmap);
- debugfs_remove(client->osdc.debugfs_file);
- debugfs_remove(client->mdsc.debugfs_file);
- debugfs_remove(client->monc.debugfs_file);
- debugfs_remove(client->debugfs_congestion_kb);
- debugfs_remove(client->debugfs_dir);
-}
#else /* CONFIG_DEBUG_FS */
-int __init ceph_debugfs_init(void)
-{
- return 0;
-}
-
-void ceph_debugfs_cleanup(void)
-{
-}
-
-int ceph_debugfs_client_init(struct ceph_client *client)
+int ceph_fs_debugfs_init(struct ceph_fs_client *fsc)
{
return 0;
}
-void ceph_debugfs_client_cleanup(struct ceph_client *client)
+void ceph_fs_debugfs_cleanup(struct ceph_fs_client *fsc)
{
}
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/spinlock.h>
#include <linux/fs_struct.h>
#include <linux/sched.h>
#include "super.h"
+#include "mds_client.h"
/*
* Directory operations: readdir, lookup, create, link, unlink,
*/
static int __dcache_readdir(struct file *filp,
void *dirent, filldir_t filldir)
- __releases(inode->i_lock)
- __acquires(inode->i_lock)
{
- struct inode *inode = filp->f_dentry->d_inode;
struct ceph_file_info *fi = filp->private_data;
struct dentry *parent = filp->f_dentry;
struct inode *dir = parent->d_inode;
atomic_inc(&dentry->d_count);
spin_unlock(&dcache_lock);
- spin_unlock(&inode->i_lock);
dout(" %llu (%llu) dentry %p %.*s %p\n", di->offset, filp->f_pos,
dentry, dentry->d_name.len, dentry->d_name.name, dentry->d_inode);
} else {
dput(last);
}
- last = NULL;
}
-
- spin_lock(&inode->i_lock);
- spin_lock(&dcache_lock);
-
last = dentry;
if (err < 0)
- goto out_unlock;
+ goto out;
- p = p->prev;
filp->f_pos++;
/* make sure a dentry wasn't dropped while we didn't have dcache_lock */
- if ((ceph_inode(dir)->i_ceph_flags & CEPH_I_COMPLETE))
- goto more;
- dout(" lost I_COMPLETE on %p; falling back to mds\n", dir);
- err = -EAGAIN;
+ if (!ceph_i_test(dir, CEPH_I_COMPLETE)) {
+ dout(" lost I_COMPLETE on %p; falling back to mds\n", dir);
+ err = -EAGAIN;
+ goto out;
+ }
+
+ spin_lock(&dcache_lock);
+ p = p->prev; /* advance to next dentry */
+ goto more;
out_unlock:
spin_unlock(&dcache_lock);
-
- if (last) {
- spin_unlock(&inode->i_lock);
+out:
+ if (last)
dput(last);
- spin_lock(&inode->i_lock);
- }
-
return err;
}
struct ceph_file_info *fi = filp->private_data;
struct inode *inode = filp->f_dentry->d_inode;
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_client *client = ceph_inode_to_client(inode);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
unsigned frag = fpos_frag(filp->f_pos);
int off = fpos_off(filp->f_pos);
int err;
u32 ftype;
struct ceph_mds_reply_info_parsed *rinfo;
- const int max_entries = client->mount_args->max_readdir;
- const int max_bytes = client->mount_args->max_readdir_bytes;
+ const int max_entries = fsc->mount_options->max_readdir;
+ const int max_bytes = fsc->mount_options->max_readdir_bytes;
dout("readdir %p filp %p frag %u off %u\n", inode, filp, frag, off);
if (fi->at_end)
/* can we use the dcache? */
spin_lock(&inode->i_lock);
if ((filp->f_pos == 2 || fi->dentry) &&
- !ceph_test_opt(client, NOASYNCREADDIR) &&
+ !ceph_test_mount_opt(fsc, NOASYNCREADDIR) &&
ceph_snap(inode) != CEPH_SNAPDIR &&
(ci->i_ceph_flags & CEPH_I_COMPLETE) &&
__ceph_caps_issued_mask(ci, CEPH_CAP_FILE_SHARED, 1)) {
+ spin_unlock(&inode->i_lock);
err = __dcache_readdir(filp, dirent, filldir);
- if (err != -EAGAIN) {
- spin_unlock(&inode->i_lock);
+ if (err != -EAGAIN)
return err;
- }
+ } else {
+ spin_unlock(&inode->i_lock);
}
- spin_unlock(&inode->i_lock);
if (fi->dentry) {
err = note_last_dentry(fi, fi->dentry->d_name.name,
fi->dentry->d_name.len);
struct dentry *ceph_finish_lookup(struct ceph_mds_request *req,
struct dentry *dentry, int err)
{
- struct ceph_client *client = ceph_sb_to_client(dentry->d_sb);
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
struct inode *parent = dentry->d_parent->d_inode;
/* .snap dir? */
if (err == -ENOENT &&
- ceph_vino(parent).ino != CEPH_INO_ROOT && /* no .snap in root dir */
strcmp(dentry->d_name.name,
- client->mount_args->snapdir_name) == 0) {
+ fsc->mount_options->snapdir_name) == 0) {
struct inode *inode = ceph_get_snapdir(parent);
dout("ENOENT on snapdir %p '%.*s', linking to snapdir %p\n",
dentry, dentry->d_name.len, dentry->d_name.name, inode);
static struct dentry *ceph_lookup(struct inode *dir, struct dentry *dentry,
struct nameidata *nd)
{
- struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
int op;
int err;
spin_lock(&dir->i_lock);
dout(" dir %p flags are %d\n", dir, ci->i_ceph_flags);
if (strncmp(dentry->d_name.name,
- client->mount_args->snapdir_name,
+ fsc->mount_options->snapdir_name,
dentry->d_name.len) &&
!is_root_ceph_dentry(dir, dentry) &&
(ci->i_ceph_flags & CEPH_I_COMPLETE) &&
static int ceph_mknod(struct inode *dir, struct dentry *dentry,
int mode, dev_t rdev)
{
- struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
int err;
static int ceph_symlink(struct inode *dir, struct dentry *dentry,
const char *dest)
{
- struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
int err;
static int ceph_mkdir(struct inode *dir, struct dentry *dentry, int mode)
{
- struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
int err = -EROFS;
int op;
static int ceph_link(struct dentry *old_dentry, struct inode *dir,
struct dentry *dentry)
{
- struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
int err;
*/
static int ceph_unlink(struct inode *dir, struct dentry *dentry)
{
- struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct inode *inode = dentry->d_inode;
struct ceph_mds_request *req;
int err = -EROFS;
static int ceph_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry)
{
- struct ceph_client *client = ceph_sb_to_client(old_dir->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(old_dir->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
int err;
static void ceph_dentry_release(struct dentry *dentry)
{
struct ceph_dentry_info *di = ceph_dentry(dentry);
- struct inode *parent_inode = dentry->d_parent->d_inode;
- u64 snapid = ceph_snap(parent_inode);
+ struct inode *parent_inode = NULL;
+ u64 snapid = CEPH_NOSNAP;
+ if (!IS_ROOT(dentry)) {
+ parent_inode = dentry->d_parent->d_inode;
+ if (parent_inode)
+ snapid = ceph_snap(parent_inode);
+ }
dout("dentry_release %p parent %p\n", dentry, parent_inode);
-
if (parent_inode && snapid != CEPH_SNAPDIR) {
struct ceph_inode_info *ci = ceph_inode(parent_inode);
struct ceph_inode_info *ci = ceph_inode(inode);
int left;
- if (!ceph_test_opt(ceph_sb_to_client(inode->i_sb), DIRSTAT))
+ if (!ceph_test_mount_opt(ceph_sb_to_client(inode->i_sb), DIRSTAT))
return -EISDIR;
if (!cf->dir_info) {
dout("dentry_lru_add %p %p '%.*s'\n", di, dn,
dn->d_name.len, dn->d_name.name);
if (di) {
- mdsc = &ceph_sb_to_client(dn->d_sb)->mdsc;
+ mdsc = ceph_sb_to_client(dn->d_sb)->mdsc;
spin_lock(&mdsc->dentry_lru_lock);
list_add_tail(&di->lru, &mdsc->dentry_lru);
mdsc->num_dentry++;
dout("dentry_lru_touch %p %p '%.*s' (offset %lld)\n", di, dn,
dn->d_name.len, dn->d_name.name, di->offset);
if (di) {
- mdsc = &ceph_sb_to_client(dn->d_sb)->mdsc;
+ mdsc = ceph_sb_to_client(dn->d_sb)->mdsc;
spin_lock(&mdsc->dentry_lru_lock);
list_move_tail(&di->lru, &mdsc->dentry_lru);
spin_unlock(&mdsc->dentry_lru_lock);
dout("dentry_lru_del %p %p '%.*s'\n", di, dn,
dn->d_name.len, dn->d_name.name);
if (di) {
- mdsc = &ceph_sb_to_client(dn->d_sb)->mdsc;
+ mdsc = ceph_sb_to_client(dn->d_sb)->mdsc;
spin_lock(&mdsc->dentry_lru_lock);
list_del_init(&di->lru);
mdsc->num_dentry--;
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/exportfs.h>
#include <linux/slab.h>
#include <asm/unaligned.h>
#include "super.h"
+#include "mds_client.h"
/*
* NFS export support
static int ceph_encode_fh(struct dentry *dentry, u32 *rawfh, int *max_len,
int connectable)
{
+ int type;
struct ceph_nfs_fh *fh = (void *)rawfh;
struct ceph_nfs_confh *cfh = (void *)rawfh;
struct dentry *parent = dentry->d_parent;
struct inode *inode = dentry->d_inode;
- int type;
+ int connected_handle_length = sizeof(*cfh)/4;
+ int handle_length = sizeof(*fh)/4;
/* don't re-export snaps */
if (ceph_snap(inode) != CEPH_NOSNAP)
return -EINVAL;
- if (*max_len >= sizeof(*cfh)) {
+ if (*max_len >= connected_handle_length) {
dout("encode_fh %p connectable\n", dentry);
cfh->ino = ceph_ino(dentry->d_inode);
cfh->parent_ino = ceph_ino(parent->d_inode);
cfh->parent_name_hash = parent->d_name.hash;
- *max_len = sizeof(*cfh);
+ *max_len = connected_handle_length;
type = 2;
- } else if (*max_len > sizeof(*fh)) {
- if (connectable)
- return -ENOSPC;
+ } else if (*max_len >= handle_length) {
+ if (connectable) {
+ *max_len = connected_handle_length;
+ return 255;
+ }
dout("encode_fh %p\n", dentry);
fh->ino = ceph_ino(dentry->d_inode);
- *max_len = sizeof(*fh);
+ *max_len = handle_length;
type = 1;
} else {
- return -ENOSPC;
+ *max_len = handle_length;
+ return 255;
}
return type;
}
static struct dentry *__cfh_to_dentry(struct super_block *sb,
struct ceph_nfs_confh *cfh)
{
- struct ceph_mds_client *mdsc = &ceph_sb_to_client(sb)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_sb_to_client(sb)->mdsc;
struct inode *inode;
struct dentry *dentry;
struct ceph_vino vino;
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
+#include <linux/module.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/file.h>
static struct ceph_mds_request *
prepare_open_request(struct super_block *sb, int flags, int create_mode)
{
- struct ceph_client *client = ceph_sb_to_client(sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
int want_auth = USE_ANY_MDS;
int op = (flags & O_CREAT) ? CEPH_MDS_OP_CREATE : CEPH_MDS_OP_OPEN;
int ceph_open(struct inode *inode, struct file *file)
{
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_client *client = ceph_sb_to_client(inode->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(inode->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
struct ceph_file_info *cf = file->private_data;
struct inode *parent_inode = file->f_dentry->d_parent->d_inode;
struct nameidata *nd, int mode,
int locked_dir)
{
- struct ceph_client *client = ceph_sb_to_client(dir->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dir->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct file *file = nd->intent.open.file;
struct inode *parent_inode = get_dentry_parent_inode(file->f_dentry);
struct ceph_mds_request *req;
}
/*
- * build a vector of user pages
- */
-static struct page **get_direct_page_vector(const char __user *data,
- int num_pages,
- loff_t off, size_t len)
-{
- struct page **pages;
- int rc;
-
- pages = kmalloc(sizeof(*pages) * num_pages, GFP_NOFS);
- if (!pages)
- return ERR_PTR(-ENOMEM);
-
- down_read(¤t->mm->mmap_sem);
- rc = get_user_pages(current, current->mm, (unsigned long)data,
- num_pages, 0, 0, pages, NULL);
- up_read(¤t->mm->mmap_sem);
- if (rc < 0)
- goto fail;
- return pages;
-
-fail:
- kfree(pages);
- return ERR_PTR(rc);
-}
-
-static void put_page_vector(struct page **pages, int num_pages)
-{
- int i;
-
- for (i = 0; i < num_pages; i++)
- put_page(pages[i]);
- kfree(pages);
-}
-
-void ceph_release_page_vector(struct page **pages, int num_pages)
-{
- int i;
-
- for (i = 0; i < num_pages; i++)
- __free_pages(pages[i], 0);
- kfree(pages);
-}
-
-/*
- * allocate a vector new pages
- */
-static struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
-{
- struct page **pages;
- int i;
-
- pages = kmalloc(sizeof(*pages) * num_pages, flags);
- if (!pages)
- return ERR_PTR(-ENOMEM);
- for (i = 0; i < num_pages; i++) {
- pages[i] = __page_cache_alloc(flags);
- if (pages[i] == NULL) {
- ceph_release_page_vector(pages, i);
- return ERR_PTR(-ENOMEM);
- }
- }
- return pages;
-}
-
-/*
- * copy user data into a page vector
- */
-static int copy_user_to_page_vector(struct page **pages,
- const char __user *data,
- loff_t off, size_t len)
-{
- int i = 0;
- int po = off & ~PAGE_CACHE_MASK;
- int left = len;
- int l, bad;
-
- while (left > 0) {
- l = min_t(int, PAGE_CACHE_SIZE-po, left);
- bad = copy_from_user(page_address(pages[i]) + po, data, l);
- if (bad == l)
- return -EFAULT;
- data += l - bad;
- left -= l - bad;
- po += l - bad;
- if (po == PAGE_CACHE_SIZE) {
- po = 0;
- i++;
- }
- }
- return len;
-}
-
-/*
- * copy user data from a page vector into a user pointer
- */
-static int copy_page_vector_to_user(struct page **pages, char __user *data,
- loff_t off, size_t len)
-{
- int i = 0;
- int po = off & ~PAGE_CACHE_MASK;
- int left = len;
- int l, bad;
-
- while (left > 0) {
- l = min_t(int, left, PAGE_CACHE_SIZE-po);
- bad = copy_to_user(data, page_address(pages[i]) + po, l);
- if (bad == l)
- return -EFAULT;
- data += l - bad;
- left -= l - bad;
- if (po) {
- po += l - bad;
- if (po == PAGE_CACHE_SIZE)
- po = 0;
- }
- i++;
- }
- return len;
-}
-
-/*
- * Zero an extent within a page vector. Offset is relative to the
- * start of the first page.
- */
-static void zero_page_vector_range(int off, int len, struct page **pages)
-{
- int i = off >> PAGE_CACHE_SHIFT;
-
- off &= ~PAGE_CACHE_MASK;
-
- dout("zero_page_vector_page %u~%u\n", off, len);
-
- /* leading partial page? */
- if (off) {
- int end = min((int)PAGE_CACHE_SIZE, off + len);
- dout("zeroing %d %p head from %d\n", i, pages[i],
- (int)off);
- zero_user_segment(pages[i], off, end);
- len -= (end - off);
- i++;
- }
- while (len >= PAGE_CACHE_SIZE) {
- dout("zeroing %d %p len=%d\n", i, pages[i], len);
- zero_user_segment(pages[i], 0, PAGE_CACHE_SIZE);
- len -= PAGE_CACHE_SIZE;
- i++;
- }
- /* trailing partial page? */
- if (len) {
- dout("zeroing %d %p tail to %d\n", i, pages[i], (int)len);
- zero_user_segment(pages[i], 0, len);
- }
-}
-
-
-/*
* Read a range of bytes striped over one or more objects. Iterate over
* objects we stripe over. (That's not atomic, but good enough for now.)
*
struct page **pages, int num_pages,
int *checkeof)
{
- struct ceph_client *client = ceph_inode_to_client(inode);
+ struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
struct ceph_inode_info *ci = ceph_inode(inode);
u64 pos, this_len;
int page_off = off & ~PAGE_CACHE_MASK; /* first byte's offset in page */
more:
this_len = left;
- ret = ceph_osdc_readpages(&client->osdc, ceph_vino(inode),
+ ret = ceph_osdc_readpages(&fsc->client->osdc, ceph_vino(inode),
&ci->i_layout, pos, &this_len,
ci->i_truncate_seq,
ci->i_truncate_size,
if (read < pos - off) {
dout(" zero gap %llu to %llu\n", off + read, pos);
- zero_page_vector_range(page_off + read,
- pos - off - read, pages);
+ ceph_zero_page_vector_range(page_off + read,
+ pos - off - read, pages);
}
pos += ret;
read = pos - off;
/* was original extent fully inside i_size? */
if (pos + left <= inode->i_size) {
dout("zero tail\n");
- zero_page_vector_range(page_off + read, len - read,
- pages);
+ ceph_zero_page_vector_range(page_off + read, len - read,
+ pages);
read = len;
goto out;
}
(file->f_flags & O_DIRECT) ? "O_DIRECT" : "");
if (file->f_flags & O_DIRECT) {
- pages = get_direct_page_vector(data, num_pages, off, len);
+ pages = ceph_get_direct_page_vector(data, num_pages, off, len);
/*
* flush any page cache pages in this range. this
ret = striped_read(inode, off, len, pages, num_pages, checkeof);
if (ret >= 0 && (file->f_flags & O_DIRECT) == 0)
- ret = copy_page_vector_to_user(pages, data, off, ret);
+ ret = ceph_copy_page_vector_to_user(pages, data, off, ret);
if (ret >= 0)
*poff = off + ret;
done:
if (file->f_flags & O_DIRECT)
- put_page_vector(pages, num_pages);
+ ceph_put_page_vector(pages, num_pages);
else
ceph_release_page_vector(pages, num_pages);
dout("sync_read result %d\n", ret);
{
struct inode *inode = file->f_dentry->d_inode;
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_client *client = ceph_inode_to_client(inode);
+ struct ceph_fs_client *fsc = ceph_inode_to_client(inode);
struct ceph_osd_request *req;
struct page **pages;
int num_pages;
*/
more:
len = left;
- req = ceph_osdc_new_request(&client->osdc, &ci->i_layout,
+ req = ceph_osdc_new_request(&fsc->client->osdc, &ci->i_layout,
ceph_vino(inode), pos, &len,
CEPH_OSD_OP_WRITE, flags,
ci->i_snap_realm->cached_context,
num_pages = calc_pages_for(pos, len);
if (file->f_flags & O_DIRECT) {
- pages = get_direct_page_vector(data, num_pages, pos, len);
+ pages = ceph_get_direct_page_vector(data, num_pages, pos, len);
if (IS_ERR(pages)) {
ret = PTR_ERR(pages);
goto out;
ret = PTR_ERR(pages);
goto out;
}
- ret = copy_user_to_page_vector(pages, data, pos, len);
+ ret = ceph_copy_user_to_page_vector(pages, data, pos, len);
if (ret < 0) {
ceph_release_page_vector(pages, num_pages);
goto out;
req->r_num_pages = num_pages;
req->r_inode = inode;
- ret = ceph_osdc_start_request(&client->osdc, req, false);
+ ret = ceph_osdc_start_request(&fsc->client->osdc, req, false);
if (!ret) {
if (req->r_safe_callback) {
/*
* start_request so that a tid has been assigned.
*/
spin_lock(&ci->i_unsafe_lock);
- list_add(&ci->i_unsafe_writes, &req->r_unsafe_item);
+ list_add(&req->r_unsafe_item, &ci->i_unsafe_writes);
spin_unlock(&ci->i_unsafe_lock);
ceph_get_cap_refs(ci, CEPH_CAP_FILE_WR);
}
- ret = ceph_osdc_wait_request(&client->osdc, req);
+ ret = ceph_osdc_wait_request(&fsc->client->osdc, req);
}
if (file->f_flags & O_DIRECT)
- put_page_vector(pages, num_pages);
+ ceph_put_page_vector(pages, num_pages);
else if (file->f_flags & O_SYNC)
ceph_release_page_vector(pages, num_pages);
struct ceph_file_info *fi = file->private_data;
struct inode *inode = file->f_dentry->d_inode;
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_osd_client *osdc = &ceph_sb_to_client(inode->i_sb)->osdc;
+ struct ceph_osd_client *osdc =
+ &ceph_sb_to_client(inode->i_sb)->client->osdc;
loff_t endoff = pos + iov->iov_len;
int want, got = 0;
int ret, err;
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/module.h>
#include <linux/fs.h>
#include <linux/pagevec.h>
#include "super.h"
-#include "decode.h"
+#include "mds_client.h"
+#include <linux/ceph/decode.h>
/*
* Ceph inode operations
*/
if (ci->i_snap_realm) {
struct ceph_mds_client *mdsc =
- &ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
+ ceph_sb_to_client(ci->vfs_inode.i_sb)->mdsc;
struct ceph_snap_realm *realm = ci->i_snap_realm;
dout(" dropping residual ref to snap realm %p\n", realm);
}
/* it may be better to set st_size in getattr instead? */
- if (ceph_test_opt(ceph_sb_to_client(inode->i_sb), RBYTES))
+ if (ceph_test_mount_opt(ceph_sb_to_client(inode->i_sb), RBYTES))
inode->i_size = ci->i_rbytes;
break;
default:
* the caller) if we fail.
*/
static struct dentry *splice_dentry(struct dentry *dn, struct inode *in,
- bool *prehash)
+ bool *prehash, bool set_offset)
{
struct dentry *realdn;
}
if ((!prehash || *prehash) && d_unhashed(dn))
d_rehash(dn);
- ceph_set_dentry_offset(dn);
+ if (set_offset)
+ ceph_set_dentry_offset(dn);
out:
return dn;
}
struct inode *in = NULL;
struct ceph_mds_reply_inode *ininfo;
struct ceph_vino vino;
- struct ceph_client *client = ceph_sb_to_client(sb);
+ struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
int i = 0;
int err = 0;
*/
if (rinfo->head->is_dentry && !req->r_aborted &&
(rinfo->head->is_target || strncmp(req->r_dentry->d_name.name,
- client->mount_args->snapdir_name,
+ fsc->mount_options->snapdir_name,
req->r_dentry->d_name.len))) {
/*
* lookup link rename : null -> possibly existing inode
d_delete(dn);
goto done;
}
- dn = splice_dentry(dn, in, &have_lease);
+ dn = splice_dentry(dn, in, &have_lease, true);
if (IS_ERR(dn)) {
err = PTR_ERR(dn);
goto done;
goto done;
}
dout(" linking snapped dir %p to dn %p\n", in, dn);
- dn = splice_dentry(dn, in, NULL);
+ dn = splice_dentry(dn, in, NULL, true);
if (IS_ERR(dn)) {
err = PTR_ERR(dn);
goto done;
err = PTR_ERR(in);
goto out;
}
- dn = splice_dentry(dn, in, NULL);
+ dn = splice_dentry(dn, in, NULL, false);
if (IS_ERR(dn))
dn = NULL;
}
struct inode *parent_inode = dentry->d_parent->d_inode;
const unsigned int ia_valid = attr->ia_valid;
struct ceph_mds_request *req;
- struct ceph_mds_client *mdsc = &ceph_sb_to_client(dentry->d_sb)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_sb_to_client(dentry->d_sb)->mdsc;
int issued;
int release = 0, dirtied = 0;
int mask = 0;
*/
int ceph_do_getattr(struct inode *inode, int mask)
{
- struct ceph_client *client = ceph_sb_to_client(inode->i_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(inode->i_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req;
int err;
#include <linux/in.h>
-#include "ioctl.h"
#include "super.h"
-#include "ceph_debug.h"
+#include "mds_client.h"
+#include <linux/ceph/ceph_debug.h>
+
+#include "ioctl.h"
/*
{
struct inode *inode = file->f_dentry->d_inode;
struct inode *parent_inode = file->f_dentry->d_parent->d_inode;
- struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_mds_request *req;
struct ceph_ioctl_layout l;
int err, i;
}
/*
+ * Set a layout policy on a directory inode. All items in the tree
+ * rooted at this inode will inherit this layout on creation,
+ * (It doesn't apply retroactively )
+ * unless a subdirectory has its own layout policy.
+ */
+static long ceph_ioctl_set_layout_policy (struct file *file, void __user *arg)
+{
+ struct inode *inode = file->f_dentry->d_inode;
+ struct ceph_mds_request *req;
+ struct ceph_ioctl_layout l;
+ int err, i;
+ struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
+
+ /* copy and validate */
+ if (copy_from_user(&l, arg, sizeof(l)))
+ return -EFAULT;
+
+ if ((l.object_size & ~PAGE_MASK) ||
+ (l.stripe_unit & ~PAGE_MASK) ||
+ !l.stripe_unit ||
+ (l.object_size &&
+ (unsigned)l.object_size % (unsigned)l.stripe_unit))
+ return -EINVAL;
+
+ /* make sure it's a valid data pool */
+ if (l.data_pool > 0) {
+ mutex_lock(&mdsc->mutex);
+ err = -EINVAL;
+ for (i = 0; i < mdsc->mdsmap->m_num_data_pg_pools; i++)
+ if (mdsc->mdsmap->m_data_pg_pools[i] == l.data_pool) {
+ err = 0;
+ break;
+ }
+ mutex_unlock(&mdsc->mutex);
+ if (err)
+ return err;
+ }
+
+ req = ceph_mdsc_create_request(mdsc, CEPH_MDS_OP_SETDIRLAYOUT,
+ USE_AUTH_MDS);
+
+ if (IS_ERR(req))
+ return PTR_ERR(req);
+ req->r_inode = igrab(inode);
+
+ req->r_args.setlayout.layout.fl_stripe_unit =
+ cpu_to_le32(l.stripe_unit);
+ req->r_args.setlayout.layout.fl_stripe_count =
+ cpu_to_le32(l.stripe_count);
+ req->r_args.setlayout.layout.fl_object_size =
+ cpu_to_le32(l.object_size);
+ req->r_args.setlayout.layout.fl_pg_pool =
+ cpu_to_le32(l.data_pool);
+ req->r_args.setlayout.layout.fl_pg_preferred =
+ cpu_to_le32(l.preferred_osd);
+
+ err = ceph_mdsc_do_request(mdsc, inode, req);
+ ceph_mdsc_put_request(req);
+ return err;
+}
+
+/*
* Return object name, size/offset information, and location (OSD
* number, network address) for a given file offset.
*/
struct ceph_ioctl_dataloc dl;
struct inode *inode = file->f_dentry->d_inode;
struct ceph_inode_info *ci = ceph_inode(inode);
- struct ceph_osd_client *osdc = &ceph_sb_to_client(inode->i_sb)->osdc;
+ struct ceph_osd_client *osdc =
+ &ceph_sb_to_client(inode->i_sb)->client->osdc;
u64 len = 1, olen;
u64 tmp;
struct ceph_object_layout ol;
case CEPH_IOC_SET_LAYOUT:
return ceph_ioctl_set_layout(file, (void __user *)arg);
+ case CEPH_IOC_SET_LAYOUT_POLICY:
+ return ceph_ioctl_set_layout_policy(file, (void __user *)arg);
+
case CEPH_IOC_GET_DATALOC:
return ceph_ioctl_get_dataloc(file, (void __user *)arg);
case CEPH_IOC_LAZYIO:
return ceph_ioctl_lazyio(file);
}
+
return -ENOTTY;
}
#include <linux/ioctl.h>
#include <linux/types.h>
-#define CEPH_IOCTL_MAGIC 0x97
+#define CEPH_IOCTL_MAGIC 0x98
/* just use u64 to align sanely on all archs */
struct ceph_ioctl_layout {
struct ceph_ioctl_layout)
#define CEPH_IOC_SET_LAYOUT _IOW(CEPH_IOCTL_MAGIC, 2, \
struct ceph_ioctl_layout)
+#define CEPH_IOC_SET_LAYOUT_POLICY _IOW(CEPH_IOCTL_MAGIC, 5, \
+ struct ceph_ioctl_layout)
/*
* Extract identity, address of the OSD and object storing a given
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/file.h>
#include <linux/namei.h>
#include "super.h"
#include "mds_client.h"
-#include "pagelist.h"
+#include <linux/ceph/pagelist.h>
/**
* Implement fcntl and flock locking functions.
{
struct inode *inode = file->f_dentry->d_inode;
struct ceph_mds_client *mdsc =
- &ceph_sb_to_client(inode->i_sb)->mdsc;
+ ceph_sb_to_client(inode->i_sb)->mdsc;
struct ceph_mds_request *req;
int err;
* Encode the flock and fcntl locks for the given inode into the pagelist.
* Format is: #fcntl locks, sequential fcntl locks, #flock locks,
* sequential flock locks.
- * Must be called with BLK already held, and the lock numbers should have
- * been gathered under the same lock holding window.
+ * Must be called with lock_flocks() already held.
+ * If we encounter more of a specific lock type than expected,
+ * we return the value 1.
*/
int ceph_encode_locks(struct inode *inode, struct ceph_pagelist *pagelist,
int num_fcntl_locks, int num_flock_locks)
struct file_lock *lock;
struct ceph_filelock cephlock;
int err = 0;
+ int seen_fcntl = 0;
+ int seen_flock = 0;
dout("encoding %d flock and %d fcntl locks", num_flock_locks,
num_fcntl_locks);
goto fail;
for (lock = inode->i_flock; lock != NULL; lock = lock->fl_next) {
if (lock->fl_flags & FL_POSIX) {
+ ++seen_fcntl;
+ if (seen_fcntl > num_fcntl_locks) {
+ err = -ENOSPC;
+ goto fail;
+ }
err = lock_to_ceph_filelock(lock, &cephlock);
if (err)
goto fail;
goto fail;
for (lock = inode->i_flock; lock != NULL; lock = lock->fl_next) {
if (lock->fl_flags & FL_FLOCK) {
+ ++seen_flock;
+ if (seen_flock > num_flock_locks) {
+ err = -ENOSPC;
+ goto fail;
+ }
err = lock_to_ceph_filelock(lock, &cephlock);
if (err)
goto fail;
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
+#include <linux/fs.h>
#include <linux/wait.h>
#include <linux/slab.h>
#include <linux/sched.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
#include <linux/smp_lock.h>
-#include "mds_client.h"
-#include "mon_client.h"
#include "super.h"
-#include "messenger.h"
-#include "decode.h"
-#include "auth.h"
-#include "pagelist.h"
+#include "mds_client.h"
+
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/pagelist.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/debugfs.h>
/*
* A cluster of MDS (metadata server) daemons is responsible for
atomic_read(&s->s_ref), atomic_read(&s->s_ref)-1);
if (atomic_dec_and_test(&s->s_ref)) {
if (s->s_authorizer)
- s->s_mdsc->client->monc.auth->ops->destroy_authorizer(
- s->s_mdsc->client->monc.auth, s->s_authorizer);
+ s->s_mdsc->fsc->client->monc.auth->ops->destroy_authorizer(
+ s->s_mdsc->fsc->client->monc.auth,
+ s->s_authorizer);
kfree(s);
}
}
s->s_seq = 0;
mutex_init(&s->s_mutex);
- ceph_con_init(mdsc->client->msgr, &s->s_con);
+ ceph_con_init(mdsc->fsc->client->msgr, &s->s_con);
s->s_con.private = s;
s->s_con.ops = &mds_con_ops;
s->s_con.peer_name.type = CEPH_ENTITY_TYPE_MDS;
} else if (req->r_dentry) {
struct inode *dir = req->r_dentry->d_parent->d_inode;
- if (dir->i_sb != mdsc->client->sb) {
+ if (dir->i_sb != mdsc->fsc->sb) {
/* not this fs! */
inode = req->r_dentry->d_inode;
} else if (ceph_snap(dir) != CEPH_NOSNAP) {
__ceph_remove_cap(cap);
if (!__ceph_is_any_real_caps(ci)) {
struct ceph_mds_client *mdsc =
- &ceph_sb_to_client(inode->i_sb)->mdsc;
+ ceph_sb_to_client(inode->i_sb)->mdsc;
spin_lock(&mdsc->cap_dirty_lock);
if (!list_empty(&ci->i_dirty_item)) {
struct ceph_msg *msg, *partial = NULL;
struct ceph_mds_cap_release *head;
int err = -ENOMEM;
- int extra = mdsc->client->mount_args->cap_release_safety;
+ int extra = mdsc->fsc->mount_options->cap_release_safety;
int num;
dout("add_cap_releases %p mds%d extra %d\n", session, session->s_mds,
/* insert trace into our cache */
mutex_lock(&req->r_fill_mutex);
- err = ceph_fill_trace(mdsc->client->sb, req, req->r_session);
+ err = ceph_fill_trace(mdsc->fsc->sb, req, req->r_session);
if (err == 0) {
if (result == 0 && rinfo->dir_nr)
ceph_readdir_prepopulate(req, req->r_session);
if (recon_state->flock) {
int num_fcntl_locks, num_flock_locks;
-
- lock_kernel();
- ceph_count_locks(inode, &num_fcntl_locks, &num_flock_locks);
- rec.v2.flock_len = (2*sizeof(u32) +
- (num_fcntl_locks+num_flock_locks) *
- sizeof(struct ceph_filelock));
-
+ struct ceph_pagelist_cursor trunc_point;
+
+ ceph_pagelist_set_cursor(pagelist, &trunc_point);
+ do {
+ lock_flocks();
+ ceph_count_locks(inode, &num_fcntl_locks,
+ &num_flock_locks);
+ rec.v2.flock_len = (2*sizeof(u32) +
+ (num_fcntl_locks+num_flock_locks) *
+ sizeof(struct ceph_filelock));
+ unlock_flocks();
+
+ /* pre-alloc pagelist */
+ ceph_pagelist_truncate(pagelist, &trunc_point);
+ err = ceph_pagelist_append(pagelist, &rec, reclen);
+ if (!err)
+ err = ceph_pagelist_reserve(pagelist,
+ rec.v2.flock_len);
+
+ /* encode locks */
+ if (!err) {
+ lock_flocks();
+ err = ceph_encode_locks(inode,
+ pagelist,
+ num_fcntl_locks,
+ num_flock_locks);
+ unlock_flocks();
+ }
+ } while (err == -ENOSPC);
+ } else {
err = ceph_pagelist_append(pagelist, &rec, reclen);
- if (!err)
- err = ceph_encode_locks(inode, pagelist,
- num_fcntl_locks,
- num_flock_locks);
- unlock_kernel();
}
out_free:
struct ceph_mds_session *session,
struct ceph_msg *msg)
{
- struct super_block *sb = mdsc->client->sb;
+ struct super_block *sb = mdsc->fsc->sb;
struct inode *inode;
struct ceph_inode_info *ci;
struct dentry *parent, *dentry;
schedule_delayed(mdsc);
}
+int ceph_mdsc_init(struct ceph_fs_client *fsc)
-int ceph_mdsc_init(struct ceph_mds_client *mdsc, struct ceph_client *client)
{
- mdsc->client = client;
+ struct ceph_mds_client *mdsc;
+
+ mdsc = kzalloc(sizeof(struct ceph_mds_client), GFP_NOFS);
+ if (!mdsc)
+ return -ENOMEM;
+ mdsc->fsc = fsc;
+ fsc->mdsc = mdsc;
mutex_init(&mdsc->mutex);
mdsc->mdsmap = kzalloc(sizeof(*mdsc->mdsmap), GFP_NOFS);
if (mdsc->mdsmap == NULL)
INIT_LIST_HEAD(&mdsc->dentry_lru);
ceph_caps_init(mdsc);
- ceph_adjust_min_caps(mdsc, client->min_caps);
+ ceph_adjust_min_caps(mdsc, fsc->min_caps);
return 0;
}
static void wait_requests(struct ceph_mds_client *mdsc)
{
struct ceph_mds_request *req;
- struct ceph_client *client = mdsc->client;
+ struct ceph_fs_client *fsc = mdsc->fsc;
mutex_lock(&mdsc->mutex);
if (__get_oldest_req(mdsc)) {
dout("wait_requests waiting for requests\n");
wait_for_completion_timeout(&mdsc->safe_umount_waiters,
- client->mount_args->mount_timeout * HZ);
+ fsc->client->options->mount_timeout * HZ);
/* tear down remaining requests */
mutex_lock(&mdsc->mutex);
{
u64 want_tid, want_flush;
- if (mdsc->client->mount_state == CEPH_MOUNT_SHUTDOWN)
+ if (mdsc->fsc->mount_state == CEPH_MOUNT_SHUTDOWN)
return;
dout("sync\n");
{
int i, n = 0;
- if (mdsc->client->mount_state == CEPH_MOUNT_SHUTDOWN)
+ if (mdsc->fsc->mount_state == CEPH_MOUNT_SHUTDOWN)
return true;
mutex_lock(&mdsc->mutex);
{
struct ceph_mds_session *session;
int i;
- struct ceph_client *client = mdsc->client;
- unsigned long timeout = client->mount_args->mount_timeout * HZ;
+ struct ceph_fs_client *fsc = mdsc->fsc;
+ unsigned long timeout = fsc->client->options->mount_timeout * HZ;
dout("close_sessions\n");
dout("stopped\n");
}
-void ceph_mdsc_stop(struct ceph_mds_client *mdsc)
+static void ceph_mdsc_stop(struct ceph_mds_client *mdsc)
{
dout("stop\n");
cancel_delayed_work_sync(&mdsc->delayed_work); /* cancel timer */
ceph_caps_finalize(mdsc);
}
+void ceph_mdsc_destroy(struct ceph_fs_client *fsc)
+{
+ struct ceph_mds_client *mdsc = fsc->mdsc;
+
+ ceph_mdsc_stop(mdsc);
+ fsc->mdsc = NULL;
+ kfree(mdsc);
+}
+
/*
* handle mds map update.
ceph_decode_need(&p, end, sizeof(fsid)+2*sizeof(u32), bad);
ceph_decode_copy(&p, &fsid, sizeof(fsid));
- if (ceph_check_fsid(mdsc->client, &fsid) < 0)
+ if (ceph_check_fsid(mdsc->fsc->client, &fsid) < 0)
return;
epoch = ceph_decode_32(&p);
maplen = ceph_decode_32(&p);
dout("handle_map epoch %u len %d\n", epoch, (int)maplen);
/* do we need it? */
- ceph_monc_got_mdsmap(&mdsc->client->monc, epoch);
+ ceph_monc_got_mdsmap(&mdsc->fsc->client->monc, epoch);
mutex_lock(&mdsc->mutex);
if (mdsc->mdsmap && epoch <= mdsc->mdsmap->m_epoch) {
dout("handle_map epoch %u <= our %u\n",
} else {
mdsc->mdsmap = newmap; /* first mds map */
}
- mdsc->client->sb->s_maxbytes = mdsc->mdsmap->m_max_file_size;
+ mdsc->fsc->sb->s_maxbytes = mdsc->mdsmap->m_max_file_size;
__wake_requests(mdsc, &mdsc->waiting_for_map);
{
struct ceph_mds_session *s = con->private;
struct ceph_mds_client *mdsc = s->s_mdsc;
- struct ceph_auth_client *ac = mdsc->client->monc.auth;
+ struct ceph_auth_client *ac = mdsc->fsc->client->monc.auth;
int ret = 0;
if (force_new && s->s_authorizer) {
{
struct ceph_mds_session *s = con->private;
struct ceph_mds_client *mdsc = s->s_mdsc;
- struct ceph_auth_client *ac = mdsc->client->monc.auth;
+ struct ceph_auth_client *ac = mdsc->fsc->client->monc.auth;
return ac->ops->verify_authorizer_reply(ac, s->s_authorizer, len);
}
{
struct ceph_mds_session *s = con->private;
struct ceph_mds_client *mdsc = s->s_mdsc;
- struct ceph_auth_client *ac = mdsc->client->monc.auth;
+ struct ceph_auth_client *ac = mdsc->fsc->client->monc.auth;
if (ac->ops->invalidate_authorizer)
ac->ops->invalidate_authorizer(ac, CEPH_ENTITY_TYPE_MDS);
- return ceph_monc_validate_auth(&mdsc->client->monc);
+ return ceph_monc_validate_auth(&mdsc->fsc->client->monc);
}
static const struct ceph_connection_operations mds_con_ops = {
.peer_reset = peer_reset,
};
-
-
-
/* eof */
#include <linux/rbtree.h>
#include <linux/spinlock.h>
-#include "types.h"
-#include "messenger.h"
-#include "mdsmap.h"
+#include <linux/ceph/types.h>
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/mdsmap.h>
/*
* Some lock dependencies:
*
*/
-struct ceph_client;
+struct ceph_fs_client;
struct ceph_cap;
/*
* mds client state
*/
struct ceph_mds_client {
- struct ceph_client *client;
+ struct ceph_fs_client *fsc;
struct mutex mutex; /* all nested structures */
struct ceph_mdsmap *mdsmap;
int caps_avail_count; /* unused, unreserved */
int caps_min_count; /* keep at least this many
(unreserved) */
-
-#ifdef CONFIG_DEBUG_FS
- struct dentry *debugfs_file;
-#endif
-
spinlock_t dentry_lru_lock;
struct list_head dentry_lru;
int num_dentry;
extern int ceph_send_msg_mds(struct ceph_mds_client *mdsc,
struct ceph_msg *msg, int mds);
-extern int ceph_mdsc_init(struct ceph_mds_client *mdsc,
- struct ceph_client *client);
+extern int ceph_mdsc_init(struct ceph_fs_client *fsc);
extern void ceph_mdsc_close_sessions(struct ceph_mds_client *mdsc);
-extern void ceph_mdsc_stop(struct ceph_mds_client *mdsc);
+extern void ceph_mdsc_destroy(struct ceph_fs_client *fsc);
extern void ceph_mdsc_sync(struct ceph_mds_client *mdsc);
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/bug.h>
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/types.h>
-#include "mdsmap.h"
-#include "messenger.h"
-#include "decode.h"
+#include <linux/ceph/mdsmap.h>
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/decode.h>
#include "super.h"
}
dout("mdsmap_decode %d/%d %lld mds%d.%d %s %s\n",
- i+1, n, global_id, mds, inc, pr_addr(&addr.in_addr),
+ i+1, n, global_id, mds, inc,
+ ceph_pr_addr(&addr.in_addr),
ceph_mds_state_name(state));
if (mds >= 0 && mds < m->m_max_mds && state > 0) {
m->m_info[mds].global_id = global_id;
+++ /dev/null
-
-#include <linux/gfp.h>
-#include <linux/pagemap.h>
-#include <linux/highmem.h>
-
-#include "pagelist.h"
-
-int ceph_pagelist_release(struct ceph_pagelist *pl)
-{
- if (pl->mapped_tail)
- kunmap(pl->mapped_tail);
- while (!list_empty(&pl->head)) {
- struct page *page = list_first_entry(&pl->head, struct page,
- lru);
- list_del(&page->lru);
- __free_page(page);
- }
- return 0;
-}
-
-static int ceph_pagelist_addpage(struct ceph_pagelist *pl)
-{
- struct page *page = __page_cache_alloc(GFP_NOFS);
- if (!page)
- return -ENOMEM;
- pl->room += PAGE_SIZE;
- list_add_tail(&page->lru, &pl->head);
- if (pl->mapped_tail)
- kunmap(pl->mapped_tail);
- pl->mapped_tail = kmap(page);
- return 0;
-}
-
-int ceph_pagelist_append(struct ceph_pagelist *pl, void *buf, size_t len)
-{
- while (pl->room < len) {
- size_t bit = pl->room;
- int ret;
-
- memcpy(pl->mapped_tail + (pl->length & ~PAGE_CACHE_MASK),
- buf, bit);
- pl->length += bit;
- pl->room -= bit;
- buf += bit;
- len -= bit;
- ret = ceph_pagelist_addpage(pl);
- if (ret)
- return ret;
- }
-
- memcpy(pl->mapped_tail + (pl->length & ~PAGE_CACHE_MASK), buf, len);
- pl->length += len;
- pl->room -= len;
- return 0;
-}
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/sort.h>
#include <linux/slab.h>
#include "super.h"
-#include "decode.h"
+#include "mds_client.h"
+
+#include <linux/ceph/decode.h>
/*
* Snapshots in ceph are driven in large part by cooperation from the
INIT_LIST_HEAD(&realm->children);
INIT_LIST_HEAD(&realm->child_item);
INIT_LIST_HEAD(&realm->empty_item);
+ INIT_LIST_HEAD(&realm->dirty_item);
INIT_LIST_HEAD(&realm->inodes_with_caps);
spin_lock_init(&realm->inodes_with_caps_lock);
__insert_snap_realm(&mdsc->snap_realms, realm);
INIT_LIST_HEAD(&capsnap->ci_item);
INIT_LIST_HEAD(&capsnap->flushing_item);
- capsnap->follows = snapc->seq - 1;
+ capsnap->follows = snapc->seq;
capsnap->issued = __ceph_caps_issued(ci, NULL);
capsnap->dirty = dirty;
struct ceph_cap_snap *capsnap)
{
struct inode *inode = &ci->vfs_inode;
- struct ceph_mds_client *mdsc = &ceph_sb_to_client(inode->i_sb)->mdsc;
+ struct ceph_mds_client *mdsc = ceph_sb_to_client(inode->i_sb)->mdsc;
BUG_ON(capsnap->writing);
capsnap->size = inode->i_size;
struct ceph_snap_realm *realm;
int invalidate = 0;
int err = -ENOMEM;
+ LIST_HEAD(dirty_realms);
dout("update_snap_trace deletion=%d\n", deletion);
more:
}
}
- if (le64_to_cpu(ri->seq) > realm->seq) {
- dout("update_snap_trace updating %llx %p %lld -> %lld\n",
- realm->ino, realm, realm->seq, le64_to_cpu(ri->seq));
- /*
- * if the realm seq has changed, queue a cap_snap for every
- * inode with open caps. we do this _before_ we update
- * the realm info so that we prepare for writeback under the
- * _previous_ snap context.
- *
- * ...unless it's a snap deletion!
- */
- if (!deletion)
- queue_realm_cap_snaps(realm);
- } else {
- dout("update_snap_trace %llx %p seq %lld unchanged\n",
- realm->ino, realm, realm->seq);
- }
-
/* ensure the parent is correct */
err = adjust_snap_realm_parent(mdsc, realm, le64_to_cpu(ri->parent));
if (err < 0)
invalidate += err;
if (le64_to_cpu(ri->seq) > realm->seq) {
+ dout("update_snap_trace updating %llx %p %lld -> %lld\n",
+ realm->ino, realm, realm->seq, le64_to_cpu(ri->seq));
/* update realm parameters, snap lists */
realm->seq = le64_to_cpu(ri->seq);
realm->created = le64_to_cpu(ri->created);
if (err < 0)
goto fail;
+ /* queue realm for cap_snap creation */
+ list_add(&realm->dirty_item, &dirty_realms);
+
invalidate = 1;
} else if (!realm->cached_context) {
+ dout("update_snap_trace %llx %p seq %lld new\n",
+ realm->ino, realm, realm->seq);
invalidate = 1;
+ } else {
+ dout("update_snap_trace %llx %p seq %lld unchanged\n",
+ realm->ino, realm, realm->seq);
}
dout("done with %llx %p, invalidated=%d, %p %p\n", realm->ino,
if (invalidate)
rebuild_snap_realms(realm);
+ /*
+ * queue cap snaps _after_ we've built the new snap contexts,
+ * so that i_head_snapc can be set appropriately.
+ */
+ list_for_each_entry(realm, &dirty_realms, dirty_item) {
+ queue_realm_cap_snaps(realm);
+ }
+
__cleanup_empty_realms(mdsc);
return 0;
igrab(inode);
spin_unlock(&mdsc->snap_flush_lock);
spin_lock(&inode->i_lock);
- __ceph_flush_snaps(ci, &session);
+ __ceph_flush_snaps(ci, &session, 0);
spin_unlock(&inode->i_lock);
iput(inode);
spin_lock(&mdsc->snap_flush_lock);
struct ceph_mds_session *session,
struct ceph_msg *msg)
{
- struct super_block *sb = mdsc->client->sb;
+ struct super_block *sb = mdsc->fsc->sb;
int mds = session->s_mds;
u64 split;
int op;
};
struct inode *inode = ceph_find_inode(sb, vino);
struct ceph_inode_info *ci;
+ struct ceph_snap_realm *oldrealm;
if (!inode)
continue;
dout(" will move %p to split realm %llx %p\n",
inode, realm->ino, realm);
/*
- * Remove the inode from the realm's inode
- * list, but don't add it to the new realm
- * yet. We don't want the cap_snap to be
- * queued (again) by ceph_update_snap_trace()
- * below. Queue it _now_, under the old context.
+ * Move the inode to the new realm
*/
spin_lock(&realm->inodes_with_caps_lock);
list_del_init(&ci->i_snap_realm_item);
+ list_add(&ci->i_snap_realm_item,
+ &realm->inodes_with_caps);
+ oldrealm = ci->i_snap_realm;
+ ci->i_snap_realm = realm;
spin_unlock(&realm->inodes_with_caps_lock);
spin_unlock(&inode->i_lock);
- ceph_queue_cap_snap(ci);
+ ceph_get_snap_realm(mdsc, realm);
+ ceph_put_snap_realm(mdsc, oldrealm);
iput(inode);
continue;
ceph_update_snap_trace(mdsc, p, e,
op == CEPH_SNAP_OP_DESTROY);
- if (op == CEPH_SNAP_OP_SPLIT) {
- /*
- * ok, _now_ add the inodes into the new realm.
- */
- for (i = 0; i < num_split_inos; i++) {
- struct ceph_vino vino = {
- .ino = le64_to_cpu(split_inos[i]),
- .snap = CEPH_NOSNAP,
- };
- struct inode *inode = ceph_find_inode(sb, vino);
- struct ceph_inode_info *ci;
-
- if (!inode)
- continue;
- ci = ceph_inode(inode);
- spin_lock(&inode->i_lock);
- if (list_empty(&ci->i_snap_realm_item)) {
- struct ceph_snap_realm *oldrealm =
- ci->i_snap_realm;
-
- dout(" moving %p to split realm %llx %p\n",
- inode, realm->ino, realm);
- spin_lock(&realm->inodes_with_caps_lock);
- list_add(&ci->i_snap_realm_item,
- &realm->inodes_with_caps);
- ci->i_snap_realm = realm;
- spin_unlock(&realm->inodes_with_caps_lock);
- ceph_get_snap_realm(mdsc, realm);
- ceph_put_snap_realm(mdsc, oldrealm);
- }
- spin_unlock(&inode->i_lock);
- iput(inode);
- }
-
+ if (op == CEPH_SNAP_OP_SPLIT)
/* we took a reference when we created the realm, above */
ceph_put_snap_realm(mdsc, realm);
- }
__cleanup_empty_realms(mdsc);
/*
- * Ceph string constants
+ * Ceph fs string constants
*/
-#include "types.h"
+#include <linux/module.h>
+#include <linux/ceph/types.h>
-const char *ceph_entity_type_name(int type)
-{
- switch (type) {
- case CEPH_ENTITY_TYPE_MDS: return "mds";
- case CEPH_ENTITY_TYPE_OSD: return "osd";
- case CEPH_ENTITY_TYPE_MON: return "mon";
- case CEPH_ENTITY_TYPE_CLIENT: return "client";
- case CEPH_ENTITY_TYPE_AUTH: return "auth";
- default: return "unknown";
- }
-}
-
-const char *ceph_osd_op_name(int op)
-{
- switch (op) {
- case CEPH_OSD_OP_READ: return "read";
- case CEPH_OSD_OP_STAT: return "stat";
-
- case CEPH_OSD_OP_MASKTRUNC: return "masktrunc";
-
- case CEPH_OSD_OP_WRITE: return "write";
- case CEPH_OSD_OP_DELETE: return "delete";
- case CEPH_OSD_OP_TRUNCATE: return "truncate";
- case CEPH_OSD_OP_ZERO: return "zero";
- case CEPH_OSD_OP_WRITEFULL: return "writefull";
- case CEPH_OSD_OP_ROLLBACK: return "rollback";
-
- case CEPH_OSD_OP_APPEND: return "append";
- case CEPH_OSD_OP_STARTSYNC: return "startsync";
- case CEPH_OSD_OP_SETTRUNC: return "settrunc";
- case CEPH_OSD_OP_TRIMTRUNC: return "trimtrunc";
-
- case CEPH_OSD_OP_TMAPUP: return "tmapup";
- case CEPH_OSD_OP_TMAPGET: return "tmapget";
- case CEPH_OSD_OP_TMAPPUT: return "tmapput";
-
- case CEPH_OSD_OP_GETXATTR: return "getxattr";
- case CEPH_OSD_OP_GETXATTRS: return "getxattrs";
- case CEPH_OSD_OP_SETXATTR: return "setxattr";
- case CEPH_OSD_OP_SETXATTRS: return "setxattrs";
- case CEPH_OSD_OP_RESETXATTRS: return "resetxattrs";
- case CEPH_OSD_OP_RMXATTR: return "rmxattr";
- case CEPH_OSD_OP_CMPXATTR: return "cmpxattr";
-
- case CEPH_OSD_OP_PULL: return "pull";
- case CEPH_OSD_OP_PUSH: return "push";
- case CEPH_OSD_OP_BALANCEREADS: return "balance-reads";
- case CEPH_OSD_OP_UNBALANCEREADS: return "unbalance-reads";
- case CEPH_OSD_OP_SCRUB: return "scrub";
-
- case CEPH_OSD_OP_WRLOCK: return "wrlock";
- case CEPH_OSD_OP_WRUNLOCK: return "wrunlock";
- case CEPH_OSD_OP_RDLOCK: return "rdlock";
- case CEPH_OSD_OP_RDUNLOCK: return "rdunlock";
- case CEPH_OSD_OP_UPLOCK: return "uplock";
- case CEPH_OSD_OP_DNLOCK: return "dnlock";
-
- case CEPH_OSD_OP_CALL: return "call";
-
- case CEPH_OSD_OP_PGLS: return "pgls";
- }
- return "???";
-}
const char *ceph_mds_state_name(int s)
{
}
return "???";
}
-
-const char *ceph_pool_op_name(int op)
-{
- switch (op) {
- case POOL_OP_CREATE: return "create";
- case POOL_OP_DELETE: return "delete";
- case POOL_OP_AUID_CHANGE: return "auid change";
- case POOL_OP_CREATE_SNAP: return "create snap";
- case POOL_OP_DELETE_SNAP: return "delete snap";
- case POOL_OP_CREATE_UNMANAGED_SNAP: return "create unmanaged snap";
- case POOL_OP_DELETE_UNMANAGED_SNAP: return "delete unmanaged snap";
- }
- return "???";
-}
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/backing-dev.h>
#include <linux/ctype.h>
#include <linux/statfs.h>
#include <linux/string.h>
-#include "decode.h"
#include "super.h"
-#include "mon_client.h"
-#include "auth.h"
+#include "mds_client.h"
+
+#include <linux/ceph/decode.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/debugfs.h>
/*
* Ceph superblock operations
* Handle the basics of mounting, unmounting.
*/
-
-/*
- * find filename portion of a path (/foo/bar/baz -> baz)
- */
-const char *ceph_file_part(const char *s, int len)
-{
- const char *e = s + len;
-
- while (e != s && *(e-1) != '/')
- e--;
- return e;
-}
-
-
/*
* super ops
*/
static void ceph_put_super(struct super_block *s)
{
- struct ceph_client *client = ceph_sb_to_client(s);
+ struct ceph_fs_client *fsc = ceph_sb_to_client(s);
dout("put_super\n");
- ceph_mdsc_close_sessions(&client->mdsc);
+ ceph_mdsc_close_sessions(fsc->mdsc);
/*
* ensure we release the bdi before put_anon_super releases
* the device name.
*/
- if (s->s_bdi == &client->backing_dev_info) {
- bdi_unregister(&client->backing_dev_info);
+ if (s->s_bdi == &fsc->backing_dev_info) {
+ bdi_unregister(&fsc->backing_dev_info);
s->s_bdi = NULL;
}
static int ceph_statfs(struct dentry *dentry, struct kstatfs *buf)
{
- struct ceph_client *client = ceph_inode_to_client(dentry->d_inode);
- struct ceph_monmap *monmap = client->monc.monmap;
+ struct ceph_fs_client *fsc = ceph_inode_to_client(dentry->d_inode);
+ struct ceph_monmap *monmap = fsc->client->monc.monmap;
struct ceph_statfs st;
u64 fsid;
int err;
dout("statfs\n");
- err = ceph_monc_do_statfs(&client->monc, &st);
+ err = ceph_monc_do_statfs(&fsc->client->monc, &st);
if (err < 0)
return err;
static int ceph_sync_fs(struct super_block *sb, int wait)
{
- struct ceph_client *client = ceph_sb_to_client(sb);
+ struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
if (!wait) {
dout("sync_fs (non-blocking)\n");
- ceph_flush_dirty_caps(&client->mdsc);
+ ceph_flush_dirty_caps(fsc->mdsc);
dout("sync_fs (non-blocking) done\n");
return 0;
}
dout("sync_fs (blocking)\n");
- ceph_osdc_sync(&ceph_sb_to_client(sb)->osdc);
- ceph_mdsc_sync(&ceph_sb_to_client(sb)->mdsc);
+ ceph_osdc_sync(&fsc->client->osdc);
+ ceph_mdsc_sync(fsc->mdsc);
dout("sync_fs (blocking) done\n");
return 0;
}
-static int default_congestion_kb(void)
-{
- int congestion_kb;
-
- /*
- * Copied from NFS
- *
- * congestion size, scale with available memory.
- *
- * 64MB: 8192k
- * 128MB: 11585k
- * 256MB: 16384k
- * 512MB: 23170k
- * 1GB: 32768k
- * 2GB: 46340k
- * 4GB: 65536k
- * 8GB: 92681k
- * 16GB: 131072k
- *
- * This allows larger machines to have larger/more transfers.
- * Limit the default to 256M
- */
- congestion_kb = (16*int_sqrt(totalram_pages)) << (PAGE_SHIFT-10);
- if (congestion_kb > 256*1024)
- congestion_kb = 256*1024;
-
- return congestion_kb;
-}
-
-/**
- * ceph_show_options - Show mount options in /proc/mounts
- * @m: seq_file to write to
- * @mnt: mount descriptor
- */
-static int ceph_show_options(struct seq_file *m, struct vfsmount *mnt)
-{
- struct ceph_client *client = ceph_sb_to_client(mnt->mnt_sb);
- struct ceph_mount_args *args = client->mount_args;
-
- if (args->flags & CEPH_OPT_FSID)
- seq_printf(m, ",fsid=%pU", &args->fsid);
- if (args->flags & CEPH_OPT_NOSHARE)
- seq_puts(m, ",noshare");
- if (args->flags & CEPH_OPT_DIRSTAT)
- seq_puts(m, ",dirstat");
- if ((args->flags & CEPH_OPT_RBYTES) == 0)
- seq_puts(m, ",norbytes");
- if (args->flags & CEPH_OPT_NOCRC)
- seq_puts(m, ",nocrc");
- if (args->flags & CEPH_OPT_NOASYNCREADDIR)
- seq_puts(m, ",noasyncreaddir");
-
- if (args->mount_timeout != CEPH_MOUNT_TIMEOUT_DEFAULT)
- seq_printf(m, ",mount_timeout=%d", args->mount_timeout);
- if (args->osd_idle_ttl != CEPH_OSD_IDLE_TTL_DEFAULT)
- seq_printf(m, ",osd_idle_ttl=%d", args->osd_idle_ttl);
- if (args->osd_timeout != CEPH_OSD_TIMEOUT_DEFAULT)
- seq_printf(m, ",osdtimeout=%d", args->osd_timeout);
- if (args->osd_keepalive_timeout != CEPH_OSD_KEEPALIVE_DEFAULT)
- seq_printf(m, ",osdkeepalivetimeout=%d",
- args->osd_keepalive_timeout);
- if (args->wsize)
- seq_printf(m, ",wsize=%d", args->wsize);
- if (args->rsize != CEPH_MOUNT_RSIZE_DEFAULT)
- seq_printf(m, ",rsize=%d", args->rsize);
- if (args->congestion_kb != default_congestion_kb())
- seq_printf(m, ",write_congestion_kb=%d", args->congestion_kb);
- if (args->caps_wanted_delay_min != CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT)
- seq_printf(m, ",caps_wanted_delay_min=%d",
- args->caps_wanted_delay_min);
- if (args->caps_wanted_delay_max != CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT)
- seq_printf(m, ",caps_wanted_delay_max=%d",
- args->caps_wanted_delay_max);
- if (args->cap_release_safety != CEPH_CAP_RELEASE_SAFETY_DEFAULT)
- seq_printf(m, ",cap_release_safety=%d",
- args->cap_release_safety);
- if (args->max_readdir != CEPH_MAX_READDIR_DEFAULT)
- seq_printf(m, ",readdir_max_entries=%d", args->max_readdir);
- if (args->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT)
- seq_printf(m, ",readdir_max_bytes=%d", args->max_readdir_bytes);
- if (strcmp(args->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT))
- seq_printf(m, ",snapdirname=%s", args->snapdir_name);
- if (args->name)
- seq_printf(m, ",name=%s", args->name);
- if (args->secret)
- seq_puts(m, ",secret=<hidden>");
- return 0;
-}
-
-/*
- * caches
- */
-struct kmem_cache *ceph_inode_cachep;
-struct kmem_cache *ceph_cap_cachep;
-struct kmem_cache *ceph_dentry_cachep;
-struct kmem_cache *ceph_file_cachep;
-
-static void ceph_inode_init_once(void *foo)
-{
- struct ceph_inode_info *ci = foo;
- inode_init_once(&ci->vfs_inode);
-}
-
-static int __init init_caches(void)
-{
- ceph_inode_cachep = kmem_cache_create("ceph_inode_info",
- sizeof(struct ceph_inode_info),
- __alignof__(struct ceph_inode_info),
- (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD),
- ceph_inode_init_once);
- if (ceph_inode_cachep == NULL)
- return -ENOMEM;
-
- ceph_cap_cachep = KMEM_CACHE(ceph_cap,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
- if (ceph_cap_cachep == NULL)
- goto bad_cap;
-
- ceph_dentry_cachep = KMEM_CACHE(ceph_dentry_info,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
- if (ceph_dentry_cachep == NULL)
- goto bad_dentry;
-
- ceph_file_cachep = KMEM_CACHE(ceph_file_info,
- SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
- if (ceph_file_cachep == NULL)
- goto bad_file;
-
- return 0;
-
-bad_file:
- kmem_cache_destroy(ceph_dentry_cachep);
-bad_dentry:
- kmem_cache_destroy(ceph_cap_cachep);
-bad_cap:
- kmem_cache_destroy(ceph_inode_cachep);
- return -ENOMEM;
-}
-
-static void destroy_caches(void)
-{
- kmem_cache_destroy(ceph_inode_cachep);
- kmem_cache_destroy(ceph_cap_cachep);
- kmem_cache_destroy(ceph_dentry_cachep);
- kmem_cache_destroy(ceph_file_cachep);
-}
-
-
-/*
- * ceph_umount_begin - initiate forced umount. Tear down down the
- * mount, skipping steps that may hang while waiting for server(s).
- */
-static void ceph_umount_begin(struct super_block *sb)
-{
- struct ceph_client *client = ceph_sb_to_client(sb);
-
- dout("ceph_umount_begin - starting forced umount\n");
- if (!client)
- return;
- client->mount_state = CEPH_MOUNT_SHUTDOWN;
- return;
-}
-
-static const struct super_operations ceph_super_ops = {
- .alloc_inode = ceph_alloc_inode,
- .destroy_inode = ceph_destroy_inode,
- .write_inode = ceph_write_inode,
- .sync_fs = ceph_sync_fs,
- .put_super = ceph_put_super,
- .show_options = ceph_show_options,
- .statfs = ceph_statfs,
- .umount_begin = ceph_umount_begin,
-};
-
-
-const char *ceph_msg_type_name(int type)
-{
- switch (type) {
- case CEPH_MSG_SHUTDOWN: return "shutdown";
- case CEPH_MSG_PING: return "ping";
- case CEPH_MSG_AUTH: return "auth";
- case CEPH_MSG_AUTH_REPLY: return "auth_reply";
- case CEPH_MSG_MON_MAP: return "mon_map";
- case CEPH_MSG_MON_GET_MAP: return "mon_get_map";
- case CEPH_MSG_MON_SUBSCRIBE: return "mon_subscribe";
- case CEPH_MSG_MON_SUBSCRIBE_ACK: return "mon_subscribe_ack";
- case CEPH_MSG_STATFS: return "statfs";
- case CEPH_MSG_STATFS_REPLY: return "statfs_reply";
- case CEPH_MSG_MDS_MAP: return "mds_map";
- case CEPH_MSG_CLIENT_SESSION: return "client_session";
- case CEPH_MSG_CLIENT_RECONNECT: return "client_reconnect";
- case CEPH_MSG_CLIENT_REQUEST: return "client_request";
- case CEPH_MSG_CLIENT_REQUEST_FORWARD: return "client_request_forward";
- case CEPH_MSG_CLIENT_REPLY: return "client_reply";
- case CEPH_MSG_CLIENT_CAPS: return "client_caps";
- case CEPH_MSG_CLIENT_CAPRELEASE: return "client_cap_release";
- case CEPH_MSG_CLIENT_SNAP: return "client_snap";
- case CEPH_MSG_CLIENT_LEASE: return "client_lease";
- case CEPH_MSG_OSD_MAP: return "osd_map";
- case CEPH_MSG_OSD_OP: return "osd_op";
- case CEPH_MSG_OSD_OPREPLY: return "osd_opreply";
- default: return "unknown";
- }
-}
-
-
/*
* mount options
*/
enum {
Opt_wsize,
Opt_rsize,
- Opt_osdtimeout,
- Opt_osdkeepalivetimeout,
- Opt_mount_timeout,
- Opt_osd_idle_ttl,
Opt_caps_wanted_delay_min,
Opt_caps_wanted_delay_max,
Opt_cap_release_safety,
Opt_congestion_kb,
Opt_last_int,
/* int args above */
- Opt_fsid,
Opt_snapdirname,
- Opt_name,
- Opt_secret,
Opt_last_string,
/* string args above */
- Opt_ip,
- Opt_noshare,
Opt_dirstat,
Opt_nodirstat,
Opt_rbytes,
Opt_norbytes,
- Opt_nocrc,
Opt_noasyncreaddir,
};
-static match_table_t arg_tokens = {
+static match_table_t fsopt_tokens = {
{Opt_wsize, "wsize=%d"},
{Opt_rsize, "rsize=%d"},
- {Opt_osdtimeout, "osdtimeout=%d"},
- {Opt_osdkeepalivetimeout, "osdkeepalive=%d"},
- {Opt_mount_timeout, "mount_timeout=%d"},
- {Opt_osd_idle_ttl, "osd_idle_ttl=%d"},
{Opt_caps_wanted_delay_min, "caps_wanted_delay_min=%d"},
{Opt_caps_wanted_delay_max, "caps_wanted_delay_max=%d"},
{Opt_cap_release_safety, "cap_release_safety=%d"},
{Opt_readdir_max_bytes, "readdir_max_bytes=%d"},
{Opt_congestion_kb, "write_congestion_kb=%d"},
/* int args above */
- {Opt_fsid, "fsid=%s"},
{Opt_snapdirname, "snapdirname=%s"},
- {Opt_name, "name=%s"},
- {Opt_secret, "secret=%s"},
/* string args above */
- {Opt_ip, "ip=%s"},
- {Opt_noshare, "noshare"},
{Opt_dirstat, "dirstat"},
{Opt_nodirstat, "nodirstat"},
{Opt_rbytes, "rbytes"},
{Opt_norbytes, "norbytes"},
- {Opt_nocrc, "nocrc"},
{Opt_noasyncreaddir, "noasyncreaddir"},
{-1, NULL}
};
-static int parse_fsid(const char *str, struct ceph_fsid *fsid)
+static int parse_fsopt_token(char *c, void *private)
{
- int i = 0;
- char tmp[3];
- int err = -EINVAL;
- int d;
-
- dout("parse_fsid '%s'\n", str);
- tmp[2] = 0;
- while (*str && i < 16) {
- if (ispunct(*str)) {
- str++;
- continue;
+ struct ceph_mount_options *fsopt = private;
+ substring_t argstr[MAX_OPT_ARGS];
+ int token, intval, ret;
+
+ token = match_token((char *)c, fsopt_tokens, argstr);
+ if (token < 0)
+ return -EINVAL;
+
+ if (token < Opt_last_int) {
+ ret = match_int(&argstr[0], &intval);
+ if (ret < 0) {
+ pr_err("bad mount option arg (not int) "
+ "at '%s'\n", c);
+ return ret;
}
- if (!isxdigit(str[0]) || !isxdigit(str[1]))
- break;
- tmp[0] = str[0];
- tmp[1] = str[1];
- if (sscanf(tmp, "%x", &d) < 1)
- break;
- fsid->fsid[i] = d & 0xff;
- i++;
- str += 2;
+ dout("got int token %d val %d\n", token, intval);
+ } else if (token > Opt_last_int && token < Opt_last_string) {
+ dout("got string token %d val %s\n", token,
+ argstr[0].from);
+ } else {
+ dout("got token %d\n", token);
}
- if (i == 16)
- err = 0;
- dout("parse_fsid ret %d got fsid %pU", err, fsid);
- return err;
+ switch (token) {
+ case Opt_snapdirname:
+ kfree(fsopt->snapdir_name);
+ fsopt->snapdir_name = kstrndup(argstr[0].from,
+ argstr[0].to-argstr[0].from,
+ GFP_KERNEL);
+ if (!fsopt->snapdir_name)
+ return -ENOMEM;
+ break;
+
+ /* misc */
+ case Opt_wsize:
+ fsopt->wsize = intval;
+ break;
+ case Opt_rsize:
+ fsopt->rsize = intval;
+ break;
+ case Opt_caps_wanted_delay_min:
+ fsopt->caps_wanted_delay_min = intval;
+ break;
+ case Opt_caps_wanted_delay_max:
+ fsopt->caps_wanted_delay_max = intval;
+ break;
+ case Opt_readdir_max_entries:
+ fsopt->max_readdir = intval;
+ break;
+ case Opt_readdir_max_bytes:
+ fsopt->max_readdir_bytes = intval;
+ break;
+ case Opt_congestion_kb:
+ fsopt->congestion_kb = intval;
+ break;
+ case Opt_dirstat:
+ fsopt->flags |= CEPH_MOUNT_OPT_DIRSTAT;
+ break;
+ case Opt_nodirstat:
+ fsopt->flags &= ~CEPH_MOUNT_OPT_DIRSTAT;
+ break;
+ case Opt_rbytes:
+ fsopt->flags |= CEPH_MOUNT_OPT_RBYTES;
+ break;
+ case Opt_norbytes:
+ fsopt->flags &= ~CEPH_MOUNT_OPT_RBYTES;
+ break;
+ case Opt_noasyncreaddir:
+ fsopt->flags |= CEPH_MOUNT_OPT_NOASYNCREADDIR;
+ break;
+ default:
+ BUG_ON(token);
+ }
+ return 0;
}
-static struct ceph_mount_args *parse_mount_args(int flags, char *options,
- const char *dev_name,
- const char **path)
+static void destroy_mount_options(struct ceph_mount_options *args)
{
- struct ceph_mount_args *args;
- const char *c;
- int err = -ENOMEM;
- substring_t argstr[MAX_OPT_ARGS];
+ dout("destroy_mount_options %p\n", args);
+ kfree(args->snapdir_name);
+ kfree(args);
+}
- args = kzalloc(sizeof(*args), GFP_KERNEL);
- if (!args)
- return ERR_PTR(-ENOMEM);
- args->mon_addr = kcalloc(CEPH_MAX_MON, sizeof(*args->mon_addr),
- GFP_KERNEL);
- if (!args->mon_addr)
- goto out;
+static int strcmp_null(const char *s1, const char *s2)
+{
+ if (!s1 && !s2)
+ return 0;
+ if (s1 && !s2)
+ return -1;
+ if (!s1 && s2)
+ return 1;
+ return strcmp(s1, s2);
+}
- dout("parse_mount_args %p, dev_name '%s'\n", args, dev_name);
-
- /* start with defaults */
- args->sb_flags = flags;
- args->flags = CEPH_OPT_DEFAULT;
- args->osd_timeout = CEPH_OSD_TIMEOUT_DEFAULT;
- args->osd_keepalive_timeout = CEPH_OSD_KEEPALIVE_DEFAULT;
- args->mount_timeout = CEPH_MOUNT_TIMEOUT_DEFAULT; /* seconds */
- args->osd_idle_ttl = CEPH_OSD_IDLE_TTL_DEFAULT; /* seconds */
- args->caps_wanted_delay_min = CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT;
- args->caps_wanted_delay_max = CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT;
- args->rsize = CEPH_MOUNT_RSIZE_DEFAULT;
- args->snapdir_name = kstrdup(CEPH_SNAPDIRNAME_DEFAULT, GFP_KERNEL);
- args->cap_release_safety = CEPH_CAP_RELEASE_SAFETY_DEFAULT;
- args->max_readdir = CEPH_MAX_READDIR_DEFAULT;
- args->max_readdir_bytes = CEPH_MAX_READDIR_BYTES_DEFAULT;
- args->congestion_kb = default_congestion_kb();
-
- /* ip1[:port1][,ip2[:port2]...]:/subdir/in/fs */
- err = -EINVAL;
- if (!dev_name)
- goto out;
- *path = strstr(dev_name, ":/");
- if (*path == NULL) {
- pr_err("device name is missing path (no :/ in %s)\n",
- dev_name);
- goto out;
- }
+static int compare_mount_options(struct ceph_mount_options *new_fsopt,
+ struct ceph_options *new_opt,
+ struct ceph_fs_client *fsc)
+{
+ struct ceph_mount_options *fsopt1 = new_fsopt;
+ struct ceph_mount_options *fsopt2 = fsc->mount_options;
+ int ofs = offsetof(struct ceph_mount_options, snapdir_name);
+ int ret;
- /* get mon ip(s) */
- err = ceph_parse_ips(dev_name, *path, args->mon_addr,
- CEPH_MAX_MON, &args->num_mon);
- if (err < 0)
- goto out;
+ ret = memcmp(fsopt1, fsopt2, ofs);
+ if (ret)
+ return ret;
+
+ ret = strcmp_null(fsopt1->snapdir_name, fsopt2->snapdir_name);
+ if (ret)
+ return ret;
+
+ return ceph_compare_options(new_opt, fsc->client);
+}
+
+static int parse_mount_options(struct ceph_mount_options **pfsopt,
+ struct ceph_options **popt,
+ int flags, char *options,
+ const char *dev_name,
+ const char **path)
+{
+ struct ceph_mount_options *fsopt;
+ const char *dev_name_end;
+ int err = -ENOMEM;
+
+ fsopt = kzalloc(sizeof(*fsopt), GFP_KERNEL);
+ if (!fsopt)
+ return -ENOMEM;
+
+ dout("parse_mount_options %p, dev_name '%s'\n", fsopt, dev_name);
+
+ fsopt->sb_flags = flags;
+ fsopt->flags = CEPH_MOUNT_OPT_DEFAULT;
+
+ fsopt->rsize = CEPH_MOUNT_RSIZE_DEFAULT;
+ fsopt->snapdir_name = kstrdup(CEPH_SNAPDIRNAME_DEFAULT, GFP_KERNEL);
+ fsopt->cap_release_safety = CEPH_CAP_RELEASE_SAFETY_DEFAULT;
+ fsopt->max_readdir = CEPH_MAX_READDIR_DEFAULT;
+ fsopt->max_readdir_bytes = CEPH_MAX_READDIR_BYTES_DEFAULT;
+ fsopt->congestion_kb = default_congestion_kb();
+
+ /* ip1[:port1][,ip2[:port2]...]:/subdir/in/fs */
+ err = -EINVAL;
+ if (!dev_name)
+ goto out;
+ *path = strstr(dev_name, ":/");
+ if (*path == NULL) {
+ pr_err("device name is missing path (no :/ in %s)\n",
+ dev_name);
+ goto out;
+ }
+ dev_name_end = *path;
+ dout("device name '%.*s'\n", (int)(dev_name_end - dev_name), dev_name);
/* path on server */
*path += 2;
dout("server path '%s'\n", *path);
- /* parse mount options */
- while ((c = strsep(&options, ",")) != NULL) {
- int token, intval, ret;
- if (!*c)
- continue;
- err = -EINVAL;
- token = match_token((char *)c, arg_tokens, argstr);
- if (token < 0) {
- pr_err("bad mount option at '%s'\n", c);
- goto out;
- }
- if (token < Opt_last_int) {
- ret = match_int(&argstr[0], &intval);
- if (ret < 0) {
- pr_err("bad mount option arg (not int) "
- "at '%s'\n", c);
- continue;
- }
- dout("got int token %d val %d\n", token, intval);
- } else if (token > Opt_last_int && token < Opt_last_string) {
- dout("got string token %d val %s\n", token,
- argstr[0].from);
- } else {
- dout("got token %d\n", token);
- }
- switch (token) {
- case Opt_ip:
- err = ceph_parse_ips(argstr[0].from,
- argstr[0].to,
- &args->my_addr,
- 1, NULL);
- if (err < 0)
- goto out;
- args->flags |= CEPH_OPT_MYIP;
- break;
-
- case Opt_fsid:
- err = parse_fsid(argstr[0].from, &args->fsid);
- if (err == 0)
- args->flags |= CEPH_OPT_FSID;
- break;
- case Opt_snapdirname:
- kfree(args->snapdir_name);
- args->snapdir_name = kstrndup(argstr[0].from,
- argstr[0].to-argstr[0].from,
- GFP_KERNEL);
- break;
- case Opt_name:
- args->name = kstrndup(argstr[0].from,
- argstr[0].to-argstr[0].from,
- GFP_KERNEL);
- break;
- case Opt_secret:
- args->secret = kstrndup(argstr[0].from,
- argstr[0].to-argstr[0].from,
- GFP_KERNEL);
- break;
-
- /* misc */
- case Opt_wsize:
- args->wsize = intval;
- break;
- case Opt_rsize:
- args->rsize = intval;
- break;
- case Opt_osdtimeout:
- args->osd_timeout = intval;
- break;
- case Opt_osdkeepalivetimeout:
- args->osd_keepalive_timeout = intval;
- break;
- case Opt_osd_idle_ttl:
- args->osd_idle_ttl = intval;
- break;
- case Opt_mount_timeout:
- args->mount_timeout = intval;
- break;
- case Opt_caps_wanted_delay_min:
- args->caps_wanted_delay_min = intval;
- break;
- case Opt_caps_wanted_delay_max:
- args->caps_wanted_delay_max = intval;
- break;
- case Opt_readdir_max_entries:
- args->max_readdir = intval;
- break;
- case Opt_readdir_max_bytes:
- args->max_readdir_bytes = intval;
- break;
- case Opt_congestion_kb:
- args->congestion_kb = intval;
- break;
-
- case Opt_noshare:
- args->flags |= CEPH_OPT_NOSHARE;
- break;
-
- case Opt_dirstat:
- args->flags |= CEPH_OPT_DIRSTAT;
- break;
- case Opt_nodirstat:
- args->flags &= ~CEPH_OPT_DIRSTAT;
- break;
- case Opt_rbytes:
- args->flags |= CEPH_OPT_RBYTES;
- break;
- case Opt_norbytes:
- args->flags &= ~CEPH_OPT_RBYTES;
- break;
- case Opt_nocrc:
- args->flags |= CEPH_OPT_NOCRC;
- break;
- case Opt_noasyncreaddir:
- args->flags |= CEPH_OPT_NOASYNCREADDIR;
- break;
-
- default:
- BUG_ON(token);
- }
- }
- return args;
+ err = ceph_parse_options(popt, options, dev_name, dev_name_end,
+ parse_fsopt_token, (void *)fsopt);
+ if (err)
+ goto out;
+
+ /* success */
+ *pfsopt = fsopt;
+ return 0;
out:
- kfree(args->mon_addr);
- kfree(args);
- return ERR_PTR(err);
+ destroy_mount_options(fsopt);
+ return err;
}
-static void destroy_mount_args(struct ceph_mount_args *args)
+/**
+ * ceph_show_options - Show mount options in /proc/mounts
+ * @m: seq_file to write to
+ * @mnt: mount descriptor
+ */
+static int ceph_show_options(struct seq_file *m, struct vfsmount *mnt)
{
- dout("destroy_mount_args %p\n", args);
- kfree(args->snapdir_name);
- args->snapdir_name = NULL;
- kfree(args->name);
- args->name = NULL;
- kfree(args->secret);
- args->secret = NULL;
- kfree(args);
+ struct ceph_fs_client *fsc = ceph_sb_to_client(mnt->mnt_sb);
+ struct ceph_mount_options *fsopt = fsc->mount_options;
+ struct ceph_options *opt = fsc->client->options;
+
+ if (opt->flags & CEPH_OPT_FSID)
+ seq_printf(m, ",fsid=%pU", &opt->fsid);
+ if (opt->flags & CEPH_OPT_NOSHARE)
+ seq_puts(m, ",noshare");
+ if (opt->flags & CEPH_OPT_NOCRC)
+ seq_puts(m, ",nocrc");
+
+ if (opt->name)
+ seq_printf(m, ",name=%s", opt->name);
+ if (opt->secret)
+ seq_puts(m, ",secret=<hidden>");
+
+ if (opt->mount_timeout != CEPH_MOUNT_TIMEOUT_DEFAULT)
+ seq_printf(m, ",mount_timeout=%d", opt->mount_timeout);
+ if (opt->osd_idle_ttl != CEPH_OSD_IDLE_TTL_DEFAULT)
+ seq_printf(m, ",osd_idle_ttl=%d", opt->osd_idle_ttl);
+ if (opt->osd_timeout != CEPH_OSD_TIMEOUT_DEFAULT)
+ seq_printf(m, ",osdtimeout=%d", opt->osd_timeout);
+ if (opt->osd_keepalive_timeout != CEPH_OSD_KEEPALIVE_DEFAULT)
+ seq_printf(m, ",osdkeepalivetimeout=%d",
+ opt->osd_keepalive_timeout);
+
+ if (fsopt->flags & CEPH_MOUNT_OPT_DIRSTAT)
+ seq_puts(m, ",dirstat");
+ if ((fsopt->flags & CEPH_MOUNT_OPT_RBYTES) == 0)
+ seq_puts(m, ",norbytes");
+ if (fsopt->flags & CEPH_MOUNT_OPT_NOASYNCREADDIR)
+ seq_puts(m, ",noasyncreaddir");
+
+ if (fsopt->wsize)
+ seq_printf(m, ",wsize=%d", fsopt->wsize);
+ if (fsopt->rsize != CEPH_MOUNT_RSIZE_DEFAULT)
+ seq_printf(m, ",rsize=%d", fsopt->rsize);
+ if (fsopt->congestion_kb != default_congestion_kb())
+ seq_printf(m, ",write_congestion_kb=%d", fsopt->congestion_kb);
+ if (fsopt->caps_wanted_delay_min != CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT)
+ seq_printf(m, ",caps_wanted_delay_min=%d",
+ fsopt->caps_wanted_delay_min);
+ if (fsopt->caps_wanted_delay_max != CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT)
+ seq_printf(m, ",caps_wanted_delay_max=%d",
+ fsopt->caps_wanted_delay_max);
+ if (fsopt->cap_release_safety != CEPH_CAP_RELEASE_SAFETY_DEFAULT)
+ seq_printf(m, ",cap_release_safety=%d",
+ fsopt->cap_release_safety);
+ if (fsopt->max_readdir != CEPH_MAX_READDIR_DEFAULT)
+ seq_printf(m, ",readdir_max_entries=%d", fsopt->max_readdir);
+ if (fsopt->max_readdir_bytes != CEPH_MAX_READDIR_BYTES_DEFAULT)
+ seq_printf(m, ",readdir_max_bytes=%d", fsopt->max_readdir_bytes);
+ if (strcmp(fsopt->snapdir_name, CEPH_SNAPDIRNAME_DEFAULT))
+ seq_printf(m, ",snapdirname=%s", fsopt->snapdir_name);
+ return 0;
}
/*
- * create a fresh client instance
+ * handle any mon messages the standard library doesn't understand.
+ * return error if we don't either.
*/
-static struct ceph_client *ceph_create_client(struct ceph_mount_args *args)
+static int extra_mon_dispatch(struct ceph_client *client, struct ceph_msg *msg)
{
- struct ceph_client *client;
+ struct ceph_fs_client *fsc = client->private;
+ int type = le16_to_cpu(msg->hdr.type);
+
+ switch (type) {
+ case CEPH_MSG_MDS_MAP:
+ ceph_mdsc_handle_map(fsc->mdsc, msg);
+ return 0;
+
+ default:
+ return -1;
+ }
+}
+
+/*
+ * create a new fs client
+ */
+struct ceph_fs_client *create_fs_client(struct ceph_mount_options *fsopt,
+ struct ceph_options *opt)
+{
+ struct ceph_fs_client *fsc;
int err = -ENOMEM;
- client = kzalloc(sizeof(*client), GFP_KERNEL);
- if (client == NULL)
+ fsc = kzalloc(sizeof(*fsc), GFP_KERNEL);
+ if (!fsc)
return ERR_PTR(-ENOMEM);
- mutex_init(&client->mount_mutex);
-
- init_waitqueue_head(&client->auth_wq);
+ fsc->client = ceph_create_client(opt, fsc);
+ if (IS_ERR(fsc->client)) {
+ err = PTR_ERR(fsc->client);
+ goto fail;
+ }
+ fsc->client->extra_mon_dispatch = extra_mon_dispatch;
+ fsc->client->supported_features |= CEPH_FEATURE_FLOCK;
+ fsc->client->monc.want_mdsmap = 1;
- client->sb = NULL;
- client->mount_state = CEPH_MOUNT_MOUNTING;
- client->mount_args = args;
+ fsc->mount_options = fsopt;
- client->msgr = NULL;
+ fsc->sb = NULL;
+ fsc->mount_state = CEPH_MOUNT_MOUNTING;
- client->auth_err = 0;
- atomic_long_set(&client->writeback_count, 0);
+ atomic_long_set(&fsc->writeback_count, 0);
- err = bdi_init(&client->backing_dev_info);
+ err = bdi_init(&fsc->backing_dev_info);
if (err < 0)
- goto fail;
+ goto fail_client;
err = -ENOMEM;
- client->wb_wq = create_workqueue("ceph-writeback");
- if (client->wb_wq == NULL)
+ fsc->wb_wq = create_workqueue("ceph-writeback");
+ if (fsc->wb_wq == NULL)
goto fail_bdi;
- client->pg_inv_wq = create_singlethread_workqueue("ceph-pg-invalid");
- if (client->pg_inv_wq == NULL)
+ fsc->pg_inv_wq = create_singlethread_workqueue("ceph-pg-invalid");
+ if (fsc->pg_inv_wq == NULL)
goto fail_wb_wq;
- client->trunc_wq = create_singlethread_workqueue("ceph-trunc");
- if (client->trunc_wq == NULL)
+ fsc->trunc_wq = create_singlethread_workqueue("ceph-trunc");
+ if (fsc->trunc_wq == NULL)
goto fail_pg_inv_wq;
/* set up mempools */
err = -ENOMEM;
- client->wb_pagevec_pool = mempool_create_kmalloc_pool(10,
- client->mount_args->wsize >> PAGE_CACHE_SHIFT);
- if (!client->wb_pagevec_pool)
+ fsc->wb_pagevec_pool = mempool_create_kmalloc_pool(10,
+ fsc->mount_options->wsize >> PAGE_CACHE_SHIFT);
+ if (!fsc->wb_pagevec_pool)
goto fail_trunc_wq;
/* caps */
- client->min_caps = args->max_readdir;
+ fsc->min_caps = fsopt->max_readdir;
+
+ return fsc;
- /* subsystems */
- err = ceph_monc_init(&client->monc, client);
- if (err < 0)
- goto fail_mempool;
- err = ceph_osdc_init(&client->osdc, client);
- if (err < 0)
- goto fail_monc;
- err = ceph_mdsc_init(&client->mdsc, client);
- if (err < 0)
- goto fail_osdc;
- return client;
-
-fail_osdc:
- ceph_osdc_stop(&client->osdc);
-fail_monc:
- ceph_monc_stop(&client->monc);
-fail_mempool:
- mempool_destroy(client->wb_pagevec_pool);
fail_trunc_wq:
- destroy_workqueue(client->trunc_wq);
+ destroy_workqueue(fsc->trunc_wq);
fail_pg_inv_wq:
- destroy_workqueue(client->pg_inv_wq);
+ destroy_workqueue(fsc->pg_inv_wq);
fail_wb_wq:
- destroy_workqueue(client->wb_wq);
+ destroy_workqueue(fsc->wb_wq);
fail_bdi:
- bdi_destroy(&client->backing_dev_info);
+ bdi_destroy(&fsc->backing_dev_info);
+fail_client:
+ ceph_destroy_client(fsc->client);
fail:
- kfree(client);
+ kfree(fsc);
return ERR_PTR(err);
}
-static void ceph_destroy_client(struct ceph_client *client)
+void destroy_fs_client(struct ceph_fs_client *fsc)
{
- dout("destroy_client %p\n", client);
+ dout("destroy_fs_client %p\n", fsc);
- /* unmount */
- ceph_mdsc_stop(&client->mdsc);
- ceph_osdc_stop(&client->osdc);
+ destroy_workqueue(fsc->wb_wq);
+ destroy_workqueue(fsc->pg_inv_wq);
+ destroy_workqueue(fsc->trunc_wq);
- /*
- * make sure mds and osd connections close out before destroying
- * the auth module, which is needed to free those connections'
- * ceph_authorizers.
- */
- ceph_msgr_flush();
-
- ceph_monc_stop(&client->monc);
+ bdi_destroy(&fsc->backing_dev_info);
- ceph_debugfs_client_cleanup(client);
- destroy_workqueue(client->wb_wq);
- destroy_workqueue(client->pg_inv_wq);
- destroy_workqueue(client->trunc_wq);
+ mempool_destroy(fsc->wb_pagevec_pool);
- bdi_destroy(&client->backing_dev_info);
+ destroy_mount_options(fsc->mount_options);
- if (client->msgr)
- ceph_messenger_destroy(client->msgr);
- mempool_destroy(client->wb_pagevec_pool);
+ ceph_fs_debugfs_cleanup(fsc);
- destroy_mount_args(client->mount_args);
+ ceph_destroy_client(fsc->client);
- kfree(client);
- dout("destroy_client %p done\n", client);
+ kfree(fsc);
+ dout("destroy_fs_client %p done\n", fsc);
}
/*
- * Initially learn our fsid, or verify an fsid matches.
+ * caches
*/
-int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid)
+struct kmem_cache *ceph_inode_cachep;
+struct kmem_cache *ceph_cap_cachep;
+struct kmem_cache *ceph_dentry_cachep;
+struct kmem_cache *ceph_file_cachep;
+
+static void ceph_inode_init_once(void *foo)
{
- if (client->have_fsid) {
- if (ceph_fsid_compare(&client->fsid, fsid)) {
- pr_err("bad fsid, had %pU got %pU",
- &client->fsid, fsid);
- return -1;
- }
- } else {
- pr_info("client%lld fsid %pU\n", client->monc.auth->global_id,
- fsid);
- memcpy(&client->fsid, fsid, sizeof(*fsid));
- ceph_debugfs_client_init(client);
- client->have_fsid = true;
- }
+ struct ceph_inode_info *ci = foo;
+ inode_init_once(&ci->vfs_inode);
+}
+
+static int __init init_caches(void)
+{
+ ceph_inode_cachep = kmem_cache_create("ceph_inode_info",
+ sizeof(struct ceph_inode_info),
+ __alignof__(struct ceph_inode_info),
+ (SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD),
+ ceph_inode_init_once);
+ if (ceph_inode_cachep == NULL)
+ return -ENOMEM;
+
+ ceph_cap_cachep = KMEM_CACHE(ceph_cap,
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+ if (ceph_cap_cachep == NULL)
+ goto bad_cap;
+
+ ceph_dentry_cachep = KMEM_CACHE(ceph_dentry_info,
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+ if (ceph_dentry_cachep == NULL)
+ goto bad_dentry;
+
+ ceph_file_cachep = KMEM_CACHE(ceph_file_info,
+ SLAB_RECLAIM_ACCOUNT|SLAB_MEM_SPREAD);
+ if (ceph_file_cachep == NULL)
+ goto bad_file;
+
return 0;
+
+bad_file:
+ kmem_cache_destroy(ceph_dentry_cachep);
+bad_dentry:
+ kmem_cache_destroy(ceph_cap_cachep);
+bad_cap:
+ kmem_cache_destroy(ceph_inode_cachep);
+ return -ENOMEM;
}
+static void destroy_caches(void)
+{
+ kmem_cache_destroy(ceph_inode_cachep);
+ kmem_cache_destroy(ceph_cap_cachep);
+ kmem_cache_destroy(ceph_dentry_cachep);
+ kmem_cache_destroy(ceph_file_cachep);
+}
+
+
/*
- * true if we have the mon map (and have thus joined the cluster)
+ * ceph_umount_begin - initiate forced umount. Tear down down the
+ * mount, skipping steps that may hang while waiting for server(s).
*/
-static int have_mon_and_osd_map(struct ceph_client *client)
+static void ceph_umount_begin(struct super_block *sb)
{
- return client->monc.monmap && client->monc.monmap->epoch &&
- client->osdc.osdmap && client->osdc.osdmap->epoch;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(sb);
+
+ dout("ceph_umount_begin - starting forced umount\n");
+ if (!fsc)
+ return;
+ fsc->mount_state = CEPH_MOUNT_SHUTDOWN;
+ return;
}
+static const struct super_operations ceph_super_ops = {
+ .alloc_inode = ceph_alloc_inode,
+ .destroy_inode = ceph_destroy_inode,
+ .write_inode = ceph_write_inode,
+ .sync_fs = ceph_sync_fs,
+ .put_super = ceph_put_super,
+ .show_options = ceph_show_options,
+ .statfs = ceph_statfs,
+ .umount_begin = ceph_umount_begin,
+};
+
/*
* Bootstrap mount by opening the root directory. Note the mount
* @started time from caller, and time out if this takes too long.
*/
-static struct dentry *open_root_dentry(struct ceph_client *client,
+static struct dentry *open_root_dentry(struct ceph_fs_client *fsc,
const char *path,
unsigned long started)
{
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct ceph_mds_request *req = NULL;
int err;
struct dentry *root;
req->r_ino1.ino = CEPH_INO_ROOT;
req->r_ino1.snap = CEPH_NOSNAP;
req->r_started = started;
- req->r_timeout = client->mount_args->mount_timeout * HZ;
+ req->r_timeout = fsc->client->options->mount_timeout * HZ;
req->r_args.getattr.mask = cpu_to_le32(CEPH_STAT_CAP_INODE);
req->r_num_caps = 2;
err = ceph_mdsc_do_request(mdsc, NULL, req);
if (err == 0) {
dout("open_root_inode success\n");
if (ceph_ino(req->r_target_inode) == CEPH_INO_ROOT &&
- client->sb->s_root == NULL)
+ fsc->sb->s_root == NULL)
root = d_alloc_root(req->r_target_inode);
else
root = d_obtain_alias(req->r_target_inode);
return root;
}
+
+
+
/*
* mount: join the ceph cluster, and open root directory.
*/
-static int ceph_mount(struct ceph_client *client, struct vfsmount *mnt,
+static int ceph_mount(struct ceph_fs_client *fsc, struct vfsmount *mnt,
const char *path)
{
- struct ceph_entity_addr *myaddr = NULL;
int err;
- unsigned long timeout = client->mount_args->mount_timeout * HZ;
unsigned long started = jiffies; /* note the start time */
struct dentry *root;
+ int first = 0; /* first vfsmount for this super_block */
dout("mount start\n");
- mutex_lock(&client->mount_mutex);
-
- /* initialize the messenger */
- if (client->msgr == NULL) {
- if (ceph_test_opt(client, MYIP))
- myaddr = &client->mount_args->my_addr;
- client->msgr = ceph_messenger_create(myaddr);
- if (IS_ERR(client->msgr)) {
- err = PTR_ERR(client->msgr);
- client->msgr = NULL;
- goto out;
- }
- client->msgr->nocrc = ceph_test_opt(client, NOCRC);
- }
+ mutex_lock(&fsc->client->mount_mutex);
- /* open session, and wait for mon, mds, and osd maps */
- err = ceph_monc_open_session(&client->monc);
+ err = __ceph_open_session(fsc->client, started);
if (err < 0)
goto out;
- while (!have_mon_and_osd_map(client)) {
- err = -EIO;
- if (timeout && time_after_eq(jiffies, started + timeout))
- goto out;
-
- /* wait */
- dout("mount waiting for mon_map\n");
- err = wait_event_interruptible_timeout(client->auth_wq,
- have_mon_and_osd_map(client) || (client->auth_err < 0),
- timeout);
- if (err == -EINTR || err == -ERESTARTSYS)
- goto out;
- if (client->auth_err < 0) {
- err = client->auth_err;
- goto out;
- }
- }
-
dout("mount opening root\n");
- root = open_root_dentry(client, "", started);
+ root = open_root_dentry(fsc, "", started);
if (IS_ERR(root)) {
err = PTR_ERR(root);
goto out;
}
- if (client->sb->s_root)
+ if (fsc->sb->s_root) {
dput(root);
- else
- client->sb->s_root = root;
+ } else {
+ fsc->sb->s_root = root;
+ first = 1;
+
+ err = ceph_fs_debugfs_init(fsc);
+ if (err < 0)
+ goto fail;
+ }
if (path[0] == 0) {
dget(root);
} else {
dout("mount opening base mountpoint\n");
- root = open_root_dentry(client, path, started);
+ root = open_root_dentry(fsc, path, started);
if (IS_ERR(root)) {
err = PTR_ERR(root);
- dput(client->sb->s_root);
- client->sb->s_root = NULL;
- goto out;
+ goto fail;
}
}
mnt->mnt_root = root;
- mnt->mnt_sb = client->sb;
+ mnt->mnt_sb = fsc->sb;
- client->mount_state = CEPH_MOUNT_MOUNTED;
+ fsc->mount_state = CEPH_MOUNT_MOUNTED;
dout("mount success\n");
err = 0;
out:
- mutex_unlock(&client->mount_mutex);
+ mutex_unlock(&fsc->client->mount_mutex);
return err;
+
+fail:
+ if (first) {
+ dput(fsc->sb->s_root);
+ fsc->sb->s_root = NULL;
+ }
+ goto out;
}
static int ceph_set_super(struct super_block *s, void *data)
{
- struct ceph_client *client = data;
+ struct ceph_fs_client *fsc = data;
int ret;
dout("set_super %p data %p\n", s, data);
- s->s_flags = client->mount_args->sb_flags;
+ s->s_flags = fsc->mount_options->sb_flags;
s->s_maxbytes = 1ULL << 40; /* temp value until we get mdsmap */
- s->s_fs_info = client;
- client->sb = s;
+ s->s_fs_info = fsc;
+ fsc->sb = s;
s->s_op = &ceph_super_ops;
s->s_export_op = &ceph_export_ops;
fail:
s->s_fs_info = NULL;
- client->sb = NULL;
+ fsc->sb = NULL;
return ret;
}
*/
static int ceph_compare_super(struct super_block *sb, void *data)
{
- struct ceph_client *new = data;
- struct ceph_mount_args *args = new->mount_args;
- struct ceph_client *other = ceph_sb_to_client(sb);
- int i;
+ struct ceph_fs_client *new = data;
+ struct ceph_mount_options *fsopt = new->mount_options;
+ struct ceph_options *opt = new->client->options;
+ struct ceph_fs_client *other = ceph_sb_to_client(sb);
dout("ceph_compare_super %p\n", sb);
- if (args->flags & CEPH_OPT_FSID) {
- if (ceph_fsid_compare(&args->fsid, &other->fsid)) {
- dout("fsid doesn't match\n");
- return 0;
- }
- } else {
- /* do we share (a) monitor? */
- for (i = 0; i < new->monc.monmap->num_mon; i++)
- if (ceph_monmap_contains(other->monc.monmap,
- &new->monc.monmap->mon_inst[i].addr))
- break;
- if (i == new->monc.monmap->num_mon) {
- dout("mon ip not part of monmap\n");
- return 0;
- }
- dout("mon ip matches existing sb %p\n", sb);
+
+ if (compare_mount_options(fsopt, opt, other)) {
+ dout("monitor(s)/mount options don't match\n");
+ return 0;
}
- if (args->sb_flags != other->mount_args->sb_flags) {
+ if ((opt->flags & CEPH_OPT_FSID) &&
+ ceph_fsid_compare(&opt->fsid, &other->client->fsid)) {
+ dout("fsid doesn't match\n");
+ return 0;
+ }
+ if (fsopt->sb_flags != other->mount_options->sb_flags) {
dout("flags differ\n");
return 0;
}
*/
static atomic_long_t bdi_seq = ATOMIC_LONG_INIT(0);
-static int ceph_register_bdi(struct super_block *sb, struct ceph_client *client)
+static int ceph_register_bdi(struct super_block *sb,
+ struct ceph_fs_client *fsc)
{
int err;
/* set ra_pages based on rsize mount option? */
- if (client->mount_args->rsize >= PAGE_CACHE_SIZE)
- client->backing_dev_info.ra_pages =
- (client->mount_args->rsize + PAGE_CACHE_SIZE - 1)
+ if (fsc->mount_options->rsize >= PAGE_CACHE_SIZE)
+ fsc->backing_dev_info.ra_pages =
+ (fsc->mount_options->rsize + PAGE_CACHE_SIZE - 1)
>> PAGE_SHIFT;
- err = bdi_register(&client->backing_dev_info, NULL, "ceph-%d",
+ err = bdi_register(&fsc->backing_dev_info, NULL, "ceph-%d",
atomic_long_inc_return(&bdi_seq));
if (!err)
- sb->s_bdi = &client->backing_dev_info;
+ sb->s_bdi = &fsc->backing_dev_info;
return err;
}
struct vfsmount *mnt)
{
struct super_block *sb;
- struct ceph_client *client;
+ struct ceph_fs_client *fsc;
int err;
int (*compare_super)(struct super_block *, void *) = ceph_compare_super;
const char *path = NULL;
- struct ceph_mount_args *args;
+ struct ceph_mount_options *fsopt = NULL;
+ struct ceph_options *opt = NULL;
dout("ceph_get_sb\n");
- args = parse_mount_args(flags, data, dev_name, &path);
- if (IS_ERR(args)) {
- err = PTR_ERR(args);
+ err = parse_mount_options(&fsopt, &opt, flags, data, dev_name, &path);
+ if (err < 0)
goto out_final;
- }
/* create client (which we may/may not use) */
- client = ceph_create_client(args);
- if (IS_ERR(client)) {
- err = PTR_ERR(client);
+ fsc = create_fs_client(fsopt, opt);
+ if (IS_ERR(fsc)) {
+ err = PTR_ERR(fsc);
+ kfree(fsopt);
+ kfree(opt);
goto out_final;
}
- if (client->mount_args->flags & CEPH_OPT_NOSHARE)
+ err = ceph_mdsc_init(fsc);
+ if (err < 0)
+ goto out;
+
+ if (ceph_test_opt(fsc->client, NOSHARE))
compare_super = NULL;
- sb = sget(fs_type, compare_super, ceph_set_super, client);
+ sb = sget(fs_type, compare_super, ceph_set_super, fsc);
if (IS_ERR(sb)) {
err = PTR_ERR(sb);
goto out;
}
- if (ceph_sb_to_client(sb) != client) {
- ceph_destroy_client(client);
- client = ceph_sb_to_client(sb);
- dout("get_sb got existing client %p\n", client);
+ if (ceph_sb_to_client(sb) != fsc) {
+ ceph_mdsc_destroy(fsc);
+ destroy_fs_client(fsc);
+ fsc = ceph_sb_to_client(sb);
+ dout("get_sb got existing client %p\n", fsc);
} else {
- dout("get_sb using new client %p\n", client);
- err = ceph_register_bdi(sb, client);
+ dout("get_sb using new client %p\n", fsc);
+ err = ceph_register_bdi(sb, fsc);
if (err < 0)
goto out_splat;
}
- err = ceph_mount(client, mnt, path);
+ err = ceph_mount(fsc, mnt, path);
if (err < 0)
goto out_splat;
dout("root %p inode %p ino %llx.%llx\n", mnt->mnt_root,
return 0;
out_splat:
- ceph_mdsc_close_sessions(&client->mdsc);
+ ceph_mdsc_close_sessions(fsc->mdsc);
deactivate_locked_super(sb);
goto out_final;
out:
- ceph_destroy_client(client);
+ ceph_mdsc_destroy(fsc);
+ destroy_fs_client(fsc);
out_final:
dout("ceph_get_sb fail %d\n", err);
return err;
static void ceph_kill_sb(struct super_block *s)
{
- struct ceph_client *client = ceph_sb_to_client(s);
+ struct ceph_fs_client *fsc = ceph_sb_to_client(s);
dout("kill_sb %p\n", s);
- ceph_mdsc_pre_umount(&client->mdsc);
+ ceph_mdsc_pre_umount(fsc->mdsc);
kill_anon_super(s); /* will call put_super after sb is r/o */
- ceph_destroy_client(client);
+ ceph_mdsc_destroy(fsc);
+ destroy_fs_client(fsc);
}
static struct file_system_type ceph_fs_type = {
static int __init init_ceph(void)
{
- int ret = 0;
-
- ret = ceph_debugfs_init();
- if (ret < 0)
- goto out;
-
- ret = ceph_msgr_init();
- if (ret < 0)
- goto out_debugfs;
-
- ret = init_caches();
+ int ret = init_caches();
if (ret)
- goto out_msgr;
+ goto out;
ret = register_filesystem(&ceph_fs_type);
if (ret)
goto out_icache;
- pr_info("loaded (mon/mds/osd proto %d/%d/%d, osdmap %d/%d %d/%d)\n",
- CEPH_MONC_PROTOCOL, CEPH_MDSC_PROTOCOL, CEPH_OSDC_PROTOCOL,
- CEPH_OSDMAP_VERSION, CEPH_OSDMAP_VERSION_EXT,
- CEPH_OSDMAP_INC_VERSION, CEPH_OSDMAP_INC_VERSION_EXT);
+ pr_info("loaded (mds proto %d)\n", CEPH_MDSC_PROTOCOL);
+
return 0;
out_icache:
destroy_caches();
-out_msgr:
- ceph_msgr_exit();
-out_debugfs:
- ceph_debugfs_cleanup();
out:
return ret;
}
dout("exit_ceph\n");
unregister_filesystem(&ceph_fs_type);
destroy_caches();
- ceph_msgr_exit();
- ceph_debugfs_cleanup();
}
module_init(init_ceph);
#ifndef _FS_CEPH_SUPER_H
#define _FS_CEPH_SUPER_H
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <asm/unaligned.h>
#include <linux/backing-dev.h>
#include <linux/writeback.h>
#include <linux/slab.h>
-#include "types.h"
-#include "messenger.h"
-#include "msgpool.h"
-#include "mon_client.h"
-#include "mds_client.h"
-#include "osd_client.h"
-#include "ceph_fs.h"
+#include <linux/ceph/libceph.h>
/* f_type in struct statfs */
#define CEPH_SUPER_MAGIC 0x00c36400
#define CEPH_BLOCK_SHIFT 20 /* 1 MB */
#define CEPH_BLOCK (1 << CEPH_BLOCK_SHIFT)
-/*
- * Supported features
- */
-#define CEPH_FEATURE_SUPPORTED CEPH_FEATURE_NOSRCADDR | CEPH_FEATURE_FLOCK
-#define CEPH_FEATURE_REQUIRED CEPH_FEATURE_NOSRCADDR
+#define CEPH_MOUNT_OPT_DIRSTAT (1<<4) /* `cat dirname` for stats */
+#define CEPH_MOUNT_OPT_RBYTES (1<<5) /* dir st_bytes = rbytes */
+#define CEPH_MOUNT_OPT_NOASYNCREADDIR (1<<7) /* no dcache readdir */
-/*
- * mount options
- */
-#define CEPH_OPT_FSID (1<<0)
-#define CEPH_OPT_NOSHARE (1<<1) /* don't share client with other sbs */
-#define CEPH_OPT_MYIP (1<<2) /* specified my ip */
-#define CEPH_OPT_DIRSTAT (1<<4) /* funky `cat dirname` for stats */
-#define CEPH_OPT_RBYTES (1<<5) /* dir st_bytes = rbytes */
-#define CEPH_OPT_NOCRC (1<<6) /* no data crc on writes */
-#define CEPH_OPT_NOASYNCREADDIR (1<<7) /* no dcache readdir */
+#define CEPH_MOUNT_OPT_DEFAULT (CEPH_MOUNT_OPT_RBYTES)
-#define CEPH_OPT_DEFAULT (CEPH_OPT_RBYTES)
+#define ceph_set_mount_opt(fsc, opt) \
+ (fsc)->mount_options->flags |= CEPH_MOUNT_OPT_##opt;
+#define ceph_test_mount_opt(fsc, opt) \
+ (!!((fsc)->mount_options->flags & CEPH_MOUNT_OPT_##opt))
-#define ceph_set_opt(client, opt) \
- (client)->mount_args->flags |= CEPH_OPT_##opt;
-#define ceph_test_opt(client, opt) \
- (!!((client)->mount_args->flags & CEPH_OPT_##opt))
+#define CEPH_MAX_READDIR_DEFAULT 1024
+#define CEPH_MAX_READDIR_BYTES_DEFAULT (512*1024)
+#define CEPH_SNAPDIRNAME_DEFAULT ".snap"
-
-struct ceph_mount_args {
- int sb_flags;
+struct ceph_mount_options {
int flags;
- struct ceph_fsid fsid;
- struct ceph_entity_addr my_addr;
- int num_mon;
- struct ceph_entity_addr *mon_addr;
- int mount_timeout;
- int osd_idle_ttl;
- int osd_timeout;
- int osd_keepalive_timeout;
+ int sb_flags;
+
int wsize;
int rsize; /* max readahead */
int congestion_kb; /* max writeback in flight */
int cap_release_safety;
int max_readdir; /* max readdir result (entires) */
int max_readdir_bytes; /* max readdir result (bytes) */
- char *snapdir_name; /* default ".snap" */
- char *name;
- char *secret;
-};
-/*
- * defaults
- */
-#define CEPH_MOUNT_TIMEOUT_DEFAULT 60
-#define CEPH_OSD_TIMEOUT_DEFAULT 60 /* seconds */
-#define CEPH_OSD_KEEPALIVE_DEFAULT 5
-#define CEPH_OSD_IDLE_TTL_DEFAULT 60
-#define CEPH_MOUNT_RSIZE_DEFAULT (512*1024) /* readahead */
-#define CEPH_MAX_READDIR_DEFAULT 1024
-#define CEPH_MAX_READDIR_BYTES_DEFAULT (512*1024)
-
-#define CEPH_MSG_MAX_FRONT_LEN (16*1024*1024)
-#define CEPH_MSG_MAX_DATA_LEN (16*1024*1024)
-
-#define CEPH_SNAPDIRNAME_DEFAULT ".snap"
-#define CEPH_AUTH_NAME_DEFAULT "guest"
-/*
- * Delay telling the MDS we no longer want caps, in case we reopen
- * the file. Delay a minimum amount of time, even if we send a cap
- * message for some other reason. Otherwise, take the oppotunity to
- * update the mds to avoid sending another message later.
- */
-#define CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT 5 /* cap release delay */
-#define CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT 60 /* cap release delay */
-
-#define CEPH_CAP_RELEASE_SAFETY_DEFAULT (CEPH_CAPS_PER_RELEASE * 4)
-
-/* mount state */
-enum {
- CEPH_MOUNT_MOUNTING,
- CEPH_MOUNT_MOUNTED,
- CEPH_MOUNT_UNMOUNTING,
- CEPH_MOUNT_UNMOUNTED,
- CEPH_MOUNT_SHUTDOWN,
-};
-
-/*
- * subtract jiffies
- */
-static inline unsigned long time_sub(unsigned long a, unsigned long b)
-{
- BUG_ON(time_after(b, a));
- return (long)a - (long)b;
-}
-
-/*
- * per-filesystem client state
- *
- * possibly shared by multiple mount points, if they are
- * mounting the same ceph filesystem/cluster.
- */
-struct ceph_client {
- struct ceph_fsid fsid;
- bool have_fsid;
+ /*
+ * everything above this point can be memcmp'd; everything below
+ * is handled in compare_mount_options()
+ */
- struct mutex mount_mutex; /* serialize mount attempts */
- struct ceph_mount_args *mount_args;
+ char *snapdir_name; /* default ".snap" */
+};
+struct ceph_fs_client {
struct super_block *sb;
- unsigned long mount_state;
- wait_queue_head_t auth_wq;
-
- int auth_err;
+ struct ceph_mount_options *mount_options;
+ struct ceph_client *client;
+ unsigned long mount_state;
int min_caps; /* min caps i added */
- struct ceph_messenger *msgr; /* messenger instance */
- struct ceph_mon_client monc;
- struct ceph_mds_client mdsc;
- struct ceph_osd_client osdc;
+ struct ceph_mds_client *mdsc;
/* writeback */
mempool_t *wb_pagevec_pool;
struct backing_dev_info backing_dev_info;
#ifdef CONFIG_DEBUG_FS
- struct dentry *debugfs_monmap;
- struct dentry *debugfs_mdsmap, *debugfs_osdmap;
- struct dentry *debugfs_dir, *debugfs_dentry_lru, *debugfs_caps;
+ struct dentry *debugfs_dentry_lru, *debugfs_caps;
struct dentry *debugfs_congestion_kb;
struct dentry *debugfs_bdi;
+ struct dentry *debugfs_mdsc, *debugfs_mdsmap;
#endif
};
+
/*
* File i/o capability. This tracks shared state with the metadata
* server that allows us to cache or writeback attributes or to read
int should_free_val;
};
+/*
+ * Ceph dentry state
+ */
+struct ceph_dentry_info {
+ struct ceph_mds_session *lease_session;
+ u32 lease_gen, lease_shared_gen;
+ u32 lease_seq;
+ unsigned long lease_renew_after, lease_renew_from;
+ struct list_head lru;
+ struct dentry *dentry;
+ u64 time;
+ u64 offset;
+};
+
struct ceph_inode_xattrs_info {
/*
* (still encoded) xattr blob. we avoid the overhead of parsing
/*
* Ceph inode.
*/
-#define CEPH_I_COMPLETE 1 /* we have complete directory cached */
-#define CEPH_I_NODELAY 4 /* do not delay cap release */
-#define CEPH_I_FLUSH 8 /* do not delay flush of dirty metadata */
-#define CEPH_I_NOFLUSH 16 /* do not flush dirty caps */
-
struct ceph_inode_info {
struct ceph_vino i_vino; /* ceph ino + snap */
return container_of(inode, struct ceph_inode_info, vfs_inode);
}
+static inline struct ceph_vino ceph_vino(struct inode *inode)
+{
+ return ceph_inode(inode)->i_vino;
+}
+
+/*
+ * ino_t is <64 bits on many architectures, blech.
+ *
+ * don't include snap in ino hash, at least for now.
+ */
+static inline ino_t ceph_vino_to_ino(struct ceph_vino vino)
+{
+ ino_t ino = (ino_t)vino.ino; /* ^ (vino.snap << 20); */
+#if BITS_PER_LONG == 32
+ ino ^= vino.ino >> (sizeof(u64)-sizeof(ino_t)) * 8;
+ if (!ino)
+ ino = 1;
+#endif
+ return ino;
+}
+
+/* for printf-style formatting */
+#define ceph_vinop(i) ceph_inode(i)->i_vino.ino, ceph_inode(i)->i_vino.snap
+
+static inline u64 ceph_ino(struct inode *inode)
+{
+ return ceph_inode(inode)->i_vino.ino;
+}
+static inline u64 ceph_snap(struct inode *inode)
+{
+ return ceph_inode(inode)->i_vino.snap;
+}
+
+static inline int ceph_ino_compare(struct inode *inode, void *data)
+{
+ struct ceph_vino *pvino = (struct ceph_vino *)data;
+ struct ceph_inode_info *ci = ceph_inode(inode);
+ return ci->i_vino.ino == pvino->ino &&
+ ci->i_vino.snap == pvino->snap;
+}
+
+static inline struct inode *ceph_find_inode(struct super_block *sb,
+ struct ceph_vino vino)
+{
+ ino_t t = ceph_vino_to_ino(vino);
+ return ilookup5(sb, t, ceph_ino_compare, &vino);
+}
+
+
+/*
+ * Ceph inode.
+ */
+#define CEPH_I_COMPLETE 1 /* we have complete directory cached */
+#define CEPH_I_NODELAY 4 /* do not delay cap release */
+#define CEPH_I_FLUSH 8 /* do not delay flush of dirty metadata */
+#define CEPH_I_NOFLUSH 16 /* do not flush dirty caps */
+
static inline void ceph_i_clear(struct inode *inode, unsigned mask)
{
struct ceph_inode_info *ci = ceph_inode(inode);
struct ceph_inode_info *ci = ceph_inode(inode);
bool r;
- smp_mb();
+ spin_lock(&inode->i_lock);
r = (ci->i_ceph_flags & mask) == mask;
+ spin_unlock(&inode->i_lock);
return r;
}
struct ceph_inode_frag *pfrag,
int *found);
-/*
- * Ceph dentry state
- */
-struct ceph_dentry_info {
- struct ceph_mds_session *lease_session;
- u32 lease_gen, lease_shared_gen;
- u32 lease_seq;
- unsigned long lease_renew_after, lease_renew_from;
- struct list_head lru;
- struct dentry *dentry;
- u64 time;
- u64 offset;
-};
-
static inline struct ceph_dentry_info *ceph_dentry(struct dentry *dentry)
{
return (struct ceph_dentry_info *)dentry->d_fsdata;
return ((loff_t)frag << 32) | (loff_t)off;
}
-/*
- * ino_t is <64 bits on many architectures, blech.
- *
- * don't include snap in ino hash, at least for now.
- */
-static inline ino_t ceph_vino_to_ino(struct ceph_vino vino)
-{
- ino_t ino = (ino_t)vino.ino; /* ^ (vino.snap << 20); */
-#if BITS_PER_LONG == 32
- ino ^= vino.ino >> (sizeof(u64)-sizeof(ino_t)) * 8;
- if (!ino)
- ino = 1;
-#endif
- return ino;
-}
-
static inline int ceph_set_ino_cb(struct inode *inode, void *data)
{
ceph_inode(inode)->i_vino = *(struct ceph_vino *)data;
return 0;
}
-static inline struct ceph_vino ceph_vino(struct inode *inode)
-{
- return ceph_inode(inode)->i_vino;
-}
-
-/* for printf-style formatting */
-#define ceph_vinop(i) ceph_inode(i)->i_vino.ino, ceph_inode(i)->i_vino.snap
-
-static inline u64 ceph_ino(struct inode *inode)
-{
- return ceph_inode(inode)->i_vino.ino;
-}
-static inline u64 ceph_snap(struct inode *inode)
-{
- return ceph_inode(inode)->i_vino.snap;
-}
-
-static inline int ceph_ino_compare(struct inode *inode, void *data)
-{
- struct ceph_vino *pvino = (struct ceph_vino *)data;
- struct ceph_inode_info *ci = ceph_inode(inode);
- return ci->i_vino.ino == pvino->ino &&
- ci->i_vino.snap == pvino->snap;
-}
-
-static inline struct inode *ceph_find_inode(struct super_block *sb,
- struct ceph_vino vino)
-{
- ino_t t = ceph_vino_to_ino(vino);
- return ilookup5(sb, t, ceph_ino_compare, &vino);
-}
-
-
/*
* caps helpers
*/
struct ceph_cap_reservation *ctx, int need);
extern int ceph_unreserve_caps(struct ceph_mds_client *mdsc,
struct ceph_cap_reservation *ctx);
-extern void ceph_reservation_status(struct ceph_client *client,
+extern void ceph_reservation_status(struct ceph_fs_client *client,
int *total, int *avail, int *used,
int *reserved, int *min);
-static inline struct ceph_client *ceph_inode_to_client(struct inode *inode)
+static inline struct ceph_fs_client *ceph_inode_to_client(struct inode *inode)
{
- return (struct ceph_client *)inode->i_sb->s_fs_info;
+ return (struct ceph_fs_client *)inode->i_sb->s_fs_info;
}
-static inline struct ceph_client *ceph_sb_to_client(struct super_block *sb)
+static inline struct ceph_fs_client *ceph_sb_to_client(struct super_block *sb)
{
- return (struct ceph_client *)sb->s_fs_info;
+ return (struct ceph_fs_client *)sb->s_fs_info;
}
/*
- * snapshots
- */
-
-/*
- * A "snap context" is the set of existing snapshots when we
- * write data. It is used by the OSD to guide its COW behavior.
- *
- * The ceph_snap_context is refcounted, and attached to each dirty
- * page, indicating which context the dirty data belonged when it was
- * dirtied.
- */
-struct ceph_snap_context {
- atomic_t nref;
- u64 seq;
- int num_snaps;
- u64 snaps[];
-};
-
-static inline struct ceph_snap_context *
-ceph_get_snap_context(struct ceph_snap_context *sc)
-{
- /*
- printk("get_snap_context %p %d -> %d\n", sc, atomic_read(&sc->nref),
- atomic_read(&sc->nref)+1);
- */
- if (sc)
- atomic_inc(&sc->nref);
- return sc;
-}
-
-static inline void ceph_put_snap_context(struct ceph_snap_context *sc)
-{
- if (!sc)
- return;
- /*
- printk("put_snap_context %p %d -> %d\n", sc, atomic_read(&sc->nref),
- atomic_read(&sc->nref)-1);
- */
- if (atomic_dec_and_test(&sc->nref)) {
- /*printk(" deleting snap_context %p\n", sc);*/
- kfree(sc);
- }
-}
-
-/*
* A "snap realm" describes a subset of the file hierarchy sharing
* the same set of snapshots that apply to it. The realms themselves
* are organized into a hierarchy, such that children inherit (some of)
struct list_head empty_item; /* if i have ref==0 */
+ struct list_head dirty_item; /* if realm needs new context */
+
/* the current set of snaps for this realm */
struct ceph_snap_context *cached_context;
spinlock_t inodes_with_caps_lock;
};
-
-
-/*
- * calculate the number of pages a given length and offset map onto,
- * if we align the data.
- */
-static inline int calc_pages_for(u64 off, u64 len)
+static inline int default_congestion_kb(void)
{
- return ((off+len+PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT) -
- (off >> PAGE_CACHE_SHIFT);
+ int congestion_kb;
+
+ /*
+ * Copied from NFS
+ *
+ * congestion size, scale with available memory.
+ *
+ * 64MB: 8192k
+ * 128MB: 11585k
+ * 256MB: 16384k
+ * 512MB: 23170k
+ * 1GB: 32768k
+ * 2GB: 46340k
+ * 4GB: 65536k
+ * 8GB: 92681k
+ * 16GB: 131072k
+ *
+ * This allows larger machines to have larger/more transfers.
+ * Limit the default to 256M
+ */
+ congestion_kb = (16*int_sqrt(totalram_pages)) << (PAGE_SHIFT-10);
+ if (congestion_kb > 256*1024)
+ congestion_kb = 256*1024;
+
+ return congestion_kb;
}
ci_item)->writing;
}
-
-/* super.c */
-extern struct kmem_cache *ceph_inode_cachep;
-extern struct kmem_cache *ceph_cap_cachep;
-extern struct kmem_cache *ceph_dentry_cachep;
-extern struct kmem_cache *ceph_file_cachep;
-
-extern const char *ceph_msg_type_name(int type);
-extern int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid);
-
/* inode.c */
extern const struct inode_operations ceph_file_iops;
extern void ceph_put_wrbuffer_cap_refs(struct ceph_inode_info *ci, int nr,
struct ceph_snap_context *snapc);
extern void __ceph_flush_snaps(struct ceph_inode_info *ci,
- struct ceph_mds_session **psession);
+ struct ceph_mds_session **psession,
+ int again);
extern void ceph_check_caps(struct ceph_inode_info *ci, int flags,
struct ceph_mds_session *session);
extern void ceph_check_delayed_caps(struct ceph_mds_client *mdsc);
/* file.c */
extern const struct file_operations ceph_file_fops;
extern const struct address_space_operations ceph_aops;
+extern int ceph_copy_to_page_vector(struct page **pages,
+ const char *data,
+ loff_t off, size_t len);
+extern int ceph_copy_from_page_vector(struct page **pages,
+ char *data,
+ loff_t off, size_t len);
+extern struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags);
extern int ceph_open(struct inode *inode, struct file *file);
extern struct dentry *ceph_lookup_open(struct inode *dir, struct dentry *dentry,
struct nameidata *nd, int mode,
int locked_dir);
extern int ceph_release(struct inode *inode, struct file *filp);
-extern void ceph_release_page_vector(struct page **pages, int num_pages);
/* dir.c */
extern const struct file_operations ceph_dir_fops;
/* export.c */
extern const struct export_operations ceph_export_ops;
-/* debugfs.c */
-extern int ceph_debugfs_init(void);
-extern void ceph_debugfs_cleanup(void);
-extern int ceph_debugfs_client_init(struct ceph_client *client);
-extern void ceph_debugfs_client_cleanup(struct ceph_client *client);
-
/* locks.c */
extern int ceph_lock(struct file *file, int cmd, struct file_lock *fl);
extern int ceph_flock(struct file *file, int cmd, struct file_lock *fl);
return NULL;
}
+/* debugfs.c */
+extern int ceph_fs_debugfs_init(struct ceph_fs_client *client);
+extern void ceph_fs_debugfs_cleanup(struct ceph_fs_client *client);
+
#endif /* _FS_CEPH_SUPER_H */
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
+
#include "super.h"
-#include "decode.h"
+#include "mds_client.h"
+
+#include <linux/ceph/decode.h>
#include <linux/xattr.h>
#include <linux/slab.h>
static int ceph_sync_setxattr(struct dentry *dentry, const char *name,
const char *value, size_t size, int flags)
{
- struct ceph_client *client = ceph_sb_to_client(dentry->d_sb);
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
struct inode *inode = dentry->d_inode;
struct ceph_inode_info *ci = ceph_inode(inode);
struct inode *parent_inode = dentry->d_parent->d_inode;
struct ceph_mds_request *req;
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_mds_client *mdsc = fsc->mdsc;
int err;
int i, nr_pages;
struct page **pages = NULL;
/* preallocate memory for xattr name, value, index node */
err = -ENOMEM;
- newname = kmalloc(name_len + 1, GFP_NOFS);
+ newname = kmemdup(name, name_len + 1, GFP_NOFS);
if (!newname)
goto out;
- memcpy(newname, name, name_len + 1);
if (val_len) {
newval = kmalloc(val_len + 1, GFP_NOFS);
static int ceph_send_removexattr(struct dentry *dentry, const char *name)
{
- struct ceph_client *client = ceph_sb_to_client(dentry->d_sb);
- struct ceph_mds_client *mdsc = &client->mdsc;
+ struct ceph_fs_client *fsc = ceph_sb_to_client(dentry->d_sb);
+ struct ceph_mds_client *mdsc = fsc->mdsc;
struct inode *inode = dentry->d_inode;
struct inode *parent_inode = dentry->d_parent->d_inode;
struct ceph_mds_request *req;
#endif
/* permit direct mmap, for read, write or exec */
BDI_CAP_MAP_DIRECT |
- BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP),
+ BDI_CAP_READ_MAP | BDI_CAP_WRITE_MAP | BDI_CAP_EXEC_MAP |
+ /* no writeback happens */
+ BDI_CAP_NO_ACCT_AND_WRITEBACK),
};
static struct kobj_map *cdev_map;
tristate "CIFS support (advanced network filesystem, SMBFS successor)"
depends on INET
select NLS
- select CRYPTO_MD5
- select CRYPTO_ARC4
help
This is the client VFS module for the Common Internet File System
(CIFS) protocol which is the successor to the Server Message Block
if (compare_oid(oid, oidlen, MSKRB5_OID,
MSKRB5_OID_LEN))
server->sec_mskerberos = true;
- if (compare_oid(oid, oidlen, KRB5U2U_OID,
+ else if (compare_oid(oid, oidlen, KRB5U2U_OID,
KRB5U2U_OID_LEN))
server->sec_kerberosu2u = true;
- if (compare_oid(oid, oidlen, KRB5_OID,
+ else if (compare_oid(oid, oidlen, KRB5_OID,
KRB5_OID_LEN))
server->sec_kerberos = true;
- if (compare_oid(oid, oidlen, NTLMSSP_OID,
+ else if (compare_oid(oid, oidlen, NTLMSSP_OID,
NTLMSSP_OID_LEN))
server->sec_ntlmssp = true;
#include "md5.h"
#include "cifs_unicode.h"
#include "cifsproto.h"
-#include "ntlmssp.h"
#include <linux/ctype.h>
#include <linux/random.h>
unsigned char *p24);
static int cifs_calculate_signature(const struct smb_hdr *cifs_pdu,
- struct TCP_Server_Info *server, char *signature)
+ const struct mac_key *key, char *signature)
{
- int rc;
+ struct MD5Context context;
- if (cifs_pdu == NULL || server == NULL || signature == NULL)
+ if ((cifs_pdu == NULL) || (signature == NULL) || (key == NULL))
return -EINVAL;
- if (!server->ntlmssp.sdescmd5) {
- cERROR(1,
- "cifs_calculate_signature: can't generate signature\n");
- return -1;
- }
-
- rc = crypto_shash_init(&server->ntlmssp.sdescmd5->shash);
- if (rc) {
- cERROR(1, "cifs_calculate_signature: oould not init md5\n");
- return rc;
- }
-
- if (server->secType == RawNTLMSSP)
- crypto_shash_update(&server->ntlmssp.sdescmd5->shash,
- server->session_key.data.ntlmv2.key,
- CIFS_NTLMV2_SESSKEY_SIZE);
- else
- crypto_shash_update(&server->ntlmssp.sdescmd5->shash,
- (char *)&server->session_key.data,
- server->session_key.len);
-
- crypto_shash_update(&server->ntlmssp.sdescmd5->shash,
- cifs_pdu->Protocol, cifs_pdu->smb_buf_length);
+ cifs_MD5_init(&context);
+ cifs_MD5_update(&context, (char *)&key->data, key->len);
+ cifs_MD5_update(&context, cifs_pdu->Protocol, cifs_pdu->smb_buf_length);
- rc = crypto_shash_final(&server->ntlmssp.sdescmd5->shash, signature);
-
- return rc;
+ cifs_MD5_final(signature, &context);
+ return 0;
}
-
int cifs_sign_smb(struct smb_hdr *cifs_pdu, struct TCP_Server_Info *server,
__u32 *pexpected_response_sequence_number)
{
server->sequence_number++;
spin_unlock(&GlobalMid_Lock);
- rc = cifs_calculate_signature(cifs_pdu, server, smb_signature);
+ rc = cifs_calculate_signature(cifs_pdu, &server->mac_signing_key,
+ smb_signature);
if (rc)
memset(cifs_pdu->Signature.SecuritySignature, 0, 8);
else
}
static int cifs_calc_signature2(const struct kvec *iov, int n_vec,
- struct TCP_Server_Info *server, char *signature)
+ const struct mac_key *key, char *signature)
{
+ struct MD5Context context;
int i;
- int rc;
- if (iov == NULL || server == NULL || signature == NULL)
+ if ((iov == NULL) || (signature == NULL) || (key == NULL))
return -EINVAL;
- if (!server->ntlmssp.sdescmd5) {
- cERROR(1, "cifs_calc_signature2: can't generate signature\n");
- return -1;
- }
-
- rc = crypto_shash_init(&server->ntlmssp.sdescmd5->shash);
- if (rc) {
- cERROR(1, "cifs_calc_signature2: oould not init md5\n");
- return rc;
- }
-
- if (server->secType == RawNTLMSSP)
- crypto_shash_update(&server->ntlmssp.sdescmd5->shash,
- server->session_key.data.ntlmv2.key,
- CIFS_NTLMV2_SESSKEY_SIZE);
- else
- crypto_shash_update(&server->ntlmssp.sdescmd5->shash,
- (char *)&server->session_key.data,
- server->session_key.len);
-
+ cifs_MD5_init(&context);
+ cifs_MD5_update(&context, (char *)&key->data, key->len);
for (i = 0; i < n_vec; i++) {
if (iov[i].iov_len == 0)
continue;
if (iov[i].iov_base == NULL) {
- cERROR(1, "cifs_calc_signature2: null iovec entry");
+ cERROR(1, "null iovec entry");
return -EIO;
}
/* The first entry includes a length field (which does not get
if (i == 0) {
if (iov[0].iov_len <= 8) /* cmd field at offset 9 */
break; /* nothing to sign or corrupt header */
- crypto_shash_update(&server->ntlmssp.sdescmd5->shash,
- iov[i].iov_base + 4, iov[i].iov_len - 4);
+ cifs_MD5_update(&context, iov[0].iov_base+4,
+ iov[0].iov_len-4);
} else
- crypto_shash_update(&server->ntlmssp.sdescmd5->shash,
- iov[i].iov_base, iov[i].iov_len);
+ cifs_MD5_update(&context, iov[i].iov_base, iov[i].iov_len);
}
- rc = crypto_shash_final(&server->ntlmssp.sdescmd5->shash, signature);
+ cifs_MD5_final(signature, &context);
- return rc;
+ return 0;
}
+
int cifs_sign_smb2(struct kvec *iov, int n_vec, struct TCP_Server_Info *server,
__u32 *pexpected_response_sequence_number)
{
server->sequence_number++;
spin_unlock(&GlobalMid_Lock);
- rc = cifs_calc_signature2(iov, n_vec, server, smb_signature);
+ rc = cifs_calc_signature2(iov, n_vec, &server->mac_signing_key,
+ smb_signature);
if (rc)
memset(cifs_pdu->Signature.SecuritySignature, 0, 8);
else
}
int cifs_verify_signature(struct smb_hdr *cifs_pdu,
- struct TCP_Server_Info *server,
+ const struct mac_key *mac_key,
__u32 expected_sequence_number)
{
- int rc;
+ unsigned int rc;
char server_response_sig[8];
char what_we_think_sig_should_be[20];
- if (cifs_pdu == NULL || server == NULL)
+ if ((cifs_pdu == NULL) || (mac_key == NULL))
return -EINVAL;
if (cifs_pdu->Command == SMB_COM_NEGOTIATE)
cpu_to_le32(expected_sequence_number);
cifs_pdu->Signature.Sequence.Reserved = 0;
- rc = cifs_calculate_signature(cifs_pdu, server,
+ rc = cifs_calculate_signature(cifs_pdu, mac_key,
what_we_think_sig_should_be);
if (rc)
}
/* We fill in key by putting in 40 byte array which was allocated by caller */
-int cifs_calculate_session_key(struct session_key *key, const char *rn,
+int cifs_calculate_mac_key(struct mac_key *key, const char *rn,
const char *password)
{
char temp_key[16];
{
int rc = 0;
int len;
- char nt_hash[CIFS_NTHASH_SIZE];
+ char nt_hash[16];
+ struct HMACMD5Context *pctxt;
wchar_t *user;
wchar_t *domain;
- wchar_t *server;
- if (!ses->server->ntlmssp.sdeschmacmd5) {
- cERROR(1, "calc_ntlmv2_hash: can't generate ntlmv2 hash\n");
- return -1;
- }
+ pctxt = kmalloc(sizeof(struct HMACMD5Context), GFP_KERNEL);
+
+ if (pctxt == NULL)
+ return -ENOMEM;
/* calculate md4 hash of password */
E_md4hash(ses->password, nt_hash);
- crypto_shash_setkey(ses->server->ntlmssp.hmacmd5, nt_hash,
- CIFS_NTHASH_SIZE);
-
- rc = crypto_shash_init(&ses->server->ntlmssp.sdeschmacmd5->shash);
- if (rc) {
- cERROR(1, "calc_ntlmv2_hash: could not init hmacmd5\n");
- return rc;
- }
+ /* convert Domainname to unicode and uppercase */
+ hmac_md5_init_limK_to_64(nt_hash, 16, pctxt);
/* convert ses->userName to unicode and uppercase */
len = strlen(ses->userName);
user = kmalloc(2 + (len * 2), GFP_KERNEL);
- if (user == NULL) {
- cERROR(1, "calc_ntlmv2_hash: user mem alloc failure\n");
- rc = -ENOMEM;
+ if (user == NULL)
goto calc_exit_2;
- }
len = cifs_strtoUCS((__le16 *)user, ses->userName, len, nls_cp);
UniStrupr(user);
-
- crypto_shash_update(&ses->server->ntlmssp.sdeschmacmd5->shash,
- (char *)user, 2 * len);
+ hmac_md5_update((char *)user, 2*len, pctxt);
/* convert ses->domainName to unicode and uppercase */
if (ses->domainName) {
len = strlen(ses->domainName);
domain = kmalloc(2 + (len * 2), GFP_KERNEL);
- if (domain == NULL) {
- cERROR(1, "calc_ntlmv2_hash: domain mem alloc failure");
- rc = -ENOMEM;
+ if (domain == NULL)
goto calc_exit_1;
- }
len = cifs_strtoUCS((__le16 *)domain, ses->domainName, len,
nls_cp);
/* the following line was removed since it didn't work well
Maybe converting the domain name earlier makes sense */
/* UniStrupr(domain); */
- crypto_shash_update(&ses->server->ntlmssp.sdeschmacmd5->shash,
- (char *)domain, 2 * len);
+ hmac_md5_update((char *)domain, 2*len, pctxt);
kfree(domain);
- } else if (ses->serverName) {
- len = strlen(ses->serverName);
-
- server = kmalloc(2 + (len * 2), GFP_KERNEL);
- if (server == NULL) {
- cERROR(1, "calc_ntlmv2_hash: server mem alloc failure");
- rc = -ENOMEM;
- goto calc_exit_1;
- }
- len = cifs_strtoUCS((__le16 *)server, ses->serverName, len,
- nls_cp);
- /* the following line was removed since it didn't work well
- with lower cased domain name that passed as an option.
- Maybe converting the domain name earlier makes sense */
- /* UniStrupr(domain); */
-
- crypto_shash_update(&ses->server->ntlmssp.sdeschmacmd5->shash,
- (char *)server, 2 * len);
-
- kfree(server);
}
-
- rc = crypto_shash_final(&ses->server->ntlmssp.sdeschmacmd5->shash,
- ses->server->ntlmv2_hash);
-
calc_exit_1:
kfree(user);
calc_exit_2:
/* BB FIXME what about bytes 24 through 40 of the signing key?
compare with the NTLM example */
+ hmac_md5_final(ses->server->ntlmv2_hash, pctxt);
+ kfree(pctxt);
return rc;
}
-static int
-find_domain_name(struct cifsSesInfo *ses)
-{
- int rc = 0;
- unsigned int attrsize;
- unsigned int type;
- unsigned char *blobptr;
- struct ntlmssp2_name *attrptr;
-
- if (ses->server->tiblob) {
- blobptr = ses->server->tiblob;
- attrptr = (struct ntlmssp2_name *) blobptr;
-
- while ((type = attrptr->type) != 0) {
- blobptr += 2; /* advance attr type */
- attrsize = attrptr->length;
- blobptr += 2; /* advance attr size */
- if (type == NTLMSSP_AV_NB_DOMAIN_NAME) {
- if (!ses->domainName) {
- ses->domainName =
- kmalloc(attrptr->length + 1,
- GFP_KERNEL);
- if (!ses->domainName)
- return -ENOMEM;
- cifs_from_ucs2(ses->domainName,
- (__le16 *)blobptr,
- attrptr->length,
- attrptr->length,
- load_nls_default(), false);
- }
- }
- blobptr += attrsize; /* advance attr value */
- attrptr = (struct ntlmssp2_name *) blobptr;
- }
- } else {
- ses->server->tilen = 2 * sizeof(struct ntlmssp2_name);
- ses->server->tiblob = kmalloc(ses->server->tilen, GFP_KERNEL);
- if (!ses->server->tiblob) {
- ses->server->tilen = 0;
- cERROR(1, "Challenge target info allocation failure");
- return -ENOMEM;
- }
- memset(ses->server->tiblob, 0x0, ses->server->tilen);
- attrptr = (struct ntlmssp2_name *) ses->server->tiblob;
- attrptr->type = cpu_to_le16(NTLMSSP_DOMAIN_TYPE);
- }
-
- return rc;
-}
-
-static int
-CalcNTLMv2_response(const struct TCP_Server_Info *server,
- char *v2_session_response)
-{
- int rc;
-
- if (!server->ntlmssp.sdeschmacmd5) {
- cERROR(1, "calc_ntlmv2_hash: can't generate ntlmv2 hash\n");
- return -1;
- }
-
- crypto_shash_setkey(server->ntlmssp.hmacmd5, server->ntlmv2_hash,
- CIFS_HMAC_MD5_HASH_SIZE);
-
- rc = crypto_shash_init(&server->ntlmssp.sdeschmacmd5->shash);
- if (rc) {
- cERROR(1, "CalcNTLMv2_response: could not init hmacmd5");
- return rc;
- }
-
- memcpy(v2_session_response + CIFS_SERVER_CHALLENGE_SIZE,
- server->cryptKey, CIFS_SERVER_CHALLENGE_SIZE);
- crypto_shash_update(&server->ntlmssp.sdeschmacmd5->shash,
- v2_session_response + CIFS_SERVER_CHALLENGE_SIZE,
- sizeof(struct ntlmv2_resp) - CIFS_SERVER_CHALLENGE_SIZE);
-
- if (server->tilen)
- crypto_shash_update(&server->ntlmssp.sdeschmacmd5->shash,
- server->tiblob, server->tilen);
-
- rc = crypto_shash_final(&server->ntlmssp.sdeschmacmd5->shash,
- v2_session_response);
-
- return rc;
-}
-
-int
-setup_ntlmv2_rsp(struct cifsSesInfo *ses, char *resp_buf,
+void setup_ntlmv2_rsp(struct cifsSesInfo *ses, char *resp_buf,
const struct nls_table *nls_cp)
{
- int rc = 0;
+ int rc;
struct ntlmv2_resp *buf = (struct ntlmv2_resp *)resp_buf;
+ struct HMACMD5Context context;
buf->blob_signature = cpu_to_le32(0x00000101);
buf->reserved = 0;
buf->time = cpu_to_le64(cifs_UnixTimeToNT(CURRENT_TIME));
get_random_bytes(&buf->client_chal, sizeof(buf->client_chal));
buf->reserved2 = 0;
-
- if (!ses->domainName) {
- rc = find_domain_name(ses);
- if (rc) {
- cERROR(1, "could not get domain/server name rc %d", rc);
- return rc;
- }
- }
+ buf->names[0].type = cpu_to_le16(NTLMSSP_DOMAIN_TYPE);
+ buf->names[0].length = 0;
+ buf->names[1].type = 0;
+ buf->names[1].length = 0;
/* calculate buf->ntlmv2_hash */
rc = calc_ntlmv2_hash(ses, nls_cp);
- if (rc) {
- cERROR(1, "could not get v2 hash rc %d", rc);
- return rc;
- }
- rc = CalcNTLMv2_response(ses->server, resp_buf);
- if (rc) {
+ if (rc)
cERROR(1, "could not get v2 hash rc %d", rc);
- return rc;
- }
-
- if (!ses->server->ntlmssp.sdeschmacmd5) {
- cERROR(1, "calc_ntlmv2_hash: can't generate ntlmv2 hash\n");
- return -1;
- }
-
- crypto_shash_setkey(ses->server->ntlmssp.hmacmd5,
- ses->server->ntlmv2_hash, CIFS_HMAC_MD5_HASH_SIZE);
+ CalcNTLMv2_response(ses, resp_buf);
- rc = crypto_shash_init(&ses->server->ntlmssp.sdeschmacmd5->shash);
- if (rc) {
- cERROR(1, "setup_ntlmv2_rsp: could not init hmacmd5\n");
- return rc;
- }
+ /* now calculate the MAC key for NTLMv2 */
+ hmac_md5_init_limK_to_64(ses->server->ntlmv2_hash, 16, &context);
+ hmac_md5_update(resp_buf, 16, &context);
+ hmac_md5_final(ses->server->mac_signing_key.data.ntlmv2.key, &context);
- crypto_shash_update(&ses->server->ntlmssp.sdeschmacmd5->shash,
- resp_buf, CIFS_HMAC_MD5_HASH_SIZE);
-
- rc = crypto_shash_final(&ses->server->ntlmssp.sdeschmacmd5->shash,
- ses->server->session_key.data.ntlmv2.key);
-
- memcpy(&ses->server->session_key.data.ntlmv2.resp, resp_buf,
- sizeof(struct ntlmv2_resp));
- ses->server->session_key.len = 16 + sizeof(struct ntlmv2_resp);
-
- return rc;
+ memcpy(&ses->server->mac_signing_key.data.ntlmv2.resp, resp_buf,
+ sizeof(struct ntlmv2_resp));
+ ses->server->mac_signing_key.len = 16 + sizeof(struct ntlmv2_resp);
}
-int
-calc_seckey(struct TCP_Server_Info *server)
-{
- int rc;
- unsigned char sec_key[CIFS_NTLMV2_SESSKEY_SIZE];
- struct crypto_blkcipher *tfm_arc4;
- struct scatterlist sgin, sgout;
- struct blkcipher_desc desc;
-
- get_random_bytes(sec_key, CIFS_NTLMV2_SESSKEY_SIZE);
-
- tfm_arc4 = crypto_alloc_blkcipher("ecb(arc4)",
- 0, CRYPTO_ALG_ASYNC);
- if (!tfm_arc4 || IS_ERR(tfm_arc4)) {
- cERROR(1, "could not allocate " "master crypto API arc4\n");
- return 1;
- }
-
- desc.tfm = tfm_arc4;
-
- crypto_blkcipher_setkey(tfm_arc4,
- server->session_key.data.ntlmv2.key, CIFS_CPHTXT_SIZE);
- sg_init_one(&sgin, sec_key, CIFS_CPHTXT_SIZE);
- sg_init_one(&sgout, server->ntlmssp.ciphertext, CIFS_CPHTXT_SIZE);
- rc = crypto_blkcipher_encrypt(&desc, &sgout, &sgin, CIFS_CPHTXT_SIZE);
-
- if (!rc)
- memcpy(server->session_key.data.ntlmv2.key,
- sec_key, CIFS_NTLMV2_SESSKEY_SIZE);
-
- crypto_free_blkcipher(tfm_arc4);
-
- return 0;
-}
-
-void
-cifs_crypto_shash_release(struct TCP_Server_Info *server)
-{
- if (server->ntlmssp.md5)
- crypto_free_shash(server->ntlmssp.md5);
-
- if (server->ntlmssp.hmacmd5)
- crypto_free_shash(server->ntlmssp.hmacmd5);
-
- kfree(server->ntlmssp.sdeschmacmd5);
-
- kfree(server->ntlmssp.sdescmd5);
-}
-
-int
-cifs_crypto_shash_allocate(struct TCP_Server_Info *server)
+void CalcNTLMv2_response(const struct cifsSesInfo *ses,
+ char *v2_session_response)
{
- int rc;
- unsigned int size;
-
- server->ntlmssp.hmacmd5 = crypto_alloc_shash("hmac(md5)", 0, 0);
- if (!server->ntlmssp.hmacmd5 ||
- IS_ERR(server->ntlmssp.hmacmd5)) {
- cERROR(1, "could not allocate crypto hmacmd5\n");
- return 1;
- }
-
- server->ntlmssp.md5 = crypto_alloc_shash("md5", 0, 0);
- if (!server->ntlmssp.md5 || IS_ERR(server->ntlmssp.md5)) {
- cERROR(1, "could not allocate crypto md5\n");
- rc = 1;
- goto cifs_crypto_shash_allocate_ret1;
- }
-
- size = sizeof(struct shash_desc) +
- crypto_shash_descsize(server->ntlmssp.hmacmd5);
- server->ntlmssp.sdeschmacmd5 = kmalloc(size, GFP_KERNEL);
- if (!server->ntlmssp.sdeschmacmd5) {
- cERROR(1, "cifs_crypto_shash_allocate: can't alloc hmacmd5\n");
- rc = -ENOMEM;
- goto cifs_crypto_shash_allocate_ret2;
- }
- server->ntlmssp.sdeschmacmd5->shash.tfm = server->ntlmssp.hmacmd5;
- server->ntlmssp.sdeschmacmd5->shash.flags = 0x0;
+ struct HMACMD5Context context;
+ /* rest of v2 struct already generated */
+ memcpy(v2_session_response + 8, ses->server->cryptKey, 8);
+ hmac_md5_init_limK_to_64(ses->server->ntlmv2_hash, 16, &context);
+ hmac_md5_update(v2_session_response+8,
+ sizeof(struct ntlmv2_resp) - 8, &context);
- size = sizeof(struct shash_desc) +
- crypto_shash_descsize(server->ntlmssp.md5);
- server->ntlmssp.sdescmd5 = kmalloc(size, GFP_KERNEL);
- if (!server->ntlmssp.sdescmd5) {
- cERROR(1, "cifs_crypto_shash_allocate: can't alloc md5\n");
- rc = -ENOMEM;
- goto cifs_crypto_shash_allocate_ret3;
- }
- server->ntlmssp.sdescmd5->shash.tfm = server->ntlmssp.md5;
- server->ntlmssp.sdescmd5->shash.flags = 0x0;
-
- return 0;
-
-cifs_crypto_shash_allocate_ret3:
- kfree(server->ntlmssp.sdeschmacmd5);
-
-cifs_crypto_shash_allocate_ret2:
- crypto_free_shash(server->ntlmssp.md5);
-
-cifs_crypto_shash_allocate_ret1:
- crypto_free_shash(server->ntlmssp.hmacmd5);
-
- return rc;
+ hmac_md5_final(v2_session_response, &context);
+/* cifs_dump_mem("v2_sess_rsp: ", v2_session_response, 32); */
}
#include <linux/workqueue.h>
#include "cifs_fs_sb.h"
#include "cifsacl.h"
-#include <crypto/internal/hash.h>
-#include <linux/scatterlist.h>
-
/*
* The sizes of various internal tables and strings
*/
/* Netbios frames protocol not supported at this time */
};
-struct session_key {
+struct mac_key {
unsigned int len;
union {
char ntlm[CIFS_SESS_KEY_SIZE + 16];
struct cifs_ace *aces;
};
-struct sdesc {
- struct shash_desc shash;
- char ctx[];
-};
-
-struct ntlmssp_auth {
- __u32 client_flags;
- __u32 server_flags;
- unsigned char ciphertext[CIFS_CPHTXT_SIZE];
- struct crypto_shash *hmacmd5;
- struct crypto_shash *md5;
- struct sdesc *sdeschmacmd5;
- struct sdesc *sdescmd5;
-};
-
/*
*****************************************************************
* Except the CIFS PDUs themselves all the
/* 16th byte of RFC1001 workstation name is always null */
char workstation_RFC1001_name[RFC1001_NAME_LEN_WITH_NULL];
__u32 sequence_number; /* needed for CIFS PDU signature */
- struct session_key session_key;
+ struct mac_key mac_signing_key;
char ntlmv2_hash[16];
unsigned long lstrp; /* when we got last response from this server */
u16 dialect; /* dialect index that server chose */
/* extended security flavors that server supports */
- unsigned int tilen; /* length of the target info blob */
- unsigned char *tiblob; /* target info blob in challenge response */
- struct ntlmssp_auth ntlmssp; /* various keys, ciphers, flags */
bool sec_kerberos; /* supports plain Kerberos */
bool sec_mskerberos; /* supports legacy MS Kerberos */
bool sec_kerberosu2u; /* supports U2U Kerberos */
* Size of the session key (crypto key encrypted with the password
*/
#define CIFS_SESS_KEY_SIZE (24)
-#define CIFS_CLIENT_CHALLENGE_SIZE (8)
-#define CIFS_SERVER_CHALLENGE_SIZE (8)
-#define CIFS_HMAC_MD5_HASH_SIZE (16)
-#define CIFS_CPHTXT_SIZE (16)
-#define CIFS_NTLMV2_SESSKEY_SIZE (16)
-#define CIFS_NTHASH_SIZE (16)
/*
* Maximum user name length
__le64 time;
__u64 client_chal; /* random */
__u32 reserved2;
+ struct ntlmssp2_name names[2];
/* array of name entries could follow ending in minimum 4 byte struct */
} __attribute__((packed));
extern int decode_negTokenInit(unsigned char *security_blob, int length,
struct TCP_Server_Info *server);
extern int cifs_convert_address(struct sockaddr *dst, const char *src, int len);
+extern int cifs_set_port(struct sockaddr *addr, const unsigned short int port);
extern int cifs_fill_sockaddr(struct sockaddr *dst, const char *src, int len,
- unsigned short int port);
+ const unsigned short int port);
extern int map_smb_to_linux_error(struct smb_hdr *smb, int logErr);
extern void header_assemble(struct smb_hdr *, char /* command */ ,
const struct cifsTconInfo *, int /* length of
extern int cifs_sign_smb2(struct kvec *iov, int n_vec, struct TCP_Server_Info *,
__u32 *);
extern int cifs_verify_signature(struct smb_hdr *,
- struct TCP_Server_Info *server,
+ const struct mac_key *mac_key,
__u32 expected_sequence_number);
-extern int cifs_calculate_session_key(struct session_key *key, const char *rn,
+extern int cifs_calculate_mac_key(struct mac_key *key, const char *rn,
const char *pass);
-extern int setup_ntlmv2_rsp(struct cifsSesInfo *, char *,
+extern void CalcNTLMv2_response(const struct cifsSesInfo *, char *);
+extern void setup_ntlmv2_rsp(struct cifsSesInfo *, char *,
const struct nls_table *);
-extern int cifs_crypto_shash_allocate(struct TCP_Server_Info *);
-extern void cifs_crypto_shash_release(struct TCP_Server_Info *);
-extern int calc_seckey(struct TCP_Server_Info *);
#ifdef CONFIG_CIFS_WEAK_PW_HASH
extern void calc_lanman_hash(const char *password, const char *cryptkey,
bool encrypt, char *lnm_session_key);
small_smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
void **request_buf)
{
- int rc = 0;
+ int rc;
rc = cifs_reconnect_tcon(tcon, smb_command);
if (rc)
if (tcon != NULL)
cifs_stats_inc(&tcon->num_smbs_sent);
- return rc;
+ return 0;
}
int
/* If the return code is zero, this function must fill in request_buf pointer */
static int
-smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
- void **request_buf /* returned */ ,
- void **response_buf /* returned */ )
+__smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
+ void **request_buf, void **response_buf)
{
- int rc = 0;
-
- rc = cifs_reconnect_tcon(tcon, smb_command);
- if (rc)
- return rc;
-
*request_buf = cifs_buf_get();
if (*request_buf == NULL) {
/* BB should we add a retry in here if not a writepage? */
if (tcon != NULL)
cifs_stats_inc(&tcon->num_smbs_sent);
- return rc;
+ return 0;
+}
+
+/* If the return code is zero, this function must fill in request_buf pointer */
+static int
+smb_init(int smb_command, int wct, struct cifsTconInfo *tcon,
+ void **request_buf, void **response_buf)
+{
+ int rc;
+
+ rc = cifs_reconnect_tcon(tcon, smb_command);
+ if (rc)
+ return rc;
+
+ return __smb_init(smb_command, wct, tcon, request_buf, response_buf);
+}
+
+static int
+smb_init_no_reconnect(int smb_command, int wct, struct cifsTconInfo *tcon,
+ void **request_buf, void **response_buf)
+{
+ if (tcon->ses->need_reconnect || tcon->need_reconnect)
+ return -EHOSTDOWN;
+
+ return __smb_init(smb_command, wct, tcon, request_buf, response_buf);
}
static int validate_t2(struct smb_t2_rsp *pSMB)
else
rc = -EINVAL;
- if (server->secType == Kerberos) {
- if (!server->sec_kerberos &&
- !server->sec_mskerberos)
- rc = -EOPNOTSUPP;
- } else if (server->secType == RawNTLMSSP) {
- if (!server->sec_ntlmssp)
- rc = -EOPNOTSUPP;
- } else
+ if (server->sec_kerberos || server->sec_mskerberos)
+ server->secType = Kerberos;
+ else if (server->sec_ntlmssp)
+ server->secType = RawNTLMSSP;
+ else
rc = -EOPNOTSUPP;
}
} else
cFYI(1, "In QFSUnixInfo");
QFSUnixRetry:
- rc = smb_init(SMB_COM_TRANSACTION2, 15, tcon, (void **) &pSMB,
- (void **) &pSMBr);
+ rc = smb_init_no_reconnect(SMB_COM_TRANSACTION2, 15, tcon,
+ (void **) &pSMB, (void **) &pSMBr);
if (rc)
return rc;
cFYI(1, "In SETFSUnixInfo");
SETFSUnixRetry:
/* BB switch to small buf init to save memory */
- rc = smb_init(SMB_COM_TRANSACTION2, 15, tcon, (void **) &pSMB,
- (void **) &pSMBr);
+ rc = smb_init_no_reconnect(SMB_COM_TRANSACTION2, 15, tcon,
+ (void **) &pSMB, (void **) &pSMBr);
if (rc)
return rc;
cFYI(1, "call to reconnect done");
csocket = server->ssocket;
continue;
- } else if ((length == -ERESTARTSYS) || (length == -EAGAIN)) {
+ } else if (length == -ERESTARTSYS ||
+ length == -EAGAIN ||
+ length == -EINTR) {
msleep(1); /* minimum sleep to prevent looping
allowing socket to clear and app threads to set
tcpStatus CifsNeedReconnect if server hung */
} else
continue;
} else if (length <= 0) {
- if (server->tcpStatus == CifsNew) {
- cFYI(1, "tcp session abend after SMBnegprot");
- /* some servers kill the TCP session rather than
- returning an SMB negprot error, in which
- case reconnecting here is not going to help,
- and so simply return error to mount */
- break;
- }
- if (!try_to_freeze() && (length == -EINTR)) {
- cFYI(1, "cifsd thread killed");
- break;
- }
cFYI(1, "Reconnect after unexpected peek error %d",
length);
cifs_reconnect(server);
an error on SMB negprot response */
cFYI(1, "Negative RFC1002 Session Response Error 0x%x)",
pdu_length);
- if (server->tcpStatus == CifsNew) {
- /* if nack on negprot (rather than
- ret of smb negprot error) reconnecting
- not going to help, ret error to mount */
- break;
- } else {
- /* give server a second to
- clean up before reconnect attempt */
- msleep(1000);
- /* always try 445 first on reconnect
- since we get NACK on some if we ever
- connected to port 139 (the NACK is
- since we do not begin with RFC1001
- session initialize frame) */
- server->addr.sockAddr.sin_port =
- htons(CIFS_PORT);
- cifs_reconnect(server);
- csocket = server->ssocket;
- wake_up(&server->response_q);
- continue;
- }
+ /* give server a second to clean up */
+ msleep(1000);
+ /* always try 445 first on reconnect since we get NACK
+ * on some if we ever connected to port 139 (the NACK
+ * is since we do not begin with RFC1001 session
+ * initialize frame)
+ */
+ cifs_set_port((struct sockaddr *)
+ &server->addr.sockAddr, CIFS_PORT);
+ cifs_reconnect(server);
+ csocket = server->ssocket;
+ wake_up(&server->response_q);
+ continue;
} else if (temp != (char) 0) {
cERROR(1, "Unknown RFC 1002 frame");
cifs_dump_mem(" Received Data: ", (char *)smb_buffer,
total_read += length) {
length = kernel_recvmsg(csocket, &smb_msg, &iov, 1,
pdu_length - total_read, 0);
- if ((server->tcpStatus == CifsExiting) ||
- (length == -EINTR)) {
+ if (server->tcpStatus == CifsExiting) {
/* then will exit */
reconnect = 2;
break;
/* Now we will reread sock */
reconnect = 1;
break;
- } else if ((length == -ERESTARTSYS) ||
- (length == -EAGAIN)) {
+ } else if (length == -ERESTARTSYS ||
+ length == -EAGAIN ||
+ length == -EINTR) {
msleep(1); /* minimum sleep to prevent looping,
allowing socket to clear and app
threads to set tcpStatus
CIFSSMBLogoff(xid, ses);
_FreeXid(xid);
}
- cifs_crypto_shash_release(server);
sesInfoFree(ses);
cifs_put_tcp_session(server);
}
if (ses) {
cFYI(1, "Existing smb sess found (status=%d)", ses->status);
- /* existing SMB ses has a server reference already */
- cifs_put_tcp_session(server);
-
mutex_lock(&ses->session_mutex);
rc = cifs_negotiate_protocol(xid, ses);
if (rc) {
}
}
mutex_unlock(&ses->session_mutex);
+
+ /* existing SMB ses has a server reference already */
+ cifs_put_tcp_session(server);
FreeXid(xid);
return ses;
}
ses->linux_uid = volume_info->linux_uid;
ses->overrideSecFlg = volume_info->secFlg;
- rc = cifs_crypto_shash_allocate(server);
- if (rc) {
- cERROR(1, "could not setup hash structures rc %d", rc);
- goto get_ses_fail;
- }
- server->tilen = 0;
- server->tiblob = NULL;
-
mutex_lock(&ses->session_mutex);
rc = cifs_negotiate_protocol(xid, ses);
if (!rc)
rc = cifs_setup_session(xid, ses, volume_info->local_nls);
mutex_unlock(&ses->session_mutex);
- if (rc) {
- cifs_crypto_shash_release(ses->server);
+ if (rc)
goto get_ses_fail;
- }
/* success, put it on the list */
write_lock(&cifs_tcp_ses_lock);
inode->i_flags |= S_NOATIME | S_NOCMTIME;
if (inode->i_state & I_NEW) {
inode->i_ino = hash;
+ if (S_ISREG(inode->i_mode))
+ inode->i_data.backing_dev_info = sb->s_bdi;
#ifdef CONFIG_CIFS_FSCACHE
/* initialize per-inode cache cookie pointer */
CIFS_I(inode)->fscache = NULL;
{
char *fromName = NULL;
char *toName = NULL;
- struct cifs_sb_info *cifs_sb_source;
- struct cifs_sb_info *cifs_sb_target;
+ struct cifs_sb_info *cifs_sb;
struct cifsTconInfo *tcon;
FILE_UNIX_BASIC_INFO *info_buf_source = NULL;
FILE_UNIX_BASIC_INFO *info_buf_target;
int xid, rc, tmprc;
- cifs_sb_target = CIFS_SB(target_dir->i_sb);
- cifs_sb_source = CIFS_SB(source_dir->i_sb);
- tcon = cifs_sb_source->tcon;
+ cifs_sb = CIFS_SB(source_dir->i_sb);
+ tcon = cifs_sb->tcon;
xid = GetXid();
/*
- * BB: this might be allowed if same server, but different share.
- * Consider adding support for this
- */
- if (tcon != cifs_sb_target->tcon) {
- rc = -EXDEV;
- goto cifs_rename_exit;
- }
-
- /*
* we already have the rename sem so we do not need to
* grab it again here to protect the path integrity
*/
info_buf_target = info_buf_source + 1;
tmprc = CIFSSMBUnixQPathInfo(xid, tcon, fromName,
info_buf_source,
- cifs_sb_source->local_nls,
- cifs_sb_source->mnt_cifs_flags &
+ cifs_sb->local_nls,
+ cifs_sb->mnt_cifs_flags &
CIFS_MOUNT_MAP_SPECIAL_CHR);
if (tmprc != 0)
goto unlink_target;
- tmprc = CIFSSMBUnixQPathInfo(xid, tcon,
- toName, info_buf_target,
- cifs_sb_target->local_nls,
- /* remap based on source sb */
- cifs_sb_source->mnt_cifs_flags &
+ tmprc = CIFSSMBUnixQPathInfo(xid, tcon, toName,
+ info_buf_target,
+ cifs_sb->local_nls,
+ cifs_sb->mnt_cifs_flags &
CIFS_MOUNT_MAP_SPECIAL_CHR);
if (tmprc == 0 && (info_buf_source->UniqueId ==
}
int
-cifs_fill_sockaddr(struct sockaddr *dst, const char *src, int len,
- const unsigned short int port)
+cifs_set_port(struct sockaddr *addr, const unsigned short int port)
{
- if (!cifs_convert_address(dst, src, len))
- return 0;
-
- switch (dst->sa_family) {
+ switch (addr->sa_family) {
case AF_INET:
- ((struct sockaddr_in *)dst)->sin_port = htons(port);
+ ((struct sockaddr_in *)addr)->sin_port = htons(port);
break;
case AF_INET6:
- ((struct sockaddr_in6 *)dst)->sin6_port = htons(port);
+ ((struct sockaddr_in6 *)addr)->sin6_port = htons(port);
break;
default:
return 0;
}
-
return 1;
}
+int
+cifs_fill_sockaddr(struct sockaddr *dst, const char *src, int len,
+ const unsigned short int port)
+{
+ if (!cifs_convert_address(dst, src, len))
+ return 0;
+ return cifs_set_port(dst, port);
+}
+
/*****************************************************************************
convert a NT status code to a dos class/code
*****************************************************************************/
#define NTLMSSP_NEGOTIATE_KEY_XCH 0x40000000
#define NTLMSSP_NEGOTIATE_56 0x80000000
-/* Define AV Pair Field IDs */
-#define NTLMSSP_AV_EOL 0
-#define NTLMSSP_AV_NB_COMPUTER_NAME 1
-#define NTLMSSP_AV_NB_DOMAIN_NAME 2
-#define NTLMSSP_AV_DNS_COMPUTER_NAME 3
-#define NTLMSSP_AV_DNS_DOMAIN_NAME 4
-#define NTLMSSP_AV_DNS_TREE_NAME 5
-#define NTLMSSP_AV_FLAGS 6
-#define NTLMSSP_AV_TIMESTAMP 7
-#define NTLMSSP_AV_RESTRICTION 8
-#define NTLMSSP_AV_TARGET_NAME 9
-#define NTLMSSP_AV_CHANNEL_BINDINGS 10
-
/* Although typedefs are not commonly used for structure definitions */
/* in the Linux kernel, in this particular case they are useful */
/* to more closely match the standards document for NTLMSSP from */
static int decode_ntlmssp_challenge(char *bcc_ptr, int blob_len,
struct cifsSesInfo *ses)
{
- unsigned int tioffset; /* challeng message target info area */
- unsigned int tilen; /* challeng message target info area length */
-
CHALLENGE_MESSAGE *pblob = (CHALLENGE_MESSAGE *)bcc_ptr;
if (blob_len < sizeof(CHALLENGE_MESSAGE)) {
/* BB spec says that if AvId field of MsvAvTimestamp is populated then
we must set the MIC field of the AUTHENTICATE_MESSAGE */
- ses->server->ntlmssp.server_flags = le32_to_cpu(pblob->NegotiateFlags);
-
- tioffset = cpu_to_le16(pblob->TargetInfoArray.BufferOffset);
- tilen = cpu_to_le16(pblob->TargetInfoArray.Length);
- ses->server->tilen = tilen;
- if (tilen) {
- ses->server->tiblob = kmalloc(tilen, GFP_KERNEL);
- if (!ses->server->tiblob) {
- cERROR(1, "Challenge target info allocation failure");
- return -ENOMEM;
- }
- memcpy(ses->server->tiblob, bcc_ptr + tioffset, tilen);
- }
-
return 0;
}
/* BB is NTLMV2 session security format easier to use here? */
flags = NTLMSSP_NEGOTIATE_56 | NTLMSSP_REQUEST_TARGET |
NTLMSSP_NEGOTIATE_128 | NTLMSSP_NEGOTIATE_UNICODE |
- NTLMSSP_NEGOTIATE_NTLM;
+ NTLMSSP_NEGOTIATE_NT_ONLY | NTLMSSP_NEGOTIATE_NTLM;
if (ses->server->secMode &
- (SECMODE_SIGN_REQUIRED | SECMODE_SIGN_ENABLED)) {
- flags |= NTLMSSP_NEGOTIATE_SIGN |
- NTLMSSP_NEGOTIATE_KEY_XCH |
- NTLMSSP_NEGOTIATE_EXTENDED_SEC;
- }
+ (SECMODE_SIGN_REQUIRED | SECMODE_SIGN_ENABLED))
+ flags |= NTLMSSP_NEGOTIATE_SIGN;
+ if (ses->server->secMode & SECMODE_SIGN_REQUIRED)
+ flags |= NTLMSSP_NEGOTIATE_ALWAYS_SIGN;
sec_blob->NegotiateFlags |= cpu_to_le32(flags);
struct cifsSesInfo *ses,
const struct nls_table *nls_cp, bool first)
{
- int rc;
- unsigned int size;
AUTHENTICATE_MESSAGE *sec_blob = (AUTHENTICATE_MESSAGE *)pbuffer;
__u32 flags;
unsigned char *tmp;
- struct ntlmv2_resp ntlmv2_response = {};
+ char ntlm_session_key[CIFS_SESS_KEY_SIZE];
memcpy(sec_blob->Signature, NTLMSSP_SIGNATURE, 8);
sec_blob->MessageType = NtLmAuthenticate;
sec_blob->LmChallengeResponse.Length = 0;
sec_blob->LmChallengeResponse.MaximumLength = 0;
- sec_blob->NtChallengeResponse.BufferOffset = cpu_to_le32(tmp - pbuffer);
- rc = setup_ntlmv2_rsp(ses, (char *)&ntlmv2_response, nls_cp);
- if (rc) {
- cERROR(1, "error rc: %d during ntlmssp ntlmv2 setup", rc);
- goto setup_ntlmv2_ret;
- }
- size = sizeof(struct ntlmv2_resp);
- memcpy(tmp, (char *)&ntlmv2_response, size);
- tmp += size;
- if (ses->server->tilen > 0) {
- memcpy(tmp, ses->server->tiblob, ses->server->tilen);
- tmp += ses->server->tilen;
- } else
- ses->server->tilen = 0;
+ /* calculate session key, BB what about adding similar ntlmv2 path? */
+ SMBNTencrypt(ses->password, ses->server->cryptKey, ntlm_session_key);
+ if (first)
+ cifs_calculate_mac_key(&ses->server->mac_signing_key,
+ ntlm_session_key, ses->password);
- sec_blob->NtChallengeResponse.Length = cpu_to_le16(size +
- ses->server->tilen);
+ memcpy(tmp, ntlm_session_key, CIFS_SESS_KEY_SIZE);
+ sec_blob->NtChallengeResponse.BufferOffset = cpu_to_le32(tmp - pbuffer);
+ sec_blob->NtChallengeResponse.Length = cpu_to_le16(CIFS_SESS_KEY_SIZE);
sec_blob->NtChallengeResponse.MaximumLength =
- cpu_to_le16(size + ses->server->tilen);
+ cpu_to_le16(CIFS_SESS_KEY_SIZE);
+
+ tmp += CIFS_SESS_KEY_SIZE;
if (ses->domainName == NULL) {
sec_blob->DomainName.BufferOffset = cpu_to_le32(tmp - pbuffer);
len = cifs_strtoUCS((__le16 *)tmp, ses->domainName,
MAX_USERNAME_SIZE, nls_cp);
len *= 2; /* unicode is 2 bytes each */
+ len += 2; /* trailing null */
sec_blob->DomainName.BufferOffset = cpu_to_le32(tmp - pbuffer);
sec_blob->DomainName.Length = cpu_to_le16(len);
sec_blob->DomainName.MaximumLength = cpu_to_le16(len);
len = cifs_strtoUCS((__le16 *)tmp, ses->userName,
MAX_USERNAME_SIZE, nls_cp);
len *= 2; /* unicode is 2 bytes each */
+ len += 2; /* trailing null */
sec_blob->UserName.BufferOffset = cpu_to_le32(tmp - pbuffer);
sec_blob->UserName.Length = cpu_to_le16(len);
sec_blob->UserName.MaximumLength = cpu_to_le16(len);
sec_blob->WorkstationName.MaximumLength = 0;
tmp += 2;
- if ((ses->server->ntlmssp.server_flags & NTLMSSP_NEGOTIATE_KEY_XCH) &&
- !calc_seckey(ses->server)) {
- memcpy(tmp, ses->server->ntlmssp.ciphertext, CIFS_CPHTXT_SIZE);
- sec_blob->SessionKey.BufferOffset = cpu_to_le32(tmp - pbuffer);
- sec_blob->SessionKey.Length = cpu_to_le16(CIFS_CPHTXT_SIZE);
- sec_blob->SessionKey.MaximumLength =
- cpu_to_le16(CIFS_CPHTXT_SIZE);
- tmp += CIFS_CPHTXT_SIZE;
- } else {
- sec_blob->SessionKey.BufferOffset = cpu_to_le32(tmp - pbuffer);
- sec_blob->SessionKey.Length = 0;
- sec_blob->SessionKey.MaximumLength = 0;
- }
-
- ses->server->sequence_number = 0;
-
-setup_ntlmv2_ret:
- if (ses->server->tilen > 0)
- kfree(ses->server->tiblob);
-
+ sec_blob->SessionKey.BufferOffset = cpu_to_le32(tmp - pbuffer);
+ sec_blob->SessionKey.Length = 0;
+ sec_blob->SessionKey.MaximumLength = 0;
return tmp - pbuffer;
}
return;
}
-static int setup_ntlmssp_auth_req(char *ntlmsspblob,
+static int setup_ntlmssp_auth_req(SESSION_SETUP_ANDX *pSMB,
struct cifsSesInfo *ses,
const struct nls_table *nls, bool first_time)
{
int bloblen;
- bloblen = build_ntlmssp_auth_blob(ntlmsspblob, ses, nls,
+ bloblen = build_ntlmssp_auth_blob(&pSMB->req.SecurityBlob[0], ses, nls,
first_time);
+ pSMB->req.SecurityBlobLength = cpu_to_le16(bloblen);
return bloblen;
}
if (first_time) /* should this be moved into common code
with similar ntlmv2 path? */
- cifs_calculate_session_key(&ses->server->session_key,
+ cifs_calculate_mac_key(&ses->server->mac_signing_key,
ntlm_session_key, ses->password);
/* copy session key */
cpu_to_le16(sizeof(struct ntlmv2_resp));
/* calculate session key */
- rc = setup_ntlmv2_rsp(ses, v2_sess_key, nls_cp);
- if (rc) {
- kfree(v2_sess_key);
- goto ssetup_exit;
- }
+ setup_ntlmv2_rsp(ses, v2_sess_key, nls_cp);
/* FIXME: calculate MAC key */
memcpy(bcc_ptr, (char *)v2_sess_key,
sizeof(struct ntlmv2_resp));
bcc_ptr += sizeof(struct ntlmv2_resp);
kfree(v2_sess_key);
- if (ses->server->tilen > 0) {
- memcpy(bcc_ptr, ses->server->tiblob,
- ses->server->tilen);
- bcc_ptr += ses->server->tilen;
- }
if (ses->capabilities & CAP_UNICODE) {
if (iov[0].iov_len % 2) {
*bcc_ptr = 0;
}
/* bail out if key is too long */
if (msg->sesskey_len >
- sizeof(ses->server->session_key.data.krb5)) {
+ sizeof(ses->server->mac_signing_key.data.krb5)) {
cERROR(1, "Kerberos signing key too long (%u bytes)",
msg->sesskey_len);
rc = -EOVERFLOW;
goto ssetup_exit;
}
if (first_time) {
- ses->server->session_key.len = msg->sesskey_len;
- memcpy(ses->server->session_key.data.krb5,
+ ses->server->mac_signing_key.len = msg->sesskey_len;
+ memcpy(ses->server->mac_signing_key.data.krb5,
msg->data, msg->sesskey_len);
}
pSMB->req.hdr.Flags2 |= SMBFLG2_EXT_SEC;
if (phase == NtLmNegotiate) {
setup_ntlmssp_neg_req(pSMB, ses);
iov[1].iov_len = sizeof(NEGOTIATE_MESSAGE);
- iov[1].iov_base = &pSMB->req.SecurityBlob[0];
} else if (phase == NtLmAuthenticate) {
int blob_len;
- char *ntlmsspblob;
-
- ntlmsspblob = kmalloc(5 *
- sizeof(struct _AUTHENTICATE_MESSAGE),
- GFP_KERNEL);
- if (!ntlmsspblob) {
- cERROR(1, "Can't allocate NTLMSSP");
- rc = -ENOMEM;
- goto ssetup_exit;
- }
-
- blob_len = setup_ntlmssp_auth_req(ntlmsspblob,
- ses,
- nls_cp,
- first_time);
+ blob_len = setup_ntlmssp_auth_req(pSMB, ses,
+ nls_cp,
+ first_time);
iov[1].iov_len = blob_len;
- iov[1].iov_base = ntlmsspblob;
- pSMB->req.SecurityBlobLength =
- cpu_to_le16(blob_len);
/* Make sure that we tell the server that we
are using the uid that it just gave us back
on the response (challenge) */
rc = -ENOSYS;
goto ssetup_exit;
}
+ iov[1].iov_base = &pSMB->req.SecurityBlob[0];
/* unicode strings must be word aligned */
if ((iov[0].iov_len + iov[1].iov_len) % 2) {
*bcc_ptr = 0;
(ses->server->secMode & (SECMODE_SIGN_REQUIRED |
SECMODE_SIGN_ENABLED))) {
rc = cifs_verify_signature(midQ->resp_buf,
- ses->server,
+ &ses->server->mac_signing_key,
midQ->sequence_number+1);
if (rc) {
cERROR(1, "Unexpected SMB signature");
(ses->server->secMode & (SECMODE_SIGN_REQUIRED |
SECMODE_SIGN_ENABLED))) {
rc = cifs_verify_signature(out_buf,
- ses->server,
+ &ses->server->mac_signing_key,
midQ->sequence_number+1);
if (rc) {
cERROR(1, "Unexpected SMB signature");
(ses->server->secMode & (SECMODE_SIGN_REQUIRED |
SECMODE_SIGN_ENABLED))) {
rc = cifs_verify_signature(out_buf,
- ses->server,
+ &ses->server->mac_signing_key,
midQ->sequence_number+1);
if (rc) {
cERROR(1, "Unexpected SMB signature");
}
/* adjust outsize. is this useful ?? */
- req->uc_outSize = nbytes;
- req->uc_flags |= REQ_WRITE;
+ req->uc_outSize = nbytes;
+ req->uc_flags |= CODA_REQ_WRITE;
count = nbytes;
/* Convert filedescriptor into a file handle */
{
compat_ssize_t tot_len;
struct iovec iovstack[UIO_FASTIOV];
- struct iovec *iov;
+ struct iovec *iov = iovstack;
ssize_t ret;
io_fn_t fn;
iov_fn_t fnv;
int ret = 0;
if (dio->bio) {
- loff_t cur_offset = dio->block_in_file << dio->blkbits;
+ loff_t cur_offset = dio->cur_page_fs_offset;
loff_t bio_next_offset = dio->logical_offset_in_bio +
dio->bio->bi_size;
* Submit now if the underlying fs is about to perform a
* metadata read
*/
- if (dio->boundary)
+ else if (dio->boundary)
dio_bio_submit(dio);
}
argv++;
if (i++ >= max)
return -E2BIG;
+
+ if (fatal_signal_pending(current))
+ return -ERESTARTNOHAND;
cond_resched();
}
}
while (len > 0) {
int offset, bytes_to_copy;
+ if (fatal_signal_pending(current)) {
+ ret = -ERESTARTNOHAND;
+ goto out;
+ }
+ cond_resched();
+
offset = pos % PAGE_SIZE;
if (offset == 0)
offset = PAGE_SIZE;
#else
stack_top = arch_align_stack(stack_top);
stack_top = PAGE_ALIGN(stack_top);
+
+ if (unlikely(stack_top < mmap_min_addr) ||
+ unlikely(vma->vm_end - vma->vm_start >= stack_top - mmap_min_addr))
+ return -ENOMEM;
+
stack_shift = vma->vm_end - stack_top;
bprm->p -= stack_shift;
fail:
return;
}
+
+/*
+ * Core dumping helper functions. These are the only things you should
+ * do on a core-file: use only these functions to write out all the
+ * necessary info.
+ */
+int dump_write(struct file *file, const void *addr, int nr)
+{
+ return access_ok(VERIFY_READ, addr, nr) && file->f_op->write(file, addr, nr, &file->f_pos) == nr;
+}
+EXPORT_SYMBOL(dump_write);
+
+int dump_seek(struct file *file, loff_t off)
+{
+ int ret = 1;
+
+ if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
+ if (file->f_op->llseek(file, off, SEEK_CUR) < 0)
+ return 0;
+ } else {
+ char *buf = (char *)get_zeroed_page(GFP_KERNEL);
+
+ if (!buf)
+ return 0;
+ while (off > 0) {
+ unsigned long n = off;
+
+ if (n > PAGE_SIZE)
+ n = PAGE_SIZE;
+ if (!dump_write(file, buf, n)) {
+ ret = 0;
+ break;
+ }
+ off -= n;
+ }
+ free_page((unsigned long)buf);
+ }
+ return ret;
+}
+EXPORT_SYMBOL(dump_seek);
unsigned nr_pages;
unsigned long length;
loff_t pg_first; /* keep 64bit also in 32-arches */
+ bool read_4_write; /* This means two things: that the read is sync
+ * And the pages should not be unlocked.
+ */
};
static void _pcol_init(struct page_collect *pcol, unsigned expected_pages,
pcol->nr_pages = 0;
pcol->length = 0;
pcol->pg_first = -1;
+ pcol->read_4_write = false;
}
static void _pcol_reset(struct page_collect *pcol)
if (PageError(page))
ClearPageError(page);
- unlock_page(page);
+ if (!pcol->read_4_write)
+ unlock_page(page);
EXOFS_DBGMSG("readpage_strip(0x%lx, 0x%lx) empty page,"
" splitting\n", inode->i_ino, page->index);
/* readpage_strip might call read_exec(,is_sync==false) at several
* places but not if we have a single page.
*/
+ pcol.read_4_write = is_sync;
ret = readpage_strip(&pcol, page);
if (ret) {
EXOFS_ERR("_readpage => %d\n", ret);
static int __init fcntl_init(void)
{
- /* please add new bits here to ensure allocation uniqueness */
- BUILD_BUG_ON(19 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
+ /*
+ * Please add new bits here to ensure allocation uniqueness.
+ * Exceptions: O_NONBLOCK is a two bit define on parisc; O_NDELAY
+ * is defined as O_NONBLOCK on some platforms and not on others.
+ */
+ BUILD_BUG_ON(18 - 1 /* for O_RDONLY being 0 */ != HWEIGHT32(
O_RDONLY | O_WRONLY | O_RDWR |
O_CREAT | O_EXCL | O_NOCTTY |
- O_TRUNC | O_APPEND | O_NONBLOCK |
+ O_TRUNC | O_APPEND | /* O_NONBLOCK | */
__O_SYNC | O_DSYNC | FASYNC |
O_DIRECT | O_LARGEFILE | O_DIRECTORY |
O_NOFOLLOW | O_NOATIME | O_CLOEXEC |
#define CREATE_TRACE_POINTS
#include <trace/events/writeback.h>
-#define inode_to_bdi(inode) ((inode)->i_mapping->backing_dev_info)
-
/*
* We don't actually have pdflush, but this one is exported though /proc...
*/
return test_bit(BDI_writeback_running, &bdi->state);
}
+static inline struct backing_dev_info *inode_to_bdi(struct inode *inode)
+{
+ struct super_block *sb = inode->i_sb;
+
+ if (strcmp(sb->s_type->name, "bdev") == 0)
+ return inode->i_mapping->backing_dev_info;
+
+ return sb->s_bdi;
+}
+
static void bdi_queue_work(struct backing_dev_info *bdi,
struct wb_writeback_work *work)
{
wb->last_active = jiffies;
set_current_state(TASK_INTERRUPTIBLE);
- if (!list_empty(&bdi->work_list)) {
+ if (!list_empty(&bdi->work_list) || kthread_should_stop()) {
__set_current_state(TASK_RUNNING);
continue;
}
* Called with fc->lock, unlocks it
*/
static void request_end(struct fuse_conn *fc, struct fuse_req *req)
-__releases(&fc->lock)
+__releases(fc->lock)
{
void (*end) (struct fuse_conn *, struct fuse_req *) = req->end;
req->end = NULL;
static void wait_answer_interruptible(struct fuse_conn *fc,
struct fuse_req *req)
-__releases(&fc->lock)
-__acquires(&fc->lock)
+__releases(fc->lock)
+__acquires(fc->lock)
{
if (signal_pending(current))
return;
}
static void request_wait_answer(struct fuse_conn *fc, struct fuse_req *req)
-__releases(&fc->lock)
-__acquires(&fc->lock)
+__releases(fc->lock)
+__acquires(fc->lock)
{
if (!fc->no_interrupt) {
/* Any signal may interrupt this */
/* Wait until a request is available on the pending list */
static void request_wait(struct fuse_conn *fc)
-__releases(&fc->lock)
-__acquires(&fc->lock)
+__releases(fc->lock)
+__acquires(fc->lock)
{
DECLARE_WAITQUEUE(wait, current);
*/
static int fuse_read_interrupt(struct fuse_conn *fc, struct fuse_copy_state *cs,
size_t nbytes, struct fuse_req *req)
-__releases(&fc->lock)
+__releases(fc->lock)
{
struct fuse_in_header ih;
struct fuse_interrupt_in arg;
loff_t file_size;
unsigned int num;
unsigned int offset;
- size_t total_len;
+ size_t total_len = 0;
req = fuse_get_req(fc);
if (IS_ERR(req))
* This function releases and reacquires fc->lock
*/
static void end_requests(struct fuse_conn *fc, struct list_head *head)
-__releases(&fc->lock)
-__acquires(&fc->lock)
+__releases(fc->lock)
+__acquires(fc->lock)
{
while (!list_empty(head)) {
struct fuse_req *req;
* locked).
*/
static void end_io_requests(struct fuse_conn *fc)
-__releases(&fc->lock)
-__acquires(&fc->lock)
+__releases(fc->lock)
+__acquires(fc->lock)
{
while (!list_empty(&fc->io)) {
struct fuse_req *req =
}
}
+static void end_queued_requests(struct fuse_conn *fc)
+__releases(fc->lock)
+__acquires(fc->lock)
+{
+ fc->max_background = UINT_MAX;
+ flush_bg_queue(fc);
+ end_requests(fc, &fc->pending);
+ end_requests(fc, &fc->processing);
+}
+
/*
* Abort all requests.
*
fc->connected = 0;
fc->blocked = 0;
end_io_requests(fc);
- end_requests(fc, &fc->pending);
- end_requests(fc, &fc->processing);
+ end_queued_requests(fc);
wake_up_all(&fc->waitq);
wake_up_all(&fc->blocked_waitq);
kill_fasync(&fc->fasync, SIGIO, POLL_IN);
if (fc) {
spin_lock(&fc->lock);
fc->connected = 0;
- end_requests(fc, &fc->pending);
- end_requests(fc, &fc->processing);
+ fc->blocked = 0;
+ end_queued_requests(fc);
+ wake_up_all(&fc->blocked_waitq);
spin_unlock(&fc->lock);
fuse_conn_put(fc);
}
/* Called under fc->lock, may release and reacquire it */
static void fuse_send_writepage(struct fuse_conn *fc, struct fuse_req *req)
-__releases(&fc->lock)
-__acquires(&fc->lock)
+__releases(fc->lock)
+__acquires(fc->lock)
{
struct fuse_inode *fi = get_fuse_inode(req->inode);
loff_t size = i_size_read(req->inode);
* Called with fc->lock
*/
void fuse_flush_writepages(struct inode *inode)
-__releases(&fc->lock)
-__acquires(&fc->lock)
+__releases(fc->lock)
+__acquires(fc->lock)
{
struct fuse_conn *fc = get_fuse_conn(inode);
struct fuse_inode *fi = get_fuse_inode(inode);
config GFS2_FS
tristate "GFS2 file system support"
- depends on EXPERIMENTAL && (64BIT || LBDAF)
+ depends on (64BIT || LBDAF)
select DLM if GFS2_FS_LOCKING_DLM
select CONFIGFS_FS if GFS2_FS_LOCKING_DLM
select SYSFS if GFS2_FS_LOCKING_DLM
#include "glops.h"
-static void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
- unsigned int from, unsigned int to)
+void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+ unsigned int from, unsigned int to)
{
struct buffer_head *head = page_buffers(page);
unsigned int bsize = head->b_size;
unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
int alloc_required;
int error = 0;
- struct gfs2_alloc *al;
+ struct gfs2_alloc *al = NULL;
pgoff_t index = pos >> PAGE_CACHE_SHIFT;
unsigned from = pos & (PAGE_CACHE_SIZE - 1);
unsigned to = from + len;
rblocks += RES_STATFS + RES_QUOTA;
if (&ip->i_inode == sdp->sd_rindex)
rblocks += 2 * RES_STATFS;
+ if (alloc_required)
+ rblocks += gfs2_rg_blocks(al);
error = gfs2_trans_begin(sdp, rblocks,
PAGE_CACHE_SIZE/sdp->sd_sb.sb_bsize);
page_cache_release(page);
- /*
- * XXX(truncate): the call below should probably be replaced with
- * a call to the gfs2-specific truncate blocks helper to actually
- * release disk blocks..
- */
+ gfs2_trans_end(sdp);
if (pos + len > ip->i_inode.i_size)
- truncate_setsize(&ip->i_inode, ip->i_inode.i_size);
+ gfs2_trim_blocks(&ip->i_inode);
+ goto out_trans_fail;
+
out_endtrans:
gfs2_trans_end(sdp);
out_trans_fail:
page_cache_release(page);
if (copied) {
- if (inode->i_size < to) {
+ if (inode->i_size < to)
i_size_write(inode, to);
- ip->i_disksize = inode->i_size;
- }
gfs2_dinode_out(ip, di);
mark_inode_dirty(inode);
}
ret = generic_write_end(file, mapping, pos, len, copied, page, fsdata);
if (ret > 0) {
- if (inode->i_size > ip->i_disksize)
- ip->i_disksize = inode->i_size;
gfs2_dinode_out(ip, dibh->b_data);
mark_inode_dirty(inode);
}
* @ip: the inode
* @dibh: the dinode buffer
* @block: the block number that was allocated
- * @private: any locked page held by the caller process
+ * @page: The (optional) page. This is looked up if @page is NULL
*
* Returns: errno
*/
/**
* gfs2_unstuff_dinode - Unstuff a dinode when the data has grown too big
* @ip: The GFS2 inode to unstuff
- * @unstuffer: the routine that handles unstuffing a non-zero length file
- * @private: private data for the unstuffer
+ * @page: The (optional) page. This is looked up if the @page is NULL
*
* This routine unstuffs a dinode and returns it to a "normal" state such
* that the height can be grown in the traditional way.
if (error)
goto out;
- if (ip->i_disksize) {
+ if (i_size_read(&ip->i_inode)) {
/* Get a free block, fill it with the stuffed data,
and write it out to disk */
di = (struct gfs2_dinode *)dibh->b_data;
gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode));
- if (ip->i_disksize) {
+ if (i_size_read(&ip->i_inode)) {
*(__be64 *)(di + 1) = cpu_to_be64(block);
gfs2_add_inode_blocks(&ip->i_inode, 1);
di->di_blocks = cpu_to_be64(gfs2_get_inode_blocks(&ip->i_inode));
}
/**
- * do_grow - Make a file look bigger than it is
- * @ip: the inode
- * @size: the size to set the file to
- *
- * Called with an exclusive lock on @ip.
- *
- * Returns: errno
- */
-
-static int do_grow(struct gfs2_inode *ip, u64 size)
-{
- struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
- struct gfs2_alloc *al;
- struct buffer_head *dibh;
- int error;
-
- al = gfs2_alloc_get(ip);
- if (!al)
- return -ENOMEM;
-
- error = gfs2_quota_lock_check(ip);
- if (error)
- goto out;
-
- al->al_requested = sdp->sd_max_height + RES_DATA;
-
- error = gfs2_inplace_reserve(ip);
- if (error)
- goto out_gunlock_q;
-
- error = gfs2_trans_begin(sdp,
- sdp->sd_max_height + al->al_rgd->rd_length +
- RES_JDATA + RES_DINODE + RES_STATFS + RES_QUOTA, 0);
- if (error)
- goto out_ipres;
-
- error = gfs2_meta_inode_buffer(ip, &dibh);
- if (error)
- goto out_end_trans;
-
- if (size > sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode)) {
- if (gfs2_is_stuffed(ip)) {
- error = gfs2_unstuff_dinode(ip, NULL);
- if (error)
- goto out_brelse;
- }
- }
-
- ip->i_disksize = size;
- ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
- gfs2_trans_add_bh(ip->i_gl, dibh, 1);
- gfs2_dinode_out(ip, dibh->b_data);
-
-out_brelse:
- brelse(dibh);
-out_end_trans:
- gfs2_trans_end(sdp);
-out_ipres:
- gfs2_inplace_release(ip);
-out_gunlock_q:
- gfs2_quota_unlock(ip);
-out:
- gfs2_alloc_put(ip);
- return error;
-}
-
-
-/**
* gfs2_block_truncate_page - Deal with zeroing out data for truncate
*
* This is partly borrowed from ext3.
*/
-static int gfs2_block_truncate_page(struct address_space *mapping)
+static int gfs2_block_truncate_page(struct address_space *mapping, loff_t from)
{
struct inode *inode = mapping->host;
struct gfs2_inode *ip = GFS2_I(inode);
- loff_t from = inode->i_size;
unsigned long index = from >> PAGE_CACHE_SHIFT;
unsigned offset = from & (PAGE_CACHE_SIZE-1);
unsigned blocksize, iblock, length, pos;
return err;
}
-static int trunc_start(struct gfs2_inode *ip, u64 size)
+static int trunc_start(struct inode *inode, u64 oldsize, u64 newsize)
{
- struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ struct address_space *mapping = inode->i_mapping;
struct buffer_head *dibh;
int journaled = gfs2_is_jdata(ip);
int error;
if (error)
goto out;
+ gfs2_trans_add_bh(ip->i_gl, dibh, 1);
+
if (gfs2_is_stuffed(ip)) {
- u64 dsize = size + sizeof(struct gfs2_dinode);
- ip->i_disksize = size;
- ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
- gfs2_trans_add_bh(ip->i_gl, dibh, 1);
- gfs2_dinode_out(ip, dibh->b_data);
- if (dsize > dibh->b_size)
- dsize = dibh->b_size;
- gfs2_buffer_clear_tail(dibh, dsize);
- error = 1;
+ gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode) + newsize);
} else {
- if (size & (u64)(sdp->sd_sb.sb_bsize - 1))
- error = gfs2_block_truncate_page(ip->i_inode.i_mapping);
-
- if (!error) {
- ip->i_disksize = size;
- ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
- ip->i_diskflags |= GFS2_DIF_TRUNC_IN_PROG;
- gfs2_trans_add_bh(ip->i_gl, dibh, 1);
- gfs2_dinode_out(ip, dibh->b_data);
+ if (newsize & (u64)(sdp->sd_sb.sb_bsize - 1)) {
+ error = gfs2_block_truncate_page(mapping, newsize);
+ if (error)
+ goto out_brelse;
}
+ ip->i_diskflags |= GFS2_DIF_TRUNC_IN_PROG;
}
- brelse(dibh);
+ i_size_write(inode, newsize);
+ ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
+ gfs2_dinode_out(ip, dibh->b_data);
+ truncate_pagecache(inode, oldsize, newsize);
+out_brelse:
+ brelse(dibh);
out:
gfs2_trans_end(sdp);
return error;
if (error)
goto out;
- if (!ip->i_disksize) {
+ if (!i_size_read(&ip->i_inode)) {
ip->i_height = 0;
ip->i_goal = ip->i_no_addr;
gfs2_buffer_clear_tail(dibh, sizeof(struct gfs2_dinode));
/**
* do_shrink - make a file smaller
- * @ip: the inode
- * @size: the size to make the file
- * @truncator: function to truncate the last partial block
+ * @inode: the inode
+ * @oldsize: the current inode size
+ * @newsize: the size to make the file
*
- * Called with an exclusive lock on @ip.
+ * Called with an exclusive lock on @inode. The @size must
+ * be equal to or smaller than the current inode size.
*
* Returns: errno
*/
-static int do_shrink(struct gfs2_inode *ip, u64 size)
+static int do_shrink(struct inode *inode, u64 oldsize, u64 newsize)
{
+ struct gfs2_inode *ip = GFS2_I(inode);
int error;
- error = trunc_start(ip, size);
+ error = trunc_start(inode, oldsize, newsize);
if (error < 0)
return error;
- if (error > 0)
+ if (gfs2_is_stuffed(ip))
return 0;
- error = trunc_dealloc(ip, size);
- if (!error)
+ error = trunc_dealloc(ip, newsize);
+ if (error == 0)
error = trunc_end(ip);
return error;
}
-static int do_touch(struct gfs2_inode *ip, u64 size)
+void gfs2_trim_blocks(struct inode *inode)
{
- struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
+ u64 size = inode->i_size;
+ int ret;
+
+ ret = do_shrink(inode, size, size);
+ WARN_ON(ret != 0);
+}
+
+/**
+ * do_grow - Touch and update inode size
+ * @inode: The inode
+ * @size: The new size
+ *
+ * This function updates the timestamps on the inode and
+ * may also increase the size of the inode. This function
+ * must not be called with @size any smaller than the current
+ * inode size.
+ *
+ * Although it is not strictly required to unstuff files here,
+ * earlier versions of GFS2 have a bug in the stuffed file reading
+ * code which will result in a buffer overrun if the size is larger
+ * than the max stuffed file size. In order to prevent this from
+ * occuring, such files are unstuffed, but in other cases we can
+ * just update the inode size directly.
+ *
+ * Returns: 0 on success, or -ve on error
+ */
+
+static int do_grow(struct inode *inode, u64 size)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
struct buffer_head *dibh;
+ struct gfs2_alloc *al = NULL;
int error;
- error = gfs2_trans_begin(sdp, RES_DINODE, 0);
+ if (gfs2_is_stuffed(ip) &&
+ (size > (sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode)))) {
+ al = gfs2_alloc_get(ip);
+ if (al == NULL)
+ return -ENOMEM;
+
+ error = gfs2_quota_lock_check(ip);
+ if (error)
+ goto do_grow_alloc_put;
+
+ al->al_requested = 1;
+ error = gfs2_inplace_reserve(ip);
+ if (error)
+ goto do_grow_qunlock;
+ }
+
+ error = gfs2_trans_begin(sdp, RES_DINODE + RES_STATFS + RES_RG_BIT, 0);
if (error)
- return error;
+ goto do_grow_release;
- down_write(&ip->i_rw_mutex);
+ if (al) {
+ error = gfs2_unstuff_dinode(ip, NULL);
+ if (error)
+ goto do_end_trans;
+ }
error = gfs2_meta_inode_buffer(ip, &dibh);
if (error)
- goto do_touch_out;
+ goto do_end_trans;
+ i_size_write(inode, size);
ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
gfs2_trans_add_bh(ip->i_gl, dibh, 1);
gfs2_dinode_out(ip, dibh->b_data);
brelse(dibh);
-do_touch_out:
- up_write(&ip->i_rw_mutex);
+do_end_trans:
gfs2_trans_end(sdp);
+do_grow_release:
+ if (al) {
+ gfs2_inplace_release(ip);
+do_grow_qunlock:
+ gfs2_quota_unlock(ip);
+do_grow_alloc_put:
+ gfs2_alloc_put(ip);
+ }
return error;
}
/**
- * gfs2_truncatei - make a file a given size
- * @ip: the inode
- * @size: the size to make the file
- * @truncator: function to truncate the last partial block
+ * gfs2_setattr_size - make a file a given size
+ * @inode: the inode
+ * @newsize: the size to make the file
*
- * The file size can grow, shrink, or stay the same size.
+ * The file size can grow, shrink, or stay the same size. This
+ * is called holding i_mutex and an exclusive glock on the inode
+ * in question.
*
* Returns: errno
*/
-int gfs2_truncatei(struct gfs2_inode *ip, u64 size)
+int gfs2_setattr_size(struct inode *inode, u64 newsize)
{
- int error;
+ int ret;
+ u64 oldsize;
- if (gfs2_assert_warn(GFS2_SB(&ip->i_inode), S_ISREG(ip->i_inode.i_mode)))
- return -EINVAL;
+ BUG_ON(!S_ISREG(inode->i_mode));
- if (size > ip->i_disksize)
- error = do_grow(ip, size);
- else if (size < ip->i_disksize)
- error = do_shrink(ip, size);
- else
- /* update time stamps */
- error = do_touch(ip, size);
+ ret = inode_newsize_ok(inode, newsize);
+ if (ret)
+ return ret;
- return error;
+ oldsize = inode->i_size;
+ if (newsize >= oldsize)
+ return do_grow(inode, newsize);
+
+ return do_shrink(inode, oldsize, newsize);
}
int gfs2_truncatei_resume(struct gfs2_inode *ip)
{
int error;
- error = trunc_dealloc(ip, ip->i_disksize);
+ error = trunc_dealloc(ip, i_size_read(&ip->i_inode));
if (!error)
error = trunc_end(ip);
return error;
shift = sdp->sd_sb.sb_bsize_shift;
BUG_ON(gfs2_is_dir(ip));
- end_of_file = (ip->i_disksize + sdp->sd_sb.sb_bsize - 1) >> shift;
+ end_of_file = (i_size_read(&ip->i_inode) + sdp->sd_sb.sb_bsize - 1) >> shift;
lblock = offset >> shift;
lblock_stop = (offset + len + sdp->sd_sb.sb_bsize - 1) >> shift;
if (lblock_stop > end_of_file)
}
}
-int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page);
-int gfs2_block_map(struct inode *inode, sector_t lblock, struct buffer_head *bh, int create);
-int gfs2_extent_map(struct inode *inode, u64 lblock, int *new, u64 *dblock, unsigned *extlen);
-
-int gfs2_truncatei(struct gfs2_inode *ip, u64 size);
-int gfs2_truncatei_resume(struct gfs2_inode *ip);
-int gfs2_file_dealloc(struct gfs2_inode *ip);
-int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset,
- unsigned int len);
+extern int gfs2_unstuff_dinode(struct gfs2_inode *ip, struct page *page);
+extern int gfs2_block_map(struct inode *inode, sector_t lblock,
+ struct buffer_head *bh, int create);
+extern int gfs2_extent_map(struct inode *inode, u64 lblock, int *new,
+ u64 *dblock, unsigned *extlen);
+extern int gfs2_setattr_size(struct inode *inode, u64 size);
+extern void gfs2_trim_blocks(struct inode *inode);
+extern int gfs2_truncatei_resume(struct gfs2_inode *ip);
+extern int gfs2_file_dealloc(struct gfs2_inode *ip);
+extern int gfs2_write_alloc_required(struct gfs2_inode *ip, u64 offset,
+ unsigned int len);
#endif /* __BMAP_DOT_H__ */
ip = GFS2_I(inode);
}
- if (sdp->sd_args.ar_localcaching)
+ if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
goto valid;
had_lock = (gfs2_glock_is_locked_by_me(dip->i_gl) != NULL);
#define gfs2_disk_hash2offset(h) (((u64)(h)) >> 1)
#define gfs2_dir_offset2hash(p) ((u32)(((u64)(p)) << 1))
+struct qstr gfs2_qdot __read_mostly;
+struct qstr gfs2_qdotdot __read_mostly;
+
typedef int (*leaf_call_t) (struct gfs2_inode *dip, u32 index, u32 len,
u64 leaf_no, void *data);
typedef int (*gfs2_dscan_t)(const struct gfs2_dirent *dent,
gfs2_trans_add_bh(ip->i_gl, dibh, 1);
memcpy(dibh->b_data + offset + sizeof(struct gfs2_dinode), buf, size);
- if (ip->i_disksize < offset + size)
- ip->i_disksize = offset + size;
+ if (ip->i_inode.i_size < offset + size)
+ i_size_write(&ip->i_inode, offset + size);
ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
gfs2_dinode_out(ip, dibh->b_data);
if (error)
return error;
- if (ip->i_disksize < offset + copied)
- ip->i_disksize = offset + copied;
+ if (ip->i_inode.i_size < offset + copied)
+ i_size_write(&ip->i_inode, offset + copied);
ip->i_inode.i_mtime = ip->i_inode.i_ctime = CURRENT_TIME;
gfs2_trans_add_bh(ip->i_gl, dibh, 1);
unsigned int o;
int copied = 0;
int error = 0;
+ u64 disksize = i_size_read(&ip->i_inode);
- if (offset >= ip->i_disksize)
+ if (offset >= disksize)
return 0;
- if (offset + size > ip->i_disksize)
- size = ip->i_disksize - offset;
+ if (offset + size > disksize)
+ size = disksize - offset;
if (!size)
return 0;
unsigned hsize = 1 << ip->i_depth;
unsigned index;
u64 ln;
- if (hsize * sizeof(u64) != ip->i_disksize) {
+ if (hsize * sizeof(u64) != i_size_read(inode)) {
gfs2_consist_inode(ip);
return ERR_PTR(-EIO);
}
for (x = sdp->sd_hash_ptrs; x--; lp++)
*lp = cpu_to_be64(bn);
- dip->i_disksize = sdp->sd_sb.sb_bsize / 2;
+ i_size_write(inode, sdp->sd_sb.sb_bsize / 2);
gfs2_add_inode_blocks(&dip->i_inode, 1);
dip->i_diskflags |= GFS2_DIF_EXHASH;
u64 *buf;
u64 *from, *to;
u64 block;
+ u64 disksize = i_size_read(&dip->i_inode);
int x;
int error = 0;
hsize = 1 << dip->i_depth;
- if (hsize * sizeof(u64) != dip->i_disksize) {
+ if (hsize * sizeof(u64) != disksize) {
gfs2_consist_inode(dip);
return -EIO;
}
if (!buf)
return -ENOMEM;
- for (block = dip->i_disksize >> sdp->sd_hash_bsize_shift; block--;) {
+ for (block = disksize >> sdp->sd_hash_bsize_shift; block--;) {
error = gfs2_dir_read_data(dip, (char *)buf,
block * sdp->sd_hash_bsize,
sdp->sd_hash_bsize, 1);
unsigned depth = 0;
hsize = 1 << dip->i_depth;
- if (hsize * sizeof(u64) != dip->i_disksize) {
+ if (hsize * sizeof(u64) != i_size_read(inode)) {
gfs2_consist_inode(dip);
return -EIO;
}
int error = 0;
hsize = 1 << dip->i_depth;
- if (hsize * sizeof(u64) != dip->i_disksize) {
+ if (hsize * sizeof(u64) != i_size_read(&dip->i_inode)) {
gfs2_consist_inode(dip);
return -EIO;
}
struct gfs2_inode;
struct gfs2_inum;
-struct inode *gfs2_dir_search(struct inode *dir, const struct qstr *filename);
-int gfs2_dir_check(struct inode *dir, const struct qstr *filename,
- const struct gfs2_inode *ip);
-int gfs2_dir_add(struct inode *inode, const struct qstr *filename,
- const struct gfs2_inode *ip, unsigned int type);
-int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *filename);
-int gfs2_dir_read(struct inode *inode, u64 *offset, void *opaque,
- filldir_t filldir);
-int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename,
- const struct gfs2_inode *nip, unsigned int new_type);
+extern struct inode *gfs2_dir_search(struct inode *dir,
+ const struct qstr *filename);
+extern int gfs2_dir_check(struct inode *dir, const struct qstr *filename,
+ const struct gfs2_inode *ip);
+extern int gfs2_dir_add(struct inode *inode, const struct qstr *filename,
+ const struct gfs2_inode *ip, unsigned int type);
+extern int gfs2_dir_del(struct gfs2_inode *dip, const struct qstr *filename);
+extern int gfs2_dir_read(struct inode *inode, u64 *offset, void *opaque,
+ filldir_t filldir);
+extern int gfs2_dir_mvino(struct gfs2_inode *dip, const struct qstr *filename,
+ const struct gfs2_inode *nip, unsigned int new_type);
-int gfs2_dir_exhash_dealloc(struct gfs2_inode *dip);
+extern int gfs2_dir_exhash_dealloc(struct gfs2_inode *dip);
-int gfs2_diradd_alloc_required(struct inode *dir,
- const struct qstr *filename);
-int gfs2_dir_get_new_buffer(struct gfs2_inode *ip, u64 block,
- struct buffer_head **bhp);
+extern int gfs2_diradd_alloc_required(struct inode *dir,
+ const struct qstr *filename);
+extern int gfs2_dir_get_new_buffer(struct gfs2_inode *ip, u64 block,
+ struct buffer_head **bhp);
static inline u32 gfs2_disk_hash(const char *data, int len)
{
memcpy(dent + 1, name->name, name->len);
}
+extern struct qstr gfs2_qdot;
+extern struct qstr gfs2_qdotdot;
+
#endif /* __DIR_DOT_H__ */
static struct dentry *gfs2_get_parent(struct dentry *child)
{
- struct qstr dotdot;
struct dentry *dentry;
- /*
- * XXX(hch): it would be a good idea to keep this around as a
- * static variable.
- */
- gfs2_str2qstr(&dotdot, "..");
-
- dentry = d_obtain_alias(gfs2_lookupi(child->d_inode, &dotdot, 1));
+ dentry = d_obtain_alias(gfs2_lookupi(child->d_inode, &gfs2_qdotdot, 1));
if (!IS_ERR(dentry))
dentry->d_op = &gfs2_dops;
return dentry;
rblocks = RES_DINODE + ind_blocks;
if (gfs2_is_jdata(ip))
rblocks += data_blocks ? data_blocks : 1;
- if (ind_blocks || data_blocks)
+ if (ind_blocks || data_blocks) {
rblocks += RES_STATFS + RES_QUOTA;
+ rblocks += gfs2_rg_blocks(al);
+ }
ret = gfs2_trans_begin(sdp, rblocks, 0);
if (ret)
goto out_trans_fail;
goto fail;
if (!(file->f_flags & O_LARGEFILE) &&
- ip->i_disksize > MAX_NON_LFS) {
+ i_size_read(inode) > MAX_NON_LFS) {
error = -EOVERFLOW;
goto fail_gunlock;
}
else
gfs2_glock_put_nolock(gl);
}
+ if (held1 && held2 && list_empty(&gl->gl_holders))
+ clear_bit(GLF_QUEUED, &gl->gl_flags);
gl->gl_state = new_state;
gl->gl_tchange = jiffies;
if (unlikely((gh->gh_flags & LM_FLAG_PRIORITY) && !insert_pt))
insert_pt = &gh2->gh_list;
}
+ set_bit(GLF_QUEUED, &gl->gl_flags);
if (likely(insert_pt == NULL)) {
list_add_tail(&gh->gh_list, &gl->gl_holders);
if (unlikely(gh->gh_flags & LM_FLAG_PRIORITY))
gfs2_glock_hold(gl);
holdtime = gl->gl_tchange + gl->gl_ops->go_min_hold_time;
- if (time_before(now, holdtime))
- delay = holdtime - now;
- if (test_bit(GLF_REPLY_PENDING, &gl->gl_flags))
- delay = gl->gl_ops->go_min_hold_time;
+ if (test_bit(GLF_QUEUED, &gl->gl_flags)) {
+ if (time_before(now, holdtime))
+ delay = holdtime - now;
+ if (test_bit(GLF_REPLY_PENDING, &gl->gl_flags))
+ delay = gl->gl_ops->go_min_hold_time;
+ }
spin_lock(&gl->gl_spin);
handle_callback(gl, state, delay);
spin_unlock(&lru_lock);
spin_lock(&gl->gl_spin);
- if (find_first_holder(gl) == NULL && gl->gl_state != LM_ST_UNLOCKED)
+ if (gl->gl_state != LM_ST_UNLOCKED)
handle_callback(gl, LM_ST_UNLOCKED, 0);
spin_unlock(&gl->gl_spin);
gfs2_glock_hold(gl);
*p++ = 'I';
if (test_bit(GLF_FROZEN, gflags))
*p++ = 'F';
+ if (test_bit(GLF_QUEUED, gflags))
+ *p++ = 'q';
*p = 0;
return buf;
}
}
#endif
- glock_workqueue = create_workqueue("glock_workqueue");
+ glock_workqueue = alloc_workqueue("glock_workqueue", WQ_RESCUER |
+ WQ_HIGHPRI | WQ_FREEZEABLE, 0);
if (IS_ERR(glock_workqueue))
return PTR_ERR(glock_workqueue);
- gfs2_delete_workqueue = create_workqueue("delete_workqueue");
+ gfs2_delete_workqueue = alloc_workqueue("delete_workqueue", WQ_RESCUER |
+ WQ_FREEZEABLE, 0);
if (IS_ERR(gfs2_delete_workqueue)) {
destroy_workqueue(glock_workqueue);
return PTR_ERR(gfs2_delete_workqueue);
void gfs2_print_dbg(struct seq_file *seq, const char *fmt, ...);
/**
- * gfs2_glock_nq_init - intialize a holder and enqueue it on a glock
+ * gfs2_glock_nq_init - initialize a holder and enqueue it on a glock
* @gl: the glock
* @state: the state we're requesting
* @flags: the modifier flags
const struct gfs2_inode *ip = gl->gl_object;
if (ip == NULL)
return 0;
- gfs2_print_dbg(seq, " I: n:%llu/%llu t:%u f:0x%02lx d:0x%08x s:%llu/%llu\n",
+ gfs2_print_dbg(seq, " I: n:%llu/%llu t:%u f:0x%02lx d:0x%08x s:%llu\n",
(unsigned long long)ip->i_no_formal_ino,
(unsigned long long)ip->i_no_addr,
IF2DT(ip->i_inode.i_mode), ip->i_flags,
(unsigned int)ip->i_diskflags,
- (unsigned long long)ip->i_inode.i_size,
- (unsigned long long)ip->i_disksize);
+ (unsigned long long)i_size_read(&ip->i_inode));
return 0;
}
[LM_TYPE_META] = &gfs2_meta_glops,
[LM_TYPE_INODE] = &gfs2_inode_glops,
[LM_TYPE_RGRP] = &gfs2_rgrp_glops,
- [LM_TYPE_NONDISK] = &gfs2_trans_glops,
[LM_TYPE_IOPEN] = &gfs2_iopen_glops,
[LM_TYPE_FLOCK] = &gfs2_flock_glops,
[LM_TYPE_NONDISK] = &gfs2_nondisk_glops,
GLF_REPLY_PENDING = 9,
GLF_INITIAL = 10,
GLF_FROZEN = 11,
+ GLF_QUEUED = 12,
};
struct gfs2_glock {
u64 i_no_formal_ino;
u64 i_generation;
u64 i_eattr;
- loff_t i_disksize;
unsigned long i_flags; /* GIF_... */
struct gfs2_glock *i_gl; /* Move into i_gh? */
struct gfs2_holder i_iopen_gh;
char ar_locktable[GFS2_LOCKNAME_LEN]; /* Name of the Lock Table */
char ar_hostdata[GFS2_LOCKNAME_LEN]; /* Host specific data */
unsigned int ar_spectator:1; /* Don't get a journal */
- unsigned int ar_ignore_local_fs:1; /* Ignore optimisations */
unsigned int ar_localflocks:1; /* Let the VFS do flock|fcntl */
- unsigned int ar_localcaching:1; /* Local caching */
unsigned int ar_debug:1; /* Oops on errors */
- unsigned int ar_upgrade:1; /* Upgrade ondisk format */
unsigned int ar_posix_acl:1; /* Enable posix acls */
unsigned int ar_quota:2; /* off/account/on */
unsigned int ar_suiddir:1; /* suiddir support */
*/
struct lm_lockstruct {
- unsigned int ls_jid;
+ int ls_jid;
unsigned int ls_first;
unsigned int ls_first_done;
unsigned int ls_nodir;
struct list_head sd_rindex_mru_list;
struct gfs2_rgrpd *sd_rindex_forward;
unsigned int sd_rgrps;
+ unsigned int sd_max_rg_data;
/* Journal index stuff */
* to do that.
*/
ip->i_inode.i_nlink = be32_to_cpu(str->di_nlink);
- ip->i_disksize = be64_to_cpu(str->di_size);
- i_size_write(&ip->i_inode, ip->i_disksize);
+ i_size_write(&ip->i_inode, be64_to_cpu(str->di_size));
gfs2_set_inode_blocks(&ip->i_inode, be64_to_cpu(str->di_blocks));
atime.tv_sec = be64_to_cpu(str->di_atime);
atime.tv_nsec = be32_to_cpu(str->di_atime_nsec);
str->di_uid = cpu_to_be32(ip->i_inode.i_uid);
str->di_gid = cpu_to_be32(ip->i_inode.i_gid);
str->di_nlink = cpu_to_be32(ip->i_inode.i_nlink);
- str->di_size = cpu_to_be64(ip->i_disksize);
+ str->di_size = cpu_to_be64(i_size_read(&ip->i_inode));
str->di_blocks = cpu_to_be64(gfs2_get_inode_blocks(&ip->i_inode));
str->di_atime = cpu_to_be64(ip->i_inode.i_atime.tv_sec);
str->di_mtime = cpu_to_be64(ip->i_inode.i_mtime.tv_sec);
(unsigned long long)ip->i_no_formal_ino);
printk(KERN_INFO " no_addr = %llu\n",
(unsigned long long)ip->i_no_addr);
- printk(KERN_INFO " i_disksize = %llu\n",
- (unsigned long long)ip->i_disksize);
+ printk(KERN_INFO " i_size = %llu\n",
+ (unsigned long long)i_size_read(&ip->i_inode));
printk(KERN_INFO " blocks = %llu\n",
(unsigned long long)gfs2_get_inode_blocks(&ip->i_inode));
printk(KERN_INFO " i_goal = %llu\n",
extern int gfs2_internal_read(struct gfs2_inode *ip,
struct file_ra_state *ra_state,
char *buf, loff_t *pos, unsigned size);
+extern void gfs2_page_add_databufs(struct gfs2_inode *ip, struct page *page,
+ unsigned int from, unsigned int to);
extern void gfs2_set_aops(struct inode *inode);
static inline int gfs2_is_stuffed(const struct gfs2_inode *ip)
dent->de_inum.no_addr = cpu_to_be64(ip->i_no_addr);
}
+static inline int gfs2_check_internal_file_size(struct inode *inode,
+ u64 minsize, u64 maxsize)
+{
+ u64 size = i_size_read(inode);
+ if (size < minsize || size > maxsize)
+ goto err;
+ if (size & ((1 << inode->i_blkbits) - 1))
+ goto err;
+ return 0;
+err:
+ gfs2_consist_inode(GFS2_I(inode));
+ return -EIO;
+}
extern void gfs2_set_iop(struct inode *inode);
extern struct inode *gfs2_inode_lookup(struct super_block *sb, unsigned type,
ret |= LM_OUT_CANCELED;
goto out;
case -EAGAIN: /* Try lock fails */
+ case -EDEADLK: /* Deadlock detected */
goto out;
- case -EINVAL: /* Invalid */
- case -ENOMEM: /* Out of memory */
+ case -ETIMEDOUT: /* Canceled due to timeout */
ret |= LM_OUT_ERROR;
goto out;
case 0: /* Success */
do {
prepare_to_wait(&sdp->sd_logd_waitq, &wait,
- TASK_UNINTERRUPTIBLE);
+ TASK_INTERRUPTIBLE);
if (!gfs2_ail_flush_reqd(sdp) &&
!gfs2_jrnl_flush_reqd(sdp) &&
!kthread_should_stop())
#include "glock.h"
#include "quota.h"
#include "recovery.h"
+#include "dir.h"
static struct shrinker qd_shrinker = {
.shrink = gfs2_shrink_qd_memory,
{
int error;
+ gfs2_str2qstr(&gfs2_qdot, ".");
+ gfs2_str2qstr(&gfs2_qdotdot, "..");
+
error = gfs2_sys_init();
if (error)
return error;
error = -ENOMEM;
gfs_recovery_wq = alloc_workqueue("gfs_recovery",
- WQ_NON_REENTRANT | WQ_RESCUER, 0);
+ WQ_RESCUER | WQ_FREEZEABLE, 0);
if (!gfs_recovery_wq)
goto fail_wq;
#define DO 0
#define UNDO 1
-static const u32 gfs2_old_fs_formats[] = {
- 0
-};
-
-static const u32 gfs2_old_multihost_formats[] = {
- 0
-};
-
/**
* gfs2_tune_init - Fill a gfs2_tune structure with default values
* @gt: tune
static int gfs2_check_sb(struct gfs2_sbd *sdp, struct gfs2_sb_host *sb, int silent)
{
- unsigned int x;
-
if (sb->sb_magic != GFS2_MAGIC ||
sb->sb_type != GFS2_METATYPE_SB) {
if (!silent)
sb->sb_multihost_format == GFS2_FORMAT_MULTI)
return 0;
- if (sb->sb_fs_format != GFS2_FORMAT_FS) {
- for (x = 0; gfs2_old_fs_formats[x]; x++)
- if (gfs2_old_fs_formats[x] == sb->sb_fs_format)
- break;
+ fs_warn(sdp, "Unknown on-disk format, unable to mount\n");
- if (!gfs2_old_fs_formats[x]) {
- printk(KERN_WARNING
- "GFS2: code version (%u, %u) is incompatible "
- "with ondisk format (%u, %u)\n",
- GFS2_FORMAT_FS, GFS2_FORMAT_MULTI,
- sb->sb_fs_format, sb->sb_multihost_format);
- printk(KERN_WARNING
- "GFS2: I don't know how to upgrade this FS\n");
- return -EINVAL;
- }
- }
-
- if (sb->sb_multihost_format != GFS2_FORMAT_MULTI) {
- for (x = 0; gfs2_old_multihost_formats[x]; x++)
- if (gfs2_old_multihost_formats[x] ==
- sb->sb_multihost_format)
- break;
-
- if (!gfs2_old_multihost_formats[x]) {
- printk(KERN_WARNING
- "GFS2: code version (%u, %u) is incompatible "
- "with ondisk format (%u, %u)\n",
- GFS2_FORMAT_FS, GFS2_FORMAT_MULTI,
- sb->sb_fs_format, sb->sb_multihost_format);
- printk(KERN_WARNING
- "GFS2: I don't know how to upgrade this FS\n");
- return -EINVAL;
- }
- }
-
- if (!sdp->sd_args.ar_upgrade) {
- printk(KERN_WARNING
- "GFS2: code version (%u, %u) is incompatible "
- "with ondisk format (%u, %u)\n",
- GFS2_FORMAT_FS, GFS2_FORMAT_MULTI,
- sb->sb_fs_format, sb->sb_multihost_format);
- printk(KERN_INFO
- "GFS2: Use the \"upgrade\" mount option to upgrade "
- "the FS\n");
- printk(KERN_INFO "GFS2: See the manual for more details\n");
- return -EINVAL;
- }
-
- return 0;
+ return -EINVAL;
}
static void end_bio_io_page(struct bio *bio, int error)
prev_db = 0;
- for (lb = 0; lb < ip->i_disksize >> sdp->sd_sb.sb_bsize_shift; lb++) {
+ for (lb = 0; lb < i_size_read(jd->jd_inode) >> sdp->sd_sb.sb_bsize_shift; lb++) {
bh.b_state = 0;
bh.b_blocknr = 0;
bh.b_size = 1 << ip->i_inode.i_blkbits;
if (!strcmp("lock_nolock", proto)) {
lm = &nolock_ops;
sdp->sd_args.ar_localflocks = 1;
- sdp->sd_args.ar_localcaching = 1;
#ifdef CONFIG_GFS2_FS_LOCKING_DLM
} else if (!strcmp("lock_dlm", proto)) {
lm = &gfs2_dlm_ops;
static int wait_on_journal(struct gfs2_sbd *sdp)
{
- if (sdp->sd_args.ar_spectator)
- return 0;
if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
return 0;
if (error)
goto fail_sb;
+ /*
+ * If user space has failed to join the cluster or some similar
+ * failure has occurred, then the journal id will contain a
+ * negative (error) number. This will then be returned to the
+ * caller (of the mount syscall). We do this even for spectator
+ * mounts (which just write a jid of 0 to indicate "ok" even though
+ * the jid is unused in the spectator case)
+ */
+ if (sdp->sd_lockstruct.ls_jid < 0) {
+ error = sdp->sd_lockstruct.ls_jid;
+ sdp->sd_lockstruct.ls_jid = 0;
+ goto fail_sb;
+ }
+
error = init_inodes(sdp, DO);
if (error)
goto fail_sb;
#include <linux/gfs2_ondisk.h>
#include <linux/crc32.h>
#include <linux/fiemap.h>
+#include <linux/swap.h>
+#include <linux/falloc.h>
#include <asm/uaccess.h>
#include "gfs2.h"
goto out_gunlock_q;
error = gfs2_trans_begin(sdp, sdp->sd_max_dirres +
- al->al_rgd->rd_length +
+ gfs2_rg_blocks(al) +
2 * RES_DINODE + RES_STATFS +
RES_QUOTA, 0);
if (error)
ip = ghs[1].gh_gl->gl_object;
- ip->i_disksize = size;
i_size_write(inode, size);
error = gfs2_meta_inode_buffer(ip, &dibh);
ip = ghs[1].gh_gl->gl_object;
ip->i_inode.i_nlink = 2;
- ip->i_disksize = sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode);
+ i_size_write(inode, sdp->sd_sb.sb_bsize - sizeof(struct gfs2_dinode));
ip->i_diskflags |= GFS2_DIF_JDATA;
ip->i_entries = 2;
if (!gfs2_assert_withdraw(sdp, !error)) {
struct gfs2_dinode *di = (struct gfs2_dinode *)dibh->b_data;
struct gfs2_dirent *dent = (struct gfs2_dirent *)(di+1);
- struct qstr str;
- gfs2_str2qstr(&str, ".");
gfs2_trans_add_bh(ip->i_gl, dibh, 1);
- gfs2_qstr2dirent(&str, GFS2_DIRENT_SIZE(str.len), dent);
+ gfs2_qstr2dirent(&gfs2_qdot, GFS2_DIRENT_SIZE(gfs2_qdot.len), dent);
dent->de_inum = di->di_num; /* already GFS2 endian */
dent->de_type = cpu_to_be16(DT_DIR);
di->di_entries = cpu_to_be32(1);
- gfs2_str2qstr(&str, "..");
dent = (struct gfs2_dirent *)((char*)dent + GFS2_DIRENT_SIZE(1));
- gfs2_qstr2dirent(&str, dibh->b_size - GFS2_DIRENT_SIZE(1) - sizeof(struct gfs2_dinode), dent);
+ gfs2_qstr2dirent(&gfs2_qdotdot, dibh->b_size - GFS2_DIRENT_SIZE(1) - sizeof(struct gfs2_dinode), dent);
gfs2_inum_out(dip, dent);
dent->de_type = cpu_to_be16(DT_DIR);
static int gfs2_rmdiri(struct gfs2_inode *dip, const struct qstr *name,
struct gfs2_inode *ip)
{
- struct qstr dotname;
int error;
if (ip->i_entries != 2) {
if (error)
return error;
- gfs2_str2qstr(&dotname, ".");
- error = gfs2_dir_del(ip, &dotname);
+ error = gfs2_dir_del(ip, &gfs2_qdot);
if (error)
return error;
- gfs2_str2qstr(&dotname, "..");
- error = gfs2_dir_del(ip, &dotname);
+ error = gfs2_dir_del(ip, &gfs2_qdotdot);
if (error)
return error;
struct inode *dir = &to->i_inode;
struct super_block *sb = dir->i_sb;
struct inode *tmp;
- struct qstr dotdot;
int error = 0;
- gfs2_str2qstr(&dotdot, "..");
-
igrab(dir);
for (;;) {
break;
}
- tmp = gfs2_lookupi(dir, &dotdot, 1);
+ tmp = gfs2_lookupi(dir, &gfs2_qdotdot, 1);
if (IS_ERR(tmp)) {
error = PTR_ERR(tmp);
break;
struct gfs2_inode *ip = GFS2_I(odentry->d_inode);
struct gfs2_inode *nip = NULL;
struct gfs2_sbd *sdp = GFS2_SB(odir);
- struct gfs2_holder ghs[5], r_gh = { .gh_gl = NULL, };
+ struct gfs2_holder ghs[5], r_gh = { .gh_gl = NULL, }, ri_gh;
struct gfs2_rgrpd *nrgd;
unsigned int num_gh;
int dir_rename = 0;
return 0;
}
+ error = gfs2_rindex_hold(sdp, &ri_gh);
+ if (error)
+ return error;
if (odip != ndip) {
error = gfs2_glock_nq_init(sdp->sd_rename_gl, LM_ST_EXCLUSIVE,
al->al_requested = sdp->sd_max_dirres;
- error = gfs2_inplace_reserve(ndip);
+ error = gfs2_inplace_reserve_ri(ndip);
if (error)
goto out_gunlock_q;
error = gfs2_trans_begin(sdp, sdp->sd_max_dirres +
- al->al_rgd->rd_length +
+ gfs2_rg_blocks(al) +
4 * RES_DINODE + 4 * RES_LEAF +
RES_STATFS + RES_QUOTA + 4, 0);
if (error)
}
if (dir_rename) {
- struct qstr name;
- gfs2_str2qstr(&name, "..");
-
error = gfs2_change_nlink(ndip, +1);
if (error)
goto out_end_trans;
if (error)
goto out_end_trans;
- error = gfs2_dir_mvino(ip, &name, ndip, DT_DIR);
+ error = gfs2_dir_mvino(ip, &gfs2_qdotdot, ndip, DT_DIR);
if (error)
goto out_end_trans;
} else {
if (r_gh.gh_gl)
gfs2_glock_dq_uninit(&r_gh);
out:
+ gfs2_glock_dq_uninit(&ri_gh);
return error;
}
struct gfs2_inode *ip = GFS2_I(dentry->d_inode);
struct gfs2_holder i_gh;
struct buffer_head *dibh;
- unsigned int x;
+ unsigned int x, size;
char *buf;
int error;
return NULL;
}
- if (!ip->i_disksize) {
+ size = (unsigned int)i_size_read(&ip->i_inode);
+ if (size == 0) {
gfs2_consist_inode(ip);
buf = ERR_PTR(-EIO);
goto out;
goto out;
}
- x = ip->i_disksize + 1;
+ x = size + 1;
buf = kmalloc(x, GFP_NOFS);
if (!buf)
buf = ERR_PTR(-ENOMEM);
return error;
}
-/*
- * XXX(truncate): the truncate_setsize calls should be moved to the end.
- */
-static int setattr_size(struct inode *inode, struct iattr *attr)
-{
- struct gfs2_inode *ip = GFS2_I(inode);
- struct gfs2_sbd *sdp = GFS2_SB(inode);
- int error;
-
- if (attr->ia_size != ip->i_disksize) {
- error = gfs2_trans_begin(sdp, 0, sdp->sd_jdesc->jd_blocks);
- if (error)
- return error;
- truncate_setsize(inode, attr->ia_size);
- gfs2_trans_end(sdp);
- }
-
- error = gfs2_truncatei(ip, attr->ia_size);
- if (error && (inode->i_size != ip->i_disksize))
- i_size_write(inode, ip->i_disksize);
-
- return error;
-}
-
static int setattr_chown(struct inode *inode, struct iattr *attr)
{
struct gfs2_inode *ip = GFS2_I(inode);
goto out;
if (attr->ia_valid & ATTR_SIZE)
- error = setattr_size(inode, attr);
+ error = gfs2_setattr_size(inode, attr->ia_size);
else if (attr->ia_valid & (ATTR_UID | ATTR_GID))
error = setattr_chown(inode, attr);
else if ((attr->ia_valid & ATTR_MODE) && IS_POSIXACL(inode))
return ret;
}
+static void empty_write_end(struct page *page, unsigned from,
+ unsigned to)
+{
+ struct gfs2_inode *ip = GFS2_I(page->mapping->host);
+
+ page_zero_new_buffers(page, from, to);
+ flush_dcache_page(page);
+ mark_page_accessed(page);
+
+ if (!gfs2_is_writeback(ip))
+ gfs2_page_add_databufs(ip, page, from, to);
+
+ block_commit_write(page, from, to);
+}
+
+
+static int write_empty_blocks(struct page *page, unsigned from, unsigned to)
+{
+ unsigned start, end, next;
+ struct buffer_head *bh, *head;
+ int error;
+
+ if (!page_has_buffers(page)) {
+ error = block_prepare_write(page, from, to, gfs2_block_map);
+ if (unlikely(error))
+ return error;
+
+ empty_write_end(page, from, to);
+ return 0;
+ }
+
+ bh = head = page_buffers(page);
+ next = end = 0;
+ while (next < from) {
+ next += bh->b_size;
+ bh = bh->b_this_page;
+ }
+ start = next;
+ do {
+ next += bh->b_size;
+ if (buffer_mapped(bh)) {
+ if (end) {
+ error = block_prepare_write(page, start, end,
+ gfs2_block_map);
+ if (unlikely(error))
+ return error;
+ empty_write_end(page, start, end);
+ end = 0;
+ }
+ start = next;
+ }
+ else
+ end = next;
+ bh = bh->b_this_page;
+ } while (next < to);
+
+ if (end) {
+ error = block_prepare_write(page, start, end, gfs2_block_map);
+ if (unlikely(error))
+ return error;
+ empty_write_end(page, start, end);
+ }
+
+ return 0;
+}
+
+static int fallocate_chunk(struct inode *inode, loff_t offset, loff_t len,
+ int mode)
+{
+ struct gfs2_inode *ip = GFS2_I(inode);
+ struct buffer_head *dibh;
+ int error;
+ u64 start = offset >> PAGE_CACHE_SHIFT;
+ unsigned int start_offset = offset & ~PAGE_CACHE_MASK;
+ u64 end = (offset + len - 1) >> PAGE_CACHE_SHIFT;
+ pgoff_t curr;
+ struct page *page;
+ unsigned int end_offset = (offset + len) & ~PAGE_CACHE_MASK;
+ unsigned int from, to;
+
+ if (!end_offset)
+ end_offset = PAGE_CACHE_SIZE;
+
+ error = gfs2_meta_inode_buffer(ip, &dibh);
+ if (unlikely(error))
+ goto out;
+
+ gfs2_trans_add_bh(ip->i_gl, dibh, 1);
+
+ if (gfs2_is_stuffed(ip)) {
+ error = gfs2_unstuff_dinode(ip, NULL);
+ if (unlikely(error))
+ goto out;
+ }
+
+ curr = start;
+ offset = start << PAGE_CACHE_SHIFT;
+ from = start_offset;
+ to = PAGE_CACHE_SIZE;
+ while (curr <= end) {
+ page = grab_cache_page_write_begin(inode->i_mapping, curr,
+ AOP_FLAG_NOFS);
+ if (unlikely(!page)) {
+ error = -ENOMEM;
+ goto out;
+ }
+
+ if (curr == end)
+ to = end_offset;
+ error = write_empty_blocks(page, from, to);
+ if (!error && offset + to > inode->i_size &&
+ !(mode & FALLOC_FL_KEEP_SIZE)) {
+ i_size_write(inode, offset + to);
+ }
+ unlock_page(page);
+ page_cache_release(page);
+ if (error)
+ goto out;
+ curr++;
+ offset += PAGE_CACHE_SIZE;
+ from = 0;
+ }
+
+ gfs2_dinode_out(ip, dibh->b_data);
+ mark_inode_dirty(inode);
+
+ brelse(dibh);
+
+out:
+ return error;
+}
+
+static void calc_max_reserv(struct gfs2_inode *ip, loff_t max, loff_t *len,
+ unsigned int *data_blocks, unsigned int *ind_blocks)
+{
+ const struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
+ unsigned int max_blocks = ip->i_alloc->al_rgd->rd_free_clone;
+ unsigned int tmp, max_data = max_blocks - 3 * (sdp->sd_max_height - 1);
+
+ for (tmp = max_data; tmp > sdp->sd_diptrs;) {
+ tmp = DIV_ROUND_UP(tmp, sdp->sd_inptrs);
+ max_data -= tmp;
+ }
+ /* This calculation isn't the exact reverse of gfs2_write_calc_reserve,
+ so it might end up with fewer data blocks */
+ if (max_data <= *data_blocks)
+ return;
+ *data_blocks = max_data;
+ *ind_blocks = max_blocks - max_data;
+ *len = ((loff_t)max_data - 3) << sdp->sd_sb.sb_bsize_shift;
+ if (*len > max) {
+ *len = max;
+ gfs2_write_calc_reserv(ip, max, data_blocks, ind_blocks);
+ }
+}
+
+static long gfs2_fallocate(struct inode *inode, int mode, loff_t offset,
+ loff_t len)
+{
+ struct gfs2_sbd *sdp = GFS2_SB(inode);
+ struct gfs2_inode *ip = GFS2_I(inode);
+ unsigned int data_blocks = 0, ind_blocks = 0, rblocks;
+ loff_t bytes, max_bytes;
+ struct gfs2_alloc *al;
+ int error;
+ loff_t next = (offset + len - 1) >> sdp->sd_sb.sb_bsize_shift;
+ next = (next + 1) << sdp->sd_sb.sb_bsize_shift;
+
+ offset = (offset >> sdp->sd_sb.sb_bsize_shift) <<
+ sdp->sd_sb.sb_bsize_shift;
+
+ len = next - offset;
+ bytes = sdp->sd_max_rg_data * sdp->sd_sb.sb_bsize / 2;
+ if (!bytes)
+ bytes = UINT_MAX;
+
+ gfs2_holder_init(ip->i_gl, LM_ST_EXCLUSIVE, 0, &ip->i_gh);
+ error = gfs2_glock_nq(&ip->i_gh);
+ if (unlikely(error))
+ goto out_uninit;
+
+ if (!gfs2_write_alloc_required(ip, offset, len))
+ goto out_unlock;
+
+ while (len > 0) {
+ if (len < bytes)
+ bytes = len;
+ al = gfs2_alloc_get(ip);
+ if (!al) {
+ error = -ENOMEM;
+ goto out_unlock;
+ }
+
+ error = gfs2_quota_lock_check(ip);
+ if (error)
+ goto out_alloc_put;
+
+retry:
+ gfs2_write_calc_reserv(ip, bytes, &data_blocks, &ind_blocks);
+
+ al->al_requested = data_blocks + ind_blocks;
+ error = gfs2_inplace_reserve(ip);
+ if (error) {
+ if (error == -ENOSPC && bytes > sdp->sd_sb.sb_bsize) {
+ bytes >>= 1;
+ goto retry;
+ }
+ goto out_qunlock;
+ }
+ max_bytes = bytes;
+ calc_max_reserv(ip, len, &max_bytes, &data_blocks, &ind_blocks);
+ al->al_requested = data_blocks + ind_blocks;
+
+ rblocks = RES_DINODE + ind_blocks + RES_STATFS + RES_QUOTA +
+ RES_RG_HDR + gfs2_rg_blocks(al);
+ if (gfs2_is_jdata(ip))
+ rblocks += data_blocks ? data_blocks : 1;
+
+ error = gfs2_trans_begin(sdp, rblocks,
+ PAGE_CACHE_SIZE/sdp->sd_sb.sb_bsize);
+ if (error)
+ goto out_trans_fail;
+
+ error = fallocate_chunk(inode, offset, max_bytes, mode);
+ gfs2_trans_end(sdp);
+
+ if (error)
+ goto out_trans_fail;
+
+ len -= max_bytes;
+ offset += max_bytes;
+ gfs2_inplace_release(ip);
+ gfs2_quota_unlock(ip);
+ gfs2_alloc_put(ip);
+ }
+ goto out_unlock;
+
+out_trans_fail:
+ gfs2_inplace_release(ip);
+out_qunlock:
+ gfs2_quota_unlock(ip);
+out_alloc_put:
+ gfs2_alloc_put(ip);
+out_unlock:
+ gfs2_glock_dq(&ip->i_gh);
+out_uninit:
+ gfs2_holder_uninit(&ip->i_gh);
+ return error;
+}
+
+
static int gfs2_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,
u64 start, u64 len)
{
.getxattr = gfs2_getxattr,
.listxattr = gfs2_listxattr,
.removexattr = gfs2_removexattr,
+ .fallocate = gfs2_fallocate,
.fiemap = gfs2_fiemap,
};
goto out;
size = loc + sizeof(struct gfs2_quota);
- if (size > inode->i_size) {
- ip->i_disksize = size;
+ if (size > inode->i_size)
i_size_write(inode, size);
- }
inode->i_mtime = inode->i_atime = CURRENT_TIME;
gfs2_trans_add_bh(ip->i_gl, dibh, 1);
gfs2_dinode_out(ip, dibh->b_data);
goto out_alloc;
if (nalloc)
- blocks += al->al_rgd->rd_length + nalloc * ind_blocks + RES_STATFS;
+ blocks += gfs2_rg_blocks(al) + nalloc * ind_blocks + RES_STATFS;
error = gfs2_trans_begin(sdp, blocks, 0);
if (error)
int gfs2_quota_init(struct gfs2_sbd *sdp)
{
struct gfs2_inode *ip = GFS2_I(sdp->sd_qc_inode);
- unsigned int blocks = ip->i_disksize >> sdp->sd_sb.sb_bsize_shift;
+ u64 size = i_size_read(sdp->sd_qc_inode);
+ unsigned int blocks = size >> sdp->sd_sb.sb_bsize_shift;
unsigned int x, slot = 0;
unsigned int found = 0;
u64 dblock;
u32 extlen = 0;
int error;
- if (!ip->i_disksize || ip->i_disksize > (64 << 20) ||
- ip->i_disksize & (sdp->sd_sb.sb_bsize - 1)) {
- gfs2_consist_inode(ip);
+ if (gfs2_check_internal_file_size(sdp->sd_qc_inode, 1, 64 << 20))
return -EIO;
- }
+
sdp->sd_quota_slots = blocks * sdp->sd_qc_per_block;
sdp->sd_quota_chunks = DIV_ROUND_UP(sdp->sd_quota_slots, 8 * PAGE_SIZE);
error = gfs2_inplace_reserve(ip);
if (error)
goto out_alloc;
+ blocks += gfs2_rg_blocks(al);
}
error = gfs2_trans_begin(sdp, blocks + RES_DINODE + 1, 0);
int ro = 0;
unsigned int pass;
int error;
+ int jlocked = 0;
- if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) {
+ if (sdp->sd_args.ar_spectator ||
+ (jd->jd_jid != sdp->sd_lockstruct.ls_jid)) {
fs_info(sdp, "jid=%u: Trying to acquire journal lock...\n",
jd->jd_jid);
-
+ jlocked = 1;
/* Acquire the journal lock so we can do recovery */
error = gfs2_glock_nq_num(sdp, jd->jd_jid, &gfs2_journal_glops,
jd->jd_jid, t);
}
- if (jd->jd_jid != sdp->sd_lockstruct.ls_jid)
- gfs2_glock_dq_uninit(&ji_gh);
-
gfs2_recovery_done(sdp, jd->jd_jid, LM_RD_SUCCESS);
- if (jd->jd_jid != sdp->sd_lockstruct.ls_jid)
+ if (jlocked) {
+ gfs2_glock_dq_uninit(&ji_gh);
gfs2_glock_dq_uninit(&j_gh);
+ }
fs_info(sdp, "jid=%u: Done\n", jd->jd_jid);
goto done;
fail_gunlock_tr:
gfs2_glock_dq_uninit(&t_gh);
fail_gunlock_ji:
- if (jd->jd_jid != sdp->sd_lockstruct.ls_jid) {
+ if (jlocked) {
gfs2_glock_dq_uninit(&ji_gh);
fail_gunlock_j:
gfs2_glock_dq_uninit(&j_gh);
for (rgrps = 0;; rgrps++) {
loff_t pos = rgrps * sizeof(struct gfs2_rindex);
- if (pos + sizeof(struct gfs2_rindex) >= ip->i_disksize)
+ if (pos + sizeof(struct gfs2_rindex) >= i_size_read(inode))
break;
error = gfs2_internal_read(ip, &ra_state, buf, &pos,
sizeof(struct gfs2_rindex));
struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
struct inode *inode = &ip->i_inode;
struct file_ra_state ra_state;
- u64 rgrp_count = ip->i_disksize;
+ u64 rgrp_count = i_size_read(inode);
+ struct gfs2_rgrpd *rgd;
+ unsigned int max_data = 0;
int error;
do_div(rgrp_count, sizeof(struct gfs2_rindex));
}
}
+ list_for_each_entry(rgd, &sdp->sd_rindex_list, rd_list)
+ if (rgd->rd_data > max_data)
+ max_data = rgd->rd_data;
+ sdp->sd_max_rg_data = max_data;
sdp->sd_rindex_uptodate = 1;
return 0;
}
struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
struct inode *inode = &ip->i_inode;
struct file_ra_state ra_state;
+ struct gfs2_rgrpd *rgd;
+ unsigned int max_data = 0;
int error;
file_ra_state_init(&ra_state, inode->i_mapping);
for (sdp->sd_rgrps = 0;; sdp->sd_rgrps++) {
/* Ignore partials */
if ((sdp->sd_rgrps + 1) * sizeof(struct gfs2_rindex) >
- ip->i_disksize)
+ i_size_read(inode))
break;
error = read_rindex_entry(ip, &ra_state);
if (error) {
return error;
}
}
+ list_for_each_entry(rgd, &sdp->sd_rindex_list, rd_list)
+ if (rgd->rd_data > max_data)
+ max_data = rgd->rd_data;
+ sdp->sd_max_rg_data = max_data;
sdp->sd_rindex_uptodate = 1;
return 0;
* Returns: errno
*/
-int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file, unsigned int line)
+int gfs2_inplace_reserve_i(struct gfs2_inode *ip, int hold_rindex,
+ char *file, unsigned int line)
{
struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
struct gfs2_alloc *al = ip->i_alloc;
return -EINVAL;
try_again:
- /* We need to hold the rindex unless the inode we're using is
- the rindex itself, in which case it's already held. */
- if (ip != GFS2_I(sdp->sd_rindex))
- error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
- else if (!sdp->sd_rgrps) /* We may not have the rindex read in, so: */
- error = gfs2_ri_update_special(ip);
+ if (hold_rindex) {
+ /* We need to hold the rindex unless the inode we're using is
+ the rindex itself, in which case it's already held. */
+ if (ip != GFS2_I(sdp->sd_rindex))
+ error = gfs2_rindex_hold(sdp, &al->al_ri_gh);
+ else if (!sdp->sd_rgrps) /* We may not have the rindex read
+ in, so: */
+ error = gfs2_ri_update_special(ip);
+ }
if (error)
return error;
try to free it, and try the allocation again. */
error = get_local_rgrp(ip, &unlinked, &last_unlinked);
if (error) {
- if (ip != GFS2_I(sdp->sd_rindex))
+ if (hold_rindex && ip != GFS2_I(sdp->sd_rindex))
gfs2_glock_dq_uninit(&al->al_ri_gh);
if (error != -EAGAIN)
return error;
al->al_rgd = NULL;
if (al->al_rgd_gh.gh_gl)
gfs2_glock_dq_uninit(&al->al_rgd_gh);
- if (ip != GFS2_I(sdp->sd_rindex))
+ if (ip != GFS2_I(sdp->sd_rindex) && al->al_ri_gh.gh_gl)
gfs2_glock_dq_uninit(&al->al_ri_gh);
}
struct gfs2_sbd *sdp = GFS2_SB(&ip->i_inode);
struct buffer_head *dibh;
struct gfs2_alloc *al = ip->i_alloc;
- struct gfs2_rgrpd *rgd = al->al_rgd;
+ struct gfs2_rgrpd *rgd;
u32 goal, blk;
u64 block;
int error;
+ /* Only happens if there is a bug in gfs2, return something distinctive
+ * to ensure that it is noticed.
+ */
+ if (al == NULL)
+ return -ECANCELED;
+
+ rgd = al->al_rgd;
+
if (rgrp_contains_block(rgd, ip->i_goal))
goal = ip->i_goal - rgd->rd_data0;
else
ip->i_alloc = NULL;
}
-extern int gfs2_inplace_reserve_i(struct gfs2_inode *ip, char *file,
- unsigned int line);
+extern int gfs2_inplace_reserve_i(struct gfs2_inode *ip, int hold_rindex,
+ char *file, unsigned int line);
#define gfs2_inplace_reserve(ip) \
-gfs2_inplace_reserve_i((ip), __FILE__, __LINE__)
+ gfs2_inplace_reserve_i((ip), 1, __FILE__, __LINE__)
+#define gfs2_inplace_reserve_ri(ip) \
+ gfs2_inplace_reserve_i((ip), 0, __FILE__, __LINE__)
extern void gfs2_inplace_release(struct gfs2_inode *ip);
{Opt_locktable, "locktable=%s"},
{Opt_hostdata, "hostdata=%s"},
{Opt_spectator, "spectator"},
+ {Opt_spectator, "norecovery"},
{Opt_ignore_local_fs, "ignore_local_fs"},
{Opt_localflocks, "localflocks"},
{Opt_localcaching, "localcaching"},
args->ar_spectator = 1;
break;
case Opt_ignore_local_fs:
- args->ar_ignore_local_fs = 1;
+ /* Retained for backwards compat only */
break;
case Opt_localflocks:
args->ar_localflocks = 1;
break;
case Opt_localcaching:
- args->ar_localcaching = 1;
+ /* Retained for backwards compat only */
break;
case Opt_debug:
if (args->ar_errors == GFS2_ERRORS_PANIC) {
args->ar_debug = 0;
break;
case Opt_upgrade:
- args->ar_upgrade = 1;
+ /* Retained for backwards compat only */
break;
case Opt_acl:
args->ar_posix_acl = 1;
{
struct gfs2_inode *ip = GFS2_I(jd->jd_inode);
struct gfs2_sbd *sdp = GFS2_SB(jd->jd_inode);
+ u64 size = i_size_read(jd->jd_inode);
- if (ip->i_disksize < (8 << 20) || ip->i_disksize > (1 << 30) ||
- (ip->i_disksize & (sdp->sd_sb.sb_bsize - 1))) {
- gfs2_consist_inode(ip);
+ if (gfs2_check_internal_file_size(jd->jd_inode, 8 << 20, 1 << 30))
return -EIO;
- }
- jd->jd_blocks = ip->i_disksize >> sdp->sd_sb.sb_bsize_shift;
- if (gfs2_write_alloc_required(ip, 0, ip->i_disksize)) {
+ jd->jd_blocks = size >> sdp->sd_sb.sb_bsize_shift;
+
+ if (gfs2_write_alloc_required(ip, 0, size)) {
gfs2_consist_inode(ip);
return -EIO;
}
/* Some flags must not be changed */
if (args_neq(&args, &sdp->sd_args, spectator) ||
- args_neq(&args, &sdp->sd_args, ignore_local_fs) ||
args_neq(&args, &sdp->sd_args, localflocks) ||
- args_neq(&args, &sdp->sd_args, localcaching) ||
args_neq(&args, &sdp->sd_args, meta))
return -EINVAL;
seq_printf(s, ",hostdata=%s", args->ar_hostdata);
if (args->ar_spectator)
seq_printf(s, ",spectator");
- if (args->ar_ignore_local_fs)
- seq_printf(s, ",ignore_local_fs");
if (args->ar_localflocks)
seq_printf(s, ",localflocks");
- if (args->ar_localcaching)
- seq_printf(s, ",localcaching");
if (args->ar_debug)
seq_printf(s, ",debug");
- if (args->ar_upgrade)
- seq_printf(s, ",upgrade");
if (args->ar_posix_acl)
seq_printf(s, ",acl");
if (args->ar_quota != GFS2_QUOTA_DEFAULT) {
if (gltype > LM_TYPE_JOURNAL)
return -EINVAL;
- glops = gfs2_glops_list[gltype];
+ if (gltype == LM_TYPE_NONDISK && glnum == GFS2_TRANS_LOCK)
+ glops = &gfs2_trans_glops;
+ else
+ glops = gfs2_glops_list[gltype];
if (glops == NULL)
return -EINVAL;
if (!test_and_set_bit(SDF_DEMOTE, &sdp->sd_flags))
static ssize_t jid_show(struct gfs2_sbd *sdp, char *buf)
{
- return sprintf(buf, "%u\n", sdp->sd_lockstruct.ls_jid);
+ return sprintf(buf, "%d\n", sdp->sd_lockstruct.ls_jid);
}
static ssize_t jid_store(struct gfs2_sbd *sdp, const char *buf, size_t len)
{
- unsigned jid;
+ int jid;
int rv;
- rv = sscanf(buf, "%u", &jid);
+ rv = sscanf(buf, "%d", &jid);
if (rv != 1)
return -EINVAL;
spin_lock(&sdp->sd_jindex_spin);
rv = -EINVAL;
- if (sdp->sd_args.ar_spectator)
- goto out;
if (sdp->sd_lockstruct.ls_ops->lm_mount == NULL)
goto out;
rv = -EBUSY;
- if (test_and_clear_bit(SDF_NOJOURNALID, &sdp->sd_flags) == 0)
+ if (test_bit(SDF_NOJOURNALID, &sdp->sd_flags) == 0)
goto out;
+ rv = 0;
+ if (sdp->sd_args.ar_spectator && jid > 0)
+ rv = jid = -EINVAL;
sdp->sd_lockstruct.ls_jid = jid;
+ clear_bit(SDF_NOJOURNALID, &sdp->sd_flags);
smp_mb__after_clear_bit();
wake_up_bit(&sdp->sd_flags, SDF_NOJOURNALID);
- rv = 0;
out:
spin_unlock(&sdp->sd_jindex_spin);
return rv ? rv : len;
add_uevent_var(env, "LOCKTABLE=%s", sdp->sd_table_name);
add_uevent_var(env, "LOCKPROTO=%s", sdp->sd_proto_name);
if (!test_bit(SDF_NOJOURNALID, &sdp->sd_flags))
- add_uevent_var(env, "JOURNALID=%u", sdp->sd_lockstruct.ls_jid);
+ add_uevent_var(env, "JOURNALID=%d", sdp->sd_lockstruct.ls_jid);
if (gfs2_uuid_valid(uuid))
add_uevent_var(env, "UUID=%pUB", uuid);
return 0;
{(1UL << GLF_INVALIDATE_IN_PROGRESS), "i" }, \
{(1UL << GLF_REPLY_PENDING), "r" }, \
{(1UL << GLF_INITIAL), "I" }, \
- {(1UL << GLF_FROZEN), "F" })
+ {(1UL << GLF_FROZEN), "F" }, \
+ {(1UL << GLF_QUEUED), "q" })
#ifndef NUMPTY
#define NUMPTY
#define RES_JDATA 1
#define RES_DATA 1
#define RES_LEAF 1
+#define RES_RG_HDR 1
#define RES_RG_BIT 2
#define RES_EATTR 1
#define RES_STATFS 1
#define RES_QUOTA 2
+/* reserve either the number of blocks to be allocated plus the rg header
+ * block, or all of the blocks in the rg, whichever is smaller */
+static inline unsigned int gfs2_rg_blocks(const struct gfs2_alloc *al)
+{
+ return (al->al_requested < al->al_rgd->rd_length)?
+ al->al_requested + 1 : al->al_rgd->rd_length;
+}
+
int gfs2_trans_begin(struct gfs2_sbd *sdp, unsigned int blocks,
unsigned int revokes);
goto out_gunlock_q;
error = gfs2_trans_begin(GFS2_SB(&ip->i_inode),
- blks + al->al_rgd->rd_length +
+ blks + gfs2_rg_blocks(al) +
RES_DINODE + RES_STATFS + RES_QUOTA, 0);
if (error)
goto out_ipres;
fd->search_key = ptr;
fd->key = ptr + tree->max_key_len + 2;
dprint(DBG_BNODE_REFS, "find_init: %d (%p)\n", tree->cnid, __builtin_return_address(0));
- down(&tree->tree_lock);
+ mutex_lock(&tree->tree_lock);
return 0;
}
hfs_bnode_put(fd->bnode);
kfree(fd->search_key);
dprint(DBG_BNODE_REFS, "find_exit: %d (%p)\n", fd->tree->cnid, __builtin_return_address(0));
- up(&fd->tree->tree_lock);
+ mutex_unlock(&fd->tree->tree_lock);
fd->tree = NULL;
}
if (!tree)
return NULL;
- init_MUTEX(&tree->tree_lock);
+ mutex_init(&tree->tree_lock);
spin_lock_init(&tree->hash_lock);
/* Set the correct compare function */
tree->sb = sb;
unsigned int depth;
//unsigned int map1_size, map_size;
- struct semaphore tree_lock;
+ struct mutex tree_lock;
unsigned int pages_per_bnode;
spinlock_t hash_lock;
fd->search_key = ptr;
fd->key = ptr + tree->max_key_len + 2;
dprint(DBG_BNODE_REFS, "find_init: %d (%p)\n", tree->cnid, __builtin_return_address(0));
- down(&tree->tree_lock);
+ mutex_lock(&tree->tree_lock);
return 0;
}
hfs_bnode_put(fd->bnode);
kfree(fd->search_key);
dprint(DBG_BNODE_REFS, "find_exit: %d (%p)\n", fd->tree->cnid, __builtin_return_address(0));
- up(&fd->tree->tree_lock);
+ mutex_unlock(&fd->tree->tree_lock);
fd->tree = NULL;
}
rec = (e + b) / 2;
len = hfs_brec_lenoff(bnode, rec, &off);
keylen = hfs_brec_keylen(bnode, rec);
+ if (keylen == 0) {
+ res = -EINVAL;
+ goto fail;
+ }
hfs_bnode_read(bnode, fd->key, off, keylen);
cmpval = bnode->tree->keycmp(fd->key, fd->search_key);
if (!cmpval) {
if (rec != e && e >= 0) {
len = hfs_brec_lenoff(bnode, e, &off);
keylen = hfs_brec_keylen(bnode, e);
+ if (keylen == 0) {
+ res = -EINVAL;
+ goto fail;
+ }
hfs_bnode_read(bnode, fd->key, off, keylen);
}
done:
fd->keylength = keylen;
fd->entryoffset = off + keylen;
fd->entrylength = len - keylen;
+fail:
return res;
}
len = hfs_brec_lenoff(bnode, fd->record, &off);
keylen = hfs_brec_keylen(bnode, fd->record);
+ if (keylen == 0) {
+ res = -EINVAL;
+ goto out;
+ }
fd->keyoffset = off;
fd->keylength = keylen;
fd->entryoffset = off + keylen;
int hfsplus_block_allocate(struct super_block *sb, u32 size, u32 offset, u32 *max)
{
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
struct page *page;
struct address_space *mapping;
__be32 *pptr, *curr, *end;
return size;
dprint(DBG_BITMAP, "block_allocate: %u,%u,%u\n", size, offset, len);
- mutex_lock(&HFSPLUS_SB(sb).alloc_file->i_mutex);
- mapping = HFSPLUS_SB(sb).alloc_file->i_mapping;
+ mutex_lock(&sbi->alloc_mutex);
+ mapping = sbi->alloc_file->i_mapping;
page = read_mapping_page(mapping, offset / PAGE_CACHE_BITS, NULL);
if (IS_ERR(page)) {
start = size;
set_page_dirty(page);
kunmap(page);
*max = offset + (curr - pptr) * 32 + i - start;
- HFSPLUS_SB(sb).free_blocks -= *max;
+ sbi->free_blocks -= *max;
sb->s_dirt = 1;
dprint(DBG_BITMAP, "-> %u,%u\n", start, *max);
out:
- mutex_unlock(&HFSPLUS_SB(sb).alloc_file->i_mutex);
+ mutex_unlock(&sbi->alloc_mutex);
return start;
}
int hfsplus_block_free(struct super_block *sb, u32 offset, u32 count)
{
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
struct page *page;
struct address_space *mapping;
__be32 *pptr, *curr, *end;
dprint(DBG_BITMAP, "block_free: %u,%u\n", offset, count);
/* are all of the bits in range? */
- if ((offset + count) > HFSPLUS_SB(sb).total_blocks)
+ if ((offset + count) > sbi->total_blocks)
return -2;
- mutex_lock(&HFSPLUS_SB(sb).alloc_file->i_mutex);
- mapping = HFSPLUS_SB(sb).alloc_file->i_mapping;
+ mutex_lock(&sbi->alloc_mutex);
+ mapping = sbi->alloc_file->i_mapping;
pnr = offset / PAGE_CACHE_BITS;
page = read_mapping_page(mapping, pnr, NULL);
pptr = kmap(page);
out:
set_page_dirty(page);
kunmap(page);
- HFSPLUS_SB(sb).free_blocks += len;
+ sbi->free_blocks += len;
sb->s_dirt = 1;
- mutex_unlock(&HFSPLUS_SB(sb).alloc_file->i_mutex);
+ mutex_unlock(&sbi->alloc_mutex);
return 0;
}
recoff = hfs_bnode_read_u16(node, node->tree->node_size - (rec + 1) * 2);
if (!recoff)
return 0;
- if (node->tree->attributes & HFS_TREE_BIGKEYS)
- retval = hfs_bnode_read_u16(node, recoff) + 2;
- else
- retval = (hfs_bnode_read_u8(node, recoff) | 1) + 1;
+
+ retval = hfs_bnode_read_u16(node, recoff) + 2;
+ if (retval > node->tree->max_key_len + 2) {
+ printk(KERN_ERR "hfs: keylen %d too large\n",
+ retval);
+ retval = 0;
+ }
}
return retval;
}
static struct hfs_bnode *hfs_bnode_split(struct hfs_find_data *fd)
{
struct hfs_btree *tree;
- struct hfs_bnode *node, *new_node;
+ struct hfs_bnode *node, *new_node, *next_node;
struct hfs_bnode_desc node_desc;
int num_recs, new_rec_off, new_off, old_rec_off;
int data_start, data_end, size;
new_node->type = node->type;
new_node->height = node->height;
+ if (node->next)
+ next_node = hfs_bnode_find(tree, node->next);
+ else
+ next_node = NULL;
+
+ if (IS_ERR(next_node)) {
+ hfs_bnode_put(node);
+ hfs_bnode_put(new_node);
+ return next_node;
+ }
+
size = tree->node_size / 2 - node->num_recs * 2 - 14;
old_rec_off = tree->node_size - 4;
num_recs = 1;
/* panic? */
hfs_bnode_put(node);
hfs_bnode_put(new_node);
+ if (next_node)
+ hfs_bnode_put(next_node);
return ERR_PTR(-ENOSPC);
}
hfs_bnode_write(node, &node_desc, 0, sizeof(node_desc));
/* update next bnode header */
- if (new_node->next) {
- struct hfs_bnode *next_node = hfs_bnode_find(tree, new_node->next);
+ if (next_node) {
next_node->prev = new_node->this;
hfs_bnode_read(next_node, &node_desc, 0, sizeof(node_desc));
node_desc.prev = cpu_to_be32(next_node->prev);
if (!tree)
return NULL;
- init_MUTEX(&tree->tree_lock);
+ mutex_init(&tree->tree_lock);
spin_lock_init(&tree->hash_lock);
tree->sb = sb;
tree->cnid = id;
goto free_tree;
tree->inode = inode;
+ if (!HFSPLUS_I(tree->inode)->first_blocks) {
+ printk(KERN_ERR
+ "hfs: invalid btree extent records (0 size).\n");
+ goto free_inode;
+ }
+
mapping = tree->inode->i_mapping;
page = read_mapping_page(mapping, 0, NULL);
if (IS_ERR(page))
- goto free_tree;
+ goto free_inode;
/* Load the header */
head = (struct hfs_btree_header_rec *)(kmap(page) + sizeof(struct hfs_bnode_desc));
tree->max_key_len = be16_to_cpu(head->max_key_len);
tree->depth = be16_to_cpu(head->depth);
- /* Set the correct compare function */
- if (id == HFSPLUS_EXT_CNID) {
+ /* Verify the tree and set the correct compare function */
+ switch (id) {
+ case HFSPLUS_EXT_CNID:
+ if (tree->max_key_len != HFSPLUS_EXT_KEYLEN - sizeof(u16)) {
+ printk(KERN_ERR "hfs: invalid extent max_key_len %d\n",
+ tree->max_key_len);
+ goto fail_page;
+ }
+ if (tree->attributes & HFS_TREE_VARIDXKEYS) {
+ printk(KERN_ERR "hfs: invalid extent btree flag\n");
+ goto fail_page;
+ }
+
tree->keycmp = hfsplus_ext_cmp_key;
- } else if (id == HFSPLUS_CAT_CNID) {
- if ((HFSPLUS_SB(sb).flags & HFSPLUS_SB_HFSX) &&
+ break;
+ case HFSPLUS_CAT_CNID:
+ if (tree->max_key_len != HFSPLUS_CAT_KEYLEN - sizeof(u16)) {
+ printk(KERN_ERR "hfs: invalid catalog max_key_len %d\n",
+ tree->max_key_len);
+ goto fail_page;
+ }
+ if (!(tree->attributes & HFS_TREE_VARIDXKEYS)) {
+ printk(KERN_ERR "hfs: invalid catalog btree flag\n");
+ goto fail_page;
+ }
+
+ if (test_bit(HFSPLUS_SB_HFSX, &HFSPLUS_SB(sb)->flags) &&
(head->key_type == HFSPLUS_KEY_BINARY))
tree->keycmp = hfsplus_cat_bin_cmp_key;
else {
tree->keycmp = hfsplus_cat_case_cmp_key;
- HFSPLUS_SB(sb).flags |= HFSPLUS_SB_CASEFOLD;
+ set_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags);
}
- } else {
+ break;
+ default:
printk(KERN_ERR "hfs: unknown B*Tree requested\n");
goto fail_page;
}
+ if (!(tree->attributes & HFS_TREE_BIGKEYS)) {
+ printk(KERN_ERR "hfs: invalid btree flag\n");
+ goto fail_page;
+ }
+
size = tree->node_size;
if (!is_power_of_2(size))
goto fail_page;
if (!tree->node_count)
goto fail_page;
+
tree->node_size_shift = ffs(size) - 1;
tree->pages_per_bnode = (tree->node_size + PAGE_CACHE_SIZE - 1) >> PAGE_CACHE_SHIFT;
return tree;
fail_page:
- tree->inode->i_mapping->a_ops = &hfsplus_aops;
page_cache_release(page);
- free_tree:
+ free_inode:
+ tree->inode->i_mapping->a_ops = &hfsplus_aops;
iput(tree->inode);
+ free_tree:
kfree(tree);
return NULL;
}
while (!tree->free_nodes) {
struct inode *inode = tree->inode;
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
u32 count;
int res;
res = hfsplus_file_extend(inode);
if (res)
return ERR_PTR(res);
- HFSPLUS_I(inode).phys_size = inode->i_size =
- (loff_t)HFSPLUS_I(inode).alloc_blocks <<
- HFSPLUS_SB(tree->sb).alloc_blksz_shift;
- HFSPLUS_I(inode).fs_blocks = HFSPLUS_I(inode).alloc_blocks <<
- HFSPLUS_SB(tree->sb).fs_shift;
+ hip->phys_size = inode->i_size =
+ (loff_t)hip->alloc_blocks <<
+ HFSPLUS_SB(tree->sb)->alloc_blksz_shift;
+ hip->fs_blocks =
+ hip->alloc_blocks << HFSPLUS_SB(tree->sb)->fs_shift;
inode_set_bytes(inode, inode->i_size);
count = inode->i_size >> tree->node_size_shift;
tree->free_nodes = count - tree->node_count;
key->key_len = cpu_to_be16(6 + ustrlen);
}
-static void hfsplus_set_perms(struct inode *inode, struct hfsplus_perm *perms)
+void hfsplus_cat_set_perms(struct inode *inode, struct hfsplus_perm *perms)
{
if (inode->i_flags & S_IMMUTABLE)
perms->rootflags |= HFSPLUS_FLG_IMMUTABLE;
perms->rootflags |= HFSPLUS_FLG_APPEND;
else
perms->rootflags &= ~HFSPLUS_FLG_APPEND;
- HFSPLUS_I(inode).rootflags = perms->rootflags;
- HFSPLUS_I(inode).userflags = perms->userflags;
+
+ perms->userflags = HFSPLUS_I(inode)->userflags;
perms->mode = cpu_to_be16(inode->i_mode);
perms->owner = cpu_to_be32(inode->i_uid);
perms->group = cpu_to_be32(inode->i_gid);
+
+ if (S_ISREG(inode->i_mode))
+ perms->dev = cpu_to_be32(inode->i_nlink);
+ else if (S_ISBLK(inode->i_mode) || S_ISCHR(inode->i_mode))
+ perms->dev = cpu_to_be32(inode->i_rdev);
+ else
+ perms->dev = 0;
}
static int hfsplus_cat_build_record(hfsplus_cat_entry *entry, u32 cnid, struct inode *inode)
{
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
+
if (S_ISDIR(inode->i_mode)) {
struct hfsplus_cat_folder *folder;
memset(folder, 0, sizeof(*folder));
folder->type = cpu_to_be16(HFSPLUS_FOLDER);
folder->id = cpu_to_be32(inode->i_ino);
- HFSPLUS_I(inode).create_date =
+ HFSPLUS_I(inode)->create_date =
folder->create_date =
folder->content_mod_date =
folder->attribute_mod_date =
folder->access_date = hfsp_now2mt();
- hfsplus_set_perms(inode, &folder->permissions);
- if (inode == HFSPLUS_SB(inode->i_sb).hidden_dir)
+ hfsplus_cat_set_perms(inode, &folder->permissions);
+ if (inode == sbi->hidden_dir)
/* invisible and namelocked */
folder->user_info.frFlags = cpu_to_be16(0x5000);
return sizeof(*folder);
file->type = cpu_to_be16(HFSPLUS_FILE);
file->flags = cpu_to_be16(HFSPLUS_FILE_THREAD_EXISTS);
file->id = cpu_to_be32(cnid);
- HFSPLUS_I(inode).create_date =
+ HFSPLUS_I(inode)->create_date =
file->create_date =
file->content_mod_date =
file->attribute_mod_date =
file->access_date = hfsp_now2mt();
if (cnid == inode->i_ino) {
- hfsplus_set_perms(inode, &file->permissions);
+ hfsplus_cat_set_perms(inode, &file->permissions);
if (S_ISLNK(inode->i_mode)) {
file->user_info.fdType = cpu_to_be32(HFSP_SYMLINK_TYPE);
file->user_info.fdCreator = cpu_to_be32(HFSP_SYMLINK_CREATOR);
} else {
- file->user_info.fdType = cpu_to_be32(HFSPLUS_SB(inode->i_sb).type);
- file->user_info.fdCreator = cpu_to_be32(HFSPLUS_SB(inode->i_sb).creator);
+ file->user_info.fdType = cpu_to_be32(sbi->type);
+ file->user_info.fdCreator = cpu_to_be32(sbi->creator);
}
if ((file->permissions.rootflags | file->permissions.userflags) & HFSPLUS_FLG_IMMUTABLE)
file->flags |= cpu_to_be16(HFSPLUS_FILE_LOCKED);
file->user_info.fdType = cpu_to_be32(HFSP_HARDLINK_TYPE);
file->user_info.fdCreator = cpu_to_be32(HFSP_HFSPLUS_CREATOR);
file->user_info.fdFlags = cpu_to_be16(0x100);
- file->create_date = HFSPLUS_I(HFSPLUS_SB(inode->i_sb).hidden_dir).create_date;
- file->permissions.dev = cpu_to_be32(HFSPLUS_I(inode).dev);
+ file->create_date = HFSPLUS_I(sbi->hidden_dir)->create_date;
+ file->permissions.dev = cpu_to_be32(HFSPLUS_I(inode)->linkid);
}
return sizeof(*file);
}
int hfsplus_create_cat(u32 cnid, struct inode *dir, struct qstr *str, struct inode *inode)
{
+ struct super_block *sb = dir->i_sb;
struct hfs_find_data fd;
- struct super_block *sb;
hfsplus_cat_entry entry;
int entry_size;
int err;
dprint(DBG_CAT_MOD, "create_cat: %s,%u(%d)\n", str->name, cnid, inode->i_nlink);
- sb = dir->i_sb;
- hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+ hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
hfsplus_cat_build_key(sb, fd.search_key, cnid, NULL);
entry_size = hfsplus_fill_cat_thread(sb, &entry, S_ISDIR(inode->i_mode) ?
int hfsplus_delete_cat(u32 cnid, struct inode *dir, struct qstr *str)
{
- struct super_block *sb;
+ struct super_block *sb = dir->i_sb;
struct hfs_find_data fd;
struct hfsplus_fork_raw fork;
struct list_head *pos;
u16 type;
dprint(DBG_CAT_MOD, "delete_cat: %s,%u\n", str ? str->name : NULL, cnid);
- sb = dir->i_sb;
- hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+ hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
if (!str) {
int len;
hfsplus_free_fork(sb, cnid, &fork, HFSPLUS_TYPE_RSRC);
}
- list_for_each(pos, &HFSPLUS_I(dir).open_dir_list) {
+ list_for_each(pos, &HFSPLUS_I(dir)->open_dir_list) {
struct hfsplus_readdir_data *rd =
list_entry(pos, struct hfsplus_readdir_data, list);
if (fd.tree->keycmp(fd.search_key, (void *)&rd->key) < 0)
struct inode *src_dir, struct qstr *src_name,
struct inode *dst_dir, struct qstr *dst_name)
{
- struct super_block *sb;
+ struct super_block *sb = src_dir->i_sb;
struct hfs_find_data src_fd, dst_fd;
hfsplus_cat_entry entry;
int entry_size, type;
dprint(DBG_CAT_MOD, "rename_cat: %u - %lu,%s - %lu,%s\n", cnid, src_dir->i_ino, src_name->name,
dst_dir->i_ino, dst_name->name);
- sb = src_dir->i_sb;
- hfs_find_init(HFSPLUS_SB(sb).cat_tree, &src_fd);
+ hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &src_fd);
dst_fd = src_fd;
/* find the old dir entry and read the data */
dentry->d_op = &hfsplus_dentry_operations;
dentry->d_fsdata = NULL;
- hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+ hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
hfsplus_cat_build_key(sb, fd.search_key, dir->i_ino, &dentry->d_name);
again:
err = hfs_brec_read(&fd, &entry, sizeof(entry));
cnid = be32_to_cpu(entry.file.id);
if (entry.file.user_info.fdType == cpu_to_be32(HFSP_HARDLINK_TYPE) &&
entry.file.user_info.fdCreator == cpu_to_be32(HFSP_HFSPLUS_CREATOR) &&
- (entry.file.create_date == HFSPLUS_I(HFSPLUS_SB(sb).hidden_dir).create_date ||
- entry.file.create_date == HFSPLUS_I(sb->s_root->d_inode).create_date) &&
- HFSPLUS_SB(sb).hidden_dir) {
+ (entry.file.create_date == HFSPLUS_I(HFSPLUS_SB(sb)->hidden_dir)->create_date ||
+ entry.file.create_date == HFSPLUS_I(sb->s_root->d_inode)->create_date) &&
+ HFSPLUS_SB(sb)->hidden_dir) {
struct qstr str;
char name[32];
linkid = be32_to_cpu(entry.file.permissions.dev);
str.len = sprintf(name, "iNode%d", linkid);
str.name = name;
- hfsplus_cat_build_key(sb, fd.search_key, HFSPLUS_SB(sb).hidden_dir->i_ino, &str);
+ hfsplus_cat_build_key(sb, fd.search_key,
+ HFSPLUS_SB(sb)->hidden_dir->i_ino, &str);
goto again;
}
} else if (!dentry->d_fsdata)
if (IS_ERR(inode))
return ERR_CAST(inode);
if (S_ISREG(inode->i_mode))
- HFSPLUS_I(inode).dev = linkid;
+ HFSPLUS_I(inode)->linkid = linkid;
out:
d_add(dentry, inode);
return NULL;
if (filp->f_pos >= inode->i_size)
return 0;
- hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+ hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
hfsplus_cat_build_key(sb, fd.search_key, inode->i_ino, NULL);
err = hfs_brec_find(&fd);
if (err)
err = -EIO;
goto out;
}
- if (HFSPLUS_SB(sb).hidden_dir &&
- HFSPLUS_SB(sb).hidden_dir->i_ino == be32_to_cpu(entry.folder.id))
+ if (HFSPLUS_SB(sb)->hidden_dir &&
+ HFSPLUS_SB(sb)->hidden_dir->i_ino ==
+ be32_to_cpu(entry.folder.id))
goto next;
if (filldir(dirent, strbuf, len, filp->f_pos,
be32_to_cpu(entry.folder.id), DT_DIR))
}
filp->private_data = rd;
rd->file = filp;
- list_add(&rd->list, &HFSPLUS_I(inode).open_dir_list);
+ list_add(&rd->list, &HFSPLUS_I(inode)->open_dir_list);
}
memcpy(&rd->key, fd.key, sizeof(struct hfsplus_cat_key));
out:
{
struct hfsplus_readdir_data *rd = file->private_data;
if (rd) {
+ mutex_lock(&inode->i_mutex);
list_del(&rd->list);
+ mutex_unlock(&inode->i_mutex);
kfree(rd);
}
return 0;
}
-static int hfsplus_create(struct inode *dir, struct dentry *dentry, int mode,
- struct nameidata *nd)
-{
- struct inode *inode;
- int res;
-
- inode = hfsplus_new_inode(dir->i_sb, mode);
- if (!inode)
- return -ENOSPC;
-
- res = hfsplus_create_cat(inode->i_ino, dir, &dentry->d_name, inode);
- if (res) {
- inode->i_nlink = 0;
- hfsplus_delete_inode(inode);
- iput(inode);
- return res;
- }
- hfsplus_instantiate(dentry, inode, inode->i_ino);
- mark_inode_dirty(inode);
- return 0;
-}
-
static int hfsplus_link(struct dentry *src_dentry, struct inode *dst_dir,
struct dentry *dst_dentry)
{
- struct super_block *sb = dst_dir->i_sb;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(dst_dir->i_sb);
struct inode *inode = src_dentry->d_inode;
struct inode *src_dir = src_dentry->d_parent->d_inode;
struct qstr str;
if (HFSPLUS_IS_RSRC(inode))
return -EPERM;
+ if (!S_ISREG(inode->i_mode))
+ return -EPERM;
+ mutex_lock(&sbi->vh_mutex);
if (inode->i_ino == (u32)(unsigned long)src_dentry->d_fsdata) {
for (;;) {
get_random_bytes(&id, sizeof(cnid));
str.len = sprintf(name, "iNode%d", id);
res = hfsplus_rename_cat(inode->i_ino,
src_dir, &src_dentry->d_name,
- HFSPLUS_SB(sb).hidden_dir, &str);
+ sbi->hidden_dir, &str);
if (!res)
break;
if (res != -EEXIST)
- return res;
+ goto out;
}
- HFSPLUS_I(inode).dev = id;
- cnid = HFSPLUS_SB(sb).next_cnid++;
+ HFSPLUS_I(inode)->linkid = id;
+ cnid = sbi->next_cnid++;
src_dentry->d_fsdata = (void *)(unsigned long)cnid;
res = hfsplus_create_cat(cnid, src_dir, &src_dentry->d_name, inode);
if (res)
/* panic? */
- return res;
- HFSPLUS_SB(sb).file_count++;
+ goto out;
+ sbi->file_count++;
}
- cnid = HFSPLUS_SB(sb).next_cnid++;
+ cnid = sbi->next_cnid++;
res = hfsplus_create_cat(cnid, dst_dir, &dst_dentry->d_name, inode);
if (res)
- return res;
+ goto out;
inc_nlink(inode);
hfsplus_instantiate(dst_dentry, inode, cnid);
atomic_inc(&inode->i_count);
inode->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
- HFSPLUS_SB(sb).file_count++;
- sb->s_dirt = 1;
-
- return 0;
+ sbi->file_count++;
+ dst_dir->i_sb->s_dirt = 1;
+out:
+ mutex_unlock(&sbi->vh_mutex);
+ return res;
}
static int hfsplus_unlink(struct inode *dir, struct dentry *dentry)
{
- struct super_block *sb = dir->i_sb;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
struct inode *inode = dentry->d_inode;
struct qstr str;
char name[32];
if (HFSPLUS_IS_RSRC(inode))
return -EPERM;
+ mutex_lock(&sbi->vh_mutex);
cnid = (u32)(unsigned long)dentry->d_fsdata;
if (inode->i_ino == cnid &&
- atomic_read(&HFSPLUS_I(inode).opencnt)) {
+ atomic_read(&HFSPLUS_I(inode)->opencnt)) {
str.name = name;
str.len = sprintf(name, "temp%lu", inode->i_ino);
res = hfsplus_rename_cat(inode->i_ino,
dir, &dentry->d_name,
- HFSPLUS_SB(sb).hidden_dir, &str);
+ sbi->hidden_dir, &str);
if (!res)
inode->i_flags |= S_DEAD;
- return res;
+ goto out;
}
res = hfsplus_delete_cat(cnid, dir, &dentry->d_name);
if (res)
- return res;
+ goto out;
if (inode->i_nlink > 0)
drop_nlink(inode);
clear_nlink(inode);
if (!inode->i_nlink) {
if (inode->i_ino != cnid) {
- HFSPLUS_SB(sb).file_count--;
- if (!atomic_read(&HFSPLUS_I(inode).opencnt)) {
+ sbi->file_count--;
+ if (!atomic_read(&HFSPLUS_I(inode)->opencnt)) {
res = hfsplus_delete_cat(inode->i_ino,
- HFSPLUS_SB(sb).hidden_dir,
+ sbi->hidden_dir,
NULL);
if (!res)
hfsplus_delete_inode(inode);
} else
hfsplus_delete_inode(inode);
} else
- HFSPLUS_SB(sb).file_count--;
+ sbi->file_count--;
inode->i_ctime = CURRENT_TIME_SEC;
mark_inode_dirty(inode);
-
+out:
+ mutex_unlock(&sbi->vh_mutex);
return res;
}
-static int hfsplus_mkdir(struct inode *dir, struct dentry *dentry, int mode)
-{
- struct inode *inode;
- int res;
-
- inode = hfsplus_new_inode(dir->i_sb, S_IFDIR | mode);
- if (!inode)
- return -ENOSPC;
-
- res = hfsplus_create_cat(inode->i_ino, dir, &dentry->d_name, inode);
- if (res) {
- inode->i_nlink = 0;
- hfsplus_delete_inode(inode);
- iput(inode);
- return res;
- }
- hfsplus_instantiate(dentry, inode, inode->i_ino);
- mark_inode_dirty(inode);
- return 0;
-}
-
static int hfsplus_rmdir(struct inode *dir, struct dentry *dentry)
{
- struct inode *inode;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
+ struct inode *inode = dentry->d_inode;
int res;
- inode = dentry->d_inode;
if (inode->i_size != 2)
return -ENOTEMPTY;
+
+ mutex_lock(&sbi->vh_mutex);
res = hfsplus_delete_cat(inode->i_ino, dir, &dentry->d_name);
if (res)
- return res;
+ goto out;
clear_nlink(inode);
inode->i_ctime = CURRENT_TIME_SEC;
hfsplus_delete_inode(inode);
mark_inode_dirty(inode);
- return 0;
+out:
+ mutex_unlock(&sbi->vh_mutex);
+ return res;
}
static int hfsplus_symlink(struct inode *dir, struct dentry *dentry,
const char *symname)
{
- struct super_block *sb;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
struct inode *inode;
- int res;
+ int res = -ENOSPC;
- sb = dir->i_sb;
- inode = hfsplus_new_inode(sb, S_IFLNK | S_IRWXUGO);
+ mutex_lock(&sbi->vh_mutex);
+ inode = hfsplus_new_inode(dir->i_sb, S_IFLNK | S_IRWXUGO);
if (!inode)
- return -ENOSPC;
+ goto out;
res = page_symlink(inode, symname, strlen(symname) + 1);
- if (res) {
- inode->i_nlink = 0;
- hfsplus_delete_inode(inode);
- iput(inode);
- return res;
- }
+ if (res)
+ goto out_err;
- mark_inode_dirty(inode);
res = hfsplus_create_cat(inode->i_ino, dir, &dentry->d_name, inode);
+ if (res)
+ goto out_err;
- if (!res) {
- hfsplus_instantiate(dentry, inode, inode->i_ino);
- mark_inode_dirty(inode);
- }
+ hfsplus_instantiate(dentry, inode, inode->i_ino);
+ mark_inode_dirty(inode);
+ goto out;
+out_err:
+ inode->i_nlink = 0;
+ hfsplus_delete_inode(inode);
+ iput(inode);
+out:
+ mutex_unlock(&sbi->vh_mutex);
return res;
}
static int hfsplus_mknod(struct inode *dir, struct dentry *dentry,
int mode, dev_t rdev)
{
- struct super_block *sb;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(dir->i_sb);
struct inode *inode;
- int res;
+ int res = -ENOSPC;
- sb = dir->i_sb;
- inode = hfsplus_new_inode(sb, mode);
+ mutex_lock(&sbi->vh_mutex);
+ inode = hfsplus_new_inode(dir->i_sb, mode);
if (!inode)
- return -ENOSPC;
+ goto out;
+
+ if (S_ISBLK(mode) || S_ISCHR(mode) || S_ISFIFO(mode) || S_ISSOCK(mode))
+ init_special_inode(inode, mode, rdev);
res = hfsplus_create_cat(inode->i_ino, dir, &dentry->d_name, inode);
if (res) {
inode->i_nlink = 0;
hfsplus_delete_inode(inode);
iput(inode);
- return res;
+ goto out;
}
- init_special_inode(inode, mode, rdev);
+
hfsplus_instantiate(dentry, inode, inode->i_ino);
mark_inode_dirty(inode);
+out:
+ mutex_unlock(&sbi->vh_mutex);
+ return res;
+}
- return 0;
+static int hfsplus_create(struct inode *dir, struct dentry *dentry, int mode,
+ struct nameidata *nd)
+{
+ return hfsplus_mknod(dir, dentry, mode, 0);
+}
+
+static int hfsplus_mkdir(struct inode *dir, struct dentry *dentry, int mode)
+{
+ return hfsplus_mknod(dir, dentry, mode | S_IFDIR, 0);
}
static int hfsplus_rename(struct inode *old_dir, struct dentry *old_dentry,
/* Unlink destination if it already exists */
if (new_dentry->d_inode) {
- res = hfsplus_unlink(new_dir, new_dentry);
+ if (S_ISDIR(new_dentry->d_inode->i_mode))
+ res = hfsplus_rmdir(new_dir, new_dentry);
+ else
+ res = hfsplus_unlink(new_dir, new_dentry);
if (res)
return res;
}
static void __hfsplus_ext_write_extent(struct inode *inode, struct hfs_find_data *fd)
{
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
int res;
- hfsplus_ext_build_key(fd->search_key, inode->i_ino, HFSPLUS_I(inode).cached_start,
- HFSPLUS_IS_RSRC(inode) ? HFSPLUS_TYPE_RSRC : HFSPLUS_TYPE_DATA);
+ WARN_ON(!mutex_is_locked(&hip->extents_lock));
+
+ hfsplus_ext_build_key(fd->search_key, inode->i_ino, hip->cached_start,
+ HFSPLUS_IS_RSRC(inode) ?
+ HFSPLUS_TYPE_RSRC : HFSPLUS_TYPE_DATA);
+
res = hfs_brec_find(fd);
- if (HFSPLUS_I(inode).flags & HFSPLUS_FLG_EXT_NEW) {
+ if (hip->flags & HFSPLUS_FLG_EXT_NEW) {
if (res != -ENOENT)
return;
- hfs_brec_insert(fd, HFSPLUS_I(inode).cached_extents, sizeof(hfsplus_extent_rec));
- HFSPLUS_I(inode).flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
+ hfs_brec_insert(fd, hip->cached_extents,
+ sizeof(hfsplus_extent_rec));
+ hip->flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
} else {
if (res)
return;
- hfs_bnode_write(fd->bnode, HFSPLUS_I(inode).cached_extents, fd->entryoffset, fd->entrylength);
- HFSPLUS_I(inode).flags &= ~HFSPLUS_FLG_EXT_DIRTY;
+ hfs_bnode_write(fd->bnode, hip->cached_extents,
+ fd->entryoffset, fd->entrylength);
+ hip->flags &= ~HFSPLUS_FLG_EXT_DIRTY;
}
}
-void hfsplus_ext_write_extent(struct inode *inode)
+static void hfsplus_ext_write_extent_locked(struct inode *inode)
{
- if (HFSPLUS_I(inode).flags & HFSPLUS_FLG_EXT_DIRTY) {
+ if (HFSPLUS_I(inode)->flags & HFSPLUS_FLG_EXT_DIRTY) {
struct hfs_find_data fd;
- hfs_find_init(HFSPLUS_SB(inode->i_sb).ext_tree, &fd);
+ hfs_find_init(HFSPLUS_SB(inode->i_sb)->ext_tree, &fd);
__hfsplus_ext_write_extent(inode, &fd);
hfs_find_exit(&fd);
}
}
+void hfsplus_ext_write_extent(struct inode *inode)
+{
+ mutex_lock(&HFSPLUS_I(inode)->extents_lock);
+ hfsplus_ext_write_extent_locked(inode);
+ mutex_unlock(&HFSPLUS_I(inode)->extents_lock);
+}
+
static inline int __hfsplus_ext_read_extent(struct hfs_find_data *fd,
struct hfsplus_extent *extent,
u32 cnid, u32 block, u8 type)
static inline int __hfsplus_ext_cache_extent(struct hfs_find_data *fd, struct inode *inode, u32 block)
{
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
int res;
- if (HFSPLUS_I(inode).flags & HFSPLUS_FLG_EXT_DIRTY)
+ WARN_ON(!mutex_is_locked(&hip->extents_lock));
+
+ if (hip->flags & HFSPLUS_FLG_EXT_DIRTY)
__hfsplus_ext_write_extent(inode, fd);
- res = __hfsplus_ext_read_extent(fd, HFSPLUS_I(inode).cached_extents, inode->i_ino,
- block, HFSPLUS_IS_RSRC(inode) ? HFSPLUS_TYPE_RSRC : HFSPLUS_TYPE_DATA);
+ res = __hfsplus_ext_read_extent(fd, hip->cached_extents, inode->i_ino,
+ block, HFSPLUS_IS_RSRC(inode) ?
+ HFSPLUS_TYPE_RSRC :
+ HFSPLUS_TYPE_DATA);
if (!res) {
- HFSPLUS_I(inode).cached_start = be32_to_cpu(fd->key->ext.start_block);
- HFSPLUS_I(inode).cached_blocks = hfsplus_ext_block_count(HFSPLUS_I(inode).cached_extents);
+ hip->cached_start = be32_to_cpu(fd->key->ext.start_block);
+ hip->cached_blocks = hfsplus_ext_block_count(hip->cached_extents);
} else {
- HFSPLUS_I(inode).cached_start = HFSPLUS_I(inode).cached_blocks = 0;
- HFSPLUS_I(inode).flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
+ hip->cached_start = hip->cached_blocks = 0;
+ hip->flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
}
return res;
}
static int hfsplus_ext_read_extent(struct inode *inode, u32 block)
{
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
struct hfs_find_data fd;
int res;
- if (block >= HFSPLUS_I(inode).cached_start &&
- block < HFSPLUS_I(inode).cached_start + HFSPLUS_I(inode).cached_blocks)
+ if (block >= hip->cached_start &&
+ block < hip->cached_start + hip->cached_blocks)
return 0;
- hfs_find_init(HFSPLUS_SB(inode->i_sb).ext_tree, &fd);
+ hfs_find_init(HFSPLUS_SB(inode->i_sb)->ext_tree, &fd);
res = __hfsplus_ext_cache_extent(&fd, inode, block);
hfs_find_exit(&fd);
return res;
int hfsplus_get_block(struct inode *inode, sector_t iblock,
struct buffer_head *bh_result, int create)
{
- struct super_block *sb;
+ struct super_block *sb = inode->i_sb;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
int res = -EIO;
u32 ablock, dblock, mask;
int shift;
- sb = inode->i_sb;
-
/* Convert inode block to disk allocation block */
- shift = HFSPLUS_SB(sb).alloc_blksz_shift - sb->s_blocksize_bits;
- ablock = iblock >> HFSPLUS_SB(sb).fs_shift;
+ shift = sbi->alloc_blksz_shift - sb->s_blocksize_bits;
+ ablock = iblock >> sbi->fs_shift;
- if (iblock >= HFSPLUS_I(inode).fs_blocks) {
- if (iblock > HFSPLUS_I(inode).fs_blocks || !create)
+ if (iblock >= hip->fs_blocks) {
+ if (iblock > hip->fs_blocks || !create)
return -EIO;
- if (ablock >= HFSPLUS_I(inode).alloc_blocks) {
+ if (ablock >= hip->alloc_blocks) {
res = hfsplus_file_extend(inode);
if (res)
return res;
} else
create = 0;
- if (ablock < HFSPLUS_I(inode).first_blocks) {
- dblock = hfsplus_ext_find_block(HFSPLUS_I(inode).first_extents, ablock);
+ if (ablock < hip->first_blocks) {
+ dblock = hfsplus_ext_find_block(hip->first_extents, ablock);
goto done;
}
if (inode->i_ino == HFSPLUS_EXT_CNID)
return -EIO;
- mutex_lock(&HFSPLUS_I(inode).extents_lock);
+ mutex_lock(&hip->extents_lock);
res = hfsplus_ext_read_extent(inode, ablock);
if (!res) {
- dblock = hfsplus_ext_find_block(HFSPLUS_I(inode).cached_extents, ablock -
- HFSPLUS_I(inode).cached_start);
+ dblock = hfsplus_ext_find_block(hip->cached_extents,
+ ablock - hip->cached_start);
} else {
- mutex_unlock(&HFSPLUS_I(inode).extents_lock);
+ mutex_unlock(&hip->extents_lock);
return -EIO;
}
- mutex_unlock(&HFSPLUS_I(inode).extents_lock);
+ mutex_unlock(&hip->extents_lock);
done:
dprint(DBG_EXTENT, "get_block(%lu): %llu - %u\n", inode->i_ino, (long long)iblock, dblock);
- mask = (1 << HFSPLUS_SB(sb).fs_shift) - 1;
- map_bh(bh_result, sb, (dblock << HFSPLUS_SB(sb).fs_shift) + HFSPLUS_SB(sb).blockoffset + (iblock & mask));
+ mask = (1 << sbi->fs_shift) - 1;
+ map_bh(bh_result, sb, (dblock << sbi->fs_shift) + sbi->blockoffset + (iblock & mask));
if (create) {
set_buffer_new(bh_result);
- HFSPLUS_I(inode).phys_size += sb->s_blocksize;
- HFSPLUS_I(inode).fs_blocks++;
+ hip->phys_size += sb->s_blocksize;
+ hip->fs_blocks++;
inode_add_bytes(inode, sb->s_blocksize);
mark_inode_dirty(inode);
}
if (total_blocks == blocks)
return 0;
- hfs_find_init(HFSPLUS_SB(sb).ext_tree, &fd);
+ hfs_find_init(HFSPLUS_SB(sb)->ext_tree, &fd);
do {
res = __hfsplus_ext_read_extent(&fd, ext_entry, cnid,
total_blocks, type);
int hfsplus_file_extend(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
u32 start, len, goal;
int res;
- if (HFSPLUS_SB(sb).alloc_file->i_size * 8 < HFSPLUS_SB(sb).total_blocks - HFSPLUS_SB(sb).free_blocks + 8) {
+ if (sbi->alloc_file->i_size * 8 <
+ sbi->total_blocks - sbi->free_blocks + 8) {
// extend alloc file
- printk(KERN_ERR "hfs: extend alloc file! (%Lu,%u,%u)\n", HFSPLUS_SB(sb).alloc_file->i_size * 8,
- HFSPLUS_SB(sb).total_blocks, HFSPLUS_SB(sb).free_blocks);
+ printk(KERN_ERR "hfs: extend alloc file! (%Lu,%u,%u)\n",
+ sbi->alloc_file->i_size * 8,
+ sbi->total_blocks, sbi->free_blocks);
return -ENOSPC;
}
- mutex_lock(&HFSPLUS_I(inode).extents_lock);
- if (HFSPLUS_I(inode).alloc_blocks == HFSPLUS_I(inode).first_blocks)
- goal = hfsplus_ext_lastblock(HFSPLUS_I(inode).first_extents);
+ mutex_lock(&hip->extents_lock);
+ if (hip->alloc_blocks == hip->first_blocks)
+ goal = hfsplus_ext_lastblock(hip->first_extents);
else {
- res = hfsplus_ext_read_extent(inode, HFSPLUS_I(inode).alloc_blocks);
+ res = hfsplus_ext_read_extent(inode, hip->alloc_blocks);
if (res)
goto out;
- goal = hfsplus_ext_lastblock(HFSPLUS_I(inode).cached_extents);
+ goal = hfsplus_ext_lastblock(hip->cached_extents);
}
- len = HFSPLUS_I(inode).clump_blocks;
- start = hfsplus_block_allocate(sb, HFSPLUS_SB(sb).total_blocks, goal, &len);
- if (start >= HFSPLUS_SB(sb).total_blocks) {
+ len = hip->clump_blocks;
+ start = hfsplus_block_allocate(sb, sbi->total_blocks, goal, &len);
+ if (start >= sbi->total_blocks) {
start = hfsplus_block_allocate(sb, goal, 0, &len);
if (start >= goal) {
res = -ENOSPC;
}
dprint(DBG_EXTENT, "extend %lu: %u,%u\n", inode->i_ino, start, len);
- if (HFSPLUS_I(inode).alloc_blocks <= HFSPLUS_I(inode).first_blocks) {
- if (!HFSPLUS_I(inode).first_blocks) {
+
+ if (hip->alloc_blocks <= hip->first_blocks) {
+ if (!hip->first_blocks) {
dprint(DBG_EXTENT, "first extents\n");
/* no extents yet */
- HFSPLUS_I(inode).first_extents[0].start_block = cpu_to_be32(start);
- HFSPLUS_I(inode).first_extents[0].block_count = cpu_to_be32(len);
+ hip->first_extents[0].start_block = cpu_to_be32(start);
+ hip->first_extents[0].block_count = cpu_to_be32(len);
res = 0;
} else {
/* try to append to extents in inode */
- res = hfsplus_add_extent(HFSPLUS_I(inode).first_extents,
- HFSPLUS_I(inode).alloc_blocks,
+ res = hfsplus_add_extent(hip->first_extents,
+ hip->alloc_blocks,
start, len);
if (res == -ENOSPC)
goto insert_extent;
}
if (!res) {
- hfsplus_dump_extent(HFSPLUS_I(inode).first_extents);
- HFSPLUS_I(inode).first_blocks += len;
+ hfsplus_dump_extent(hip->first_extents);
+ hip->first_blocks += len;
}
} else {
- res = hfsplus_add_extent(HFSPLUS_I(inode).cached_extents,
- HFSPLUS_I(inode).alloc_blocks -
- HFSPLUS_I(inode).cached_start,
+ res = hfsplus_add_extent(hip->cached_extents,
+ hip->alloc_blocks - hip->cached_start,
start, len);
if (!res) {
- hfsplus_dump_extent(HFSPLUS_I(inode).cached_extents);
- HFSPLUS_I(inode).flags |= HFSPLUS_FLG_EXT_DIRTY;
- HFSPLUS_I(inode).cached_blocks += len;
+ hfsplus_dump_extent(hip->cached_extents);
+ hip->flags |= HFSPLUS_FLG_EXT_DIRTY;
+ hip->cached_blocks += len;
} else if (res == -ENOSPC)
goto insert_extent;
}
out:
- mutex_unlock(&HFSPLUS_I(inode).extents_lock);
+ mutex_unlock(&hip->extents_lock);
if (!res) {
- HFSPLUS_I(inode).alloc_blocks += len;
+ hip->alloc_blocks += len;
mark_inode_dirty(inode);
}
return res;
insert_extent:
dprint(DBG_EXTENT, "insert new extent\n");
- hfsplus_ext_write_extent(inode);
+ hfsplus_ext_write_extent_locked(inode);
- memset(HFSPLUS_I(inode).cached_extents, 0, sizeof(hfsplus_extent_rec));
- HFSPLUS_I(inode).cached_extents[0].start_block = cpu_to_be32(start);
- HFSPLUS_I(inode).cached_extents[0].block_count = cpu_to_be32(len);
- hfsplus_dump_extent(HFSPLUS_I(inode).cached_extents);
- HFSPLUS_I(inode).flags |= HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW;
- HFSPLUS_I(inode).cached_start = HFSPLUS_I(inode).alloc_blocks;
- HFSPLUS_I(inode).cached_blocks = len;
+ memset(hip->cached_extents, 0, sizeof(hfsplus_extent_rec));
+ hip->cached_extents[0].start_block = cpu_to_be32(start);
+ hip->cached_extents[0].block_count = cpu_to_be32(len);
+ hfsplus_dump_extent(hip->cached_extents);
+ hip->flags |= HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW;
+ hip->cached_start = hip->alloc_blocks;
+ hip->cached_blocks = len;
res = 0;
goto out;
void hfsplus_file_truncate(struct inode *inode)
{
struct super_block *sb = inode->i_sb;
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
struct hfs_find_data fd;
u32 alloc_cnt, blk_cnt, start;
int res;
- dprint(DBG_INODE, "truncate: %lu, %Lu -> %Lu\n", inode->i_ino,
- (long long)HFSPLUS_I(inode).phys_size, inode->i_size);
- if (inode->i_size > HFSPLUS_I(inode).phys_size) {
+ dprint(DBG_INODE, "truncate: %lu, %Lu -> %Lu\n",
+ inode->i_ino, (long long)hip->phys_size, inode->i_size);
+
+ if (inode->i_size > hip->phys_size) {
struct address_space *mapping = inode->i_mapping;
struct page *page;
void *fsdata;
return;
mark_inode_dirty(inode);
return;
- } else if (inode->i_size == HFSPLUS_I(inode).phys_size)
+ } else if (inode->i_size == hip->phys_size)
return;
- blk_cnt = (inode->i_size + HFSPLUS_SB(sb).alloc_blksz - 1) >> HFSPLUS_SB(sb).alloc_blksz_shift;
- alloc_cnt = HFSPLUS_I(inode).alloc_blocks;
+ blk_cnt = (inode->i_size + HFSPLUS_SB(sb)->alloc_blksz - 1) >>
+ HFSPLUS_SB(sb)->alloc_blksz_shift;
+ alloc_cnt = hip->alloc_blocks;
if (blk_cnt == alloc_cnt)
goto out;
- mutex_lock(&HFSPLUS_I(inode).extents_lock);
- hfs_find_init(HFSPLUS_SB(sb).ext_tree, &fd);
+ mutex_lock(&hip->extents_lock);
+ hfs_find_init(HFSPLUS_SB(sb)->ext_tree, &fd);
while (1) {
- if (alloc_cnt == HFSPLUS_I(inode).first_blocks) {
- hfsplus_free_extents(sb, HFSPLUS_I(inode).first_extents,
+ if (alloc_cnt == hip->first_blocks) {
+ hfsplus_free_extents(sb, hip->first_extents,
alloc_cnt, alloc_cnt - blk_cnt);
- hfsplus_dump_extent(HFSPLUS_I(inode).first_extents);
- HFSPLUS_I(inode).first_blocks = blk_cnt;
+ hfsplus_dump_extent(hip->first_extents);
+ hip->first_blocks = blk_cnt;
break;
}
res = __hfsplus_ext_cache_extent(&fd, inode, alloc_cnt);
if (res)
break;
- start = HFSPLUS_I(inode).cached_start;
- hfsplus_free_extents(sb, HFSPLUS_I(inode).cached_extents,
+ start = hip->cached_start;
+ hfsplus_free_extents(sb, hip->cached_extents,
alloc_cnt - start, alloc_cnt - blk_cnt);
- hfsplus_dump_extent(HFSPLUS_I(inode).cached_extents);
+ hfsplus_dump_extent(hip->cached_extents);
if (blk_cnt > start) {
- HFSPLUS_I(inode).flags |= HFSPLUS_FLG_EXT_DIRTY;
+ hip->flags |= HFSPLUS_FLG_EXT_DIRTY;
break;
}
alloc_cnt = start;
- HFSPLUS_I(inode).cached_start = HFSPLUS_I(inode).cached_blocks = 0;
- HFSPLUS_I(inode).flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
+ hip->cached_start = hip->cached_blocks = 0;
+ hip->flags &= ~(HFSPLUS_FLG_EXT_DIRTY | HFSPLUS_FLG_EXT_NEW);
hfs_brec_remove(&fd);
}
hfs_find_exit(&fd);
- mutex_unlock(&HFSPLUS_I(inode).extents_lock);
+ mutex_unlock(&hip->extents_lock);
- HFSPLUS_I(inode).alloc_blocks = blk_cnt;
+ hip->alloc_blocks = blk_cnt;
out:
- HFSPLUS_I(inode).phys_size = inode->i_size;
- HFSPLUS_I(inode).fs_blocks = (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
- inode_set_bytes(inode, HFSPLUS_I(inode).fs_blocks << sb->s_blocksize_bits);
+ hip->phys_size = inode->i_size;
+ hip->fs_blocks = (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
+ inode_set_bytes(inode, hip->fs_blocks << sb->s_blocksize_bits);
mark_inode_dirty(inode);
}
unsigned int depth;
//unsigned int map1_size, map_size;
- struct semaphore tree_lock;
+ struct mutex tree_lock;
unsigned int pages_per_bnode;
spinlock_t hash_lock;
u32 sect_count;
int fs_shift;
- /* Stuff in host order from Vol Header */
+ /* immutable data from the volume header */
u32 alloc_blksz;
int alloc_blksz_shift;
u32 total_blocks;
+ u32 data_clump_blocks, rsrc_clump_blocks;
+
+ /* mutable data from the volume header, protected by alloc_mutex */
u32 free_blocks;
- u32 next_alloc;
+ struct mutex alloc_mutex;
+
+ /* mutable data from the volume header, protected by vh_mutex */
u32 next_cnid;
u32 file_count;
u32 folder_count;
- u32 data_clump_blocks, rsrc_clump_blocks;
+ struct mutex vh_mutex;
/* Config options */
u32 creator;
int part, session;
unsigned long flags;
-
- struct hlist_head rsrc_inodes;
};
-#define HFSPLUS_SB_WRITEBACKUP 0x0001
-#define HFSPLUS_SB_NODECOMPOSE 0x0002
-#define HFSPLUS_SB_FORCE 0x0004
-#define HFSPLUS_SB_HFSX 0x0008
-#define HFSPLUS_SB_CASEFOLD 0x0010
+#define HFSPLUS_SB_WRITEBACKUP 0
+#define HFSPLUS_SB_NODECOMPOSE 1
+#define HFSPLUS_SB_FORCE 2
+#define HFSPLUS_SB_HFSX 3
+#define HFSPLUS_SB_CASEFOLD 4
struct hfsplus_inode_info {
- struct mutex extents_lock;
- u32 clump_blocks, alloc_blocks;
- sector_t fs_blocks;
- /* Allocation extents from catalog record or volume header */
- hfsplus_extent_rec first_extents;
- u32 first_blocks;
- hfsplus_extent_rec cached_extents;
- u32 cached_start, cached_blocks;
atomic_t opencnt;
- struct inode *rsrc_inode;
+ /*
+ * Extent allocation information, protected by extents_lock.
+ */
+ u32 first_blocks;
+ u32 clump_blocks;
+ u32 alloc_blocks;
+ u32 cached_start;
+ u32 cached_blocks;
+ hfsplus_extent_rec first_extents;
+ hfsplus_extent_rec cached_extents;
unsigned long flags;
+ struct mutex extents_lock;
+ /*
+ * Immutable data.
+ */
+ struct inode *rsrc_inode;
__be32 create_date;
- /* Device number in hfsplus_permissions in catalog */
- u32 dev;
- /* BSD system and user file flags */
- u8 rootflags;
- u8 userflags;
+ /*
+ * Protected by sbi->vh_mutex.
+ */
+ u32 linkid;
+
+ /*
+ * Protected by i_mutex.
+ */
+ sector_t fs_blocks;
+ u8 userflags; /* BSD user file flags */
struct list_head open_dir_list;
loff_t phys_size;
+
struct inode vfs_inode;
};
#define HFSPLUS_FLG_EXT_DIRTY 0x0002
#define HFSPLUS_FLG_EXT_NEW 0x0004
-#define HFSPLUS_IS_DATA(inode) (!(HFSPLUS_I(inode).flags & HFSPLUS_FLG_RSRC))
-#define HFSPLUS_IS_RSRC(inode) (HFSPLUS_I(inode).flags & HFSPLUS_FLG_RSRC)
+#define HFSPLUS_IS_DATA(inode) (!(HFSPLUS_I(inode)->flags & HFSPLUS_FLG_RSRC))
+#define HFSPLUS_IS_RSRC(inode) (HFSPLUS_I(inode)->flags & HFSPLUS_FLG_RSRC)
struct hfs_find_data {
/* filled by caller */
int hfsplus_delete_cat(u32, struct inode *, struct qstr *);
int hfsplus_rename_cat(u32, struct inode *, struct qstr *,
struct inode *, struct qstr *);
+void hfsplus_cat_set_perms(struct inode *inode, struct hfsplus_perm *perms);
/* dir.c */
extern const struct inode_operations hfsplus_dir_inode_operations;
int hfs_part_find(struct super_block *, sector_t *, sector_t *);
/* access macros */
-/*
static inline struct hfsplus_sb_info *HFSPLUS_SB(struct super_block *sb)
{
return sb->s_fs_info;
}
+
static inline struct hfsplus_inode_info *HFSPLUS_I(struct inode *inode)
{
return list_entry(inode, struct hfsplus_inode_info, vfs_inode);
}
-*/
-#define HFSPLUS_SB(super) (*(struct hfsplus_sb_info *)(super)->s_fs_info)
-#define HFSPLUS_I(inode) (*list_entry(inode, struct hfsplus_inode_info, vfs_inode))
-
-#if 1
-#define hfsplus_kmap(p) ({ struct page *__p = (p); kmap(__p); })
-#define hfsplus_kunmap(p) ({ struct page *__p = (p); kunmap(__p); __p; })
-#else
-#define hfsplus_kmap(p) kmap(p)
-#define hfsplus_kunmap(p) kunmap(p)
-#endif
#define sb_bread512(sb, sec, data) ({ \
struct buffer_head *__bh; \
#define hfsp_ut2mt(t) __hfsp_ut2mt((t).tv_sec)
#define hfsp_now2mt() __hfsp_ut2mt(get_seconds())
-#define kdev_t_to_nr(x) (x)
-
#endif
struct hfsplus_unistr name;
} __packed;
+#define HFSPLUS_CAT_KEYLEN (sizeof(struct hfsplus_cat_key))
/* Structs from hfs.h */
struct hfsp_point {
__be32 start_block;
} __packed;
-#define HFSPLUS_EXT_KEYLEN 12
+#define HFSPLUS_EXT_KEYLEN sizeof(struct hfsplus_ext_key)
/* HFS+ generic BTree key */
typedef union {
*pagep = NULL;
ret = cont_write_begin(file, mapping, pos, len, flags, pagep, fsdata,
hfsplus_get_block,
- &HFSPLUS_I(mapping->host).phys_size);
+ &HFSPLUS_I(mapping->host)->phys_size);
if (unlikely(ret)) {
loff_t isize = mapping->host->i_size;
if (pos + len > isize)
switch (inode->i_ino) {
case HFSPLUS_EXT_CNID:
- tree = HFSPLUS_SB(sb).ext_tree;
+ tree = HFSPLUS_SB(sb)->ext_tree;
break;
case HFSPLUS_CAT_CNID:
- tree = HFSPLUS_SB(sb).cat_tree;
+ tree = HFSPLUS_SB(sb)->cat_tree;
break;
case HFSPLUS_ATTR_CNID:
- tree = HFSPLUS_SB(sb).attr_tree;
+ tree = HFSPLUS_SB(sb)->attr_tree;
break;
default:
BUG();
struct hfs_find_data fd;
struct super_block *sb = dir->i_sb;
struct inode *inode = NULL;
+ struct hfsplus_inode_info *hip;
int err;
if (HFSPLUS_IS_RSRC(dir) || strcmp(dentry->d_name.name, "rsrc"))
goto out;
- inode = HFSPLUS_I(dir).rsrc_inode;
+ inode = HFSPLUS_I(dir)->rsrc_inode;
if (inode)
goto out;
if (!inode)
return ERR_PTR(-ENOMEM);
+ hip = HFSPLUS_I(inode);
inode->i_ino = dir->i_ino;
- INIT_LIST_HEAD(&HFSPLUS_I(inode).open_dir_list);
- mutex_init(&HFSPLUS_I(inode).extents_lock);
- HFSPLUS_I(inode).flags = HFSPLUS_FLG_RSRC;
+ INIT_LIST_HEAD(&hip->open_dir_list);
+ mutex_init(&hip->extents_lock);
+ hip->flags = HFSPLUS_FLG_RSRC;
- hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+ hfs_find_init(HFSPLUS_SB(sb)->cat_tree, &fd);
err = hfsplus_find_cat(sb, dir->i_ino, &fd);
if (!err)
err = hfsplus_cat_read_inode(inode, &fd);
iput(inode);
return ERR_PTR(err);
}
- HFSPLUS_I(inode).rsrc_inode = dir;
- HFSPLUS_I(dir).rsrc_inode = inode;
+ hip->rsrc_inode = dir;
+ HFSPLUS_I(dir)->rsrc_inode = inode;
igrab(dir);
- hlist_add_head(&inode->i_hash, &HFSPLUS_SB(sb).rsrc_inodes);
+
+ /*
+ * __mark_inode_dirty expects inodes to be hashed. Since we don't
+ * want resource fork inodes in the regular inode space, we make them
+ * appear hashed, but do not put on any lists. hlist_del()
+ * will work fine and require no locking.
+ */
+ inode->i_hash.pprev = &inode->i_hash.next;
+
mark_inode_dirty(inode);
out:
d_add(dentry, inode);
static void hfsplus_get_perms(struct inode *inode, struct hfsplus_perm *perms, int dir)
{
- struct super_block *sb = inode->i_sb;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
u16 mode;
mode = be16_to_cpu(perms->mode);
inode->i_uid = be32_to_cpu(perms->owner);
if (!inode->i_uid && !mode)
- inode->i_uid = HFSPLUS_SB(sb).uid;
+ inode->i_uid = sbi->uid;
inode->i_gid = be32_to_cpu(perms->group);
if (!inode->i_gid && !mode)
- inode->i_gid = HFSPLUS_SB(sb).gid;
+ inode->i_gid = sbi->gid;
if (dir) {
- mode = mode ? (mode & S_IALLUGO) :
- (S_IRWXUGO & ~(HFSPLUS_SB(sb).umask));
+ mode = mode ? (mode & S_IALLUGO) : (S_IRWXUGO & ~(sbi->umask));
mode |= S_IFDIR;
} else if (!mode)
- mode = S_IFREG | ((S_IRUGO|S_IWUGO) &
- ~(HFSPLUS_SB(sb).umask));
+ mode = S_IFREG | ((S_IRUGO|S_IWUGO) & ~(sbi->umask));
inode->i_mode = mode;
- HFSPLUS_I(inode).rootflags = perms->rootflags;
- HFSPLUS_I(inode).userflags = perms->userflags;
+ HFSPLUS_I(inode)->userflags = perms->userflags;
if (perms->rootflags & HFSPLUS_FLG_IMMUTABLE)
inode->i_flags |= S_IMMUTABLE;
else
inode->i_flags &= ~S_APPEND;
}
-static void hfsplus_set_perms(struct inode *inode, struct hfsplus_perm *perms)
-{
- if (inode->i_flags & S_IMMUTABLE)
- perms->rootflags |= HFSPLUS_FLG_IMMUTABLE;
- else
- perms->rootflags &= ~HFSPLUS_FLG_IMMUTABLE;
- if (inode->i_flags & S_APPEND)
- perms->rootflags |= HFSPLUS_FLG_APPEND;
- else
- perms->rootflags &= ~HFSPLUS_FLG_APPEND;
- perms->userflags = HFSPLUS_I(inode).userflags;
- perms->mode = cpu_to_be16(inode->i_mode);
- perms->owner = cpu_to_be32(inode->i_uid);
- perms->group = cpu_to_be32(inode->i_gid);
- perms->dev = cpu_to_be32(HFSPLUS_I(inode).dev);
-}
-
static int hfsplus_file_open(struct inode *inode, struct file *file)
{
if (HFSPLUS_IS_RSRC(inode))
- inode = HFSPLUS_I(inode).rsrc_inode;
+ inode = HFSPLUS_I(inode)->rsrc_inode;
if (!(file->f_flags & O_LARGEFILE) && i_size_read(inode) > MAX_NON_LFS)
return -EOVERFLOW;
- atomic_inc(&HFSPLUS_I(inode).opencnt);
+ atomic_inc(&HFSPLUS_I(inode)->opencnt);
return 0;
}
struct super_block *sb = inode->i_sb;
if (HFSPLUS_IS_RSRC(inode))
- inode = HFSPLUS_I(inode).rsrc_inode;
- if (atomic_dec_and_test(&HFSPLUS_I(inode).opencnt)) {
+ inode = HFSPLUS_I(inode)->rsrc_inode;
+ if (atomic_dec_and_test(&HFSPLUS_I(inode)->opencnt)) {
mutex_lock(&inode->i_mutex);
hfsplus_file_truncate(inode);
if (inode->i_flags & S_DEAD) {
- hfsplus_delete_cat(inode->i_ino, HFSPLUS_SB(sb).hidden_dir, NULL);
+ hfsplus_delete_cat(inode->i_ino,
+ HFSPLUS_SB(sb)->hidden_dir, NULL);
hfsplus_delete_inode(inode);
}
mutex_unlock(&inode->i_mutex);
struct inode *hfsplus_new_inode(struct super_block *sb, int mode)
{
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
struct inode *inode = new_inode(sb);
+ struct hfsplus_inode_info *hip;
+
if (!inode)
return NULL;
- inode->i_ino = HFSPLUS_SB(sb).next_cnid++;
+ inode->i_ino = sbi->next_cnid++;
inode->i_mode = mode;
inode->i_uid = current_fsuid();
inode->i_gid = current_fsgid();
inode->i_nlink = 1;
inode->i_mtime = inode->i_atime = inode->i_ctime = CURRENT_TIME_SEC;
- INIT_LIST_HEAD(&HFSPLUS_I(inode).open_dir_list);
- mutex_init(&HFSPLUS_I(inode).extents_lock);
- atomic_set(&HFSPLUS_I(inode).opencnt, 0);
- HFSPLUS_I(inode).flags = 0;
- memset(HFSPLUS_I(inode).first_extents, 0, sizeof(hfsplus_extent_rec));
- memset(HFSPLUS_I(inode).cached_extents, 0, sizeof(hfsplus_extent_rec));
- HFSPLUS_I(inode).alloc_blocks = 0;
- HFSPLUS_I(inode).first_blocks = 0;
- HFSPLUS_I(inode).cached_start = 0;
- HFSPLUS_I(inode).cached_blocks = 0;
- HFSPLUS_I(inode).phys_size = 0;
- HFSPLUS_I(inode).fs_blocks = 0;
- HFSPLUS_I(inode).rsrc_inode = NULL;
+
+ hip = HFSPLUS_I(inode);
+ INIT_LIST_HEAD(&hip->open_dir_list);
+ mutex_init(&hip->extents_lock);
+ atomic_set(&hip->opencnt, 0);
+ hip->flags = 0;
+ memset(hip->first_extents, 0, sizeof(hfsplus_extent_rec));
+ memset(hip->cached_extents, 0, sizeof(hfsplus_extent_rec));
+ hip->alloc_blocks = 0;
+ hip->first_blocks = 0;
+ hip->cached_start = 0;
+ hip->cached_blocks = 0;
+ hip->phys_size = 0;
+ hip->fs_blocks = 0;
+ hip->rsrc_inode = NULL;
if (S_ISDIR(inode->i_mode)) {
inode->i_size = 2;
- HFSPLUS_SB(sb).folder_count++;
+ sbi->folder_count++;
inode->i_op = &hfsplus_dir_inode_operations;
inode->i_fop = &hfsplus_dir_operations;
} else if (S_ISREG(inode->i_mode)) {
- HFSPLUS_SB(sb).file_count++;
+ sbi->file_count++;
inode->i_op = &hfsplus_file_inode_operations;
inode->i_fop = &hfsplus_file_operations;
inode->i_mapping->a_ops = &hfsplus_aops;
- HFSPLUS_I(inode).clump_blocks = HFSPLUS_SB(sb).data_clump_blocks;
+ hip->clump_blocks = sbi->data_clump_blocks;
} else if (S_ISLNK(inode->i_mode)) {
- HFSPLUS_SB(sb).file_count++;
+ sbi->file_count++;
inode->i_op = &page_symlink_inode_operations;
inode->i_mapping->a_ops = &hfsplus_aops;
- HFSPLUS_I(inode).clump_blocks = 1;
+ hip->clump_blocks = 1;
} else
- HFSPLUS_SB(sb).file_count++;
+ sbi->file_count++;
insert_inode_hash(inode);
mark_inode_dirty(inode);
sb->s_dirt = 1;
struct super_block *sb = inode->i_sb;
if (S_ISDIR(inode->i_mode)) {
- HFSPLUS_SB(sb).folder_count--;
+ HFSPLUS_SB(sb)->folder_count--;
sb->s_dirt = 1;
return;
}
- HFSPLUS_SB(sb).file_count--;
+ HFSPLUS_SB(sb)->file_count--;
if (S_ISREG(inode->i_mode)) {
if (!inode->i_nlink) {
inode->i_size = 0;
void hfsplus_inode_read_fork(struct inode *inode, struct hfsplus_fork_raw *fork)
{
struct super_block *sb = inode->i_sb;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
u32 count;
int i;
- memcpy(&HFSPLUS_I(inode).first_extents, &fork->extents,
- sizeof(hfsplus_extent_rec));
+ memcpy(&hip->first_extents, &fork->extents, sizeof(hfsplus_extent_rec));
for (count = 0, i = 0; i < 8; i++)
count += be32_to_cpu(fork->extents[i].block_count);
- HFSPLUS_I(inode).first_blocks = count;
- memset(HFSPLUS_I(inode).cached_extents, 0, sizeof(hfsplus_extent_rec));
- HFSPLUS_I(inode).cached_start = 0;
- HFSPLUS_I(inode).cached_blocks = 0;
-
- HFSPLUS_I(inode).alloc_blocks = be32_to_cpu(fork->total_blocks);
- inode->i_size = HFSPLUS_I(inode).phys_size = be64_to_cpu(fork->total_size);
- HFSPLUS_I(inode).fs_blocks = (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
- inode_set_bytes(inode, HFSPLUS_I(inode).fs_blocks << sb->s_blocksize_bits);
- HFSPLUS_I(inode).clump_blocks = be32_to_cpu(fork->clump_size) >> HFSPLUS_SB(sb).alloc_blksz_shift;
- if (!HFSPLUS_I(inode).clump_blocks)
- HFSPLUS_I(inode).clump_blocks = HFSPLUS_IS_RSRC(inode) ? HFSPLUS_SB(sb).rsrc_clump_blocks :
- HFSPLUS_SB(sb).data_clump_blocks;
+ hip->first_blocks = count;
+ memset(hip->cached_extents, 0, sizeof(hfsplus_extent_rec));
+ hip->cached_start = 0;
+ hip->cached_blocks = 0;
+
+ hip->alloc_blocks = be32_to_cpu(fork->total_blocks);
+ hip->phys_size = inode->i_size = be64_to_cpu(fork->total_size);
+ hip->fs_blocks =
+ (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits;
+ inode_set_bytes(inode, hip->fs_blocks << sb->s_blocksize_bits);
+ hip->clump_blocks =
+ be32_to_cpu(fork->clump_size) >> sbi->alloc_blksz_shift;
+ if (!hip->clump_blocks) {
+ hip->clump_blocks = HFSPLUS_IS_RSRC(inode) ?
+ sbi->rsrc_clump_blocks :
+ sbi->data_clump_blocks;
+ }
}
void hfsplus_inode_write_fork(struct inode *inode, struct hfsplus_fork_raw *fork)
{
- memcpy(&fork->extents, &HFSPLUS_I(inode).first_extents,
+ memcpy(&fork->extents, &HFSPLUS_I(inode)->first_extents,
sizeof(hfsplus_extent_rec));
fork->total_size = cpu_to_be64(inode->i_size);
- fork->total_blocks = cpu_to_be32(HFSPLUS_I(inode).alloc_blocks);
+ fork->total_blocks = cpu_to_be32(HFSPLUS_I(inode)->alloc_blocks);
}
int hfsplus_cat_read_inode(struct inode *inode, struct hfs_find_data *fd)
type = hfs_bnode_read_u16(fd->bnode, fd->entryoffset);
- HFSPLUS_I(inode).dev = 0;
+ HFSPLUS_I(inode)->linkid = 0;
if (type == HFSPLUS_FOLDER) {
struct hfsplus_cat_folder *folder = &entry.folder;
inode->i_atime = hfsp_mt2ut(folder->access_date);
inode->i_mtime = hfsp_mt2ut(folder->content_mod_date);
inode->i_ctime = hfsp_mt2ut(folder->attribute_mod_date);
- HFSPLUS_I(inode).create_date = folder->create_date;
- HFSPLUS_I(inode).fs_blocks = 0;
+ HFSPLUS_I(inode)->create_date = folder->create_date;
+ HFSPLUS_I(inode)->fs_blocks = 0;
inode->i_op = &hfsplus_dir_inode_operations;
inode->i_fop = &hfsplus_dir_operations;
} else if (type == HFSPLUS_FILE) {
inode->i_atime = hfsp_mt2ut(file->access_date);
inode->i_mtime = hfsp_mt2ut(file->content_mod_date);
inode->i_ctime = hfsp_mt2ut(file->attribute_mod_date);
- HFSPLUS_I(inode).create_date = file->create_date;
+ HFSPLUS_I(inode)->create_date = file->create_date;
} else {
printk(KERN_ERR "hfs: bad catalog entry used to create inode\n");
res = -EIO;
hfsplus_cat_entry entry;
if (HFSPLUS_IS_RSRC(inode))
- main_inode = HFSPLUS_I(inode).rsrc_inode;
+ main_inode = HFSPLUS_I(inode)->rsrc_inode;
if (!main_inode->i_nlink)
return 0;
- if (hfs_find_init(HFSPLUS_SB(main_inode->i_sb).cat_tree, &fd))
+ if (hfs_find_init(HFSPLUS_SB(main_inode->i_sb)->cat_tree, &fd))
/* panic? */
return -EIO;
hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
sizeof(struct hfsplus_cat_folder));
/* simple node checks? */
- hfsplus_set_perms(inode, &folder->permissions);
+ hfsplus_cat_set_perms(inode, &folder->permissions);
folder->access_date = hfsp_ut2mt(inode->i_atime);
folder->content_mod_date = hfsp_ut2mt(inode->i_mtime);
folder->attribute_mod_date = hfsp_ut2mt(inode->i_ctime);
hfs_bnode_read(fd.bnode, &entry, fd.entryoffset,
sizeof(struct hfsplus_cat_file));
hfsplus_inode_write_fork(inode, &file->data_fork);
- if (S_ISREG(inode->i_mode))
- HFSPLUS_I(inode).dev = inode->i_nlink;
- if (S_ISCHR(inode->i_mode) || S_ISBLK(inode->i_mode))
- HFSPLUS_I(inode).dev = kdev_t_to_nr(inode->i_rdev);
- hfsplus_set_perms(inode, &file->permissions);
+ hfsplus_cat_set_perms(inode, &file->permissions);
if ((file->permissions.rootflags | file->permissions.userflags) & HFSPLUS_FLG_IMMUTABLE)
file->flags |= cpu_to_be16(HFSPLUS_FILE_LOCKED);
else
#include <linux/mount.h>
#include <linux/sched.h>
#include <linux/xattr.h>
-#include <linux/smp_lock.h>
#include <asm/uaccess.h>
#include "hfsplus_fs.h"
-long hfsplus_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
+static int hfsplus_ioctl_getflags(struct file *file, int __user *user_flags)
{
- struct inode *inode = filp->f_path.dentry->d_inode;
+ struct inode *inode = file->f_path.dentry->d_inode;
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
+ unsigned int flags = 0;
+
+ if (inode->i_flags & S_IMMUTABLE)
+ flags |= FS_IMMUTABLE_FL;
+ if (inode->i_flags |= S_APPEND)
+ flags |= FS_APPEND_FL;
+ if (hip->userflags & HFSPLUS_FLG_NODUMP)
+ flags |= FS_NODUMP_FL;
+
+ return put_user(flags, user_flags);
+}
+
+static int hfsplus_ioctl_setflags(struct file *file, int __user *user_flags)
+{
+ struct inode *inode = file->f_path.dentry->d_inode;
+ struct hfsplus_inode_info *hip = HFSPLUS_I(inode);
unsigned int flags;
+ int err = 0;
- lock_kernel();
- switch (cmd) {
- case HFSPLUS_IOC_EXT2_GETFLAGS:
- flags = 0;
- if (HFSPLUS_I(inode).rootflags & HFSPLUS_FLG_IMMUTABLE)
- flags |= FS_IMMUTABLE_FL; /* EXT2_IMMUTABLE_FL */
- if (HFSPLUS_I(inode).rootflags & HFSPLUS_FLG_APPEND)
- flags |= FS_APPEND_FL; /* EXT2_APPEND_FL */
- if (HFSPLUS_I(inode).userflags & HFSPLUS_FLG_NODUMP)
- flags |= FS_NODUMP_FL; /* EXT2_NODUMP_FL */
- return put_user(flags, (int __user *)arg);
- case HFSPLUS_IOC_EXT2_SETFLAGS: {
- int err = 0;
- err = mnt_want_write(filp->f_path.mnt);
- if (err) {
- unlock_kernel();
- return err;
- }
+ err = mnt_want_write(file->f_path.mnt);
+ if (err)
+ goto out;
- if (!is_owner_or_cap(inode)) {
- err = -EACCES;
- goto setflags_out;
- }
- if (get_user(flags, (int __user *)arg)) {
- err = -EFAULT;
- goto setflags_out;
- }
- if (flags & (FS_IMMUTABLE_FL|FS_APPEND_FL) ||
- HFSPLUS_I(inode).rootflags & (HFSPLUS_FLG_IMMUTABLE|HFSPLUS_FLG_APPEND)) {
- if (!capable(CAP_LINUX_IMMUTABLE)) {
- err = -EPERM;
- goto setflags_out;
- }
- }
+ if (!is_owner_or_cap(inode)) {
+ err = -EACCES;
+ goto out_drop_write;
+ }
- /* don't silently ignore unsupported ext2 flags */
- if (flags & ~(FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NODUMP_FL)) {
- err = -EOPNOTSUPP;
- goto setflags_out;
- }
- if (flags & FS_IMMUTABLE_FL) { /* EXT2_IMMUTABLE_FL */
- inode->i_flags |= S_IMMUTABLE;
- HFSPLUS_I(inode).rootflags |= HFSPLUS_FLG_IMMUTABLE;
- } else {
- inode->i_flags &= ~S_IMMUTABLE;
- HFSPLUS_I(inode).rootflags &= ~HFSPLUS_FLG_IMMUTABLE;
- }
- if (flags & FS_APPEND_FL) { /* EXT2_APPEND_FL */
- inode->i_flags |= S_APPEND;
- HFSPLUS_I(inode).rootflags |= HFSPLUS_FLG_APPEND;
- } else {
- inode->i_flags &= ~S_APPEND;
- HFSPLUS_I(inode).rootflags &= ~HFSPLUS_FLG_APPEND;
+ if (get_user(flags, user_flags)) {
+ err = -EFAULT;
+ goto out_drop_write;
+ }
+
+ mutex_lock(&inode->i_mutex);
+
+ if ((flags & (FS_IMMUTABLE_FL|FS_APPEND_FL)) ||
+ inode->i_flags & (S_IMMUTABLE|S_APPEND)) {
+ if (!capable(CAP_LINUX_IMMUTABLE)) {
+ err = -EPERM;
+ goto out_unlock_inode;
}
- if (flags & FS_NODUMP_FL) /* EXT2_NODUMP_FL */
- HFSPLUS_I(inode).userflags |= HFSPLUS_FLG_NODUMP;
- else
- HFSPLUS_I(inode).userflags &= ~HFSPLUS_FLG_NODUMP;
-
- inode->i_ctime = CURRENT_TIME_SEC;
- mark_inode_dirty(inode);
-setflags_out:
- mnt_drop_write(filp->f_path.mnt);
- unlock_kernel();
- return err;
}
+
+ /* don't silently ignore unsupported ext2 flags */
+ if (flags & ~(FS_IMMUTABLE_FL|FS_APPEND_FL|FS_NODUMP_FL)) {
+ err = -EOPNOTSUPP;
+ goto out_unlock_inode;
+ }
+
+ if (flags & FS_IMMUTABLE_FL)
+ inode->i_flags |= S_IMMUTABLE;
+ else
+ inode->i_flags &= ~S_IMMUTABLE;
+
+ if (flags & FS_APPEND_FL)
+ inode->i_flags |= S_APPEND;
+ else
+ inode->i_flags &= ~S_APPEND;
+
+ if (flags & FS_NODUMP_FL)
+ hip->userflags |= HFSPLUS_FLG_NODUMP;
+ else
+ hip->userflags &= ~HFSPLUS_FLG_NODUMP;
+
+ inode->i_ctime = CURRENT_TIME_SEC;
+ mark_inode_dirty(inode);
+
+out_unlock_inode:
+ mutex_lock(&inode->i_mutex);
+out_drop_write:
+ mnt_drop_write(file->f_path.mnt);
+out:
+ return err;
+}
+
+long hfsplus_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+ void __user *argp = (void __user *)arg;
+
+ switch (cmd) {
+ case HFSPLUS_IOC_EXT2_GETFLAGS:
+ return hfsplus_ioctl_getflags(file, argp);
+ case HFSPLUS_IOC_EXT2_SETFLAGS:
+ return hfsplus_ioctl_setflags(file, argp);
default:
- unlock_kernel();
return -ENOTTY;
}
}
if (!S_ISREG(inode->i_mode) || HFSPLUS_IS_RSRC(inode))
return -EOPNOTSUPP;
- res = hfs_find_init(HFSPLUS_SB(inode->i_sb).cat_tree, &fd);
+ res = hfs_find_init(HFSPLUS_SB(inode->i_sb)->cat_tree, &fd);
if (res)
return res;
res = hfsplus_find_cat(inode->i_sb, inode->i_ino, &fd);
return -EOPNOTSUPP;
if (size) {
- res = hfs_find_init(HFSPLUS_SB(inode->i_sb).cat_tree, &fd);
+ res = hfs_find_init(HFSPLUS_SB(inode->i_sb)->cat_tree, &fd);
if (res)
return res;
res = hfsplus_find_cat(inode->i_sb, inode->i_ino, &fd);
} else
res = size ? -ERANGE : 4;
} else
- res = -ENODATA;
+ res = -EOPNOTSUPP;
out:
if (size)
hfs_find_exit(&fd);
kfree(p);
break;
case opt_decompose:
- sbi->flags &= ~HFSPLUS_SB_NODECOMPOSE;
+ clear_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags);
break;
case opt_nodecompose:
- sbi->flags |= HFSPLUS_SB_NODECOMPOSE;
+ set_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags);
break;
case opt_force:
- sbi->flags |= HFSPLUS_SB_FORCE;
+ set_bit(HFSPLUS_SB_FORCE, &sbi->flags);
break;
default:
return 0;
int hfsplus_show_options(struct seq_file *seq, struct vfsmount *mnt)
{
- struct hfsplus_sb_info *sbi = &HFSPLUS_SB(mnt->mnt_sb);
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(mnt->mnt_sb);
if (sbi->creator != HFSPLUS_DEF_CR_TYPE)
seq_printf(seq, ",creator=%.4s", (char *)&sbi->creator);
seq_printf(seq, ",session=%u", sbi->session);
if (sbi->nls)
seq_printf(seq, ",nls=%s", sbi->nls->charset);
- if (sbi->flags & HFSPLUS_SB_NODECOMPOSE)
+ if (test_bit(HFSPLUS_SB_NODECOMPOSE, &sbi->flags))
seq_printf(seq, ",nodecompose");
return 0;
}
int hfs_part_find(struct super_block *sb,
sector_t *part_start, sector_t *part_size)
{
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
struct buffer_head *bh;
__be16 *data;
int i, size, res;
for (i = 0; i < size; p++, i++) {
if (p->pdStart && p->pdSize &&
p->pdFSID == cpu_to_be32(0x54465331)/*"TFS1"*/ &&
- (HFSPLUS_SB(sb).part < 0 || HFSPLUS_SB(sb).part == i)) {
+ (sbi->part < 0 || sbi->part == i)) {
*part_start += be32_to_cpu(p->pdStart);
*part_size = be32_to_cpu(p->pdSize);
res = 0;
size = be32_to_cpu(pm->pmMapBlkCnt);
for (i = 0; i < size;) {
if (!memcmp(pm->pmPartType,"Apple_HFS", 9) &&
- (HFSPLUS_SB(sb).part < 0 || HFSPLUS_SB(sb).part == i)) {
+ (sbi->part < 0 || sbi->part == i)) {
*part_start += be32_to_cpu(pm->pmPyPartStart);
*part_size = be32_to_cpu(pm->pmPartBlkCnt);
res = 0;
#include <linux/pagemap.h>
#include <linux/fs.h>
#include <linux/slab.h>
-#include <linux/smp_lock.h>
#include <linux/vfs.h>
#include <linux/nls.h>
#include "hfsplus_fs.h"
-struct inode *hfsplus_iget(struct super_block *sb, unsigned long ino)
+static int hfsplus_system_read_inode(struct inode *inode)
{
- struct hfs_find_data fd;
- struct hfsplus_vh *vhdr;
- struct inode *inode;
- long err = -EIO;
-
- inode = iget_locked(sb, ino);
- if (!inode)
- return ERR_PTR(-ENOMEM);
- if (!(inode->i_state & I_NEW))
- return inode;
+ struct hfsplus_vh *vhdr = HFSPLUS_SB(inode->i_sb)->s_vhdr;
- INIT_LIST_HEAD(&HFSPLUS_I(inode).open_dir_list);
- mutex_init(&HFSPLUS_I(inode).extents_lock);
- HFSPLUS_I(inode).flags = 0;
- HFSPLUS_I(inode).rsrc_inode = NULL;
- atomic_set(&HFSPLUS_I(inode).opencnt, 0);
-
- if (inode->i_ino >= HFSPLUS_FIRSTUSER_CNID) {
- read_inode:
- hfs_find_init(HFSPLUS_SB(inode->i_sb).cat_tree, &fd);
- err = hfsplus_find_cat(inode->i_sb, inode->i_ino, &fd);
- if (!err)
- err = hfsplus_cat_read_inode(inode, &fd);
- hfs_find_exit(&fd);
- if (err)
- goto bad_inode;
- goto done;
- }
- vhdr = HFSPLUS_SB(inode->i_sb).s_vhdr;
- switch(inode->i_ino) {
- case HFSPLUS_ROOT_CNID:
- goto read_inode;
+ switch (inode->i_ino) {
case HFSPLUS_EXT_CNID:
hfsplus_inode_read_fork(inode, &vhdr->ext_file);
inode->i_mapping->a_ops = &hfsplus_btree_aops;
inode->i_mapping->a_ops = &hfsplus_btree_aops;
break;
default:
- goto bad_inode;
+ return -EIO;
+ }
+
+ return 0;
+}
+
+struct inode *hfsplus_iget(struct super_block *sb, unsigned long ino)
+{
+ struct hfs_find_data fd;
+ struct inode *inode;
+ int err;
+
+ inode = iget_locked(sb, ino);
+ if (!inode)
+ return ERR_PTR(-ENOMEM);
+ if (!(inode->i_state & I_NEW))
+ return inode;
+
+ INIT_LIST_HEAD(&HFSPLUS_I(inode)->open_dir_list);
+ mutex_init(&HFSPLUS_I(inode)->extents_lock);
+ HFSPLUS_I(inode)->flags = 0;
+ HFSPLUS_I(inode)->rsrc_inode = NULL;
+ atomic_set(&HFSPLUS_I(inode)->opencnt, 0);
+
+ if (inode->i_ino >= HFSPLUS_FIRSTUSER_CNID ||
+ inode->i_ino == HFSPLUS_ROOT_CNID) {
+ hfs_find_init(HFSPLUS_SB(inode->i_sb)->cat_tree, &fd);
+ err = hfsplus_find_cat(inode->i_sb, inode->i_ino, &fd);
+ if (!err)
+ err = hfsplus_cat_read_inode(inode, &fd);
+ hfs_find_exit(&fd);
+ } else {
+ err = hfsplus_system_read_inode(inode);
+ }
+
+ if (err) {
+ iget_failed(inode);
+ return ERR_PTR(err);
}
-done:
unlock_new_inode(inode);
return inode;
-
-bad_inode:
- iget_failed(inode);
- return ERR_PTR(err);
}
-static int hfsplus_write_inode(struct inode *inode,
- struct writeback_control *wbc)
+static int hfsplus_system_write_inode(struct inode *inode)
{
- struct hfsplus_vh *vhdr;
- int ret = 0;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(inode->i_sb);
+ struct hfsplus_vh *vhdr = sbi->s_vhdr;
+ struct hfsplus_fork_raw *fork;
+ struct hfs_btree *tree = NULL;
- dprint(DBG_INODE, "hfsplus_write_inode: %lu\n", inode->i_ino);
- hfsplus_ext_write_extent(inode);
- if (inode->i_ino >= HFSPLUS_FIRSTUSER_CNID) {
- return hfsplus_cat_write_inode(inode);
- }
- vhdr = HFSPLUS_SB(inode->i_sb).s_vhdr;
switch (inode->i_ino) {
- case HFSPLUS_ROOT_CNID:
- ret = hfsplus_cat_write_inode(inode);
- break;
case HFSPLUS_EXT_CNID:
- if (vhdr->ext_file.total_size != cpu_to_be64(inode->i_size)) {
- HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
- inode->i_sb->s_dirt = 1;
- }
- hfsplus_inode_write_fork(inode, &vhdr->ext_file);
- hfs_btree_write(HFSPLUS_SB(inode->i_sb).ext_tree);
+ fork = &vhdr->ext_file;
+ tree = sbi->ext_tree;
break;
case HFSPLUS_CAT_CNID:
- if (vhdr->cat_file.total_size != cpu_to_be64(inode->i_size)) {
- HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
- inode->i_sb->s_dirt = 1;
- }
- hfsplus_inode_write_fork(inode, &vhdr->cat_file);
- hfs_btree_write(HFSPLUS_SB(inode->i_sb).cat_tree);
+ fork = &vhdr->cat_file;
+ tree = sbi->cat_tree;
break;
case HFSPLUS_ALLOC_CNID:
- if (vhdr->alloc_file.total_size != cpu_to_be64(inode->i_size)) {
- HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
- inode->i_sb->s_dirt = 1;
- }
- hfsplus_inode_write_fork(inode, &vhdr->alloc_file);
+ fork = &vhdr->alloc_file;
break;
case HFSPLUS_START_CNID:
- if (vhdr->start_file.total_size != cpu_to_be64(inode->i_size)) {
- HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
- inode->i_sb->s_dirt = 1;
- }
- hfsplus_inode_write_fork(inode, &vhdr->start_file);
+ fork = &vhdr->start_file;
break;
case HFSPLUS_ATTR_CNID:
- if (vhdr->attr_file.total_size != cpu_to_be64(inode->i_size)) {
- HFSPLUS_SB(inode->i_sb).flags |= HFSPLUS_SB_WRITEBACKUP;
- inode->i_sb->s_dirt = 1;
- }
- hfsplus_inode_write_fork(inode, &vhdr->attr_file);
- hfs_btree_write(HFSPLUS_SB(inode->i_sb).attr_tree);
- break;
+ fork = &vhdr->attr_file;
+ tree = sbi->attr_tree;
+ default:
+ return -EIO;
+ }
+
+ if (fork->total_size != cpu_to_be64(inode->i_size)) {
+ set_bit(HFSPLUS_SB_WRITEBACKUP, &sbi->flags);
+ inode->i_sb->s_dirt = 1;
}
- return ret;
+ hfsplus_inode_write_fork(inode, fork);
+ if (tree)
+ hfs_btree_write(tree);
+ return 0;
+}
+
+static int hfsplus_write_inode(struct inode *inode,
+ struct writeback_control *wbc)
+{
+ dprint(DBG_INODE, "hfsplus_write_inode: %lu\n", inode->i_ino);
+
+ hfsplus_ext_write_extent(inode);
+
+ if (inode->i_ino >= HFSPLUS_FIRSTUSER_CNID ||
+ inode->i_ino == HFSPLUS_ROOT_CNID)
+ return hfsplus_cat_write_inode(inode);
+ else
+ return hfsplus_system_write_inode(inode);
}
static void hfsplus_evict_inode(struct inode *inode)
truncate_inode_pages(&inode->i_data, 0);
end_writeback(inode);
if (HFSPLUS_IS_RSRC(inode)) {
- HFSPLUS_I(HFSPLUS_I(inode).rsrc_inode).rsrc_inode = NULL;
- iput(HFSPLUS_I(inode).rsrc_inode);
+ HFSPLUS_I(HFSPLUS_I(inode)->rsrc_inode)->rsrc_inode = NULL;
+ iput(HFSPLUS_I(inode)->rsrc_inode);
}
}
int hfsplus_sync_fs(struct super_block *sb, int wait)
{
- struct hfsplus_vh *vhdr = HFSPLUS_SB(sb).s_vhdr;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+ struct hfsplus_vh *vhdr = sbi->s_vhdr;
dprint(DBG_SUPER, "hfsplus_write_super\n");
- lock_super(sb);
+ mutex_lock(&sbi->vh_mutex);
+ mutex_lock(&sbi->alloc_mutex);
sb->s_dirt = 0;
- vhdr->free_blocks = cpu_to_be32(HFSPLUS_SB(sb).free_blocks);
- vhdr->next_alloc = cpu_to_be32(HFSPLUS_SB(sb).next_alloc);
- vhdr->next_cnid = cpu_to_be32(HFSPLUS_SB(sb).next_cnid);
- vhdr->folder_count = cpu_to_be32(HFSPLUS_SB(sb).folder_count);
- vhdr->file_count = cpu_to_be32(HFSPLUS_SB(sb).file_count);
+ vhdr->free_blocks = cpu_to_be32(sbi->free_blocks);
+ vhdr->next_cnid = cpu_to_be32(sbi->next_cnid);
+ vhdr->folder_count = cpu_to_be32(sbi->folder_count);
+ vhdr->file_count = cpu_to_be32(sbi->file_count);
- mark_buffer_dirty(HFSPLUS_SB(sb).s_vhbh);
- if (HFSPLUS_SB(sb).flags & HFSPLUS_SB_WRITEBACKUP) {
- if (HFSPLUS_SB(sb).sect_count) {
+ mark_buffer_dirty(sbi->s_vhbh);
+ if (test_and_clear_bit(HFSPLUS_SB_WRITEBACKUP, &sbi->flags)) {
+ if (sbi->sect_count) {
struct buffer_head *bh;
u32 block, offset;
- block = HFSPLUS_SB(sb).blockoffset;
- block += (HFSPLUS_SB(sb).sect_count - 2) >> (sb->s_blocksize_bits - 9);
- offset = ((HFSPLUS_SB(sb).sect_count - 2) << 9) & (sb->s_blocksize - 1);
- printk(KERN_DEBUG "hfs: backup: %u,%u,%u,%u\n", HFSPLUS_SB(sb).blockoffset,
- HFSPLUS_SB(sb).sect_count, block, offset);
+ block = sbi->blockoffset;
+ block += (sbi->sect_count - 2) >> (sb->s_blocksize_bits - 9);
+ offset = ((sbi->sect_count - 2) << 9) & (sb->s_blocksize - 1);
+ printk(KERN_DEBUG "hfs: backup: %u,%u,%u,%u\n",
+ sbi->blockoffset, sbi->sect_count,
+ block, offset);
bh = sb_bread(sb, block);
if (bh) {
vhdr = (struct hfsplus_vh *)(bh->b_data + offset);
if (be16_to_cpu(vhdr->signature) == HFSPLUS_VOLHEAD_SIG) {
- memcpy(vhdr, HFSPLUS_SB(sb).s_vhdr, sizeof(*vhdr));
+ memcpy(vhdr, sbi->s_vhdr, sizeof(*vhdr));
mark_buffer_dirty(bh);
brelse(bh);
} else
printk(KERN_WARNING "hfs: backup not found!\n");
}
}
- HFSPLUS_SB(sb).flags &= ~HFSPLUS_SB_WRITEBACKUP;
}
- unlock_super(sb);
+ mutex_unlock(&sbi->alloc_mutex);
+ mutex_unlock(&sbi->vh_mutex);
return 0;
}
static void hfsplus_put_super(struct super_block *sb)
{
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
+
dprint(DBG_SUPER, "hfsplus_put_super\n");
+
if (!sb->s_fs_info)
return;
- lock_kernel();
-
if (sb->s_dirt)
hfsplus_write_super(sb);
- if (!(sb->s_flags & MS_RDONLY) && HFSPLUS_SB(sb).s_vhdr) {
- struct hfsplus_vh *vhdr = HFSPLUS_SB(sb).s_vhdr;
+ if (!(sb->s_flags & MS_RDONLY) && sbi->s_vhdr) {
+ struct hfsplus_vh *vhdr = sbi->s_vhdr;
vhdr->modify_date = hfsp_now2mt();
vhdr->attributes |= cpu_to_be32(HFSPLUS_VOL_UNMNT);
vhdr->attributes &= cpu_to_be32(~HFSPLUS_VOL_INCNSTNT);
- mark_buffer_dirty(HFSPLUS_SB(sb).s_vhbh);
- sync_dirty_buffer(HFSPLUS_SB(sb).s_vhbh);
+ mark_buffer_dirty(sbi->s_vhbh);
+ sync_dirty_buffer(sbi->s_vhbh);
}
- hfs_btree_close(HFSPLUS_SB(sb).cat_tree);
- hfs_btree_close(HFSPLUS_SB(sb).ext_tree);
- iput(HFSPLUS_SB(sb).alloc_file);
- iput(HFSPLUS_SB(sb).hidden_dir);
- brelse(HFSPLUS_SB(sb).s_vhbh);
- unload_nls(HFSPLUS_SB(sb).nls);
+ hfs_btree_close(sbi->cat_tree);
+ hfs_btree_close(sbi->ext_tree);
+ iput(sbi->alloc_file);
+ iput(sbi->hidden_dir);
+ brelse(sbi->s_vhbh);
+ unload_nls(sbi->nls);
kfree(sb->s_fs_info);
sb->s_fs_info = NULL;
-
- unlock_kernel();
}
static int hfsplus_statfs(struct dentry *dentry, struct kstatfs *buf)
{
struct super_block *sb = dentry->d_sb;
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
u64 id = huge_encode_dev(sb->s_bdev->bd_dev);
buf->f_type = HFSPLUS_SUPER_MAGIC;
buf->f_bsize = sb->s_blocksize;
- buf->f_blocks = HFSPLUS_SB(sb).total_blocks << HFSPLUS_SB(sb).fs_shift;
- buf->f_bfree = HFSPLUS_SB(sb).free_blocks << HFSPLUS_SB(sb).fs_shift;
+ buf->f_blocks = sbi->total_blocks << sbi->fs_shift;
+ buf->f_bfree = sbi->free_blocks << sbi->fs_shift;
buf->f_bavail = buf->f_bfree;
buf->f_files = 0xFFFFFFFF;
- buf->f_ffree = 0xFFFFFFFF - HFSPLUS_SB(sb).next_cnid;
+ buf->f_ffree = 0xFFFFFFFF - sbi->next_cnid;
buf->f_fsid.val[0] = (u32)id;
buf->f_fsid.val[1] = (u32)(id >> 32);
buf->f_namelen = HFSPLUS_MAX_STRLEN;
if ((*flags & MS_RDONLY) == (sb->s_flags & MS_RDONLY))
return 0;
if (!(*flags & MS_RDONLY)) {
- struct hfsplus_vh *vhdr = HFSPLUS_SB(sb).s_vhdr;
+ struct hfsplus_vh *vhdr = HFSPLUS_SB(sb)->s_vhdr;
struct hfsplus_sb_info sbi;
memset(&sbi, 0, sizeof(struct hfsplus_sb_info));
- sbi.nls = HFSPLUS_SB(sb).nls;
+ sbi.nls = HFSPLUS_SB(sb)->nls;
if (!hfsplus_parse_options(data, &sbi))
return -EINVAL;
"running fsck.hfsplus is recommended. leaving read-only.\n");
sb->s_flags |= MS_RDONLY;
*flags |= MS_RDONLY;
- } else if (sbi.flags & HFSPLUS_SB_FORCE) {
+ } else if (test_bit(HFSPLUS_SB_FORCE, &sbi.flags)) {
/* nothing */
} else if (vhdr->attributes & cpu_to_be32(HFSPLUS_VOL_SOFTLOCK)) {
printk(KERN_WARNING "hfs: filesystem is marked locked, leaving read-only.\n");
return -ENOMEM;
sb->s_fs_info = sbi;
- INIT_HLIST_HEAD(&sbi->rsrc_inodes);
+ mutex_init(&sbi->alloc_mutex);
+ mutex_init(&sbi->vh_mutex);
hfsplus_fill_defaults(sbi);
if (!hfsplus_parse_options(data, sbi)) {
printk(KERN_ERR "hfs: unable to parse mount options\n");
err = -EINVAL;
goto cleanup;
}
- vhdr = HFSPLUS_SB(sb).s_vhdr;
+ vhdr = sbi->s_vhdr;
/* Copy parts of the volume header into the superblock */
sb->s_magic = HFSPLUS_VOLHEAD_SIG;
printk(KERN_ERR "hfs: wrong filesystem version\n");
goto cleanup;
}
- HFSPLUS_SB(sb).total_blocks = be32_to_cpu(vhdr->total_blocks);
- HFSPLUS_SB(sb).free_blocks = be32_to_cpu(vhdr->free_blocks);
- HFSPLUS_SB(sb).next_alloc = be32_to_cpu(vhdr->next_alloc);
- HFSPLUS_SB(sb).next_cnid = be32_to_cpu(vhdr->next_cnid);
- HFSPLUS_SB(sb).file_count = be32_to_cpu(vhdr->file_count);
- HFSPLUS_SB(sb).folder_count = be32_to_cpu(vhdr->folder_count);
- HFSPLUS_SB(sb).data_clump_blocks = be32_to_cpu(vhdr->data_clump_sz) >> HFSPLUS_SB(sb).alloc_blksz_shift;
- if (!HFSPLUS_SB(sb).data_clump_blocks)
- HFSPLUS_SB(sb).data_clump_blocks = 1;
- HFSPLUS_SB(sb).rsrc_clump_blocks = be32_to_cpu(vhdr->rsrc_clump_sz) >> HFSPLUS_SB(sb).alloc_blksz_shift;
- if (!HFSPLUS_SB(sb).rsrc_clump_blocks)
- HFSPLUS_SB(sb).rsrc_clump_blocks = 1;
+ sbi->total_blocks = be32_to_cpu(vhdr->total_blocks);
+ sbi->free_blocks = be32_to_cpu(vhdr->free_blocks);
+ sbi->next_cnid = be32_to_cpu(vhdr->next_cnid);
+ sbi->file_count = be32_to_cpu(vhdr->file_count);
+ sbi->folder_count = be32_to_cpu(vhdr->folder_count);
+ sbi->data_clump_blocks =
+ be32_to_cpu(vhdr->data_clump_sz) >> sbi->alloc_blksz_shift;
+ if (!sbi->data_clump_blocks)
+ sbi->data_clump_blocks = 1;
+ sbi->rsrc_clump_blocks =
+ be32_to_cpu(vhdr->rsrc_clump_sz) >> sbi->alloc_blksz_shift;
+ if (!sbi->rsrc_clump_blocks)
+ sbi->rsrc_clump_blocks = 1;
/* Set up operations so we can load metadata */
sb->s_op = &hfsplus_sops;
printk(KERN_WARNING "hfs: Filesystem was not cleanly unmounted, "
"running fsck.hfsplus is recommended. mounting read-only.\n");
sb->s_flags |= MS_RDONLY;
- } else if (sbi->flags & HFSPLUS_SB_FORCE) {
+ } else if (test_and_clear_bit(HFSPLUS_SB_FORCE, &sbi->flags)) {
/* nothing */
} else if (vhdr->attributes & cpu_to_be32(HFSPLUS_VOL_SOFTLOCK)) {
printk(KERN_WARNING "hfs: Filesystem is marked locked, mounting read-only.\n");
"use the force option at your own risk, mounting read-only.\n");
sb->s_flags |= MS_RDONLY;
}
- sbi->flags &= ~HFSPLUS_SB_FORCE;
/* Load metadata objects (B*Trees) */
- HFSPLUS_SB(sb).ext_tree = hfs_btree_open(sb, HFSPLUS_EXT_CNID);
- if (!HFSPLUS_SB(sb).ext_tree) {
+ sbi->ext_tree = hfs_btree_open(sb, HFSPLUS_EXT_CNID);
+ if (!sbi->ext_tree) {
printk(KERN_ERR "hfs: failed to load extents file\n");
goto cleanup;
}
- HFSPLUS_SB(sb).cat_tree = hfs_btree_open(sb, HFSPLUS_CAT_CNID);
- if (!HFSPLUS_SB(sb).cat_tree) {
+ sbi->cat_tree = hfs_btree_open(sb, HFSPLUS_CAT_CNID);
+ if (!sbi->cat_tree) {
printk(KERN_ERR "hfs: failed to load catalog file\n");
goto cleanup;
}
err = PTR_ERR(inode);
goto cleanup;
}
- HFSPLUS_SB(sb).alloc_file = inode;
+ sbi->alloc_file = inode;
/* Load the root directory */
root = hfsplus_iget(sb, HFSPLUS_ROOT_CNID);
str.len = sizeof(HFSP_HIDDENDIR_NAME) - 1;
str.name = HFSP_HIDDENDIR_NAME;
- hfs_find_init(HFSPLUS_SB(sb).cat_tree, &fd);
+ hfs_find_init(sbi->cat_tree, &fd);
hfsplus_cat_build_key(sb, fd.search_key, HFSPLUS_ROOT_CNID, &str);
if (!hfs_brec_read(&fd, &entry, sizeof(entry))) {
hfs_find_exit(&fd);
err = PTR_ERR(inode);
goto cleanup;
}
- HFSPLUS_SB(sb).hidden_dir = inode;
+ sbi->hidden_dir = inode;
} else
hfs_find_exit(&fd);
be32_add_cpu(&vhdr->write_count, 1);
vhdr->attributes &= cpu_to_be32(~HFSPLUS_VOL_UNMNT);
vhdr->attributes |= cpu_to_be32(HFSPLUS_VOL_INCNSTNT);
- mark_buffer_dirty(HFSPLUS_SB(sb).s_vhbh);
- sync_dirty_buffer(HFSPLUS_SB(sb).s_vhbh);
+ mark_buffer_dirty(sbi->s_vhbh);
+ sync_dirty_buffer(sbi->s_vhbh);
- if (!HFSPLUS_SB(sb).hidden_dir) {
+ if (!sbi->hidden_dir) {
printk(KERN_DEBUG "hfs: create hidden dir...\n");
- HFSPLUS_SB(sb).hidden_dir = hfsplus_new_inode(sb, S_IFDIR);
- hfsplus_create_cat(HFSPLUS_SB(sb).hidden_dir->i_ino, sb->s_root->d_inode,
- &str, HFSPLUS_SB(sb).hidden_dir);
- mark_inode_dirty(HFSPLUS_SB(sb).hidden_dir);
+
+ mutex_lock(&sbi->vh_mutex);
+ sbi->hidden_dir = hfsplus_new_inode(sb, S_IFDIR);
+ hfsplus_create_cat(sbi->hidden_dir->i_ino, sb->s_root->d_inode,
+ &str, sbi->hidden_dir);
+ mutex_unlock(&sbi->vh_mutex);
+
+ mark_inode_dirty(sbi->hidden_dir);
}
out:
unload_nls(sbi->nls);
static void hfsplus_destroy_inode(struct inode *inode)
{
- kmem_cache_free(hfsplus_inode_cachep, &HFSPLUS_I(inode));
+ kmem_cache_free(hfsplus_inode_cachep, HFSPLUS_I(inode));
}
#define HFSPLUS_INODE_SIZE sizeof(struct hfsplus_inode_info)
int hfsplus_uni2asc(struct super_block *sb, const struct hfsplus_unistr *ustr, char *astr, int *len_p)
{
const hfsplus_unichr *ip;
- struct nls_table *nls = HFSPLUS_SB(sb).nls;
+ struct nls_table *nls = HFSPLUS_SB(sb)->nls;
u8 *op;
u16 cc, c0, c1;
u16 *ce1, *ce2;
ustrlen = be16_to_cpu(ustr->length);
len = *len_p;
ce1 = NULL;
- compose = !(HFSPLUS_SB(sb).flags & HFSPLUS_SB_NODECOMPOSE);
+ compose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
while (ustrlen > 0) {
c0 = be16_to_cpu(*ip++);
static inline int asc2unichar(struct super_block *sb, const char *astr, int len,
wchar_t *uc)
{
- int size = HFSPLUS_SB(sb).nls->char2uni(astr, len, uc);
+ int size = HFSPLUS_SB(sb)->nls->char2uni(astr, len, uc);
if (size <= 0) {
*uc = '?';
size = 1;
u16 *dstr, outlen = 0;
wchar_t c;
- decompose = !(HFSPLUS_SB(sb).flags & HFSPLUS_SB_NODECOMPOSE);
+ decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
while (outlen < HFSPLUS_MAX_STRLEN && len > 0) {
size = asc2unichar(sb, astr, len, &c);
wchar_t c;
u16 c2;
- casefold = (HFSPLUS_SB(sb).flags & HFSPLUS_SB_CASEFOLD);
- decompose = !(HFSPLUS_SB(sb).flags & HFSPLUS_SB_NODECOMPOSE);
+ casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags);
+ decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
hash = init_name_hash();
astr = str->name;
len = str->len;
u16 c1, c2;
wchar_t c;
- casefold = (HFSPLUS_SB(sb).flags & HFSPLUS_SB_CASEFOLD);
- decompose = !(HFSPLUS_SB(sb).flags & HFSPLUS_SB_NODECOMPOSE);
+ casefold = test_bit(HFSPLUS_SB_CASEFOLD, &HFSPLUS_SB(sb)->flags);
+ decompose = !test_bit(HFSPLUS_SB_NODECOMPOSE, &HFSPLUS_SB(sb)->flags);
astr1 = s1->name;
len1 = s1->len;
astr2 = s2->name;
*start = 0;
*size = sb->s_bdev->bd_inode->i_size >> 9;
- if (HFSPLUS_SB(sb).session >= 0) {
- te.cdte_track = HFSPLUS_SB(sb).session;
+ if (HFSPLUS_SB(sb)->session >= 0) {
+ te.cdte_track = HFSPLUS_SB(sb)->session;
te.cdte_format = CDROM_LBA;
res = ioctl_by_bdev(sb->s_bdev, CDROMREADTOCENTRY, (unsigned long)&te);
if (!res && (te.cdte_ctrl & CDROM_DATA_TRACK) == 4) {
/* Takes in super block, returns true if good data read */
int hfsplus_read_wrapper(struct super_block *sb)
{
+ struct hfsplus_sb_info *sbi = HFSPLUS_SB(sb);
struct buffer_head *bh;
struct hfsplus_vh *vhdr;
struct hfsplus_wd wd;
if (vhdr->signature == cpu_to_be16(HFSPLUS_VOLHEAD_SIG))
break;
if (vhdr->signature == cpu_to_be16(HFSPLUS_VOLHEAD_SIGX)) {
- HFSPLUS_SB(sb).flags |= HFSPLUS_SB_HFSX;
+ set_bit(HFSPLUS_SB_HFSX, &sbi->flags);
break;
}
brelse(bh);
if (blocksize < HFSPLUS_SECTOR_SIZE ||
((blocksize - 1) & blocksize))
return -EINVAL;
- HFSPLUS_SB(sb).alloc_blksz = blocksize;
- HFSPLUS_SB(sb).alloc_blksz_shift = 0;
+ sbi->alloc_blksz = blocksize;
+ sbi->alloc_blksz_shift = 0;
while ((blocksize >>= 1) != 0)
- HFSPLUS_SB(sb).alloc_blksz_shift++;
- blocksize = min(HFSPLUS_SB(sb).alloc_blksz, (u32)PAGE_SIZE);
+ sbi->alloc_blksz_shift++;
+ blocksize = min(sbi->alloc_blksz, (u32)PAGE_SIZE);
/* align block size to block offset */
while (part_start & ((blocksize >> HFSPLUS_SECTOR_SHIFT) - 1))
return -EINVAL;
}
- HFSPLUS_SB(sb).blockoffset = part_start >>
- (sb->s_blocksize_bits - HFSPLUS_SECTOR_SHIFT);
- HFSPLUS_SB(sb).sect_count = part_size;
- HFSPLUS_SB(sb).fs_shift = HFSPLUS_SB(sb).alloc_blksz_shift -
- sb->s_blocksize_bits;
+ sbi->blockoffset =
+ part_start >> (sb->s_blocksize_bits - HFSPLUS_SECTOR_SHIFT);
+ sbi->sect_count = part_size;
+ sbi->fs_shift = sbi->alloc_blksz_shift - sb->s_blocksize_bits;
bh = sb_bread512(sb, part_start + HFSPLUS_VOLHEAD_SECTOR, vhdr);
if (!bh)
return -EIO;
/* should still be the same... */
- if (vhdr->signature != (HFSPLUS_SB(sb).flags & HFSPLUS_SB_HFSX ?
- cpu_to_be16(HFSPLUS_VOLHEAD_SIGX) :
- cpu_to_be16(HFSPLUS_VOLHEAD_SIG)))
- goto error;
- HFSPLUS_SB(sb).s_vhbh = bh;
- HFSPLUS_SB(sb).s_vhdr = vhdr;
+ if (test_bit(HFSPLUS_SB_HFSX, &sbi->flags)) {
+ if (vhdr->signature != cpu_to_be16(HFSPLUS_VOLHEAD_SIGX))
+ goto error;
+ } else {
+ if (vhdr->signature != cpu_to_be16(HFSPLUS_VOLHEAD_SIG))
+ goto error;
+ }
+
+ sbi->s_vhbh = bh;
+ sbi->s_vhdr = vhdr;
return 0;
error:
inode_inc_link_count(dir);
- inode = minix_new_inode(dir, mode, &err);
+ inode = minix_new_inode(dir, S_IFDIR | mode, &err);
if (!inode)
goto out_dir;
config NFS_V4
bool "NFS client support for NFS version 4"
depends on NFS_FS
+ select SUNRPC_GSS
help
This option enables support for version 4 of the NFS protocol
(RFC 3530) in the kernel's NFS client.
sin1->sin6_scope_id != sin2->sin6_scope_id)
return 0;
- return ipv6_addr_equal(&sin1->sin6_addr, &sin1->sin6_addr);
+ return ipv6_addr_equal(&sin1->sin6_addr, &sin2->sin6_addr);
}
#else /* !defined(CONFIG_IPV6) && !defined(CONFIG_IPV6_MODULE) */
static int nfs_sockaddr_match_ipaddr6(const struct sockaddr *sa1,
default:
BUG();
}
- if (res < 0)
- dprintk(KERN_WARNING "%s: VFS is out of sync with lock manager"
- " - error %d!\n",
- __func__, res);
return res;
}
goto out_err;
error = server->nfs_client->rpc_ops->statfs(server, fh, &res);
+ if (unlikely(error == -ESTALE)) {
+ struct dentry *pd_dentry;
+ pd_dentry = dget_parent(dentry);
+ if (pd_dentry != NULL) {
+ nfs_zap_caches(pd_dentry->d_inode);
+ dput(pd_dentry);
+ }
+ }
nfs_free_fattr(res.fattr);
if (error < 0)
goto out_err;
depends on NFSD && PROC_FS && EXPERIMENTAL
select NFSD_V3
select FS_POSIX_ACL
+ select SUNRPC_GSS
help
This option enables support in your system's NFS server for
version 4 of the NFS protocol (RFC 3530).
static int nfs4_access_to_omode(u32 access)
{
- switch (access) {
+ switch (access & NFS4_SHARE_ACCESS_BOTH) {
case NFS4_SHARE_ACCESS_READ:
return O_RDONLY;
case NFS4_SHARE_ACCESS_WRITE:
static inline void
fh_unlock(struct svc_fh *fhp)
{
- BUG_ON(!fhp->fh_dentry);
-
if (fhp->fh_locked) {
fill_post_wcc(fhp);
mutex_unlock(&fhp->fh_dentry->d_inode->i_mutex);
source "fs/notify/dnotify/Kconfig"
source "fs/notify/inotify/Kconfig"
-source "fs/notify/fanotify/Kconfig"
+#source "fs/notify/fanotify/Kconfig"
}
inode->i_mode = new_mode;
+ inode->i_ctime = CURRENT_TIME;
di->i_mode = cpu_to_le16(inode->i_mode);
+ di->i_ctime = cpu_to_le64(inode->i_ctime.tv_sec);
+ di->i_ctime_nsec = cpu_to_le32(inode->i_ctime.tv_nsec);
ocfs2_journal_dirty(handle, di_bh);
last_page_bytes = PAGE_ALIGN(end);
index = start >> PAGE_CACHE_SHIFT;
do {
- pages[numpages] = grab_cache_page(mapping, index);
+ pages[numpages] = find_or_create_page(mapping, index, GFP_NOFS);
if (!pages[numpages]) {
ret = -ENOMEM;
mlog_errno(ret);
ocfs2_blockcheck_inc_failure(stats);
mlog(ML_ERROR,
- "CRC32 failed: stored: %u, computed %u. Applying ECC.\n",
+ "CRC32 failed: stored: 0x%x, computed 0x%x. Applying ECC.\n",
(unsigned int)check.bc_crc32e, (unsigned int)crc);
/* Ok, try ECC fixups */
goto out;
}
- mlog(ML_ERROR, "Fixed CRC32 failed: stored: %u, computed %u\n",
+ mlog(ML_ERROR, "Fixed CRC32 failed: stored: 0x%x, computed 0x%x\n",
(unsigned int)check.bc_crc32e, (unsigned int)crc);
rc = -EIO;
int o2net_send_message_vec(u32 msg_type, u32 key, struct kvec *caller_vec,
size_t caller_veclen, u8 target_node, int *status)
{
- int ret;
+ int ret = 0;
struct o2net_msg *msg = NULL;
size_t veclen, caller_bytes = 0;
struct kvec *vec = NULL;
goto out_commit;
}
+ cpos = split_hash;
+ ret = ocfs2_dx_dir_new_cluster(dir, &et, cpos, handle,
+ data_ac, meta_ac, new_dx_leaves,
+ num_dx_leaves);
+ if (ret) {
+ mlog_errno(ret);
+ goto out_commit;
+ }
+
for (i = 0; i < num_dx_leaves; i++) {
ret = ocfs2_journal_access_dl(handle, INODE_CACHE(dir),
orig_dx_leaves[i],
mlog_errno(ret);
goto out_commit;
}
- }
- cpos = split_hash;
- ret = ocfs2_dx_dir_new_cluster(dir, &et, cpos, handle,
- data_ac, meta_ac, new_dx_leaves,
- num_dx_leaves);
- if (ret) {
- mlog_errno(ret);
- goto out_commit;
+ ret = ocfs2_journal_access_dl(handle, INODE_CACHE(dir),
+ new_dx_leaves[i],
+ OCFS2_JOURNAL_ACCESS_WRITE);
+ if (ret) {
+ mlog_errno(ret);
+ goto out_commit;
+ }
}
ocfs2_dx_dir_transfer_leaf(dir, split_hash, handle, tmp_dx_leaf,
struct dlm_lock_resource *res);
void dlm_clean_master_list(struct dlm_ctxt *dlm,
u8 dead_node);
+void dlm_force_free_mles(struct dlm_ctxt *dlm);
int dlm_lock_basts_flushed(struct dlm_ctxt *dlm, struct dlm_lock *lock);
int __dlm_lockres_has_locks(struct dlm_lock_resource *res);
int __dlm_lockres_unused(struct dlm_lock_resource *res);
spin_lock(&dlm->track_lock);
if (oldres)
track_list = &oldres->tracking;
- else
+ else {
track_list = &dlm->tracking_list;
+ if (list_empty(track_list)) {
+ dl = NULL;
+ spin_unlock(&dlm->track_lock);
+ goto bail;
+ }
+ }
list_for_each_entry(res, track_list, tracking) {
if (&res->tracking == &dlm->tracking_list)
} else
dl = NULL;
+bail:
/* passed to seq_show */
return dl;
}
dlm_mark_domain_leaving(dlm);
dlm_leave_domain(dlm);
+ dlm_force_free_mles(dlm);
dlm_complete_dlm_shutdown(dlm);
}
dlm_put(dlm);
wake_up(&res->wq);
wake_up(&dlm->migration_wq);
}
+
+void dlm_force_free_mles(struct dlm_ctxt *dlm)
+{
+ int i;
+ struct hlist_head *bucket;
+ struct dlm_master_list_entry *mle;
+ struct hlist_node *tmp, *list;
+
+ /*
+ * We notified all other nodes that we are exiting the domain and
+ * marked the dlm state to DLM_CTXT_LEAVING. If any mles are still
+ * around we force free them and wake any processes that are waiting
+ * on the mles
+ */
+ spin_lock(&dlm->spinlock);
+ spin_lock(&dlm->master_lock);
+
+ BUG_ON(dlm->dlm_state != DLM_CTXT_LEAVING);
+ BUG_ON((find_next_bit(dlm->domain_map, O2NM_MAX_NODES, 0) < O2NM_MAX_NODES));
+
+ for (i = 0; i < DLM_HASH_BUCKETS; i++) {
+ bucket = dlm_master_hash(dlm, i);
+ hlist_for_each_safe(list, tmp, bucket) {
+ mle = hlist_entry(list, struct dlm_master_list_entry,
+ master_hash_node);
+ if (mle->type != DLM_MLE_BLOCK) {
+ mlog(ML_ERROR, "bad mle: %p\n", mle);
+ dlm_print_one_mle(mle);
+ }
+ atomic_set(&mle->woken, 1);
+ wake_up(&mle->wq);
+
+ __dlm_unlink_mle(dlm, mle);
+ __dlm_mle_detach_hb_events(dlm, mle);
+ __dlm_put_mle(mle);
+ }
+ }
+ spin_unlock(&dlm->master_lock);
+ spin_unlock(&dlm->spinlock);
+}
OI_LS_PARENT,
OI_LS_RENAME1,
OI_LS_RENAME2,
+ OI_LS_REFLINK_TARGET,
};
int ocfs2_dlm_init(struct ocfs2_super *osb);
#include <linux/writeback.h>
#include <linux/falloc.h>
#include <linux/quotaops.h>
+#include <linux/blkdev.h>
#define MLOG_MASK_PREFIX ML_INODE
#include <cluster/masklog.h>
if (err)
goto bail;
- if (datasync && !(inode->i_state & I_DIRTY_DATASYNC))
+ if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) {
+ /*
+ * We still have to flush drive's caches to get data to the
+ * platter
+ */
+ if (osb->s_mount_opt & OCFS2_MOUNT_BARRIER)
+ blkdev_issue_flush(inode->i_sb->s_bdev, GFP_KERNEL,
+ NULL, BLKDEV_IFL_WAIT);
goto bail;
+ }
journal = osb->journal->j_journal;
err = jbd2_journal_force_commit(journal);
BUG_ON(abs_to > (((u64)index + 1) << PAGE_CACHE_SHIFT));
BUG_ON(abs_from & (inode->i_blkbits - 1));
- page = grab_cache_page(mapping, index);
+ page = find_or_create_page(mapping, index, GFP_NOFS);
if (!page) {
ret = -ENOMEM;
mlog_errno(ret);
BUG_ON(ret == -EIOCBQUEUED && !(file->f_flags & O_DIRECT));
if (((file->f_flags & O_DSYNC) && !direct_io) || IS_SYNC(inode) ||
- ((file->f_flags & O_DIRECT) && has_refcount)) {
+ ((file->f_flags & O_DIRECT) && !direct_io)) {
ret = filemap_fdatawrite_range(file->f_mapping, pos,
pos + count - 1);
if (ret < 0)
OCFS2_BH_IGNORE_CACHE);
} else {
status = ocfs2_read_blocks_sync(osb, args->fi_blkno, 1, &bh);
- if (!status)
+ /*
+ * If buffer is in jbd, then its checksum may not have been
+ * computed as yet.
+ */
+ if (!status && !buffer_jbd(bh))
status = ocfs2_validate_inode_block(osb->sb, bh);
}
if (status < 0) {
/*
* Another node might have truncated while we were waiting on
* cluster locks.
+ * We don't check size == 0 before the shift. This is borrowed
+ * from do_generic_file_read.
*/
- last_index = size >> PAGE_CACHE_SHIFT;
- if (page->index > last_index) {
+ last_index = (size - 1) >> PAGE_CACHE_SHIFT;
+ if (unlikely(!size || page->index > last_index)) {
ret = -EINVAL;
goto out;
}
* because the "write" would invalidate their data.
*/
if (page->index == last_index)
- len = size & ~PAGE_CACHE_MASK;
+ len = ((size - 1) & ~PAGE_CACHE_MASK) + 1;
ret = ocfs2_write_begin_nolock(mapping, pos, len, 0, &locked_page,
&fsdata, di_bh, page);
return status;
}
-static int ocfs2_mknod_locked(struct ocfs2_super *osb,
- struct inode *dir,
- struct inode *inode,
- dev_t dev,
- struct buffer_head **new_fe_bh,
- struct buffer_head *parent_fe_bh,
- handle_t *handle,
- struct ocfs2_alloc_context *inode_ac)
+static int __ocfs2_mknod_locked(struct inode *dir,
+ struct inode *inode,
+ dev_t dev,
+ struct buffer_head **new_fe_bh,
+ struct buffer_head *parent_fe_bh,
+ handle_t *handle,
+ struct ocfs2_alloc_context *inode_ac,
+ u64 fe_blkno, u64 suballoc_loc, u16 suballoc_bit)
{
int status = 0;
+ struct ocfs2_super *osb = OCFS2_SB(dir->i_sb);
struct ocfs2_dinode *fe = NULL;
struct ocfs2_extent_list *fel;
- u64 suballoc_loc, fe_blkno = 0;
- u16 suballoc_bit;
u16 feat;
*new_fe_bh = NULL;
- status = ocfs2_claim_new_inode(handle, dir, parent_fe_bh,
- inode_ac, &suballoc_loc,
- &suballoc_bit, &fe_blkno);
- if (status < 0) {
- mlog_errno(status);
- goto leave;
- }
-
/* populate as many fields early on as possible - many of
* these are used by the support functions here and in
* callers. */
return status;
}
+static int ocfs2_mknod_locked(struct ocfs2_super *osb,
+ struct inode *dir,
+ struct inode *inode,
+ dev_t dev,
+ struct buffer_head **new_fe_bh,
+ struct buffer_head *parent_fe_bh,
+ handle_t *handle,
+ struct ocfs2_alloc_context *inode_ac)
+{
+ int status = 0;
+ u64 suballoc_loc, fe_blkno = 0;
+ u16 suballoc_bit;
+
+ *new_fe_bh = NULL;
+
+ status = ocfs2_claim_new_inode(handle, dir, parent_fe_bh,
+ inode_ac, &suballoc_loc,
+ &suballoc_bit, &fe_blkno);
+ if (status < 0) {
+ mlog_errno(status);
+ return status;
+ }
+
+ return __ocfs2_mknod_locked(dir, inode, dev, new_fe_bh,
+ parent_fe_bh, handle, inode_ac,
+ fe_blkno, suballoc_loc, suballoc_bit);
+}
+
static int ocfs2_mkdir(struct inode *dir,
struct dentry *dentry,
int mode)
return status;
}
-static int ocfs2_prepare_orphan_dir(struct ocfs2_super *osb,
- struct inode **ret_orphan_dir,
- u64 blkno,
- char *name,
- struct ocfs2_dir_lookup_result *lookup)
+static int ocfs2_lookup_lock_orphan_dir(struct ocfs2_super *osb,
+ struct inode **ret_orphan_dir,
+ struct buffer_head **ret_orphan_dir_bh)
{
struct inode *orphan_dir_inode;
struct buffer_head *orphan_dir_bh = NULL;
- int status = 0;
-
- status = ocfs2_blkno_stringify(blkno, name);
- if (status < 0) {
- mlog_errno(status);
- return status;
- }
+ int ret = 0;
orphan_dir_inode = ocfs2_get_system_file_inode(osb,
ORPHAN_DIR_SYSTEM_INODE,
osb->slot_num);
if (!orphan_dir_inode) {
- status = -ENOENT;
- mlog_errno(status);
- return status;
+ ret = -ENOENT;
+ mlog_errno(ret);
+ return ret;
}
mutex_lock(&orphan_dir_inode->i_mutex);
- status = ocfs2_inode_lock(orphan_dir_inode, &orphan_dir_bh, 1);
- if (status < 0) {
- mlog_errno(status);
- goto leave;
+ ret = ocfs2_inode_lock(orphan_dir_inode, &orphan_dir_bh, 1);
+ if (ret < 0) {
+ mutex_unlock(&orphan_dir_inode->i_mutex);
+ iput(orphan_dir_inode);
+
+ mlog_errno(ret);
+ return ret;
}
- status = ocfs2_prepare_dir_for_insert(osb, orphan_dir_inode,
- orphan_dir_bh, name,
- OCFS2_ORPHAN_NAMELEN, lookup);
- if (status < 0) {
- ocfs2_inode_unlock(orphan_dir_inode, 1);
+ *ret_orphan_dir = orphan_dir_inode;
+ *ret_orphan_dir_bh = orphan_dir_bh;
- mlog_errno(status);
- goto leave;
+ return 0;
+}
+
+static int __ocfs2_prepare_orphan_dir(struct inode *orphan_dir_inode,
+ struct buffer_head *orphan_dir_bh,
+ u64 blkno,
+ char *name,
+ struct ocfs2_dir_lookup_result *lookup)
+{
+ int ret;
+ struct ocfs2_super *osb = OCFS2_SB(orphan_dir_inode->i_sb);
+
+ ret = ocfs2_blkno_stringify(blkno, name);
+ if (ret < 0) {
+ mlog_errno(ret);
+ return ret;
+ }
+
+ ret = ocfs2_prepare_dir_for_insert(osb, orphan_dir_inode,
+ orphan_dir_bh, name,
+ OCFS2_ORPHAN_NAMELEN, lookup);
+ if (ret < 0) {
+ mlog_errno(ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+/**
+ * ocfs2_prepare_orphan_dir() - Prepare an orphan directory for
+ * insertion of an orphan.
+ * @osb: ocfs2 file system
+ * @ret_orphan_dir: Orphan dir inode - returned locked!
+ * @blkno: Actual block number of the inode to be inserted into orphan dir.
+ * @lookup: dir lookup result, to be passed back into functions like
+ * ocfs2_orphan_add
+ *
+ * Returns zero on success and the ret_orphan_dir, name and lookup
+ * fields will be populated.
+ *
+ * Returns non-zero on failure.
+ */
+static int ocfs2_prepare_orphan_dir(struct ocfs2_super *osb,
+ struct inode **ret_orphan_dir,
+ u64 blkno,
+ char *name,
+ struct ocfs2_dir_lookup_result *lookup)
+{
+ struct inode *orphan_dir_inode = NULL;
+ struct buffer_head *orphan_dir_bh = NULL;
+ int ret = 0;
+
+ ret = ocfs2_lookup_lock_orphan_dir(osb, &orphan_dir_inode,
+ &orphan_dir_bh);
+ if (ret < 0) {
+ mlog_errno(ret);
+ return ret;
+ }
+
+ ret = __ocfs2_prepare_orphan_dir(orphan_dir_inode, orphan_dir_bh,
+ blkno, name, lookup);
+ if (ret < 0) {
+ mlog_errno(ret);
+ goto out;
}
*ret_orphan_dir = orphan_dir_inode;
-leave:
- if (status) {
+out:
+ brelse(orphan_dir_bh);
+
+ if (ret) {
+ ocfs2_inode_unlock(orphan_dir_inode, 1);
mutex_unlock(&orphan_dir_inode->i_mutex);
iput(orphan_dir_inode);
}
- brelse(orphan_dir_bh);
-
- mlog_exit(status);
- return status;
+ mlog_exit(ret);
+ return ret;
}
static int ocfs2_orphan_add(struct ocfs2_super *osb,
return status;
}
+/**
+ * ocfs2_prep_new_orphaned_file() - Prepare the orphan dir to recieve a newly
+ * allocated file. This is different from the typical 'add to orphan dir'
+ * operation in that the inode does not yet exist. This is a problem because
+ * the orphan dir stringifies the inode block number to come up with it's
+ * dirent. Obviously if the inode does not yet exist we have a chicken and egg
+ * problem. This function works around it by calling deeper into the orphan
+ * and suballoc code than other callers. Use this only by necessity.
+ * @dir: The directory which this inode will ultimately wind up under - not the
+ * orphan dir!
+ * @dir_bh: buffer_head the @dir inode block
+ * @orphan_name: string of length (CFS2_ORPHAN_NAMELEN + 1). Will be filled
+ * with the string to be used for orphan dirent. Pass back to the orphan dir
+ * code.
+ * @ret_orphan_dir: orphan dir inode returned to be passed back into orphan
+ * dir code.
+ * @ret_di_blkno: block number where the new inode will be allocated.
+ * @orphan_insert: Dir insert context to be passed back into orphan dir code.
+ * @ret_inode_ac: Inode alloc context to be passed back to the allocator.
+ *
+ * Returns zero on success and the ret_orphan_dir, name and lookup
+ * fields will be populated.
+ *
+ * Returns non-zero on failure.
+ */
+static int ocfs2_prep_new_orphaned_file(struct inode *dir,
+ struct buffer_head *dir_bh,
+ char *orphan_name,
+ struct inode **ret_orphan_dir,
+ u64 *ret_di_blkno,
+ struct ocfs2_dir_lookup_result *orphan_insert,
+ struct ocfs2_alloc_context **ret_inode_ac)
+{
+ int ret;
+ u64 di_blkno;
+ struct ocfs2_super *osb = OCFS2_SB(dir->i_sb);
+ struct inode *orphan_dir = NULL;
+ struct buffer_head *orphan_dir_bh = NULL;
+ struct ocfs2_alloc_context *inode_ac = NULL;
+
+ ret = ocfs2_lookup_lock_orphan_dir(osb, &orphan_dir, &orphan_dir_bh);
+ if (ret < 0) {
+ mlog_errno(ret);
+ return ret;
+ }
+
+ /* reserve an inode spot */
+ ret = ocfs2_reserve_new_inode(osb, &inode_ac);
+ if (ret < 0) {
+ if (ret != -ENOSPC)
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = ocfs2_find_new_inode_loc(dir, dir_bh, inode_ac,
+ &di_blkno);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = __ocfs2_prepare_orphan_dir(orphan_dir, orphan_dir_bh,
+ di_blkno, orphan_name, orphan_insert);
+ if (ret < 0) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+out:
+ if (ret == 0) {
+ *ret_orphan_dir = orphan_dir;
+ *ret_di_blkno = di_blkno;
+ *ret_inode_ac = inode_ac;
+ /*
+ * orphan_name and orphan_insert are already up to
+ * date via prepare_orphan_dir
+ */
+ } else {
+ /* Unroll reserve_new_inode* */
+ if (inode_ac)
+ ocfs2_free_alloc_context(inode_ac);
+
+ /* Unroll orphan dir locking */
+ mutex_unlock(&orphan_dir->i_mutex);
+ ocfs2_inode_unlock(orphan_dir, 1);
+ iput(orphan_dir);
+ }
+
+ brelse(orphan_dir_bh);
+
+ return 0;
+}
+
int ocfs2_create_inode_in_orphan(struct inode *dir,
int mode,
struct inode **new_inode)
struct buffer_head *new_di_bh = NULL;
struct ocfs2_alloc_context *inode_ac = NULL;
struct ocfs2_dir_lookup_result orphan_insert = { NULL, };
+ u64 uninitialized_var(di_blkno), suballoc_loc;
+ u16 suballoc_bit;
status = ocfs2_inode_lock(dir, &parent_di_bh, 1);
if (status < 0) {
return status;
}
- /*
- * We give the orphan dir the root blkno to fake an orphan name,
- * and allocate enough space for our insertion.
- */
- status = ocfs2_prepare_orphan_dir(osb, &orphan_dir,
- osb->root_blkno,
- orphan_name, &orphan_insert);
- if (status < 0) {
- mlog_errno(status);
- goto leave;
- }
-
- /* reserve an inode spot */
- status = ocfs2_reserve_new_inode(osb, &inode_ac);
+ status = ocfs2_prep_new_orphaned_file(dir, parent_di_bh,
+ orphan_name, &orphan_dir,
+ &di_blkno, &orphan_insert, &inode_ac);
if (status < 0) {
if (status != -ENOSPC)
mlog_errno(status);
goto leave;
did_quota_inode = 1;
- inode->i_nlink = 0;
- /* do the real work now. */
- status = ocfs2_mknod_locked(osb, dir, inode,
- 0, &new_di_bh, parent_di_bh, handle,
- inode_ac);
+ status = ocfs2_claim_new_inode_at_loc(handle, dir, inode_ac,
+ &suballoc_loc,
+ &suballoc_bit, di_blkno);
if (status < 0) {
mlog_errno(status);
goto leave;
}
- status = ocfs2_blkno_stringify(OCFS2_I(inode)->ip_blkno, orphan_name);
+ inode->i_nlink = 0;
+ /* do the real work now. */
+ status = __ocfs2_mknod_locked(dir, inode,
+ 0, &new_di_bh, parent_di_bh, handle,
+ inode_ac, di_blkno, suballoc_loc,
+ suballoc_bit);
if (status < 0) {
mlog_errno(status);
goto leave;
#define OCFS2_HAS_REFCOUNT_FL (0x0010)
/* Inode attributes, keep in sync with EXT2 */
-#define OCFS2_SECRM_FL (0x00000001) /* Secure deletion */
-#define OCFS2_UNRM_FL (0x00000002) /* Undelete */
-#define OCFS2_COMPR_FL (0x00000004) /* Compress file */
-#define OCFS2_SYNC_FL (0x00000008) /* Synchronous updates */
-#define OCFS2_IMMUTABLE_FL (0x00000010) /* Immutable file */
-#define OCFS2_APPEND_FL (0x00000020) /* writes to file may only append */
-#define OCFS2_NODUMP_FL (0x00000040) /* do not dump file */
-#define OCFS2_NOATIME_FL (0x00000080) /* do not update atime */
-#define OCFS2_DIRSYNC_FL (0x00010000) /* dirsync behaviour (directories only) */
-
-#define OCFS2_FL_VISIBLE (0x000100FF) /* User visible flags */
-#define OCFS2_FL_MODIFIABLE (0x000100FF) /* User modifiable flags */
+#define OCFS2_SECRM_FL FS_SECRM_FL /* Secure deletion */
+#define OCFS2_UNRM_FL FS_UNRM_FL /* Undelete */
+#define OCFS2_COMPR_FL FS_COMPR_FL /* Compress file */
+#define OCFS2_SYNC_FL FS_SYNC_FL /* Synchronous updates */
+#define OCFS2_IMMUTABLE_FL FS_IMMUTABLE_FL /* Immutable file */
+#define OCFS2_APPEND_FL FS_APPEND_FL /* writes to file may only append */
+#define OCFS2_NODUMP_FL FS_NODUMP_FL /* do not dump file */
+#define OCFS2_NOATIME_FL FS_NOATIME_FL /* do not update atime */
+/* Reserved for compression usage... */
+#define OCFS2_DIRTY_FL FS_DIRTY_FL
+#define OCFS2_COMPRBLK_FL FS_COMPRBLK_FL /* One or more compressed clusters */
+#define OCFS2_NOCOMP_FL FS_NOCOMP_FL /* Don't compress */
+#define OCFS2_ECOMPR_FL FS_ECOMPR_FL /* Compression error */
+/* End compression flags --- maybe not all used */
+#define OCFS2_BTREE_FL FS_BTREE_FL /* btree format dir */
+#define OCFS2_INDEX_FL FS_INDEX_FL /* hash-indexed directory */
+#define OCFS2_IMAGIC_FL FS_IMAGIC_FL /* AFS directory */
+#define OCFS2_JOURNAL_DATA_FL FS_JOURNAL_DATA_FL /* Reserved for ext3 */
+#define OCFS2_NOTAIL_FL FS_NOTAIL_FL /* file tail should not be merged */
+#define OCFS2_DIRSYNC_FL FS_DIRSYNC_FL /* dirsync behaviour (directories only) */
+#define OCFS2_TOPDIR_FL FS_TOPDIR_FL /* Top of directory hierarchies*/
+#define OCFS2_RESERVED_FL FS_RESERVED_FL /* reserved for ext2 lib */
+
+#define OCFS2_FL_VISIBLE FS_FL_USER_VISIBLE /* User visible flags */
+#define OCFS2_FL_MODIFIABLE FS_FL_USER_MODIFIABLE /* User modifiable flags */
/*
* Extent record flags (e_node.leaf.flags)
/*
* ioctl commands
*/
-#define OCFS2_IOC_GETFLAGS _IOR('f', 1, long)
-#define OCFS2_IOC_SETFLAGS _IOW('f', 2, long)
-#define OCFS2_IOC32_GETFLAGS _IOR('f', 1, int)
-#define OCFS2_IOC32_SETFLAGS _IOW('f', 2, int)
+#define OCFS2_IOC_GETFLAGS FS_IOC_GETFLAGS
+#define OCFS2_IOC_SETFLAGS FS_IOC_SETFLAGS
+#define OCFS2_IOC32_GETFLAGS FS_IOC32_GETFLAGS
+#define OCFS2_IOC32_SETFLAGS FS_IOC32_SETFLAGS
/*
* Space reservation / allocation / free ioctls and argument structure
if (map_end & (PAGE_CACHE_SIZE - 1))
to = map_end & (PAGE_CACHE_SIZE - 1);
- page = grab_cache_page(mapping, page_index);
+ page = find_or_create_page(mapping, page_index, GFP_NOFS);
/*
* In case PAGE_CACHE_SIZE <= CLUSTER_SIZE, This page
if (map_end > end)
map_end = end;
- page = grab_cache_page(context->inode->i_mapping, page_index);
+ page = find_or_create_page(context->inode->i_mapping,
+ page_index, GFP_NOFS);
BUG_ON(!page);
wait_on_page_writeback(page);
goto out;
}
- mutex_lock(&new_inode->i_mutex);
- ret = ocfs2_inode_lock(new_inode, &new_bh, 1);
+ mutex_lock_nested(&new_inode->i_mutex, I_MUTEX_CHILD);
+ ret = ocfs2_inode_lock_nested(new_inode, &new_bh, 1,
+ OI_LS_REFLINK_TARGET);
if (ret) {
mlog_errno(ret);
goto out_unlock;
struct ocfs2_alloc_reservation *resv,
int *cstart, int *clen)
{
- unsigned int wanted = *clen;
-
if (resv == NULL || ocfs2_resmap_disabled(resmap))
return -ENOSPC;
spin_lock(&resv_lock);
- /*
- * We don't want to over-allocate for temporary
- * windows. Otherwise, we run the risk of fragmenting the
- * allocation space.
- */
- wanted = ocfs2_resv_window_bits(resmap, resv);
- if ((resv->r_flags & OCFS2_RESV_FLAG_TMP) || wanted < *clen)
- wanted = *clen;
-
if (ocfs2_resv_empty(resv)) {
- mlog(0, "empty reservation, find new window\n");
+ /*
+ * We don't want to over-allocate for temporary
+ * windows. Otherwise, we run the risk of fragmenting the
+ * allocation space.
+ */
+ unsigned int wanted = ocfs2_resv_window_bits(resmap, resv);
+ if ((resv->r_flags & OCFS2_RESV_FLAG_TMP) || wanted < *clen)
+ wanted = *clen;
+
+ mlog(0, "empty reservation, find new window\n");
/*
* Try to get a window here. If it works, we must fall
* through and test the bitmap . This avoids some
u64 sr_bg_blkno; /* The bg we allocated from. Set
to 0 when a block group is
contiguous. */
+ u64 sr_bg_stable_blkno; /*
+ * Doesn't change, always
+ * set to target block
+ * group descriptor
+ * block.
+ */
u64 sr_blkno; /* The first allocated block */
unsigned int sr_bit_offset; /* The bit in the bg */
unsigned int sr_bits; /* How many bits we claimed */
};
+static u64 ocfs2_group_from_res(struct ocfs2_suballoc_result *res)
+{
+ if (res->sr_blkno == 0)
+ return 0;
+
+ if (res->sr_bg_blkno)
+ return res->sr_bg_blkno;
+
+ return ocfs2_which_suballoc_group(res->sr_blkno, res->sr_bit_offset);
+}
+
static inline void ocfs2_debug_bg(struct ocfs2_group_desc *bg);
static inline void ocfs2_debug_suballoc_inode(struct ocfs2_dinode *fe);
static inline u16 ocfs2_find_victim_chain(struct ocfs2_chain_list *cl);
brelse(ac->ac_bh);
ac->ac_bh = NULL;
ac->ac_resv = NULL;
+ if (ac->ac_find_loc_priv) {
+ kfree(ac->ac_find_loc_priv);
+ ac->ac_find_loc_priv = NULL;
+ }
}
void ocfs2_free_alloc_context(struct ocfs2_alloc_context *ac)
static void ocfs2_bg_discontig_add_extent(struct ocfs2_super *osb,
struct ocfs2_group_desc *bg,
struct ocfs2_chain_list *cl,
- u64 p_blkno, u32 clusters)
+ u64 p_blkno, unsigned int clusters)
{
struct ocfs2_extent_list *el = &bg->bg_list;
struct ocfs2_extent_rec *rec;
rec->e_blkno = cpu_to_le64(p_blkno);
rec->e_cpos = cpu_to_le32(le16_to_cpu(bg->bg_bits) /
le16_to_cpu(cl->cl_bpc));
- rec->e_leaf_clusters = cpu_to_le32(clusters);
+ rec->e_leaf_clusters = cpu_to_le16(clusters);
le16_add_cpu(&bg->bg_bits, clusters * le16_to_cpu(cl->cl_bpc));
le16_add_cpu(&bg->bg_free_bits_count,
clusters * le16_to_cpu(cl->cl_bpc));
if (!ret)
ocfs2_bg_discontig_fix_result(ac, gd, res);
+ /*
+ * sr_bg_blkno might have been changed by
+ * ocfs2_bg_discontig_fix_result
+ */
+ res->sr_bg_stable_blkno = group_bh->b_blocknr;
+
+ if (ac->ac_find_loc_only)
+ goto out_loc_only;
+
ret = ocfs2_alloc_dinode_update_counts(alloc_inode, handle, ac->ac_bh,
res->sr_bits,
le16_to_cpu(gd->bg_chain));
if (ret < 0)
mlog_errno(ret);
+out_loc_only:
*bits_left = le16_to_cpu(gd->bg_free_bits_count);
out:
{
int status;
u16 chain;
- u32 tmp_used;
u64 next_group;
struct inode *alloc_inode = ac->ac_inode;
struct buffer_head *group_bh = NULL;
if (!status)
ocfs2_bg_discontig_fix_result(ac, bg, res);
+ /*
+ * sr_bg_blkno might have been changed by
+ * ocfs2_bg_discontig_fix_result
+ */
+ res->sr_bg_stable_blkno = group_bh->b_blocknr;
/*
* Keep track of previous block descriptor read. When
}
}
- /* Ok, claim our bits now: set the info on dinode, chainlist
- * and then the group */
- status = ocfs2_journal_access_di(handle,
- INODE_CACHE(alloc_inode),
- ac->ac_bh,
- OCFS2_JOURNAL_ACCESS_WRITE);
- if (status < 0) {
+ if (ac->ac_find_loc_only)
+ goto out_loc_only;
+
+ status = ocfs2_alloc_dinode_update_counts(alloc_inode, handle,
+ ac->ac_bh, res->sr_bits,
+ chain);
+ if (status) {
mlog_errno(status);
goto bail;
}
- tmp_used = le32_to_cpu(fe->id1.bitmap1.i_used);
- fe->id1.bitmap1.i_used = cpu_to_le32(res->sr_bits + tmp_used);
- le32_add_cpu(&cl->cl_recs[chain].c_free, -res->sr_bits);
- ocfs2_journal_dirty(handle, ac->ac_bh);
-
status = ocfs2_block_group_set_bits(handle,
alloc_inode,
bg,
mlog(0, "Allocated %u bits from suballocator %llu\n", res->sr_bits,
(unsigned long long)le64_to_cpu(fe->i_blkno));
+out_loc_only:
*bits_left = le16_to_cpu(bg->bg_free_bits_count);
bail:
brelse(group_bh);
int status;
u16 victim, i;
u16 bits_left = 0;
+ u64 hint = ac->ac_last_group;
struct ocfs2_chain_list *cl;
struct ocfs2_dinode *fe;
goto bail;
}
- res->sr_bg_blkno = ac->ac_last_group;
+ res->sr_bg_blkno = hint;
if (res->sr_bg_blkno) {
/* Attempt to short-circuit the usual search mechanism
* by jumping straight to the most recently used
status = ocfs2_search_chain(ac, handle, bits_wanted, min_bits,
res, &bits_left);
- if (!status)
+ if (!status) {
+ hint = ocfs2_group_from_res(res);
goto set_hint;
+ }
if (status < 0 && status != -ENOSPC) {
mlog_errno(status);
goto bail;
ac->ac_chain = i;
status = ocfs2_search_chain(ac, handle, bits_wanted, min_bits,
res, &bits_left);
- if (!status)
+ if (!status) {
+ hint = ocfs2_group_from_res(res);
break;
+ }
if (status < 0 && status != -ENOSPC) {
mlog_errno(status);
goto bail;
if (bits_left < min_bits)
ac->ac_last_group = 0;
else
- ac->ac_last_group = res->sr_bg_blkno;
+ ac->ac_last_group = hint;
}
bail:
OCFS2_I(dir)->ip_last_used_slot = ac->ac_alloc_slot;
}
+int ocfs2_find_new_inode_loc(struct inode *dir,
+ struct buffer_head *parent_fe_bh,
+ struct ocfs2_alloc_context *ac,
+ u64 *fe_blkno)
+{
+ int ret;
+ handle_t *handle = NULL;
+ struct ocfs2_suballoc_result *res;
+
+ BUG_ON(!ac);
+ BUG_ON(ac->ac_bits_given != 0);
+ BUG_ON(ac->ac_bits_wanted != 1);
+ BUG_ON(ac->ac_which != OCFS2_AC_USE_INODE);
+
+ res = kzalloc(sizeof(*res), GFP_NOFS);
+ if (res == NULL) {
+ ret = -ENOMEM;
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ocfs2_init_inode_ac_group(dir, parent_fe_bh, ac);
+
+ /*
+ * The handle started here is for chain relink. Alternatively,
+ * we could just disable relink for these calls.
+ */
+ handle = ocfs2_start_trans(OCFS2_SB(dir->i_sb), OCFS2_SUBALLOC_ALLOC);
+ if (IS_ERR(handle)) {
+ ret = PTR_ERR(handle);
+ handle = NULL;
+ mlog_errno(ret);
+ goto out;
+ }
+
+ /*
+ * This will instruct ocfs2_claim_suballoc_bits and
+ * ocfs2_search_one_group to search but save actual allocation
+ * for later.
+ */
+ ac->ac_find_loc_only = 1;
+
+ ret = ocfs2_claim_suballoc_bits(ac, handle, 1, 1, res);
+ if (ret < 0) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ac->ac_find_loc_priv = res;
+ *fe_blkno = res->sr_blkno;
+
+out:
+ if (handle)
+ ocfs2_commit_trans(OCFS2_SB(dir->i_sb), handle);
+
+ if (ret)
+ kfree(res);
+
+ return ret;
+}
+
+int ocfs2_claim_new_inode_at_loc(handle_t *handle,
+ struct inode *dir,
+ struct ocfs2_alloc_context *ac,
+ u64 *suballoc_loc,
+ u16 *suballoc_bit,
+ u64 di_blkno)
+{
+ int ret;
+ u16 chain;
+ struct ocfs2_suballoc_result *res = ac->ac_find_loc_priv;
+ struct buffer_head *bg_bh = NULL;
+ struct ocfs2_group_desc *bg;
+ struct ocfs2_dinode *di = (struct ocfs2_dinode *) ac->ac_bh->b_data;
+
+ /*
+ * Since di_blkno is being passed back in, we check for any
+ * inconsistencies which may have happened between
+ * calls. These are code bugs as di_blkno is not expected to
+ * change once returned from ocfs2_find_new_inode_loc()
+ */
+ BUG_ON(res->sr_blkno != di_blkno);
+
+ ret = ocfs2_read_group_descriptor(ac->ac_inode, di,
+ res->sr_bg_stable_blkno, &bg_bh);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ bg = (struct ocfs2_group_desc *) bg_bh->b_data;
+ chain = le16_to_cpu(bg->bg_chain);
+
+ ret = ocfs2_alloc_dinode_update_counts(ac->ac_inode, handle,
+ ac->ac_bh, res->sr_bits,
+ chain);
+ if (ret) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ ret = ocfs2_block_group_set_bits(handle,
+ ac->ac_inode,
+ bg,
+ bg_bh,
+ res->sr_bit_offset,
+ res->sr_bits);
+ if (ret < 0) {
+ mlog_errno(ret);
+ goto out;
+ }
+
+ mlog(0, "Allocated %u bits from suballocator %llu\n", res->sr_bits,
+ (unsigned long long)di_blkno);
+
+ atomic_inc(&OCFS2_SB(ac->ac_inode->i_sb)->alloc_stats.bg_allocs);
+
+ BUG_ON(res->sr_bits != 1);
+
+ *suballoc_loc = res->sr_bg_blkno;
+ *suballoc_bit = res->sr_bit_offset;
+ ac->ac_bits_given++;
+ ocfs2_save_inode_ac_group(dir, ac);
+
+out:
+ brelse(bg_bh);
+
+ return ret;
+}
+
int ocfs2_claim_new_inode(handle_t *handle,
struct inode *dir,
struct buffer_head *parent_fe_bh,
* suballoc_bit.
*/
static int ocfs2_get_suballoc_slot_bit(struct ocfs2_super *osb, u64 blkno,
- u16 *suballoc_slot, u16 *suballoc_bit)
+ u16 *suballoc_slot, u64 *group_blkno,
+ u16 *suballoc_bit)
{
int status;
struct buffer_head *inode_bh = NULL;
*suballoc_slot = le16_to_cpu(inode_fe->i_suballoc_slot);
if (suballoc_bit)
*suballoc_bit = le16_to_cpu(inode_fe->i_suballoc_bit);
+ if (group_blkno)
+ *group_blkno = le64_to_cpu(inode_fe->i_suballoc_loc);
bail:
brelse(inode_bh);
*/
static int ocfs2_test_suballoc_bit(struct ocfs2_super *osb,
struct inode *suballoc,
- struct buffer_head *alloc_bh, u64 blkno,
+ struct buffer_head *alloc_bh,
+ u64 group_blkno, u64 blkno,
u16 bit, int *res)
{
struct ocfs2_dinode *alloc_di;
goto bail;
}
- if (alloc_di->i_suballoc_loc)
- bg_blkno = le64_to_cpu(alloc_di->i_suballoc_loc);
- else
- bg_blkno = ocfs2_which_suballoc_group(blkno, bit);
+ bg_blkno = group_blkno ? group_blkno :
+ ocfs2_which_suballoc_group(blkno, bit);
status = ocfs2_read_group_descriptor(suballoc, alloc_di, bg_blkno,
&group_bh);
if (status < 0) {
int ocfs2_test_inode_bit(struct ocfs2_super *osb, u64 blkno, int *res)
{
int status;
+ u64 group_blkno = 0;
u16 suballoc_bit = 0, suballoc_slot = 0;
struct inode *inode_alloc_inode;
struct buffer_head *alloc_bh = NULL;
mlog_entry("blkno: %llu", (unsigned long long)blkno);
status = ocfs2_get_suballoc_slot_bit(osb, blkno, &suballoc_slot,
- &suballoc_bit);
+ &group_blkno, &suballoc_bit);
if (status < 0) {
mlog(ML_ERROR, "get alloc slot and bit failed %d\n", status);
goto bail;
}
status = ocfs2_test_suballoc_bit(osb, inode_alloc_inode, alloc_bh,
- blkno, suballoc_bit, res);
+ group_blkno, blkno, suballoc_bit, res);
if (status < 0)
mlog(ML_ERROR, "test suballoc bit failed %d\n", status);
u64 ac_max_block; /* Highest block number to allocate. 0 is
is the same as ~0 - unlimited */
+ int ac_find_loc_only; /* hack for reflink operation ordering */
+ struct ocfs2_suballoc_result *ac_find_loc_priv; /* */
+
struct ocfs2_alloc_reservation *ac_resv;
};
struct ocfs2_alloc_context **meta_ac);
int ocfs2_test_inode_bit(struct ocfs2_super *osb, u64 blkno, int *res);
+
+
+
+/*
+ * The following two interfaces are for ocfs2_create_inode_in_orphan().
+ */
+int ocfs2_find_new_inode_loc(struct inode *dir,
+ struct buffer_head *parent_fe_bh,
+ struct ocfs2_alloc_context *ac,
+ u64 *fe_blkno);
+
+int ocfs2_claim_new_inode_at_loc(handle_t *handle,
+ struct inode *dir,
+ struct ocfs2_alloc_context *ac,
+ u64 *suballoc_loc,
+ u16 *suballoc_bit,
+ u64 di_blkno);
+
#endif /* _CHAINALLOC_H_ */
}
/* Fast symlinks can't be large */
- len = strlen(target);
+ len = strnlen(target, ocfs2_fast_symlink_chars(inode->i_sb));
link = kzalloc(len + 1, GFP_NOFS);
if (!link) {
status = -ENOMEM;
xis.inode_bh = xbs.inode_bh = di_bh;
di = (struct ocfs2_dinode *)di_bh->b_data;
- down_read(&oi->ip_xattr_sem);
ret = ocfs2_xattr_ibody_get(inode, name_index, name, buffer,
buffer_size, &xis);
if (ret == -ENODATA && di->i_xattr_loc)
ret = ocfs2_xattr_block_get(inode, name_index, name, buffer,
buffer_size, &xbs);
- up_read(&oi->ip_xattr_sem);
return ret;
}
mlog_errno(ret);
return ret;
}
+ down_read(&OCFS2_I(inode)->ip_xattr_sem);
ret = ocfs2_xattr_get_nolock(inode, di_bh, name_index,
name, buffer, buffer_size);
+ up_read(&OCFS2_I(inode)->ip_xattr_sem);
ocfs2_inode_unlock(inode, 0);
INF("auxv", S_IRUSR, proc_pid_auxv),
ONE("status", S_IRUGO, proc_pid_status),
ONE("personality", S_IRUSR, proc_pid_personality),
- INF("limits", S_IRUSR, proc_pid_limits),
+ INF("limits", S_IRUGO, proc_pid_limits),
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
#endif
INF("auxv", S_IRUSR, proc_pid_auxv),
ONE("status", S_IRUGO, proc_pid_status),
ONE("personality", S_IRUSR, proc_pid_personality),
- INF("limits", S_IRUSR, proc_pid_limits),
+ INF("limits", S_IRUGO, proc_pid_limits),
#ifdef CONFIG_SCHED_DEBUG
REG("sched", S_IRUGO|S_IWUSR, proc_pid_sched_operations),
#endif
u |= kpf_copy_bit(k, KPF_HWPOISON, PG_hwpoison);
#endif
-#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR
+#ifdef CONFIG_ARCH_USES_PG_UNCACHED
u |= kpf_copy_bit(k, KPF_UNCACHED, PG_uncached);
#endif
/* We don't show the stack guard page in /proc/maps */
start = vma->vm_start;
if (vma->vm_flags & VM_GROWSDOWN)
- start += PAGE_SIZE;
+ if (!vma_stack_continue(vma->vm_prev, vma->vm_start))
+ start += PAGE_SIZE;
seq_printf(m, "%08lx-%08lx %c%c%c%c %08llx %02x:%02x %lu %n",
start,
mss->referenced += PAGE_SIZE;
mapcount = page_mapcount(page);
if (mapcount >= 2) {
- if (pte_dirty(ptent))
+ if (pte_dirty(ptent) || PageDirty(page))
mss->shared_dirty += PAGE_SIZE;
else
mss->shared_clean += PAGE_SIZE;
mss->pss += (PAGE_SIZE << PSS_SHIFT) / mapcount;
} else {
- if (pte_dirty(ptent))
+ if (pte_dirty(ptent) || PageDirty(page))
mss->private_dirty += PAGE_SIZE;
else
mss->private_clean += PAGE_SIZE;
static const struct file_operations proc_vmcore_operations = {
.read = read_vmcore,
- .llseek = generic_file_llseek,
+ .llseek = default_llseek,
};
static struct vmcore* __init get_new_element(void)
int reiserfs_unpack(struct inode *inode, struct file *filp)
{
int retval = 0;
+ int depth;
int index;
struct page *page;
struct address_space *mapping;
/* we need to make sure nobody is changing the file size beneath
** us
*/
- mutex_lock(&inode->i_mutex);
- reiserfs_write_lock(inode->i_sb);
+ reiserfs_mutex_lock_safe(&inode->i_mutex, inode->i_sb);
+ depth = reiserfs_write_lock_once(inode->i_sb);
write_from = inode->i_size & (blocksize - 1);
/* if we are on a block boundary, we are already unpacked. */
out:
mutex_unlock(&inode->i_mutex);
- reiserfs_write_unlock(inode->i_sb);
+ reiserfs_write_unlock_once(inode->i_sb, depth);
return retval;
}
if (!xfs_buf_zone)
goto out;
- xfslogd_workqueue = create_workqueue("xfslogd");
+ xfslogd_workqueue = alloc_workqueue("xfslogd",
+ WQ_RESCUER | WQ_HIGHPRI, 1);
if (!xfslogd_workqueue)
goto out_free_buf_zone;
{
struct fsxattr fa;
+ memset(&fa, 0, sizeof(struct fsxattr));
+
xfs_ilock(ip, XFS_ILOCK_SHARED);
fa.fsx_xflags = xfs_ip2xflags(ip);
fa.fsx_extsize = ip->i_d.di_extsize << ip->i_mount->m_sb.sb_blocklog;
xfs_perag_put(pag);
}
-void
-__xfs_inode_clear_reclaim_tag(
- xfs_mount_t *mp,
+STATIC void
+__xfs_inode_clear_reclaim(
xfs_perag_t *pag,
xfs_inode_t *ip)
{
- radix_tree_tag_clear(&pag->pag_ici_root,
- XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
pag->pag_ici_reclaimable--;
if (!pag->pag_ici_reclaimable) {
/* clear the reclaim tag from the perag radix tree */
}
}
+void
+__xfs_inode_clear_reclaim_tag(
+ xfs_mount_t *mp,
+ xfs_perag_t *pag,
+ xfs_inode_t *ip)
+{
+ radix_tree_tag_clear(&pag->pag_ici_root,
+ XFS_INO_TO_AGINO(mp, ip->i_ino), XFS_ICI_RECLAIM_TAG);
+ __xfs_inode_clear_reclaim(pag, ip);
+}
+
/*
* Inodes in different states need to be treated differently, and the return
* value of xfs_iflush is not sufficient to get this right. The following table
if (!radix_tree_delete(&pag->pag_ici_root,
XFS_INO_TO_AGINO(ip->i_mount, ip->i_ino)))
ASSERT(0);
+ __xfs_inode_clear_reclaim(pag, ip);
write_unlock(&pag->pag_ici_lock);
/*
new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_SLEEP|KM_NOFS);
new_ctx->ticket = xlog_cil_ticket_alloc(log);
- /* lock out transaction commit, but don't block on background push */
+ /*
+ * Lock out transaction commit, but don't block for background pushes
+ * unless we are well over the CIL space limit. See the definition of
+ * XLOG_CIL_HARD_SPACE_LIMIT() for the full explanation of the logic
+ * used here.
+ */
if (!down_write_trylock(&cil->xc_ctx_lock)) {
- if (!push_seq)
+ if (!push_seq &&
+ cil->xc_ctx->space_used < XLOG_CIL_HARD_SPACE_LIMIT(log))
goto out_free_ticket;
down_write(&cil->xc_ctx_lock);
}
goto out_skip;
/* check for a previously pushed seqeunce */
- if (push_seq < cil->xc_ctx->sequence)
+ if (push_seq && push_seq < cil->xc_ctx->sequence)
goto out_skip;
/*
};
/*
- * The amount of log space we should the CIL to aggregate is difficult to size.
- * Whatever we chose we have to make we can get a reservation for the log space
- * effectively, that it is large enough to capture sufficient relogging to
- * reduce log buffer IO significantly, but it is not too large for the log or
- * induces too much latency when writing out through the iclogs. We track both
- * space consumed and the number of vectors in the checkpoint context, so we
- * need to decide which to use for limiting.
+ * The amount of log space we allow the CIL to aggregate is difficult to size.
+ * Whatever we choose, we have to make sure we can get a reservation for the
+ * log space effectively, that it is large enough to capture sufficient
+ * relogging to reduce log buffer IO significantly, but it is not too large for
+ * the log or induces too much latency when writing out through the iclogs. We
+ * track both space consumed and the number of vectors in the checkpoint
+ * context, so we need to decide which to use for limiting.
*
* Every log buffer we write out during a push needs a header reserved, which
* is at least one sector and more for v2 logs. Hence we need a reservation of
* checkpoint transaction ticket is specific to the checkpoint context, rather
* than the CIL itself.
*
- * With dynamic reservations, we can basically make up arbitrary limits for the
- * checkpoint size so long as they don't violate any other size rules. Hence
- * the initial maximum size for the checkpoint transaction will be set to a
- * quarter of the log or 8MB, which ever is smaller. 8MB is an arbitrary limit
- * right now based on the latency of writing out a large amount of data through
- * the circular iclog buffers.
+ * With dynamic reservations, we can effectively make up arbitrary limits for
+ * the checkpoint size so long as they don't violate any other size rules.
+ * Recovery imposes a rule that no transaction exceed half the log, so we are
+ * limited by that. Furthermore, the log transaction reservation subsystem
+ * tries to keep 25% of the log free, so we need to keep below that limit or we
+ * risk running out of free log space to start any new transactions.
+ *
+ * In order to keep background CIL push efficient, we will set a lower
+ * threshold at which background pushing is attempted without blocking current
+ * transaction commits. A separate, higher bound defines when CIL pushes are
+ * enforced to ensure we stay within our maximum checkpoint size bounds.
+ * threshold, yet give us plenty of space for aggregation on large logs.
*/
-
-#define XLOG_CIL_SPACE_LIMIT(log) \
- (min((log->l_logsize >> 2), (8 * 1024 * 1024)))
+#define XLOG_CIL_SPACE_LIMIT(log) (log->l_logsize >> 3)
+#define XLOG_CIL_HARD_SPACE_LIMIT(log) (3 * (log->l_logsize >> 4))
/*
* The reservation head lsn is not made up of a cycle number and block number.
extern u8 acpi_gbl_permanent_mmap;
/*
- * Globals that are publically available, allowing for
+ * Globals that are publicly available, allowing for
* run time configuration
*/
extern u32 acpi_dbg_level;
* While the GPIO programming interface defines valid GPIO numbers
* to be in the range 0..MAX_INT, this library restricts them to the
* smaller range 0..ARCH_NR_GPIOS-1.
+ *
+ * ARCH_NR_GPIOS is somewhat arbitrary; it usually reflects the sum of
+ * builtin/SoC GPIOs plus a number of GPIOs on expanders; the latter is
+ * actually an estimate of a board-specific value.
*/
#ifndef ARCH_NR_GPIOS
#define ARCH_NR_GPIOS 256
#endif
+/*
+ * "valid" GPIO numbers are nonnegative and may be passed to
+ * setup routines like gpio_request(). only some valid numbers
+ * can successfully be requested and used.
+ *
+ * Invalid GPIO numbers are useful for indicating no-such-GPIO in
+ * platform data and other tables.
+ */
+
static inline int gpio_is_valid(int number)
{
- /* only some non-negative numbers are valid */
return ((unsigned)number) < ARCH_NR_GPIOS;
}
#include <linux/cache.h>
#include <linux/threads.h>
-#include <linux/irq.h>
typedef struct {
unsigned int __softirq_pending;
} ____cacheline_aligned irq_cpustat_t;
#include <linux/irq_cpustat.h> /* Standard mappings for irq_cpustat_t above */
+#include <linux/irq.h>
#ifndef ack_bad_irq
static inline void ack_bad_irq(unsigned int irq)
#define move_pte(pte, prot, old_addr, new_addr) (pte)
#endif
+#ifndef flush_tlb_fix_spurious_fault
+#define flush_tlb_fix_spurious_fault(vma, address) flush_tlb_page(vma, address)
+#endif
+
#ifndef pgprot_noncached
#define pgprot_noncached(prot) (prot)
#endif
\
BUG_TABLE \
\
+ JUMP_TABLE \
+ \
/* PCI quirks */ \
.pci_fixup : AT(ADDR(.pci_fixup) - LOAD_OFFSET) { \
VMLINUX_SYMBOL(__start_pci_fixups_early) = .; \
#define BUG_TABLE
#endif
+#define JUMP_TABLE \
+ . = ALIGN(8); \
+ __jump_table : AT(ADDR(__jump_table) - LOAD_OFFSET) { \
+ VMLINUX_SYMBOL(__start___jump_table) = .; \
+ *(__jump_table) \
+ VMLINUX_SYMBOL(__stop___jump_table) = .; \
+ }
+
#ifdef CONFIG_PM_TRACE
#define TRACEDATA \
. = ALIGN(4); \
- LOAD_OFFSET) { \
VMLINUX_SYMBOL(__per_cpu_start) = .; \
*(.data..percpu..first) \
+ . = ALIGN(PAGE_SIZE); \
*(.data..percpu..page_aligned) \
+ *(.data..percpu..readmostly) \
*(.data..percpu) \
*(.data..percpu..shared_aligned) \
VMLINUX_SYMBOL(__per_cpu_end) = .; \
VMLINUX_SYMBOL(__per_cpu_load) = .; \
VMLINUX_SYMBOL(__per_cpu_start) = .; \
*(.data..percpu..first) \
+ . = ALIGN(PAGE_SIZE); \
*(.data..percpu..page_aligned) \
+ *(.data..percpu..readmostly) \
*(.data..percpu) \
*(.data..percpu..shared_aligned) \
VMLINUX_SYMBOL(__per_cpu_end) = .; \
struct kref refcount;
/** Handle count of this object. Each handle also holds a reference */
- struct kref handlecount;
+ atomic_t handle_count; /* number of handles on this object */
/** Related drm device */
struct drm_device *dev;
*/
int (*gem_init_object) (struct drm_gem_object *obj);
void (*gem_free_object) (struct drm_gem_object *obj);
- void (*gem_free_object_unlocked) (struct drm_gem_object *obj);
/* vga arb irq handler */
void (*vgaarb_irq)(struct drm_device *dev, bool state);
extern int drm_mmap(struct file *filp, struct vm_area_struct *vma);
extern int drm_mmap_locked(struct file *filp, struct vm_area_struct *vma);
extern void drm_vm_open_locked(struct vm_area_struct *vma);
+extern void drm_vm_close_locked(struct vm_area_struct *vma);
extern resource_size_t drm_core_get_map_ofs(struct drm_local_map * map);
extern resource_size_t drm_core_get_reg_ofs(struct drm_device *dev);
extern unsigned int drm_poll(struct file *filp, struct poll_table_struct *wait);
void drm_gem_destroy(struct drm_device *dev);
void drm_gem_object_release(struct drm_gem_object *obj);
void drm_gem_object_free(struct kref *kref);
-void drm_gem_object_free_unlocked(struct kref *kref);
struct drm_gem_object *drm_gem_object_alloc(struct drm_device *dev,
size_t size);
int drm_gem_object_init(struct drm_device *dev,
struct drm_gem_object *obj, size_t size);
-void drm_gem_object_handle_free(struct kref *kref);
+void drm_gem_object_handle_free(struct drm_gem_object *obj);
void drm_gem_vm_open(struct vm_area_struct *vma);
void drm_gem_vm_close(struct vm_area_struct *vma);
int drm_gem_mmap(struct file *filp, struct vm_area_struct *vma);
static inline void
drm_gem_object_unreference_unlocked(struct drm_gem_object *obj)
{
- if (obj != NULL)
- kref_put(&obj->refcount, drm_gem_object_free_unlocked);
+ if (obj != NULL) {
+ struct drm_device *dev = obj->dev;
+ mutex_lock(&dev->struct_mutex);
+ kref_put(&obj->refcount, drm_gem_object_free);
+ mutex_unlock(&dev->struct_mutex);
+ }
}
int drm_gem_handle_create(struct drm_file *file_priv,
drm_gem_object_handle_reference(struct drm_gem_object *obj)
{
drm_gem_object_reference(obj);
- kref_get(&obj->handlecount);
+ atomic_inc(&obj->handle_count);
}
static inline void
if (obj == NULL)
return;
+ if (atomic_read(&obj->handle_count) == 0)
+ return;
/*
* Must bump handle count first as this may be the last
* ref, in which case the object would disappear before we
* checked for a name
*/
- kref_put(&obj->handlecount, drm_gem_object_handle_free);
+ if (atomic_dec_and_test(&obj->handle_count))
+ drm_gem_object_handle_free(obj);
drm_gem_object_unreference(obj);
}
if (obj == NULL)
return;
+ if (atomic_read(&obj->handle_count) == 0)
+ return;
+
/*
* Must bump handle count first as this may be the last
* ref, in which case the object would disappear before we
* checked for a name
*/
- kref_put(&obj->handlecount, drm_gem_object_handle_free);
+
+ if (atomic_dec_and_test(&obj->handle_count))
+ drm_gem_object_handle_free(obj);
drm_gem_object_unreference_unlocked(obj);
}
void (*dpms)(struct drm_connector *connector, int mode);
void (*save)(struct drm_connector *connector);
void (*restore)(struct drm_connector *connector);
- enum drm_connector_status (*detect)(struct drm_connector *connector);
+
+ /* Check to see if anything is attached to the connector.
+ * @force is set to false whilst polling, true when checking the
+ * connector due to user request. @force can be used by the driver
+ * to avoid expensive, destructive operations during automated
+ * probing.
+ */
+ enum drm_connector_status (*detect)(struct drm_connector *connector,
+ bool force);
int (*fill_modes)(struct drm_connector *connector, uint32_t max_width, uint32_t max_height);
int (*set_property)(struct drm_connector *connector, struct drm_property *property,
uint64_t val);
{0x1002, 0x5460, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV380|RADEON_IS_MOBILITY}, \
{0x1002, 0x5462, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV380|RADEON_IS_MOBILITY}, \
{0x1002, 0x5464, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV380|RADEON_IS_MOBILITY}, \
- {0x1002, 0x5657, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV380|RADEON_NEW_MEMMAP}, \
{0x1002, 0x5548, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_R423|RADEON_NEW_MEMMAP}, \
{0x1002, 0x5549, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_R423|RADEON_NEW_MEMMAP}, \
{0x1002, 0x554A, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_R423|RADEON_NEW_MEMMAP}, \
{0x1002, 0x564F, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV410|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x5652, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV410|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
{0x1002, 0x5653, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV410|RADEON_IS_MOBILITY|RADEON_NEW_MEMMAP}, \
+ {0x1002, 0x5657, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RV410|RADEON_NEW_MEMMAP}, \
{0x1002, 0x5834, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RS300|RADEON_IS_IGP}, \
{0x1002, 0x5835, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RS300|RADEON_IS_IGP|RADEON_IS_MOBILITY}, \
{0x1002, 0x5954, PCI_ANY_ID, PCI_ANY_ID, 0, 0, CHIP_RS480|RADEON_IS_IGP|RADEON_IS_MOBILITY|RADEON_IS_IGPGART}, \
atomic_t reserved;
-
/**
* Members protected by the bo::lock
+ * In addition, setting sync_obj to anything else
+ * than NULL requires bo::reserved to be held. This allows for
+ * checking NULL while reserved but not holding bo::lock.
*/
void *sync_obj_arg;
header-y += ext2_fs.h
header-y += fadvise.h
header-y += falloc.h
-header-y += fanotify.h
header-y += fb.h
header-y += fcntl.h
header-y += fd.h
return acpi_pm_read_verified() & ACPI_PM_MASK;
}
-extern void pmtimer_wait(unsigned);
-
#else
static inline u32 acpi_pm_read_early(void)
#ifndef _FS_CEPH_AUTH_H
#define _FS_CEPH_AUTH_H
-#include "types.h"
-#include "buffer.h"
+#include <linux/ceph/types.h>
+#include <linux/ceph/buffer.h>
/*
* Abstract interface for communicating with the authenticate module.
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-#ifdef CONFIG_CEPH_FS_PRETTYDEBUG
+#ifdef CONFIG_CEPH_LIB_PRETTYDEBUG
/*
* wrap pr_debug to include a filename:lineno prefix on each line.
# if defined(DEBUG) || defined(CONFIG_DYNAMIC_DEBUG)
extern const char *ceph_file_part(const char *s, int len);
# define dout(fmt, ...) \
- pr_debug(" %12.12s:%-4d : " fmt, \
+ pr_debug("%.*s %12.12s:%-4d : " fmt, \
+ 8 - (int)sizeof(KBUILD_MODNAME), " ", \
ceph_file_part(__FILE__, sizeof(__FILE__)), \
__LINE__, ##__VA_ARGS__)
# else
CEPH_MDS_OP_SETATTR = 0x01108,
CEPH_MDS_OP_SETFILELOCK= 0x01109,
CEPH_MDS_OP_GETFILELOCK= 0x00110,
+ CEPH_MDS_OP_SETDIRLAYOUT=0x0110a,
CEPH_MDS_OP_MKNOD = 0x01201,
CEPH_MDS_OP_LINK = 0x01202,
--- /dev/null
+#ifndef _FS_CEPH_DEBUGFS_H
+#define _FS_CEPH_DEBUGFS_H
+
+#include "ceph_debug.h"
+#include "types.h"
+
+#define CEPH_DEFINE_SHOW_FUNC(name) \
+static int name##_open(struct inode *inode, struct file *file) \
+{ \
+ struct seq_file *sf; \
+ int ret; \
+ \
+ ret = single_open(file, name, NULL); \
+ sf = file->private_data; \
+ sf->private = inode->i_private; \
+ return ret; \
+} \
+ \
+static const struct file_operations name##_fops = { \
+ .open = name##_open, \
+ .read = seq_read, \
+ .llseek = seq_lseek, \
+ .release = single_release, \
+};
+
+/* debugfs.c */
+extern int ceph_debugfs_init(void);
+extern void ceph_debugfs_cleanup(void);
+extern int ceph_debugfs_client_init(struct ceph_client *client);
+extern void ceph_debugfs_client_cleanup(struct ceph_client *client);
+
+#endif
+
ceph_encode_need(p, end, n, bad); \
ceph_encode_copy(p, pv, n); \
} while (0)
+#define ceph_encode_string_safe(p, end, s, n, bad) \
+ do { \
+ ceph_encode_need(p, end, n, bad); \
+ ceph_encode_string(p, end, s, n); \
+ } while (0)
#endif
--- /dev/null
+#ifndef _FS_CEPH_LIBCEPH_H
+#define _FS_CEPH_LIBCEPH_H
+
+#include "ceph_debug.h"
+
+#include <asm/unaligned.h>
+#include <linux/backing-dev.h>
+#include <linux/completion.h>
+#include <linux/exportfs.h>
+#include <linux/fs.h>
+#include <linux/mempool.h>
+#include <linux/pagemap.h>
+#include <linux/wait.h>
+#include <linux/writeback.h>
+#include <linux/slab.h>
+
+#include "types.h"
+#include "messenger.h"
+#include "msgpool.h"
+#include "mon_client.h"
+#include "osd_client.h"
+#include "ceph_fs.h"
+
+/*
+ * Supported features
+ */
+#define CEPH_FEATURE_SUPPORTED_DEFAULT CEPH_FEATURE_NOSRCADDR
+#define CEPH_FEATURE_REQUIRED_DEFAULT CEPH_FEATURE_NOSRCADDR
+
+/*
+ * mount options
+ */
+#define CEPH_OPT_FSID (1<<0)
+#define CEPH_OPT_NOSHARE (1<<1) /* don't share client with other sbs */
+#define CEPH_OPT_MYIP (1<<2) /* specified my ip */
+#define CEPH_OPT_NOCRC (1<<3) /* no data crc on writes */
+
+#define CEPH_OPT_DEFAULT (0);
+
+#define ceph_set_opt(client, opt) \
+ (client)->options->flags |= CEPH_OPT_##opt;
+#define ceph_test_opt(client, opt) \
+ (!!((client)->options->flags & CEPH_OPT_##opt))
+
+struct ceph_options {
+ int flags;
+ struct ceph_fsid fsid;
+ struct ceph_entity_addr my_addr;
+ int mount_timeout;
+ int osd_idle_ttl;
+ int osd_timeout;
+ int osd_keepalive_timeout;
+
+ /*
+ * any type that can't be simply compared or doesn't need need
+ * to be compared should go beyond this point,
+ * ceph_compare_options() should be updated accordingly
+ */
+
+ struct ceph_entity_addr *mon_addr; /* should be the first
+ pointer type of args */
+ int num_mon;
+ char *name;
+ char *secret;
+};
+
+/*
+ * defaults
+ */
+#define CEPH_MOUNT_TIMEOUT_DEFAULT 60
+#define CEPH_OSD_TIMEOUT_DEFAULT 60 /* seconds */
+#define CEPH_OSD_KEEPALIVE_DEFAULT 5
+#define CEPH_OSD_IDLE_TTL_DEFAULT 60
+#define CEPH_MOUNT_RSIZE_DEFAULT (512*1024) /* readahead */
+
+#define CEPH_MSG_MAX_FRONT_LEN (16*1024*1024)
+#define CEPH_MSG_MAX_DATA_LEN (16*1024*1024)
+
+#define CEPH_AUTH_NAME_DEFAULT "guest"
+
+/*
+ * Delay telling the MDS we no longer want caps, in case we reopen
+ * the file. Delay a minimum amount of time, even if we send a cap
+ * message for some other reason. Otherwise, take the oppotunity to
+ * update the mds to avoid sending another message later.
+ */
+#define CEPH_CAPS_WANTED_DELAY_MIN_DEFAULT 5 /* cap release delay */
+#define CEPH_CAPS_WANTED_DELAY_MAX_DEFAULT 60 /* cap release delay */
+
+#define CEPH_CAP_RELEASE_SAFETY_DEFAULT (CEPH_CAPS_PER_RELEASE * 4)
+
+/* mount state */
+enum {
+ CEPH_MOUNT_MOUNTING,
+ CEPH_MOUNT_MOUNTED,
+ CEPH_MOUNT_UNMOUNTING,
+ CEPH_MOUNT_UNMOUNTED,
+ CEPH_MOUNT_SHUTDOWN,
+};
+
+/*
+ * subtract jiffies
+ */
+static inline unsigned long time_sub(unsigned long a, unsigned long b)
+{
+ BUG_ON(time_after(b, a));
+ return (long)a - (long)b;
+}
+
+struct ceph_mds_client;
+
+/*
+ * per client state
+ *
+ * possibly shared by multiple mount points, if they are
+ * mounting the same ceph filesystem/cluster.
+ */
+struct ceph_client {
+ struct ceph_fsid fsid;
+ bool have_fsid;
+
+ void *private;
+
+ struct ceph_options *options;
+
+ struct mutex mount_mutex; /* serialize mount attempts */
+ wait_queue_head_t auth_wq;
+ int auth_err;
+
+ int (*extra_mon_dispatch)(struct ceph_client *, struct ceph_msg *);
+
+ u32 supported_features;
+ u32 required_features;
+
+ struct ceph_messenger *msgr; /* messenger instance */
+ struct ceph_mon_client monc;
+ struct ceph_osd_client osdc;
+
+#ifdef CONFIG_DEBUG_FS
+ struct dentry *debugfs_dir;
+ struct dentry *debugfs_monmap;
+ struct dentry *debugfs_osdmap;
+#endif
+};
+
+
+
+/*
+ * snapshots
+ */
+
+/*
+ * A "snap context" is the set of existing snapshots when we
+ * write data. It is used by the OSD to guide its COW behavior.
+ *
+ * The ceph_snap_context is refcounted, and attached to each dirty
+ * page, indicating which context the dirty data belonged when it was
+ * dirtied.
+ */
+struct ceph_snap_context {
+ atomic_t nref;
+ u64 seq;
+ int num_snaps;
+ u64 snaps[];
+};
+
+static inline struct ceph_snap_context *
+ceph_get_snap_context(struct ceph_snap_context *sc)
+{
+ /*
+ printk("get_snap_context %p %d -> %d\n", sc, atomic_read(&sc->nref),
+ atomic_read(&sc->nref)+1);
+ */
+ if (sc)
+ atomic_inc(&sc->nref);
+ return sc;
+}
+
+static inline void ceph_put_snap_context(struct ceph_snap_context *sc)
+{
+ if (!sc)
+ return;
+ /*
+ printk("put_snap_context %p %d -> %d\n", sc, atomic_read(&sc->nref),
+ atomic_read(&sc->nref)-1);
+ */
+ if (atomic_dec_and_test(&sc->nref)) {
+ /*printk(" deleting snap_context %p\n", sc);*/
+ kfree(sc);
+ }
+}
+
+/*
+ * calculate the number of pages a given length and offset map onto,
+ * if we align the data.
+ */
+static inline int calc_pages_for(u64 off, u64 len)
+{
+ return ((off+len+PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT) -
+ (off >> PAGE_CACHE_SHIFT);
+}
+
+/* ceph_common.c */
+extern const char *ceph_msg_type_name(int type);
+extern int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid);
+extern struct kmem_cache *ceph_inode_cachep;
+extern struct kmem_cache *ceph_cap_cachep;
+extern struct kmem_cache *ceph_dentry_cachep;
+extern struct kmem_cache *ceph_file_cachep;
+
+extern int ceph_parse_options(struct ceph_options **popt, char *options,
+ const char *dev_name, const char *dev_name_end,
+ int (*parse_extra_token)(char *c, void *private),
+ void *private);
+extern void ceph_destroy_options(struct ceph_options *opt);
+extern int ceph_compare_options(struct ceph_options *new_opt,
+ struct ceph_client *client);
+extern struct ceph_client *ceph_create_client(struct ceph_options *opt,
+ void *private);
+extern u64 ceph_client_id(struct ceph_client *client);
+extern void ceph_destroy_client(struct ceph_client *client);
+extern int __ceph_open_session(struct ceph_client *client,
+ unsigned long started);
+extern int ceph_open_session(struct ceph_client *client);
+
+/* pagevec.c */
+extern void ceph_release_page_vector(struct page **pages, int num_pages);
+
+extern struct page **ceph_get_direct_page_vector(const char __user *data,
+ int num_pages,
+ loff_t off, size_t len);
+extern void ceph_put_page_vector(struct page **pages, int num_pages);
+extern void ceph_release_page_vector(struct page **pages, int num_pages);
+extern struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags);
+extern int ceph_copy_user_to_page_vector(struct page **pages,
+ const char __user *data,
+ loff_t off, size_t len);
+extern int ceph_copy_to_page_vector(struct page **pages,
+ const char *data,
+ loff_t off, size_t len);
+extern int ceph_copy_from_page_vector(struct page **pages,
+ char *data,
+ loff_t off, size_t len);
+extern int ceph_copy_page_vector_to_user(struct page **pages, char __user *data,
+ loff_t off, size_t len);
+extern void ceph_zero_page_vector_range(int off, int len, struct page **pages);
+
+
+#endif /* _FS_CEPH_SUPER_H */
*/
u32 global_seq;
spinlock_t global_seq_lock;
+
+ u32 supported_features;
+ u32 required_features;
};
/*
struct ceph_pagelist *pagelist; /* instead of pages */
struct list_head list_head;
struct kref kref;
+ struct bio *bio; /* instead of pages/pagelist */
+ struct bio *bio_iter; /* bio iterator */
+ int bio_seg; /* current bio segment */
+ struct ceph_pagelist *trail; /* the trailing part of the data */
bool front_is_vmalloc;
bool more_to_follow;
bool needs_out_seq;
};
-extern const char *pr_addr(const struct sockaddr_storage *ss);
+extern const char *ceph_pr_addr(const struct sockaddr_storage *ss);
extern int ceph_parse_ips(const char *c, const char *end,
struct ceph_entity_addr *addr,
int max_count, int *count);
extern void ceph_msgr_flush(void);
extern struct ceph_messenger *ceph_messenger_create(
- struct ceph_entity_addr *myaddr);
+ struct ceph_entity_addr *myaddr,
+ u32 features, u32 required);
extern void ceph_messenger_destroy(struct ceph_messenger *);
extern void ceph_con_init(struct ceph_messenger *msgr,
u64 last_tid;
/* mds/osd map */
+ int want_mdsmap;
int want_next_osdmap; /* 1 = want, 2 = want+asked */
u32 have_osdmap, have_mdsmap;
struct ceph_osd_request;
struct ceph_osd_client;
struct ceph_authorizer;
+struct ceph_pagelist;
/*
* completion callback for async writepages
struct list_head r_unsafe_item;
struct inode *r_inode; /* for use by callbacks */
+ void *r_priv; /* ditto */
char r_oid[40]; /* object name */
int r_oid_len;
struct page **r_pages; /* pages for data payload */
int r_pages_from_pool;
int r_own_pages; /* if true, i own page list */
+#ifdef CONFIG_BLOCK
+ struct bio *r_bio; /* instead of pages */
+#endif
+
+ struct ceph_pagelist *r_trail; /* trailing part of the data */
};
struct ceph_osd_client {
struct ceph_msgpool msgpool_op_reply;
};
+struct ceph_osd_req_op {
+ u16 op; /* CEPH_OSD_OP_* */
+ u32 flags; /* CEPH_OSD_FLAG_* */
+ union {
+ struct {
+ u64 offset, length;
+ u64 truncate_size;
+ u32 truncate_seq;
+ } extent;
+ struct {
+ const char *name;
+ u32 name_len;
+ const char *val;
+ u32 value_len;
+ __u8 cmp_op; /* CEPH_OSD_CMPXATTR_OP_* */
+ __u8 cmp_mode; /* CEPH_OSD_CMPXATTR_MODE_* */
+ } xattr;
+ struct {
+ const char *class_name;
+ __u8 class_len;
+ const char *method_name;
+ __u8 method_len;
+ __u8 argc;
+ const char *indata;
+ u32 indata_len;
+ } cls;
+ struct {
+ u64 cookie, count;
+ } pgls;
+ struct {
+ u64 snapid;
+ } snap;
+ };
+ u32 payload_len;
+};
+
extern int ceph_osdc_init(struct ceph_osd_client *osdc,
struct ceph_client *client);
extern void ceph_osdc_stop(struct ceph_osd_client *osdc);
extern void ceph_osdc_handle_map(struct ceph_osd_client *osdc,
struct ceph_msg *msg);
+extern void ceph_calc_raw_layout(struct ceph_osd_client *osdc,
+ struct ceph_file_layout *layout,
+ u64 snapid,
+ u64 off, u64 *plen, u64 *bno,
+ struct ceph_osd_request *req,
+ struct ceph_osd_req_op *op);
+
+extern struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
+ int flags,
+ struct ceph_snap_context *snapc,
+ struct ceph_osd_req_op *ops,
+ bool use_mempool,
+ gfp_t gfp_flags,
+ struct page **pages,
+ struct bio *bio);
+
+extern void ceph_osdc_build_request(struct ceph_osd_request *req,
+ u64 off, u64 *plen,
+ struct ceph_osd_req_op *src_ops,
+ struct ceph_snap_context *snapc,
+ struct timespec *mtime,
+ const char *oid,
+ int oid_len);
+
extern struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *,
struct ceph_file_layout *layout,
struct ceph_vino vino,
#include <linux/rbtree.h>
#include "types.h"
#include "ceph_fs.h"
-#include "crush/crush.h"
+#include <linux/crush/crush.h>
/*
* The osd map describes the current membership of the osd cluster and
extern int ceph_calc_pg_primary(struct ceph_osdmap *osdmap,
struct ceph_pg pgid);
+extern int ceph_pg_poolid_by_name(struct ceph_osdmap *map, const char *name);
+
#endif
void *mapped_tail;
size_t length;
size_t room;
+ struct list_head free_list;
+ size_t num_pages_free;
+};
+
+struct ceph_pagelist_cursor {
+ struct ceph_pagelist *pl; /* pagelist, for error checking */
+ struct list_head *page_lru; /* page in list */
+ size_t room; /* room remaining to reset to */
};
static inline void ceph_pagelist_init(struct ceph_pagelist *pl)
pl->mapped_tail = NULL;
pl->length = 0;
pl->room = 0;
+ INIT_LIST_HEAD(&pl->free_list);
+ pl->num_pages_free = 0;
}
+
extern int ceph_pagelist_release(struct ceph_pagelist *pl);
-extern int ceph_pagelist_append(struct ceph_pagelist *pl, void *d, size_t l);
+extern int ceph_pagelist_append(struct ceph_pagelist *pl, const void *d, size_t l);
+
+extern int ceph_pagelist_reserve(struct ceph_pagelist *pl, size_t space);
+
+extern int ceph_pagelist_free_reserve(struct ceph_pagelist *pl);
+
+extern void ceph_pagelist_set_cursor(struct ceph_pagelist *pl,
+ struct ceph_pagelist_cursor *c);
+
+extern int ceph_pagelist_truncate(struct ceph_pagelist *pl,
+ struct ceph_pagelist_cursor *c);
static inline int ceph_pagelist_encode_64(struct ceph_pagelist *pl, u64 v)
{
unsigned long flags;
/* ID for this css, if possible */
- struct css_id *id;
+ struct css_id __rcu *id;
};
/* bits in struct cgroup_subsys_state flags field */
struct list_head children; /* my children */
struct cgroup *parent; /* my parent */
- struct dentry *dentry; /* cgroup fs entry, RCU protected */
+ struct dentry __rcu *dentry; /* cgroup fs entry, RCU protected */
/* Private pointers for each registered subsystem */
struct cgroup_subsys_state *subsys[CGROUP_SUBSYS_COUNT];
void cgroup_iter_end(struct cgroup *cgrp, struct cgroup_iter *it);
int cgroup_scan_tasks(struct cgroup_scanner *scan);
int cgroup_attach_task(struct cgroup *, struct task_struct *);
-int cgroup_attach_task_current_cg(struct task_struct *);
+int cgroup_attach_task_all(struct task_struct *from, struct task_struct *);
+
+static inline int cgroup_attach_task_current_cg(struct task_struct *tsk)
+{
+ return cgroup_attach_task_all(current, tsk);
+}
/*
* CSS ID is ID for cgroup_subsys_state structs under subsys. This only works
}
/* No cgroups - nothing to do */
+static inline int cgroup_attach_task_all(struct task_struct *from,
+ struct task_struct *t)
+{
+ return 0;
+}
static inline int cgroup_attach_task_current_cg(struct task_struct *t)
{
return 0;
const struct compat_iovec __user *uvector, unsigned long nr_segs,
unsigned long fast_segs, struct iovec *fast_pointer,
struct iovec **ret_pointer);
+
+extern void __user *compat_alloc_user_space(unsigned long len);
+
#endif /* CONFIG_COMPAT */
#endif /* _LINUX_COMPAT_H */
# define __release(x) __context__(x,-1)
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
# define __percpu __attribute__((noderef, address_space(3)))
+#ifdef CONFIG_SPARSE_RCU_POINTER
+# define __rcu __attribute__((noderef, address_space(4)))
+#else
# define __rcu
+#endif
extern void __chk_user_ptr(const volatile void __user *);
extern void __chk_io_ptr(const volatile void __iomem *);
#else
* These are the only things you should do on a core-file: use only these
* functions to write out all the necessary info.
*/
-static inline int dump_write(struct file *file, const void *addr, int nr)
-{
- return file->f_op->write(file, addr, nr, &file->f_pos) == nr;
-}
-
-static inline int dump_seek(struct file *file, loff_t off)
-{
- int ret = 1;
-
- if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
- if (file->f_op->llseek(file, off, SEEK_CUR) < 0)
- return 0;
- } else {
- char *buf = (char *)get_zeroed_page(GFP_KERNEL);
-
- if (!buf)
- return 0;
- while (off > 0) {
- unsigned long n = off;
-
- if (n > PAGE_SIZE)
- n = PAGE_SIZE;
- if (!dump_write(file, buf, n)) {
- ret = 0;
- break;
- }
- off -= n;
- }
- free_page((unsigned long)buf);
- }
- return ret;
-}
+extern int dump_write(struct file *file, const void *addr, int nr);
+extern int dump_seek(struct file *file, loff_t off);
#endif /* _LINUX_COREDUMP_H */
#define CPUIDLE_FLAG_BALANCED (0x40) /* medium latency, moderate savings */
#define CPUIDLE_FLAG_DEEP (0x80) /* high latency, large savings */
#define CPUIDLE_FLAG_IGNORE (0x100) /* ignore during this idle period */
+#define CPUIDLE_FLAG_TLB_FLUSHED (0x200) /* tlb will be flushed */
#define CPUIDLE_DRIVER_FLAGS_MASK (0xFFFF0000)
atomic_t usage;
pid_t tgid; /* thread group process ID */
spinlock_t lock;
- struct key *session_keyring; /* keyring inherited over fork */
+ struct key __rcu *session_keyring; /* keyring inherited over fork */
struct key *process_keyring; /* keyring private to this process */
struct rcu_head rcu; /* RCU deletion hook */
};
#ifdef CONFIG_LOCKDEP
extern void debug_show_all_locks(void);
-extern void __debug_show_held_locks(struct task_struct *task);
extern void debug_show_held_locks(struct task_struct *task);
extern void debug_check_no_locks_freed(const void *from, unsigned long len);
extern void debug_check_no_locks_held(struct task_struct *task);
{
}
-static inline void __debug_show_held_locks(struct task_struct *task)
-{
-}
-
static inline void debug_show_held_locks(struct task_struct *task)
{
}
return DMA_BIT_MASK(32);
}
+#ifdef ARCH_HAS_DMA_SET_COHERENT_MASK
+int dma_set_coherent_mask(struct device *dev, u64 mask);
+#else
static inline int dma_set_coherent_mask(struct device *dev, u64 mask)
{
if (!dma_supported(dev, mask))
dev->coherent_dma_mask = mask;
return 0;
}
+#endif
extern u64 dma_get_required_mask(struct device *dev);
return (dma->max_pq & DMA_HAS_PQ_CONTINUE) == DMA_HAS_PQ_CONTINUE;
}
-static unsigned short dma_dev_to_maxpq(struct dma_device *dma)
+static inline unsigned short dma_dev_to_maxpq(struct dma_device *dma)
{
return dma->max_pq & ~DMA_HAS_PQ_CONTINUE;
}
#ifndef _DYNAMIC_DEBUG_H
#define _DYNAMIC_DEBUG_H
+#include <linux/jump_label.h>
+
/* dynamic_printk_enabled, and dynamic_printk_enabled2 are bitmasks in which
* bit n is set to 1 if any modname hashes into the bucket n, 0 otherwise. They
* use independent hash functions, to reduce the chance of false positives.
const char *function;
const char *filename;
const char *format;
- char primary_hash;
- char secondary_hash;
unsigned int lineno:24;
/*
* The flags field controls the behaviour at the callsite.
#define _DPRINTK_FLAGS_PRINT (1<<0) /* printk() a message using the format */
#define _DPRINTK_FLAGS_DEFAULT 0
unsigned int flags:8;
+ char enabled;
} __attribute__((aligned(8)));
#if defined(CONFIG_DYNAMIC_DEBUG)
extern int ddebug_remove_module(const char *mod_name);
-#define __dynamic_dbg_enabled(dd) ({ \
- int __ret = 0; \
- if (unlikely((dynamic_debug_enabled & (1LL << DEBUG_HASH)) && \
- (dynamic_debug_enabled2 & (1LL << DEBUG_HASH2)))) \
- if (unlikely(dd.flags)) \
- __ret = 1; \
- __ret; })
-
#define dynamic_pr_debug(fmt, ...) do { \
+ __label__ do_printk; \
+ __label__ out; \
static struct _ddebug descriptor \
__used \
__attribute__((section("__verbose"), aligned(8))) = \
- { KBUILD_MODNAME, __func__, __FILE__, fmt, DEBUG_HASH, \
- DEBUG_HASH2, __LINE__, _DPRINTK_FLAGS_DEFAULT }; \
- if (__dynamic_dbg_enabled(descriptor)) \
- printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); \
+ { KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__, \
+ _DPRINTK_FLAGS_DEFAULT }; \
+ JUMP_LABEL(&descriptor.enabled, do_printk); \
+ goto out; \
+do_printk: \
+ printk(KERN_DEBUG pr_fmt(fmt), ##__VA_ARGS__); \
+out: ; \
} while (0)
#define dynamic_dev_dbg(dev, fmt, ...) do { \
+ __label__ do_printk; \
+ __label__ out; \
static struct _ddebug descriptor \
__used \
__attribute__((section("__verbose"), aligned(8))) = \
- { KBUILD_MODNAME, __func__, __FILE__, fmt, DEBUG_HASH, \
- DEBUG_HASH2, __LINE__, _DPRINTK_FLAGS_DEFAULT }; \
- if (__dynamic_dbg_enabled(descriptor)) \
- dev_printk(KERN_DEBUG, dev, fmt, ##__VA_ARGS__); \
+ { KBUILD_MODNAME, __func__, __FILE__, fmt, __LINE__, \
+ _DPRINTK_FLAGS_DEFAULT }; \
+ JUMP_LABEL(&descriptor.enabled, do_printk); \
+ goto out; \
+do_printk: \
+ dev_printk(KERN_DEBUG, dev, fmt, ##__VA_ARGS__); \
+out: ; \
} while (0)
#else
#define _LINUX_EDAC_H_
#include <asm/atomic.h>
+#include <linux/sysdev.h>
#define EDAC_OPSTATE_INVAL -1
#define EDAC_OPSTATE_POLL 0
extern int edac_op_state;
extern int edac_err_assert;
extern atomic_t edac_handlers;
+extern struct sysdev_class edac_class;
extern int edac_handler_set(void);
extern void edac_atomic_assert_error(void);
+extern struct sysdev_class *edac_get_sysfs_class(void);
+extern void edac_put_sysfs_class(void);
static inline void opstate_init(void)
{
struct elevator_type *elevator_type;
struct mutex sysfs_lock;
struct hlist_head *hash;
+ unsigned int registered:1;
};
/*
extern int elevator_init(struct request_queue *, char *);
extern void elevator_exit(struct elevator_queue *);
+extern int elevator_change(struct request_queue *, const char *);
extern int elv_rq_merge_ok(struct request *, struct bio *);
/*
struct fdtable {
unsigned int max_fds;
- struct file ** fd; /* current fd array */
+ struct file __rcu **fd; /* current fd array */
fd_set *close_on_exec;
fd_set *open_fds;
struct rcu_head rcu;
* read mostly part
*/
atomic_t count;
- struct fdtable *fdt;
+ struct fdtable __rcu *fdt;
struct fdtable fdtab;
/*
* written part on a separate cache line in SMP
int next_fd;
struct embedded_fd_set close_on_exec_init;
struct embedded_fd_set open_fds_init;
- struct file * fd_array[NR_OPEN_DEFAULT];
+ struct file __rcu * fd_array[NR_OPEN_DEFAULT];
};
#define rcu_dereference_check_fdtable(files, fdtfd) \
#include <linux/fcntl.h>
+/* temporary stubs for BKL removal */
+#define lock_flocks() lock_kernel()
+#define unlock_flocks() unlock_kernel()
+
extern void send_sigio(struct fown_struct *fown, int fd, int band);
#ifdef CONFIG_FILE_LOCKING
* Saved mount options for lazy filesystems using
* generic_show_options()
*/
- char *s_options;
+ char __rcu *s_options;
};
extern struct timespec current_fs_time(struct super_block *sb);
unsigned int flags;
#ifdef CONFIG_PERF_EVENTS
- int perf_refcount;
- struct hlist_head *perf_events;
+ int perf_refcount;
+ struct hlist_head __percpu *perf_events;
#endif
};
extern int perf_trace_init(struct perf_event *event);
extern void perf_trace_destroy(struct perf_event *event);
-extern int perf_trace_enable(struct perf_event *event);
-extern void perf_trace_disable(struct perf_event *event);
+extern int perf_trace_add(struct perf_event *event, int flags);
+extern void perf_trace_del(struct perf_event *event, int flags);
extern int ftrace_profile_set_filter(struct perf_event *event, int event_id,
char *filter_str);
extern void ftrace_profile_free_filter(struct perf_event *event);
struct disk_part_tbl {
struct rcu_head rcu_head;
int len;
- struct hd_struct *last_lookup;
- struct hd_struct *part[];
+ struct hd_struct __rcu *last_lookup;
+ struct hd_struct __rcu *part[];
};
struct gendisk {
* non-critical accesses use RCU. Always access through
* helpers.
*/
- struct disk_part_tbl *part_tbl;
+ struct disk_part_tbl __rcu *part_tbl;
struct hd_struct part0;
const struct block_device_operations *fops;
#include <linux/errno.h>
struct device;
+struct gpio_chip;
/*
* Some platforms don't support the GPIO programming interface.
#define HARDIRQ_OFFSET (1UL << HARDIRQ_SHIFT)
#define NMI_OFFSET (1UL << NMI_SHIFT)
+#define SOFTIRQ_DISABLE_OFFSET (2 * SOFTIRQ_OFFSET)
+
#ifndef PREEMPT_ACTIVE
#define PREEMPT_ACTIVE_BITS 1
#define PREEMPT_ACTIVE_SHIFT (NMI_SHIFT + NMI_BITS)
/*
* Are we doing bottom half or hardware interrupt processing?
* Are we in a softirq context? Interrupt context?
+ * in_softirq - Are we currently processing softirq or have bh disabled?
+ * in_serving_softirq - Are we currently processing softirq?
*/
#define in_irq() (hardirq_count())
#define in_softirq() (softirq_count())
#define in_interrupt() (irq_count())
+#define in_serving_softirq() (softirq_count() & SOFTIRQ_OFFSET)
/*
* Are we in NMI context?
struct task_struct;
-#ifndef CONFIG_VIRT_CPU_ACCOUNTING
+#if !defined(CONFIG_VIRT_CPU_ACCOUNTING) && !defined(CONFIG_IRQ_TIME_ACCOUNTING)
static inline void account_system_vtime(struct task_struct *tsk)
{
}
+#else
+extern void account_system_vtime(struct task_struct *tsk);
#endif
#if defined(CONFIG_NO_HZ)
-#if defined(CONFIG_TINY_RCU)
+#if defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
extern void rcu_enter_nohz(void);
extern void rcu_exit_nohz(void);
* IRQ lines will appear. Similarly to gpio_base, the expander
* will create a block of irqs beginning at this number.
* This value is ignored if irq_summary is < 0.
+ * @reset_during_probe: If set to true, the driver will trigger a full
+ * reset of the chip at the beginning of the probe
+ * in order to place it in a known state.
*/
struct sx150x_platform_data {
unsigned gpio_base;
u16 io_polarity;
int irq_summary;
unsigned irq_base;
+ bool reset_during_probe;
};
#endif /* __LINUX_I2C_SX150X_H */
struct idr_layer {
unsigned long bitmap; /* A zero bit means "space here" */
- struct idr_layer *ary[1<<IDR_BITS];
+ struct idr_layer __rcu *ary[1<<IDR_BITS];
int count; /* When zero, we can release it */
int layer; /* distance from leaf */
struct rcu_head rcu_head;
};
struct idr {
- struct idr_layer *top;
+ struct idr_layer __rcu *top;
struct idr_layer *id_free;
int layers; /* only valid without concurrent changes */
int id_free_cnt;
# define CAP_INIT_BSET CAP_FULL_SET
#ifdef CONFIG_TREE_PREEMPT_RCU
+#define INIT_TASK_RCU_TREE_PREEMPT() \
+ .rcu_blocked_node = NULL,
+#else
+#define INIT_TASK_RCU_TREE_PREEMPT(tsk)
+#endif
+#ifdef CONFIG_PREEMPT_RCU
#define INIT_TASK_RCU_PREEMPT(tsk) \
.rcu_read_lock_nesting = 0, \
.rcu_read_unlock_special = 0, \
- .rcu_blocked_node = NULL, \
- .rcu_node_entry = LIST_HEAD_INIT(tsk.rcu_node_entry),
+ .rcu_node_entry = LIST_HEAD_INIT(tsk.rcu_node_entry), \
+ INIT_TASK_RCU_TREE_PREEMPT()
#else
#define INIT_TASK_RCU_PREEMPT(tsk)
#endif
.children = LIST_HEAD_INIT(tsk.children), \
.sibling = LIST_HEAD_INIT(tsk.sibling), \
.group_leader = &tsk, \
- .real_cred = &init_cred, \
- .cred = &init_cred, \
+ RCU_INIT_POINTER(.real_cred, &init_cred), \
+ RCU_INIT_POINTER(.cred, &init_cred), \
.cred_guard_mutex = \
__MUTEX_INITIALIZER(tsk.cred_guard_mutex), \
.comm = "swapper", \
int (*flush)(struct input_dev *dev, struct file *file);
int (*event)(struct input_dev *dev, unsigned int type, unsigned int code, int value);
- struct input_handle *grab;
+ struct input_handle __rcu *grab;
spinlock_t event_lock;
struct mutex mutex;
#include <asm/atomic.h>
#include <asm/ptrace.h>
#include <asm/system.h>
+#include <trace/events/irq.h>
/*
* These correspond to the IORESOURCE_IRQ_* defines in
asmlinkage void __do_softirq(void);
extern void open_softirq(int nr, void (*action)(struct softirq_action *));
extern void softirq_init(void);
-#define __raise_softirq_irqoff(nr) do { or_softirq_pending(1UL << (nr)); } while (0)
+static inline void __raise_softirq_irqoff(unsigned int nr)
+{
+ trace_softirq_raise((struct softirq_action *)(unsigned long)nr, NULL);
+ or_softirq_pending(1UL << nr);
+}
+
extern void raise_softirq_irqoff(unsigned int nr);
extern void raise_softirq(unsigned int nr);
extern void wakeup_softirqd(void);
}
/* Atomic map/unmap */
-static inline void *
+static inline void __iomem *
io_mapping_map_atomic_wc(struct io_mapping *mapping,
unsigned long offset,
int slot)
}
static inline void
-io_mapping_unmap_atomic(void *vaddr, int slot)
+io_mapping_unmap_atomic(void __iomem *vaddr, int slot)
{
iounmap_atomic(vaddr, slot);
}
-static inline void *
+static inline void __iomem *
io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
{
resource_size_t phys_addr;
}
static inline void
-io_mapping_unmap(void *vaddr)
+io_mapping_unmap(void __iomem *vaddr)
{
iounmap(vaddr);
}
static inline struct io_mapping *
io_mapping_create_wc(resource_size_t base, unsigned long size)
{
- return (struct io_mapping *) ioremap_wc(base, size);
+ return (struct io_mapping __force *) ioremap_wc(base, size);
}
static inline void
io_mapping_free(struct io_mapping *mapping)
{
- iounmap(mapping);
+ iounmap((void __force __iomem *) mapping);
}
/* Atomic map/unmap */
-static inline void *
+static inline void __iomem *
io_mapping_map_atomic_wc(struct io_mapping *mapping,
unsigned long offset,
int slot)
{
- return ((char *) mapping) + offset;
+ return ((char __force __iomem *) mapping) + offset;
}
static inline void
-io_mapping_unmap_atomic(void *vaddr, int slot)
+io_mapping_unmap_atomic(void __iomem *vaddr, int slot)
{
}
/* Non-atomic map/unmap */
-static inline void *
+static inline void __iomem *
io_mapping_map_wc(struct io_mapping *mapping, unsigned long offset)
{
- return ((char *) mapping) + offset;
+ return ((char __force __iomem *) mapping) + offset;
}
static inline void
-io_mapping_unmap(void *vaddr)
+io_mapping_unmap(void __iomem *vaddr)
{
}
struct radix_tree_root radix_root;
struct hlist_head cic_list;
- void *ioc_data;
+ void __rcu *ioc_data;
};
static inline struct io_context *ioc_task_link(struct io_context *ioc)
--- /dev/null
+#ifndef _LINUX_IRQ_WORK_H
+#define _LINUX_IRQ_WORK_H
+
+struct irq_work {
+ struct irq_work *next;
+ void (*func)(struct irq_work *);
+};
+
+static inline
+void init_irq_work(struct irq_work *entry, void (*func)(struct irq_work *))
+{
+ entry->next = NULL;
+ entry->func = func;
+}
+
+bool irq_work_queue(struct irq_work *entry);
+void irq_work_run(void);
+void irq_work_sync(struct irq_work *entry);
+
+#endif /* _LINUX_IRQ_WORK_H */
--- /dev/null
+#ifndef _LINUX_JUMP_LABEL_H
+#define _LINUX_JUMP_LABEL_H
+
+#if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_HAVE_ARCH_JUMP_LABEL)
+# include <asm/jump_label.h>
+# define HAVE_JUMP_LABEL
+#endif
+
+enum jump_label_type {
+ JUMP_LABEL_ENABLE,
+ JUMP_LABEL_DISABLE
+};
+
+struct module;
+
+#ifdef HAVE_JUMP_LABEL
+
+extern struct jump_entry __start___jump_table[];
+extern struct jump_entry __stop___jump_table[];
+
+extern void arch_jump_label_transform(struct jump_entry *entry,
+ enum jump_label_type type);
+extern void arch_jump_label_text_poke_early(jump_label_t addr);
+extern void jump_label_update(unsigned long key, enum jump_label_type type);
+extern void jump_label_apply_nops(struct module *mod);
+extern int jump_label_text_reserved(void *start, void *end);
+
+#define jump_label_enable(key) \
+ jump_label_update((unsigned long)key, JUMP_LABEL_ENABLE);
+
+#define jump_label_disable(key) \
+ jump_label_update((unsigned long)key, JUMP_LABEL_DISABLE);
+
+#else
+
+#define JUMP_LABEL(key, label) \
+do { \
+ if (unlikely(*key)) \
+ goto label; \
+} while (0)
+
+#define jump_label_enable(cond_var) \
+do { \
+ *(cond_var) = 1; \
+} while (0)
+
+#define jump_label_disable(cond_var) \
+do { \
+ *(cond_var) = 0; \
+} while (0)
+
+static inline int jump_label_apply_nops(struct module *mod)
+{
+ return 0;
+}
+
+static inline int jump_label_text_reserved(void *start, void *end)
+{
+ return 0;
+}
+
+#endif
+
+#define COND_STMT(key, stmt) \
+do { \
+ __label__ jl_enabled; \
+ JUMP_LABEL(key, jl_enabled); \
+ if (0) { \
+jl_enabled: \
+ stmt; \
+ } \
+} while (0)
+
+#endif
--- /dev/null
+#ifndef _LINUX_JUMP_LABEL_REF_H
+#define _LINUX_JUMP_LABEL_REF_H
+
+#include <linux/jump_label.h>
+#include <asm/atomic.h>
+
+#ifdef HAVE_JUMP_LABEL
+
+static inline void jump_label_inc(atomic_t *key)
+{
+ if (atomic_add_return(1, key) == 1)
+ jump_label_enable(key);
+}
+
+static inline void jump_label_dec(atomic_t *key)
+{
+ if (atomic_dec_and_test(key))
+ jump_label_disable(key);
+}
+
+#else /* !HAVE_JUMP_LABEL */
+
+static inline void jump_label_inc(atomic_t *key)
+{
+ atomic_inc(key);
+}
+
+static inline void jump_label_dec(atomic_t *key)
+{
+ atomic_dec(key);
+}
+
+#undef JUMP_LABEL
+#define JUMP_LABEL(key, label) \
+do { \
+ if (unlikely(__builtin_choose_expr( \
+ __builtin_types_compatible_p(typeof(key), atomic_t *), \
+ atomic_read((atomic_t *)(key)), *(key)))) \
+ goto label; \
+} while (0)
+
+#endif /* HAVE_JUMP_LABEL */
+
+#endif /* _LINUX_JUMP_LABEL_REF_H */
#define FIELD_SIZEOF(t, f) (sizeof(((t*)0)->f))
#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d))
-#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
+#define roundup(x, y) ( \
+{ \
+ typeof(y) __y = y; \
+ (((x) + (__y - 1)) / __y) * __y; \
+} \
+)
+#define rounddown(x, y) ( \
+{ \
+ typeof(x) __x = (x); \
+ __x - (__x % (y)); \
+} \
+)
#define DIV_ROUND_CLOSEST(x, divisor)( \
{ \
typeof(divisor) __divisor = divisor; \
*/
union {
unsigned long value;
+ void __rcu *rcudata;
void *data;
- struct keyring_list *subscriptions;
+ struct keyring_list __rcu *subscriptions;
} payload;
};
*/
#define kfifo_reset(fifo) \
(void)({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
__tmp->kfifo.in = __tmp->kfifo.out = 0; \
})
*/
#define kfifo_reset_out(fifo) \
(void)({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
__tmp->kfifo.out = __tmp->kfifo.in; \
})
*/
#define kfifo_len(fifo) \
({ \
- typeof(fifo + 1) __tmpl = (fifo); \
+ typeof((fifo) + 1) __tmpl = (fifo); \
__tmpl->kfifo.in - __tmpl->kfifo.out; \
})
*/
#define kfifo_is_empty(fifo) \
({ \
- typeof(fifo + 1) __tmpq = (fifo); \
+ typeof((fifo) + 1) __tmpq = (fifo); \
__tmpq->kfifo.in == __tmpq->kfifo.out; \
})
*/
#define kfifo_is_full(fifo) \
({ \
- typeof(fifo + 1) __tmpq = (fifo); \
+ typeof((fifo) + 1) __tmpq = (fifo); \
kfifo_len(__tmpq) > __tmpq->kfifo.mask; \
})
#define kfifo_avail(fifo) \
__kfifo_must_check_helper( \
({ \
- typeof(fifo + 1) __tmpq = (fifo); \
+ typeof((fifo) + 1) __tmpq = (fifo); \
const size_t __recsize = sizeof(*__tmpq->rectype); \
unsigned int __avail = kfifo_size(__tmpq) - kfifo_len(__tmpq); \
(__recsize) ? ((__avail <= __recsize) ? 0 : \
*/
#define kfifo_skip(fifo) \
(void)({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
if (__recsize) \
#define kfifo_peek_len(fifo) \
__kfifo_must_check_helper( \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
(!__recsize) ? kfifo_len(__tmp) * sizeof(*__tmp->type) : \
#define kfifo_alloc(fifo, size, gfp_mask) \
__kfifo_must_check_helper( \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
__is_kfifo_ptr(__tmp) ? \
__kfifo_alloc(__kfifo, size, sizeof(*__tmp->type), gfp_mask) : \
*/
#define kfifo_free(fifo) \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
if (__is_kfifo_ptr(__tmp)) \
__kfifo_free(__kfifo); \
*/
#define kfifo_init(fifo, buffer, size) \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
__is_kfifo_ptr(__tmp) ? \
__kfifo_init(__kfifo, buffer, size, sizeof(*__tmp->type)) : \
*/
#define kfifo_put(fifo, val) \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
- typeof(val + 1) __val = (val); \
+ typeof((fifo) + 1) __tmp = (fifo); \
+ typeof((val) + 1) __val = (val); \
unsigned int __ret; \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
#define kfifo_get(fifo, val) \
__kfifo_must_check_helper( \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
- typeof(val + 1) __val = (val); \
+ typeof((fifo) + 1) __tmp = (fifo); \
+ typeof((val) + 1) __val = (val); \
unsigned int __ret; \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
#define kfifo_peek(fifo, val) \
__kfifo_must_check_helper( \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
- typeof(val + 1) __val = (val); \
+ typeof((fifo) + 1) __tmp = (fifo); \
+ typeof((val) + 1) __val = (val); \
unsigned int __ret; \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
*/
#define kfifo_in(fifo, buf, n) \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
- typeof(buf + 1) __buf = (buf); \
+ typeof((fifo) + 1) __tmp = (fifo); \
+ typeof((buf) + 1) __buf = (buf); \
unsigned long __n = (n); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
#define kfifo_out(fifo, buf, n) \
__kfifo_must_check_helper( \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
- typeof(buf + 1) __buf = (buf); \
+ typeof((fifo) + 1) __tmp = (fifo); \
+ typeof((buf) + 1) __buf = (buf); \
unsigned long __n = (n); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
#define kfifo_from_user(fifo, from, len, copied) \
__kfifo_must_check_helper( \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
const void __user *__from = (from); \
unsigned int __len = (len); \
unsigned int *__copied = (copied); \
#define kfifo_to_user(fifo, to, len, copied) \
__kfifo_must_check_helper( \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
void __user *__to = (to); \
unsigned int __len = (len); \
unsigned int *__copied = (copied); \
*/
#define kfifo_dma_in_prepare(fifo, sgl, nents, len) \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
struct scatterlist *__sgl = (sgl); \
int __nents = (nents); \
unsigned int __len = (len); \
*/
#define kfifo_dma_in_finish(fifo, len) \
(void)({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
unsigned int __len = (len); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
*/
#define kfifo_dma_out_prepare(fifo, sgl, nents, len) \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
struct scatterlist *__sgl = (sgl); \
int __nents = (nents); \
unsigned int __len = (len); \
*/
#define kfifo_dma_out_finish(fifo, len) \
(void)({ \
- typeof(fifo + 1) __tmp = (fifo); \
+ typeof((fifo) + 1) __tmp = (fifo); \
unsigned int __len = (len); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
#define kfifo_out_peek(fifo, buf, n) \
__kfifo_must_check_helper( \
({ \
- typeof(fifo + 1) __tmp = (fifo); \
- typeof(buf + 1) __buf = (buf); \
+ typeof((fifo) + 1) __tmp = (fifo); \
+ typeof((buf) + 1) __buf = (buf); \
unsigned long __n = (n); \
const size_t __recsize = sizeof(*__tmp->rectype); \
struct __kfifo *__kfifo = &__tmp->kfifo; \
struct stable_node;
struct mem_cgroup;
+struct page *ksm_does_need_to_copy(struct page *page,
+ struct vm_area_struct *vma, unsigned long address);
+
#ifdef CONFIG_KSM
int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, unsigned long *vm_flags);
* We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE,
* but what if the vma was unmerged while the page was swapped out?
*/
-struct page *ksm_does_need_to_copy(struct page *page,
- struct vm_area_struct *vma, unsigned long address);
-static inline struct page *ksm_might_need_to_copy(struct page *page,
+static inline int ksm_might_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address)
{
struct anon_vma *anon_vma = page_anon_vma(page);
- if (!anon_vma ||
- (anon_vma->root == vma->anon_vma->root &&
- page->index == linear_page_index(vma, address)))
- return page;
-
- return ksm_does_need_to_copy(page, vma, address);
+ return anon_vma &&
+ (anon_vma->root != vma->anon_vma->root ||
+ page->index != linear_page_index(vma, address));
}
int page_referenced_ksm(struct page *page,
return 0;
}
-static inline struct page *ksm_might_need_to_copy(struct page *page,
+static inline int ksm_might_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address)
{
- return page;
+ return 0;
}
static inline int page_referenced_ksm(struct page *page,
struct mutex irq_lock;
#ifdef CONFIG_HAVE_KVM_IRQCHIP
- struct kvm_irq_routing_table *irq_routing;
+ struct kvm_irq_routing_table __rcu *irq_routing;
struct hlist_head mask_notifier_list;
struct hlist_head irq_ack_notifier_list;
#endif
int i; \
preempt_disable(); \
rwlock_acquire(&name##_lock_dep_map, 0, 0, _RET_IP_); \
- for_each_online_cpu(i) { \
+ for_each_possible_cpu(i) { \
arch_spinlock_t *lock; \
lock = &per_cpu(name##_lock, i); \
arch_spin_lock(lock); \
void name##_global_unlock(void) { \
int i; \
rwlock_release(&name##_lock_dep_map, 1, _RET_IP_); \
- for_each_online_cpu(i) { \
+ for_each_possible_cpu(i) { \
arch_spinlock_t *lock; \
lock = &per_cpu(name##_lock, i); \
arch_spin_unlock(lock); \
ATA_EHI_HOTPLUGGED = (1 << 0), /* could have been hotplugged */
ATA_EHI_NO_AUTOPSY = (1 << 2), /* no autopsy */
ATA_EHI_QUIET = (1 << 3), /* be quiet */
+ ATA_EHI_NO_RECOVERY = (1 << 4), /* no recovery */
ATA_EHI_DID_SOFTRESET = (1 << 16), /* already soft-reset this port */
ATA_EHI_DID_HARDRESET = (1 << 17), /* already soft-reset this port */
struct ata_ioports ioaddr; /* ATA cmd/ctl/dma register blocks */
u8 ctl; /* cache of ATA control register */
u8 last_ctl; /* Cache last written value */
+ struct ata_link* sff_pio_task_link; /* link currently used */
struct delayed_work sff_pio_task;
#ifdef CONFIG_ATA_BMDMA
struct ata_bmdma_prd *bmdma_prd; /* BMDMA SG list */
extern void ata_sff_irq_clear(struct ata_port *ap);
extern int ata_sff_hsm_move(struct ata_port *ap, struct ata_queued_cmd *qc,
u8 status, int in_wq);
-extern void ata_sff_queue_pio_task(struct ata_port *ap, unsigned long delay);
+extern void ata_sff_queue_pio_task(struct ata_link *link, unsigned long delay);
extern unsigned int ata_sff_qc_issue(struct ata_queued_cmd *qc);
extern bool ata_sff_qc_fill_rtf(struct ata_queued_cmd *qc);
extern unsigned int ata_sff_port_intr(struct ata_port *ap,
#define MAX_LOCKDEP_SUBCLASSES 8UL
/*
+ * NR_LOCKDEP_CACHING_CLASSES ... Number of classes
+ * cached in the instance of lockdep_map
+ *
+ * Currently main class (subclass == 0) and signle depth subclass
+ * are cached in lockdep_map. This optimization is mainly targeting
+ * on rq->lock. double_rq_lock() acquires this highly competitive with
+ * single depth.
+ */
+#define NR_LOCKDEP_CACHING_CLASSES 2
+
+/*
* Lock-classes are keyed via unique addresses, by embedding the
* lockclass-key into the kernel (or module) .data section. (For
* static locks we use the lock address itself as the key.)
*/
struct lockdep_map {
struct lock_class_key *key;
- struct lock_class *class_cache;
+ struct lock_class *class_cache[NR_LOCKDEP_CACHING_CLASSES];
const char *name;
#ifdef CONFIG_LOCK_STAT
int cpu;
int set_page_dirty_lock(struct page *page);
int clear_page_dirty_for_io(struct page *page);
+/* Is the vma a continuation of the stack vma above it? */
+static inline int vma_stack_continue(struct vm_area_struct *vma, unsigned long addr)
+{
+ return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
+}
+
extern unsigned long move_page_tables(struct vm_area_struct *vma,
unsigned long old_addr, struct vm_area_struct *new_vma,
unsigned long new_addr, unsigned long len);
* new_owner->mm == mm
* new_owner->alloc_lock is held
*/
- struct task_struct *owner;
+ struct task_struct __rcu *owner;
#endif
#ifdef CONFIG_PROC_FS
* [8:0] Byte/block count
*/
+#define R4_MEMORY_PRESENT (1 << 27)
+
/*
SDIO status in R5
Type
unsigned long watermark[NR_WMARK];
/*
+ * When free pages are below this point, additional steps are taken
+ * when reading the number of free pages to avoid per-cpu counter
+ * drift allowing watermarks to be breached
+ */
+ unsigned long percpu_drift_mark;
+
+ /*
* We don't know if the memory that we're going to allocate will be freeable
* or/and it will be released eventually, so to avoid totally wasting several
* GB of ram we must reserve some of the lower zone memory (otherwise we risk
return test_bit(ZONE_OOM_LOCKED, &zone->flags);
}
+#ifdef CONFIG_SMP
+unsigned long zone_nr_free_pages(struct zone *zone);
+#else
+#define zone_nr_free_pages(zone) zone_page_state(zone, NR_FREE_PAGES)
+#endif /* CONFIG_SMP */
+
/*
* The "priority" of VM scanning is how much of the queues we will scan in one
* go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the
struct tracepoint *tracepoints;
unsigned int num_tracepoints;
#endif
-
+#ifdef HAVE_JUMP_LABEL
+ struct jump_entry *jump_entries;
+ unsigned int num_jump_entries;
+#endif
#ifdef CONFIG_TRACING
const char **trace_bprintk_fmt_start;
unsigned int num_trace_bprintk_fmt;
#ifdef CONFIG_GENERIC_BUG
-int module_bug_finalize(const Elf_Ehdr *, const Elf_Shdr *,
+void module_bug_finalize(const Elf_Ehdr *, const Elf_Shdr *,
struct module *);
void module_bug_cleanup(struct module *);
#else /* !CONFIG_GENERIC_BUG */
-static inline int module_bug_finalize(const Elf_Ehdr *hdr,
+static inline void module_bug_finalize(const Elf_Ehdr *hdr,
const Elf_Shdr *sechdrs,
struct module *mod)
{
- return 0;
}
static inline void module_bug_cleanup(struct module *mod) {}
#endif /* CONFIG_GENERIC_BUG */
# include <linux/mutex-debug.h>
#else
# define __DEBUG_MUTEX_INITIALIZER(lockname)
+/**
+ * mutex_init - initialize the mutex
+ * @mutex: the mutex to be initialized
+ *
+ * Initialize the mutex to unlocked state.
+ *
+ * It is not allowed to initialize an already locked mutex.
+ */
# define mutex_init(mutex) \
do { \
static struct lock_class_key __key; \
CTA_TUPLE_MASTER,
CTA_NAT_SEQ_ADJ_ORIG,
CTA_NAT_SEQ_ADJ_REPLY,
- CTA_SECMARK,
+ CTA_SECMARK, /* obsolete */
CTA_ZONE,
+ CTA_SECCTX,
__CTA_MAX
};
#define CTA_MAX (__CTA_MAX - 1)
};
#define CTA_HELP_MAX (__CTA_HELP_MAX - 1)
+enum ctattr_secctx {
+ CTA_SECCTX_UNSPEC,
+ CTA_SECCTX_NAME,
+ __CTA_SECCTX_MAX
+};
+#define CTA_SECCTX_MAX (__CTA_SECCTX_MAX - 1)
+
#endif /* _IPCONNTRACK_NETLINK_H */
* packets are being marked for.
*/
#define SECMARK_MODE_SEL 0x01 /* SELinux */
-#define SECMARK_SELCTX_MAX 256
-
-struct xt_secmark_target_selinux_info {
- __u32 selsid;
- char selctx[SECMARK_SELCTX_MAX];
-};
+#define SECMARK_SECCTX_MAX 256
struct xt_secmark_target_info {
__u8 mode;
- union {
- struct xt_secmark_target_selinux_info sel;
- } u;
+ __u32 secid;
+ char secctx[SECMARK_SECCTX_MAX];
};
#endif /*_XT_SECMARK_H_target */
#define MAX_LINKS 32
-struct net;
-
struct sockaddr_nl {
sa_family_t nl_family; /* AF_NETLINK */
unsigned short nl_pad; /* zero */
#include <linux/capability.h>
#include <linux/skbuff.h>
+struct net;
+
static inline struct nlmsghdr *nlmsg_hdr(const struct sk_buff *skb)
{
return (struct nlmsghdr *)skb->data;
unsigned long flags;
bool ret = false;
- rcu_read_lock_bh();
+ local_irq_save(flags);
npinfo = rcu_dereference_bh(skb->dev->npinfo);
if (!npinfo || (list_empty(&npinfo->rx_np) && !npinfo->rx_flags))
goto out;
- spin_lock_irqsave(&npinfo->rx_lock, flags);
+ spin_lock(&npinfo->rx_lock);
/* check rx_flags again with the lock held */
if (npinfo->rx_flags && __netpoll_rx(skb))
ret = true;
- spin_unlock_irqrestore(&npinfo->rx_lock, flags);
+ spin_unlock(&npinfo->rx_lock);
out:
- rcu_read_unlock_bh();
+ local_irq_restore(flags);
return ret;
}
struct nfs4_cached_acl *nfs4_acl;
/* NFSv4 state */
struct list_head open_states;
- struct nfs_delegation *delegation;
+ struct nfs_delegation __rcu *delegation;
fmode_t delegation_state;
struct rw_semaphore rwsem;
#endif /* CONFIG_NFS_V4*/
struct notifier_block {
int (*notifier_call)(struct notifier_block *, unsigned long, void *);
- struct notifier_block *next;
+ struct notifier_block __rcu *next;
int priority;
};
struct atomic_notifier_head {
spinlock_t lock;
- struct notifier_block *head;
+ struct notifier_block __rcu *head;
};
struct blocking_notifier_head {
struct rw_semaphore rwsem;
- struct notifier_block *head;
+ struct notifier_block __rcu *head;
};
struct raw_notifier_head {
- struct notifier_block *head;
+ struct notifier_block __rcu *head;
};
struct srcu_notifier_head {
struct mutex mutex;
struct srcu_struct srcu;
- struct notifier_block *head;
+ struct notifier_block __rcu *head;
};
#define ATOMIC_INIT_NOTIFIER_HEAD(name) do { \
#include <linux/types.h>
#include <linux/spinlock.h>
+#include <linux/init.h>
#include <asm/atomic.h>
/* Each escaped entry is prefixed by ESCAPE_CODE
int oprofile_add_data64(struct op_entry *entry, u64 val);
int oprofile_write_commit(struct op_entry *entry);
+#ifdef CONFIG_PERF_EVENTS
+int __init oprofile_perf_init(struct oprofile_operations *ops);
+void oprofile_perf_exit(void);
+char *op_name_from_perf_id(void);
+#endif /* CONFIG_PERF_EVENTS */
+
#endif /* OPROFILE_H */
#define PCI_DEVICE_ID_VLSI_82C147 0x0105
#define PCI_DEVICE_ID_VLSI_VAS96011 0x0702
+/* AMD RD890 Chipset */
+#define PCI_DEVICE_ID_RD890_IOMMU 0x5a23
+
#define PCI_VENDOR_ID_ADL 0x1005
#define PCI_DEVICE_ID_ADL_2301 0x2301
#define PCI_DEVICE_ID_AMD_11H_NB_DRAM 0x1302
#define PCI_DEVICE_ID_AMD_11H_NB_MISC 0x1303
#define PCI_DEVICE_ID_AMD_11H_NB_LINK 0x1304
+#define PCI_DEVICE_ID_AMD_15H_NB_MISC 0x1603
#define PCI_DEVICE_ID_AMD_LANCE 0x2000
#define PCI_DEVICE_ID_AMD_LANCE_HOME 0x2001
#define PCI_DEVICE_ID_AMD_SCSI 0x2020
__aligned(PAGE_SIZE)
/*
+ * Declaration/definition used for per-CPU variables that must be read mostly.
+ */
+#define DECLARE_PER_CPU_READ_MOSTLY(type, name) \
+ DECLARE_PER_CPU_SECTION(type, name, "..readmostly")
+
+#define DEFINE_PER_CPU_READ_MOSTLY(type, name) \
+ DEFINE_PER_CPU_SECTION(type, name, "..readmostly")
+
+/*
* Intermodule exports for per-CPU variables. sparse forgets about
* address space across EXPORT_SYMBOL(), change EXPORT_SYMBOL() to
* noop if __CHECKER__.
preempt_enable(); \
} while (0)
+#define get_cpu_ptr(var) ({ \
+ preempt_disable(); \
+ this_cpu_ptr(var); })
+
+#define put_cpu_ptr(var) do { \
+ (void)(var); \
+ preempt_enable(); \
+} while (0)
+
#ifdef CONFIG_SMP
/* minimum unit size, also is the maximum supported allocation size */
#include <linux/workqueue.h>
#include <linux/ftrace.h>
#include <linux/cpu.h>
+#include <linux/irq_work.h>
+#include <linux/jump_label_ref.h>
#include <asm/atomic.h>
#include <asm/local.h>
int last_cpu;
};
struct { /* software */
- s64 remaining;
struct hrtimer hrtimer;
};
#ifdef CONFIG_HAVE_HW_BREAKPOINT
struct { /* breakpoint */
struct arch_hw_breakpoint info;
struct list_head bp_list;
+ /*
+ * Crufty hack to avoid the chicken and egg
+ * problem hw_breakpoint has with context
+ * creation and event initalization.
+ */
+ struct task_struct *bp_target;
};
#endif
};
+ int state;
local64_t prev_count;
u64 sample_period;
u64 last_period;
#endif
};
+/*
+ * hw_perf_event::state flags
+ */
+#define PERF_HES_STOPPED 0x01 /* the counter is stopped */
+#define PERF_HES_UPTODATE 0x02 /* event->count up-to-date */
+#define PERF_HES_ARCH 0x04
+
struct perf_event;
/*
* struct pmu - generic performance monitoring unit
*/
struct pmu {
- int (*enable) (struct perf_event *event);
- void (*disable) (struct perf_event *event);
- int (*start) (struct perf_event *event);
- void (*stop) (struct perf_event *event);
- void (*read) (struct perf_event *event);
- void (*unthrottle) (struct perf_event *event);
+ struct list_head entry;
+
+ int * __percpu pmu_disable_count;
+ struct perf_cpu_context * __percpu pmu_cpu_context;
+ int task_ctx_nr;
+
+ /*
+ * Fully disable/enable this PMU, can be used to protect from the PMI
+ * as well as for lazy/batch writing of the MSRs.
+ */
+ void (*pmu_enable) (struct pmu *pmu); /* optional */
+ void (*pmu_disable) (struct pmu *pmu); /* optional */
/*
- * Group events scheduling is treated as a transaction, add group
- * events as a whole and perform one schedulability test. If the test
- * fails, roll back the whole group
+ * Try and initialize the event for this PMU.
+ * Should return -ENOENT when the @event doesn't match this PMU.
*/
+ int (*event_init) (struct perf_event *event);
+
+#define PERF_EF_START 0x01 /* start the counter when adding */
+#define PERF_EF_RELOAD 0x02 /* reload the counter when starting */
+#define PERF_EF_UPDATE 0x04 /* update the counter when stopping */
/*
- * Start the transaction, after this ->enable() doesn't need
- * to do schedulability tests.
+ * Adds/Removes a counter to/from the PMU, can be done inside
+ * a transaction, see the ->*_txn() methods.
*/
- void (*start_txn) (const struct pmu *pmu);
+ int (*add) (struct perf_event *event, int flags);
+ void (*del) (struct perf_event *event, int flags);
+
/*
- * If ->start_txn() disabled the ->enable() schedulability test
+ * Starts/Stops a counter present on the PMU. The PMI handler
+ * should stop the counter when perf_event_overflow() returns
+ * !0. ->start() will be used to continue.
+ */
+ void (*start) (struct perf_event *event, int flags);
+ void (*stop) (struct perf_event *event, int flags);
+
+ /*
+ * Updates the counter value of the event.
+ */
+ void (*read) (struct perf_event *event);
+
+ /*
+ * Group events scheduling is treated as a transaction, add
+ * group events as a whole and perform one schedulability test.
+ * If the test fails, roll back the whole group
+ *
+ * Start the transaction, after this ->add() doesn't need to
+ * do schedulability tests.
+ */
+ void (*start_txn) (struct pmu *pmu); /* optional */
+ /*
+ * If ->start_txn() disabled the ->add() schedulability test
* then ->commit_txn() is required to perform one. On success
* the transaction is closed. On error the transaction is kept
* open until ->cancel_txn() is called.
*/
- int (*commit_txn) (const struct pmu *pmu);
+ int (*commit_txn) (struct pmu *pmu); /* optional */
/*
- * Will cancel the transaction, assumes ->disable() is called for
- * each successfull ->enable() during the transaction.
+ * Will cancel the transaction, assumes ->del() is called
+ * for each successfull ->add() during the transaction.
*/
- void (*cancel_txn) (const struct pmu *pmu);
+ void (*cancel_txn) (struct pmu *pmu); /* optional */
};
/**
void *data_pages[0];
};
-struct perf_pending_entry {
- struct perf_pending_entry *next;
- void (*func)(struct perf_pending_entry *);
-};
-
struct perf_sample_data;
typedef void (*perf_overflow_handler_t)(struct perf_event *, int,
#define PERF_ATTACH_CONTEXT 0x01
#define PERF_ATTACH_GROUP 0x02
+#define PERF_ATTACH_TASK 0x04
/**
* struct perf_event - performance event kernel representation:
int nr_siblings;
int group_flags;
struct perf_event *group_leader;
- const struct pmu *pmu;
+ struct pmu *pmu;
enum perf_event_active_state state;
unsigned int attach_state;
int pending_wakeup;
int pending_kill;
int pending_disable;
- struct perf_pending_entry pending;
+ struct irq_work pending;
atomic_t event_limit;
#endif /* CONFIG_PERF_EVENTS */
};
+enum perf_event_context_type {
+ task_context,
+ cpu_context,
+};
+
/**
* struct perf_event_context - event context structure
*
* Used as a container for task events and CPU events as well:
*/
struct perf_event_context {
+ enum perf_event_context_type type;
+ struct pmu *pmu;
/*
* Protect the states of the events in the list,
* nr_active, and the list:
struct rcu_head rcu_head;
};
+/*
+ * Number of contexts where an event can trigger:
+ * task, softirq, hardirq, nmi.
+ */
+#define PERF_NR_CONTEXTS 4
+
/**
* struct perf_event_cpu_context - per cpu event context structure
*/
struct perf_event_context ctx;
struct perf_event_context *task_ctx;
int active_oncpu;
- int max_pertask;
int exclusive;
- struct swevent_hlist *swevent_hlist;
- struct mutex hlist_mutex;
- int hlist_refcount;
-
- /*
- * Recursion avoidance:
- *
- * task, softirq, irq, nmi context
- */
- int recursion[4];
+ struct list_head rotation_list;
+ int jiffies_interval;
};
struct perf_output_handle {
#ifdef CONFIG_PERF_EVENTS
-/*
- * Set by architecture code:
- */
-extern int perf_max_events;
+extern int perf_pmu_register(struct pmu *pmu);
+extern void perf_pmu_unregister(struct pmu *pmu);
+
+extern int perf_num_counters(void);
+extern const char *perf_pmu_name(void);
+extern void __perf_event_task_sched_in(struct task_struct *task);
+extern void __perf_event_task_sched_out(struct task_struct *task, struct task_struct *next);
-extern const struct pmu *hw_perf_event_init(struct perf_event *event);
+extern atomic_t perf_task_events;
+
+static inline void perf_event_task_sched_in(struct task_struct *task)
+{
+ COND_STMT(&perf_task_events, __perf_event_task_sched_in(task));
+}
+
+static inline
+void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next)
+{
+ COND_STMT(&perf_task_events, __perf_event_task_sched_out(task, next));
+}
-extern void perf_event_task_sched_in(struct task_struct *task);
-extern void perf_event_task_sched_out(struct task_struct *task, struct task_struct *next);
-extern void perf_event_task_tick(struct task_struct *task);
extern int perf_event_init_task(struct task_struct *child);
extern void perf_event_exit_task(struct task_struct *child);
extern void perf_event_free_task(struct task_struct *task);
-extern void set_perf_event_pending(void);
-extern void perf_event_do_pending(void);
+extern void perf_event_delayed_put(struct task_struct *task);
extern void perf_event_print_debug(void);
-extern void __perf_disable(void);
-extern bool __perf_enable(void);
-extern void perf_disable(void);
-extern void perf_enable(void);
+extern void perf_pmu_disable(struct pmu *pmu);
+extern void perf_pmu_enable(struct pmu *pmu);
extern int perf_event_task_disable(void);
extern int perf_event_task_enable(void);
extern void perf_event_update_userpage(struct perf_event *event);
extern struct perf_event *
perf_event_create_kernel_counter(struct perf_event_attr *attr,
int cpu,
- pid_t pid,
+ struct task_struct *task,
perf_overflow_handler_t callback);
extern u64 perf_event_read_value(struct perf_event *event,
u64 *enabled, u64 *running);
*/
static inline int is_software_event(struct perf_event *event)
{
- switch (event->attr.type) {
- case PERF_TYPE_SOFTWARE:
- case PERF_TYPE_TRACEPOINT:
- /* for now the breakpoint stuff also works as software event */
- case PERF_TYPE_BREAKPOINT:
- return 1;
- }
- return 0;
+ return event->pmu->task_ctx_nr == perf_sw_context;
}
extern atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
perf_arch_fetch_caller_regs(regs, CALLER_ADDR0);
}
-static inline void
+static __always_inline void
perf_sw_event(u32 event_id, u64 nr, int nmi, struct pt_regs *regs, u64 addr)
{
- if (atomic_read(&perf_swevent_enabled[event_id])) {
- struct pt_regs hot_regs;
-
- if (!regs) {
- perf_fetch_caller_regs(&hot_regs);
- regs = &hot_regs;
- }
- __perf_sw_event(event_id, nr, nmi, regs, addr);
+ struct pt_regs hot_regs;
+
+ JUMP_LABEL(&perf_swevent_enabled[event_id], have_event);
+ return;
+
+have_event:
+ if (!regs) {
+ perf_fetch_caller_regs(&hot_regs);
+ regs = &hot_regs;
}
+ __perf_sw_event(event_id, nr, nmi, regs, addr);
}
extern void perf_event_mmap(struct vm_area_struct *vma);
extern void perf_event_comm(struct task_struct *tsk);
extern void perf_event_fork(struct task_struct *tsk);
-extern struct perf_callchain_entry *perf_callchain(struct pt_regs *regs);
+/* Callchains */
+DECLARE_PER_CPU(struct perf_callchain_entry, perf_callchain_entry);
+
+extern void perf_callchain_user(struct perf_callchain_entry *entry,
+ struct pt_regs *regs);
+extern void perf_callchain_kernel(struct perf_callchain_entry *entry,
+ struct pt_regs *regs);
+
+
+static inline void
+perf_callchain_store(struct perf_callchain_entry *entry, u64 ip)
+{
+ if (entry->nr < PERF_MAX_STACK_DEPTH)
+ entry->ip[entry->nr++] = ip;
+}
extern int sysctl_perf_event_paranoid;
extern int sysctl_perf_event_mlock;
extern void perf_swevent_put_recursion_context(int rctx);
extern void perf_event_enable(struct perf_event *event);
extern void perf_event_disable(struct perf_event *event);
+extern void perf_event_task_tick(void);
#else
static inline void
perf_event_task_sched_in(struct task_struct *task) { }
static inline void
perf_event_task_sched_out(struct task_struct *task,
struct task_struct *next) { }
-static inline void
-perf_event_task_tick(struct task_struct *task) { }
static inline int perf_event_init_task(struct task_struct *child) { return 0; }
static inline void perf_event_exit_task(struct task_struct *child) { }
static inline void perf_event_free_task(struct task_struct *task) { }
-static inline void perf_event_do_pending(void) { }
+static inline void perf_event_delayed_put(struct task_struct *task) { }
static inline void perf_event_print_debug(void) { }
-static inline void perf_disable(void) { }
-static inline void perf_enable(void) { }
static inline int perf_event_task_disable(void) { return -EINVAL; }
static inline int perf_event_task_enable(void) { return -EINVAL; }
static inline void perf_swevent_put_recursion_context(int rctx) { }
static inline void perf_event_enable(struct perf_event *event) { }
static inline void perf_event_disable(struct perf_event *event) { }
+static inline void perf_event_task_tick(void) { }
#endif
#define perf_output_put(handle, x) \
int ret;
ret = dquot_alloc_space_nodirty(inode, nr);
- if (!ret)
- mark_inode_dirty_sync(inode);
+ if (!ret) {
+ /*
+ * Mark inode fully dirty. Since we are allocating blocks, inode
+ * would become fully dirty soon anyway and it reportedly
+ * reduces inode_lock contention.
+ */
+ mark_inode_dirty(inode);
+ }
return ret;
}
{
return (void *)((unsigned long)ptr & ~RADIX_TREE_INDIRECT_PTR);
}
+#define radix_tree_indirect_to_ptr(ptr) \
+ radix_tree_indirect_to_ptr((void __force *)(ptr))
static inline int radix_tree_is_indirect_ptr(void *ptr)
{
struct radix_tree_root {
unsigned int height;
gfp_t gfp_mask;
- struct radix_tree_node *rnode;
+ struct radix_tree_node __rcu *rnode;
};
#define RADIX_TREE_INIT(mask) { \
#include <linux/rcupdate.h>
/*
+ * Why is there no list_empty_rcu()? Because list_empty() serves this
+ * purpose. The list_empty() function fetches the RCU-protected pointer
+ * and compares it to the address of the list head, but neither dereferences
+ * this pointer itself nor provides this pointer to the caller. Therefore,
+ * it is not necessary to use rcu_dereference(), so that list_empty() can
+ * be used anywhere you would want to use a list_empty_rcu().
+ */
+
+/*
+ * return the ->next pointer of a list_head in an rcu safe
+ * way, we must not access it directly
+ */
+#define list_next_rcu(list) (*((struct list_head __rcu **)(&(list)->next)))
+
+/*
* Insert a new entry between two known consecutive entries.
*
* This is only for internal list manipulation where we know
{
new->next = next;
new->prev = prev;
- rcu_assign_pointer(prev->next, new);
+ rcu_assign_pointer(list_next_rcu(prev), new);
next->prev = new;
}
{
new->next = old->next;
new->prev = old->prev;
- rcu_assign_pointer(new->prev->next, new);
+ rcu_assign_pointer(list_next_rcu(new->prev), new);
new->next->prev = new;
old->prev = LIST_POISON2;
}
*/
last->next = at;
- rcu_assign_pointer(head->next, first);
+ rcu_assign_pointer(list_next_rcu(head), first);
first->prev = head;
at->prev = last;
}
* primitives such as list_add_rcu() as long as it's guarded by rcu_read_lock().
*/
#define list_entry_rcu(ptr, type, member) \
- container_of(rcu_dereference_raw(ptr), type, member)
+ ({typeof (*ptr) __rcu *__ptr = (typeof (*ptr) __rcu __force *)ptr; \
+ container_of((typeof(ptr))rcu_dereference_raw(__ptr), type, member); \
+ })
/**
* list_first_entry_rcu - get the first element from a list
list_entry_rcu((ptr)->next, type, member)
#define __list_for_each_rcu(pos, head) \
- for (pos = rcu_dereference_raw((head)->next); \
+ for (pos = rcu_dereference_raw(list_next_rcu(head)); \
pos != (head); \
- pos = rcu_dereference_raw(pos->next))
+ pos = rcu_dereference_raw(list_next_rcu((pos)))
/**
* list_for_each_entry_rcu - iterate over rcu list of given type
* as long as the traversal is guarded by rcu_read_lock().
*/
#define list_for_each_continue_rcu(pos, head) \
- for ((pos) = rcu_dereference_raw((pos)->next); \
+ for ((pos) = rcu_dereference_raw(list_next_rcu(pos)); \
prefetch((pos)->next), (pos) != (head); \
- (pos) = rcu_dereference_raw((pos)->next))
+ (pos) = rcu_dereference_raw(list_next_rcu(pos)))
/**
* list_for_each_entry_continue_rcu - continue iteration over list of given type
new->next = next;
new->pprev = old->pprev;
- rcu_assign_pointer(*new->pprev, new);
+ rcu_assign_pointer(*(struct hlist_node __rcu **)new->pprev, new);
if (next)
new->next->pprev = &new->next;
old->pprev = LIST_POISON2;
}
+/*
+ * return the first or the next element in an RCU protected hlist
+ */
+#define hlist_first_rcu(head) (*((struct hlist_node __rcu **)(&(head)->first)))
+#define hlist_next_rcu(node) (*((struct hlist_node __rcu **)(&(node)->next)))
+#define hlist_pprev_rcu(node) (*((struct hlist_node __rcu **)((node)->pprev)))
+
/**
* hlist_add_head_rcu
* @n: the element to add to the hash list.
n->next = first;
n->pprev = &h->first;
- rcu_assign_pointer(h->first, n);
+ rcu_assign_pointer(hlist_first_rcu(h), n);
if (first)
first->pprev = &n->next;
}
{
n->pprev = next->pprev;
n->next = next;
- rcu_assign_pointer(*(n->pprev), n);
+ rcu_assign_pointer(hlist_pprev_rcu(n), n);
next->pprev = &n->next;
}
{
n->next = prev->next;
n->pprev = &prev->next;
- rcu_assign_pointer(prev->next, n);
+ rcu_assign_pointer(hlist_next_rcu(prev), n);
if (n->next)
n->next->pprev = &n->next;
}
-#define __hlist_for_each_rcu(pos, head) \
- for (pos = rcu_dereference((head)->first); \
- pos && ({ prefetch(pos->next); 1; }); \
- pos = rcu_dereference(pos->next))
+#define __hlist_for_each_rcu(pos, head) \
+ for (pos = rcu_dereference(hlist_first_rcu(head)); \
+ pos && ({ prefetch(pos->next); 1; }); \
+ pos = rcu_dereference(hlist_next_rcu(pos)))
/**
* hlist_for_each_entry_rcu - iterate over rcu list of given type
* the _rcu list-mutation primitives such as hlist_add_head_rcu()
* as long as the traversal is guarded by rcu_read_lock().
*/
-#define hlist_for_each_entry_rcu(tpos, pos, head, member) \
- for (pos = rcu_dereference_raw((head)->first); \
+#define hlist_for_each_entry_rcu(tpos, pos, head, member) \
+ for (pos = rcu_dereference_raw(hlist_first_rcu(head)); \
pos && ({ prefetch(pos->next); 1; }) && \
({ tpos = hlist_entry(pos, typeof(*tpos), member); 1; }); \
- pos = rcu_dereference_raw(pos->next))
+ pos = rcu_dereference_raw(hlist_next_rcu(pos)))
/**
* hlist_for_each_entry_rcu_bh - iterate over rcu list of given type
}
}
+#define hlist_nulls_first_rcu(head) \
+ (*((struct hlist_nulls_node __rcu __force **)&(head)->first))
+
+#define hlist_nulls_next_rcu(node) \
+ (*((struct hlist_nulls_node __rcu __force **)&(node)->next))
+
/**
* hlist_nulls_del_rcu - deletes entry from hash list without re-initialization
* @n: the element to delete from the hash list.
n->next = first;
n->pprev = &h->first;
- rcu_assign_pointer(h->first, n);
+ rcu_assign_pointer(hlist_nulls_first_rcu(h), n);
if (!is_a_nulls(first))
first->pprev = &n->next;
}
* @member: the name of the hlist_nulls_node within the struct.
*
*/
-#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
- for (pos = rcu_dereference_raw((head)->first); \
- (!is_a_nulls(pos)) && \
+#define hlist_nulls_for_each_entry_rcu(tpos, pos, head, member) \
+ for (pos = rcu_dereference_raw(hlist_nulls_first_rcu(head)); \
+ (!is_a_nulls(pos)) && \
({ tpos = hlist_nulls_entry(pos, typeof(*tpos), member); 1; }); \
- pos = rcu_dereference_raw(pos->next))
+ pos = rcu_dereference_raw(hlist_nulls_next_rcu(pos)))
#endif
#endif
#include <linux/lockdep.h>
#include <linux/completion.h>
#include <linux/debugobjects.h>
+#include <linux/compiler.h>
#ifdef CONFIG_RCU_TORTURE_TEST
extern int rcutorture_runnable; /* for sysctl */
#endif /* #ifdef CONFIG_RCU_TORTURE_TEST */
+#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
+#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
+
/**
* struct rcu_head - callback structure for use with RCU
* @next: next update requests in a list
};
/* Exported common interfaces */
-extern void rcu_barrier(void);
+extern void call_rcu_sched(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu));
+extern void synchronize_sched(void);
extern void rcu_barrier_bh(void);
extern void rcu_barrier_sched(void);
extern void synchronize_sched_expedited(void);
extern int sched_expedited_torture_stats(char *page);
+static inline void __rcu_read_lock_bh(void)
+{
+ local_bh_disable();
+}
+
+static inline void __rcu_read_unlock_bh(void)
+{
+ local_bh_enable();
+}
+
+#ifdef CONFIG_PREEMPT_RCU
+
+extern void __rcu_read_lock(void);
+extern void __rcu_read_unlock(void);
+void synchronize_rcu(void);
+
+/*
+ * Defined as a macro as it is a very low level header included from
+ * areas that don't even know about current. This gives the rcu_read_lock()
+ * nesting depth, but makes sense only if CONFIG_PREEMPT_RCU -- in other
+ * types of kernel builds, the rcu_read_lock() nesting depth is unknowable.
+ */
+#define rcu_preempt_depth() (current->rcu_read_lock_nesting)
+
+#else /* #ifdef CONFIG_PREEMPT_RCU */
+
+static inline void __rcu_read_lock(void)
+{
+ preempt_disable();
+}
+
+static inline void __rcu_read_unlock(void)
+{
+ preempt_enable();
+}
+
+static inline void synchronize_rcu(void)
+{
+ synchronize_sched();
+}
+
+static inline int rcu_preempt_depth(void)
+{
+ return 0;
+}
+
+#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+
/* Internal to kernel */
extern void rcu_init(void);
+extern void rcu_sched_qs(int cpu);
+extern void rcu_bh_qs(int cpu);
+extern void rcu_check_callbacks(int cpu, int user);
+struct notifier_block;
+
+#ifdef CONFIG_NO_HZ
+
+extern void rcu_enter_nohz(void);
+extern void rcu_exit_nohz(void);
+
+#else /* #ifdef CONFIG_NO_HZ */
+
+static inline void rcu_enter_nohz(void)
+{
+}
+
+static inline void rcu_exit_nohz(void)
+{
+}
+
+#endif /* #else #ifdef CONFIG_NO_HZ */
#if defined(CONFIG_TREE_RCU) || defined(CONFIG_TREE_PREEMPT_RCU)
#include <linux/rcutree.h>
-#elif defined(CONFIG_TINY_RCU)
+#elif defined(CONFIG_TINY_RCU) || defined(CONFIG_TINY_PREEMPT_RCU)
#include <linux/rcutiny.h>
#else
#error "Unknown RCU implementation specified to kernel configuration"
#endif
-#define RCU_HEAD_INIT { .next = NULL, .func = NULL }
-#define RCU_HEAD(head) struct rcu_head head = RCU_HEAD_INIT
-#define INIT_RCU_HEAD(ptr) do { \
- (ptr)->next = NULL; (ptr)->func = NULL; \
-} while (0)
-
/*
* init_rcu_head_on_stack()/destroy_rcu_head_on_stack() are needed for dynamic
* initialization and destruction of rcu_head on the stack. rcu_head structures
extern int debug_lockdep_rcu_enabled(void);
/**
- * rcu_read_lock_held - might we be in RCU read-side critical section?
+ * rcu_read_lock_held() - might we be in RCU read-side critical section?
*
* If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an RCU
* read-side critical section. In absence of CONFIG_DEBUG_LOCK_ALLOC,
* this assumes we are in an RCU read-side critical section unless it can
- * prove otherwise.
+ * prove otherwise. This is useful for debug checks in functions that
+ * require that they be called within an RCU read-side critical section.
*
- * Check debug_lockdep_rcu_enabled() to prevent false positives during boot
+ * Checks debug_lockdep_rcu_enabled() to prevent false positives during boot
* and while lockdep is disabled.
*/
static inline int rcu_read_lock_held(void)
extern int rcu_read_lock_bh_held(void);
/**
- * rcu_read_lock_sched_held - might we be in RCU-sched read-side critical section?
+ * rcu_read_lock_sched_held() - might we be in RCU-sched read-side critical section?
*
* If CONFIG_DEBUG_LOCK_ALLOC is selected, returns nonzero iff in an
* RCU-sched read-side critical section. In absence of
* CONFIG_DEBUG_LOCK_ALLOC, this assumes we are in an RCU-sched read-side
* critical section unless it can prove otherwise. Note that disabling
* of preemption (including disabling irqs) counts as an RCU-sched
- * read-side critical section.
+ * read-side critical section. This is useful for debug checks in functions
+ * that required that they be called within an RCU-sched read-side
+ * critical section.
*
* Check debug_lockdep_rcu_enabled() to prevent false positives during boot
* and while lockdep is disabled.
extern int rcu_my_thread_group_empty(void);
-#define __do_rcu_dereference_check(c) \
+/**
+ * rcu_lockdep_assert - emit lockdep splat if specified condition not met
+ * @c: condition to check
+ */
+#define rcu_lockdep_assert(c) \
do { \
static bool __warned; \
if (debug_lockdep_rcu_enabled() && !__warned && !(c)) { \
} \
} while (0)
+#else /* #ifdef CONFIG_PROVE_RCU */
+
+#define rcu_lockdep_assert(c) do { } while (0)
+
+#endif /* #else #ifdef CONFIG_PROVE_RCU */
+
+/*
+ * Helper functions for rcu_dereference_check(), rcu_dereference_protected()
+ * and rcu_assign_pointer(). Some of these could be folded into their
+ * callers, but they are left separate in order to ease introduction of
+ * multiple flavors of pointers to match the multiple flavors of RCU
+ * (e.g., __rcu_bh, * __rcu_sched, and __srcu), should this make sense in
+ * the future.
+ */
+
+#ifdef __CHECKER__
+#define rcu_dereference_sparse(p, space) \
+ ((void)(((typeof(*p) space *)p) == p))
+#else /* #ifdef __CHECKER__ */
+#define rcu_dereference_sparse(p, space)
+#endif /* #else #ifdef __CHECKER__ */
+
+#define __rcu_access_pointer(p, space) \
+ ({ \
+ typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
+ rcu_dereference_sparse(p, space); \
+ ((typeof(*p) __force __kernel *)(_________p1)); \
+ })
+#define __rcu_dereference_check(p, c, space) \
+ ({ \
+ typeof(*p) *_________p1 = (typeof(*p)*__force )ACCESS_ONCE(p); \
+ rcu_lockdep_assert(c); \
+ rcu_dereference_sparse(p, space); \
+ smp_read_barrier_depends(); \
+ ((typeof(*p) __force __kernel *)(_________p1)); \
+ })
+#define __rcu_dereference_protected(p, c, space) \
+ ({ \
+ rcu_lockdep_assert(c); \
+ rcu_dereference_sparse(p, space); \
+ ((typeof(*p) __force __kernel *)(p)); \
+ })
+
+#define __rcu_dereference_index_check(p, c) \
+ ({ \
+ typeof(p) _________p1 = ACCESS_ONCE(p); \
+ rcu_lockdep_assert(c); \
+ smp_read_barrier_depends(); \
+ (_________p1); \
+ })
+#define __rcu_assign_pointer(p, v, space) \
+ ({ \
+ if (!__builtin_constant_p(v) || \
+ ((v) != NULL)) \
+ smp_wmb(); \
+ (p) = (typeof(*v) __force space *)(v); \
+ })
+
+
+/**
+ * rcu_access_pointer() - fetch RCU pointer with no dereferencing
+ * @p: The pointer to read
+ *
+ * Return the value of the specified RCU-protected pointer, but omit the
+ * smp_read_barrier_depends() and keep the ACCESS_ONCE(). This is useful
+ * when the value of this pointer is accessed, but the pointer is not
+ * dereferenced, for example, when testing an RCU-protected pointer against
+ * NULL. Although rcu_access_pointer() may also be used in cases where
+ * update-side locks prevent the value of the pointer from changing, you
+ * should instead use rcu_dereference_protected() for this use case.
+ */
+#define rcu_access_pointer(p) __rcu_access_pointer((p), __rcu)
+
/**
- * rcu_dereference_check - rcu_dereference with debug checking
+ * rcu_dereference_check() - rcu_dereference with debug checking
* @p: The pointer to read, prior to dereferencing
* @c: The conditions under which the dereference will take place
*
* Do an rcu_dereference(), but check that the conditions under which the
- * dereference will take place are correct. Typically the conditions indicate
- * the various locking conditions that should be held at that point. The check
- * should return true if the conditions are satisfied.
+ * dereference will take place are correct. Typically the conditions
+ * indicate the various locking conditions that should be held at that
+ * point. The check should return true if the conditions are satisfied.
+ * An implicit check for being in an RCU read-side critical section
+ * (rcu_read_lock()) is included.
*
* For example:
*
- * bar = rcu_dereference_check(foo->bar, rcu_read_lock_held() ||
- * lockdep_is_held(&foo->lock));
+ * bar = rcu_dereference_check(foo->bar, lockdep_is_held(&foo->lock));
*
* could be used to indicate to lockdep that foo->bar may only be dereferenced
- * if either the RCU read lock is held, or that the lock required to replace
+ * if either rcu_read_lock() is held, or that the lock required to replace
* the bar struct at foo->bar is held.
*
* Note that the list of conditions may also include indications of when a lock
* need not be held, for example during initialisation or destruction of the
* target struct:
*
- * bar = rcu_dereference_check(foo->bar, rcu_read_lock_held() ||
- * lockdep_is_held(&foo->lock) ||
+ * bar = rcu_dereference_check(foo->bar, lockdep_is_held(&foo->lock) ||
* atomic_read(&foo->usage) == 0);
+ *
+ * Inserts memory barriers on architectures that require them
+ * (currently only the Alpha), prevents the compiler from refetching
+ * (and from merging fetches), and, more importantly, documents exactly
+ * which pointers are protected by RCU and checks that the pointer is
+ * annotated as __rcu.
*/
#define rcu_dereference_check(p, c) \
- ({ \
- __do_rcu_dereference_check(c); \
- rcu_dereference_raw(p); \
- })
+ __rcu_dereference_check((p), rcu_read_lock_held() || (c), __rcu)
+
+/**
+ * rcu_dereference_bh_check() - rcu_dereference_bh with debug checking
+ * @p: The pointer to read, prior to dereferencing
+ * @c: The conditions under which the dereference will take place
+ *
+ * This is the RCU-bh counterpart to rcu_dereference_check().
+ */
+#define rcu_dereference_bh_check(p, c) \
+ __rcu_dereference_check((p), rcu_read_lock_bh_held() || (c), __rcu)
/**
- * rcu_dereference_protected - fetch RCU pointer when updates prevented
+ * rcu_dereference_sched_check() - rcu_dereference_sched with debug checking
+ * @p: The pointer to read, prior to dereferencing
+ * @c: The conditions under which the dereference will take place
+ *
+ * This is the RCU-sched counterpart to rcu_dereference_check().
+ */
+#define rcu_dereference_sched_check(p, c) \
+ __rcu_dereference_check((p), rcu_read_lock_sched_held() || (c), \
+ __rcu)
+
+#define rcu_dereference_raw(p) rcu_dereference_check(p, 1) /*@@@ needed? @@@*/
+
+/**
+ * rcu_dereference_index_check() - rcu_dereference for indices with debug checking
+ * @p: The pointer to read, prior to dereferencing
+ * @c: The conditions under which the dereference will take place
+ *
+ * Similar to rcu_dereference_check(), but omits the sparse checking.
+ * This allows rcu_dereference_index_check() to be used on integers,
+ * which can then be used as array indices. Attempting to use
+ * rcu_dereference_check() on an integer will give compiler warnings
+ * because the sparse address-space mechanism relies on dereferencing
+ * the RCU-protected pointer. Dereferencing integers is not something
+ * that even gcc will put up with.
+ *
+ * Note that this function does not implicitly check for RCU read-side
+ * critical sections. If this function gains lots of uses, it might
+ * make sense to provide versions for each flavor of RCU, but it does
+ * not make sense as of early 2010.
+ */
+#define rcu_dereference_index_check(p, c) \
+ __rcu_dereference_index_check((p), (c))
+
+/**
+ * rcu_dereference_protected() - fetch RCU pointer when updates prevented
+ * @p: The pointer to read, prior to dereferencing
+ * @c: The conditions under which the dereference will take place
*
* Return the value of the specified RCU-protected pointer, but omit
* both the smp_read_barrier_depends() and the ACCESS_ONCE(). This
* prevent the compiler from repeating this reference or combining it
* with other references, so it should not be used without protection
* of appropriate locks.
+ *
+ * This function is only for update-side use. Using this function
+ * when protected only by rcu_read_lock() will result in infrequent
+ * but very ugly failures.
*/
#define rcu_dereference_protected(p, c) \
- ({ \
- __do_rcu_dereference_check(c); \
- (p); \
- })
+ __rcu_dereference_protected((p), (c), __rcu)
-#else /* #ifdef CONFIG_PROVE_RCU */
+/**
+ * rcu_dereference_bh_protected() - fetch RCU-bh pointer when updates prevented
+ * @p: The pointer to read, prior to dereferencing
+ * @c: The conditions under which the dereference will take place
+ *
+ * This is the RCU-bh counterpart to rcu_dereference_protected().
+ */
+#define rcu_dereference_bh_protected(p, c) \
+ __rcu_dereference_protected((p), (c), __rcu)
-#define rcu_dereference_check(p, c) rcu_dereference_raw(p)
-#define rcu_dereference_protected(p, c) (p)
+/**
+ * rcu_dereference_sched_protected() - fetch RCU-sched pointer when updates prevented
+ * @p: The pointer to read, prior to dereferencing
+ * @c: The conditions under which the dereference will take place
+ *
+ * This is the RCU-sched counterpart to rcu_dereference_protected().
+ */
+#define rcu_dereference_sched_protected(p, c) \
+ __rcu_dereference_protected((p), (c), __rcu)
-#endif /* #else #ifdef CONFIG_PROVE_RCU */
/**
- * rcu_access_pointer - fetch RCU pointer with no dereferencing
+ * rcu_dereference() - fetch RCU-protected pointer for dereferencing
+ * @p: The pointer to read, prior to dereferencing
*
- * Return the value of the specified RCU-protected pointer, but omit the
- * smp_read_barrier_depends() and keep the ACCESS_ONCE(). This is useful
- * when the value of this pointer is accessed, but the pointer is not
- * dereferenced, for example, when testing an RCU-protected pointer against
- * NULL. This may also be used in cases where update-side locks prevent
- * the value of the pointer from changing, but rcu_dereference_protected()
- * is a lighter-weight primitive for this use case.
+ * This is a simple wrapper around rcu_dereference_check().
+ */
+#define rcu_dereference(p) rcu_dereference_check(p, 0)
+
+/**
+ * rcu_dereference_bh() - fetch an RCU-bh-protected pointer for dereferencing
+ * @p: The pointer to read, prior to dereferencing
+ *
+ * Makes rcu_dereference_check() do the dirty work.
+ */
+#define rcu_dereference_bh(p) rcu_dereference_bh_check(p, 0)
+
+/**
+ * rcu_dereference_sched() - fetch RCU-sched-protected pointer for dereferencing
+ * @p: The pointer to read, prior to dereferencing
+ *
+ * Makes rcu_dereference_check() do the dirty work.
*/
-#define rcu_access_pointer(p) ACCESS_ONCE(p)
+#define rcu_dereference_sched(p) rcu_dereference_sched_check(p, 0)
/**
- * rcu_read_lock - mark the beginning of an RCU read-side critical section.
+ * rcu_read_lock() - mark the beginning of an RCU read-side critical section
*
* When synchronize_rcu() is invoked on one CPU while other CPUs
* are within RCU read-side critical sections, then the
* until after the all the other CPUs exit their critical sections.
*
* Note, however, that RCU callbacks are permitted to run concurrently
- * with RCU read-side critical sections. One way that this can happen
+ * with new RCU read-side critical sections. One way that this can happen
* is via the following sequence of events: (1) CPU 0 enters an RCU
* read-side critical section, (2) CPU 1 invokes call_rcu() to register
* an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
* will be deferred until the outermost RCU read-side critical section
* completes.
*
- * It is illegal to block while in an RCU read-side critical section.
+ * You can avoid reading and understanding the next paragraph by
+ * following this rule: don't put anything in an rcu_read_lock() RCU
+ * read-side critical section that would block in a !PREEMPT kernel.
+ * But if you want the full story, read on!
+ *
+ * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU), it
+ * is illegal to block while in an RCU read-side critical section. In
+ * preemptible RCU implementations (TREE_PREEMPT_RCU and TINY_PREEMPT_RCU)
+ * in CONFIG_PREEMPT kernel builds, RCU read-side critical sections may
+ * be preempted, but explicit blocking is illegal. Finally, in preemptible
+ * RCU implementations in real-time (CONFIG_PREEMPT_RT) kernel builds,
+ * RCU read-side critical sections may be preempted and they may also
+ * block, but only when acquiring spinlocks that are subject to priority
+ * inheritance.
*/
static inline void rcu_read_lock(void)
{
*/
/**
- * rcu_read_unlock - marks the end of an RCU read-side critical section.
+ * rcu_read_unlock() - marks the end of an RCU read-side critical section.
*
* See rcu_read_lock() for more information.
*/
}
/**
- * rcu_read_lock_bh - mark the beginning of a softirq-only RCU critical section
+ * rcu_read_lock_bh() - mark the beginning of an RCU-bh critical section
*
* This is equivalent of rcu_read_lock(), but to be used when updates
- * are being done using call_rcu_bh(). Since call_rcu_bh() callbacks
- * consider completion of a softirq handler to be a quiescent state,
- * a process in RCU read-side critical section must be protected by
- * disabling softirqs. Read-side critical sections in interrupt context
- * can use just rcu_read_lock().
- *
+ * are being done using call_rcu_bh() or synchronize_rcu_bh(). Since
+ * both call_rcu_bh() and synchronize_rcu_bh() consider completion of a
+ * softirq handler to be a quiescent state, a process in RCU read-side
+ * critical section must be protected by disabling softirqs. Read-side
+ * critical sections in interrupt context can use just rcu_read_lock(),
+ * though this should at least be commented to avoid confusing people
+ * reading the code.
*/
static inline void rcu_read_lock_bh(void)
{
}
/**
- * rcu_read_lock_sched - mark the beginning of a RCU-classic critical section
+ * rcu_read_lock_sched() - mark the beginning of a RCU-sched critical section
*
- * Should be used with either
- * - synchronize_sched()
- * or
- * - call_rcu_sched() and rcu_barrier_sched()
- * on the write-side to insure proper synchronization.
+ * This is equivalent of rcu_read_lock(), but to be used when updates
+ * are being done using call_rcu_sched() or synchronize_rcu_sched().
+ * Read-side critical sections can also be introduced by anything that
+ * disables preemption, including local_irq_disable() and friends.
*/
static inline void rcu_read_lock_sched(void)
{
preempt_enable_notrace();
}
-
/**
- * rcu_dereference_raw - fetch an RCU-protected pointer
+ * rcu_assign_pointer() - assign to RCU-protected pointer
+ * @p: pointer to assign to
+ * @v: value to assign (publish)
*
- * The caller must be within some flavor of RCU read-side critical
- * section, or must be otherwise preventing the pointer from changing,
- * for example, by holding an appropriate lock. This pointer may later
- * be safely dereferenced. It is the caller's responsibility to have
- * done the right thing, as this primitive does no checking of any kind.
- *
- * Inserts memory barriers on architectures that require them
- * (currently only the Alpha), and, more importantly, documents
- * exactly which pointers are protected by RCU.
- */
-#define rcu_dereference_raw(p) ({ \
- typeof(p) _________p1 = ACCESS_ONCE(p); \
- smp_read_barrier_depends(); \
- (_________p1); \
- })
-
-/**
- * rcu_dereference - fetch an RCU-protected pointer, checking for RCU
- *
- * Makes rcu_dereference_check() do the dirty work.
- */
-#define rcu_dereference(p) \
- rcu_dereference_check(p, rcu_read_lock_held())
-
-/**
- * rcu_dereference_bh - fetch an RCU-protected pointer, checking for RCU-bh
- *
- * Makes rcu_dereference_check() do the dirty work.
- */
-#define rcu_dereference_bh(p) \
- rcu_dereference_check(p, rcu_read_lock_bh_held())
-
-/**
- * rcu_dereference_sched - fetch RCU-protected pointer, checking for RCU-sched
- *
- * Makes rcu_dereference_check() do the dirty work.
- */
-#define rcu_dereference_sched(p) \
- rcu_dereference_check(p, rcu_read_lock_sched_held())
-
-/**
- * rcu_assign_pointer - assign (publicize) a pointer to a newly
- * initialized structure that will be dereferenced by RCU read-side
- * critical sections. Returns the value assigned.
+ * Assigns the specified value to the specified RCU-protected
+ * pointer, ensuring that any concurrent RCU readers will see
+ * any prior initialization. Returns the value assigned.
*
* Inserts memory barriers on architectures that require them
* (pretty much all of them other than x86), and also prevents
* call documents which pointers will be dereferenced by RCU read-side
* code.
*/
-
#define rcu_assign_pointer(p, v) \
- ({ \
- if (!__builtin_constant_p(v) || \
- ((v) != NULL)) \
- smp_wmb(); \
- (p) = (v); \
- })
+ __rcu_assign_pointer((p), (v), __rcu)
+
+/**
+ * RCU_INIT_POINTER() - initialize an RCU protected pointer
+ *
+ * Initialize an RCU-protected pointer in such a way to avoid RCU-lockdep
+ * splats.
+ */
+#define RCU_INIT_POINTER(p, v) \
+ p = (typeof(*v) __force __rcu *)(v)
/* Infrastructure to implement the synchronize_() primitives. */
extern void wakeme_after_rcu(struct rcu_head *head);
+#ifdef CONFIG_PREEMPT_RCU
+
/**
- * call_rcu - Queue an RCU callback for invocation after a grace period.
+ * call_rcu() - Queue an RCU callback for invocation after a grace period.
* @head: structure to be used for queueing the RCU updates.
- * @func: actual update function to be invoked after the grace period
+ * @func: actual callback function to be invoked after the grace period
*
- * The update function will be invoked some time after a full grace
- * period elapses, in other words after all currently executing RCU
- * read-side critical sections have completed. RCU read-side critical
+ * The callback function will be invoked some time after a full grace
+ * period elapses, in other words after all pre-existing RCU read-side
+ * critical sections have completed. However, the callback function
+ * might well execute concurrently with RCU read-side critical sections
+ * that started after call_rcu() was invoked. RCU read-side critical
* sections are delimited by rcu_read_lock() and rcu_read_unlock(),
* and may be nested.
*/
extern void call_rcu(struct rcu_head *head,
void (*func)(struct rcu_head *head));
+#else /* #ifdef CONFIG_PREEMPT_RCU */
+
+/* In classic RCU, call_rcu() is just call_rcu_sched(). */
+#define call_rcu call_rcu_sched
+
+#endif /* #else #ifdef CONFIG_PREEMPT_RCU */
+
/**
- * call_rcu_bh - Queue an RCU for invocation after a quicker grace period.
+ * call_rcu_bh() - Queue an RCU for invocation after a quicker grace period.
* @head: structure to be used for queueing the RCU updates.
- * @func: actual update function to be invoked after the grace period
+ * @func: actual callback function to be invoked after the grace period
*
- * The update function will be invoked some time after a full grace
+ * The callback function will be invoked some time after a full grace
* period elapses, in other words after all currently executing RCU
* read-side critical sections have completed. call_rcu_bh() assumes
* that the read-side critical sections end on completion of a softirq
}
#endif /* #else !CONFIG_DEBUG_OBJECTS_RCU_HEAD */
-#ifndef CONFIG_PROVE_RCU
-#define __do_rcu_dereference_check(c) do { } while (0)
-#endif /* #ifdef CONFIG_PROVE_RCU */
-
-#define __rcu_dereference_index_check(p, c) \
- ({ \
- typeof(p) _________p1 = ACCESS_ONCE(p); \
- __do_rcu_dereference_check(c); \
- smp_read_barrier_depends(); \
- (_________p1); \
- })
-
-/**
- * rcu_dereference_index_check() - rcu_dereference for indices with debug checking
- * @p: The pointer to read, prior to dereferencing
- * @c: The conditions under which the dereference will take place
- *
- * Similar to rcu_dereference_check(), but omits the sparse checking.
- * This allows rcu_dereference_index_check() to be used on integers,
- * which can then be used as array indices. Attempting to use
- * rcu_dereference_check() on an integer will give compiler warnings
- * because the sparse address-space mechanism relies on dereferencing
- * the RCU-protected pointer. Dereferencing integers is not something
- * that even gcc will put up with.
- *
- * Note that this function does not implicitly check for RCU read-side
- * critical sections. If this function gains lots of uses, it might
- * make sense to provide versions for each flavor of RCU, but it does
- * not make sense as of early 2010.
- */
-#define rcu_dereference_index_check(p, c) \
- __rcu_dereference_index_check((p), (c))
-
#endif /* __LINUX_RCUPDATE_H */
#include <linux/cache.h>
-void rcu_sched_qs(int cpu);
-void rcu_bh_qs(int cpu);
-static inline void rcu_note_context_switch(int cpu)
-{
- rcu_sched_qs(cpu);
-}
+#define rcu_init_sched() do { } while (0)
-#define __rcu_read_lock() preempt_disable()
-#define __rcu_read_unlock() preempt_enable()
-#define __rcu_read_lock_bh() local_bh_disable()
-#define __rcu_read_unlock_bh() local_bh_enable()
-#define call_rcu_sched call_rcu
+#ifdef CONFIG_TINY_RCU
-#define rcu_init_sched() do { } while (0)
-extern void rcu_check_callbacks(int cpu, int user);
+static inline void synchronize_rcu_expedited(void)
+{
+ synchronize_sched(); /* Only one CPU, so pretty fast anyway!!! */
+}
-static inline int rcu_needs_cpu(int cpu)
+static inline void rcu_barrier(void)
{
- return 0;
+ rcu_barrier_sched(); /* Only one CPU, so only one list of callbacks! */
}
-/*
- * Return the number of grace periods.
- */
-static inline long rcu_batches_completed(void)
+#else /* #ifdef CONFIG_TINY_RCU */
+
+void rcu_barrier(void);
+void synchronize_rcu_expedited(void);
+
+#endif /* #else #ifdef CONFIG_TINY_RCU */
+
+static inline void synchronize_rcu_bh(void)
{
- return 0;
+ synchronize_sched();
}
-/*
- * Return the number of bottom-half grace periods.
- */
-static inline long rcu_batches_completed_bh(void)
+static inline void synchronize_rcu_bh_expedited(void)
{
- return 0;
+ synchronize_sched();
}
-static inline void rcu_force_quiescent_state(void)
+#ifdef CONFIG_TINY_RCU
+
+static inline void rcu_preempt_note_context_switch(void)
{
}
-static inline void rcu_bh_force_quiescent_state(void)
+static inline void exit_rcu(void)
{
}
-static inline void rcu_sched_force_quiescent_state(void)
+static inline int rcu_needs_cpu(int cpu)
{
+ return 0;
}
-extern void synchronize_sched(void);
+#else /* #ifdef CONFIG_TINY_RCU */
+
+void rcu_preempt_note_context_switch(void);
+extern void exit_rcu(void);
+int rcu_preempt_needs_cpu(void);
-static inline void synchronize_rcu(void)
+static inline int rcu_needs_cpu(int cpu)
{
- synchronize_sched();
+ return rcu_preempt_needs_cpu();
}
-static inline void synchronize_rcu_bh(void)
+#endif /* #else #ifdef CONFIG_TINY_RCU */
+
+static inline void rcu_note_context_switch(int cpu)
{
- synchronize_sched();
+ rcu_sched_qs(cpu);
+ rcu_preempt_note_context_switch();
}
-static inline void synchronize_rcu_expedited(void)
+/*
+ * Return the number of grace periods.
+ */
+static inline long rcu_batches_completed(void)
{
- synchronize_sched();
+ return 0;
}
-static inline void synchronize_rcu_bh_expedited(void)
+/*
+ * Return the number of bottom-half grace periods.
+ */
+static inline long rcu_batches_completed_bh(void)
{
- synchronize_sched();
+ return 0;
}
-struct notifier_block;
-
-#ifdef CONFIG_NO_HZ
-
-extern void rcu_enter_nohz(void);
-extern void rcu_exit_nohz(void);
-
-#else /* #ifdef CONFIG_NO_HZ */
-
-static inline void rcu_enter_nohz(void)
+static inline void rcu_force_quiescent_state(void)
{
}
-static inline void rcu_exit_nohz(void)
+static inline void rcu_bh_force_quiescent_state(void)
{
}
-#endif /* #else #ifdef CONFIG_NO_HZ */
-
-static inline void exit_rcu(void)
+static inline void rcu_sched_force_quiescent_state(void)
{
}
-static inline int rcu_preempt_depth(void)
+static inline void rcu_cpu_stall_reset(void)
{
- return 0;
}
#ifdef CONFIG_DEBUG_LOCK_ALLOC
#ifndef __LINUX_RCUTREE_H
#define __LINUX_RCUTREE_H
-struct notifier_block;
-
-extern void rcu_sched_qs(int cpu);
-extern void rcu_bh_qs(int cpu);
extern void rcu_note_context_switch(int cpu);
extern int rcu_needs_cpu(int cpu);
+extern void rcu_cpu_stall_reset(void);
#ifdef CONFIG_TREE_PREEMPT_RCU
-extern void __rcu_read_lock(void);
-extern void __rcu_read_unlock(void);
-extern void synchronize_rcu(void);
extern void exit_rcu(void);
-/*
- * Defined as macro as it is a very low level header
- * included from areas that don't even know about current
- */
-#define rcu_preempt_depth() (current->rcu_read_lock_nesting)
-
#else /* #ifdef CONFIG_TREE_PREEMPT_RCU */
-static inline void __rcu_read_lock(void)
-{
- preempt_disable();
-}
-
-static inline void __rcu_read_unlock(void)
-{
- preempt_enable();
-}
-
-#define synchronize_rcu synchronize_sched
-
static inline void exit_rcu(void)
{
}
-static inline int rcu_preempt_depth(void)
-{
- return 0;
-}
-
#endif /* #else #ifdef CONFIG_TREE_PREEMPT_RCU */
-static inline void __rcu_read_lock_bh(void)
-{
- local_bh_disable();
-}
-static inline void __rcu_read_unlock_bh(void)
-{
- local_bh_enable();
-}
-
-extern void call_rcu_sched(struct rcu_head *head,
- void (*func)(struct rcu_head *rcu));
extern void synchronize_rcu_bh(void);
-extern void synchronize_sched(void);
extern void synchronize_rcu_expedited(void);
static inline void synchronize_rcu_bh_expedited(void)
synchronize_sched_expedited();
}
-extern void rcu_check_callbacks(int cpu, int user);
+extern void rcu_barrier(void);
extern long rcu_batches_completed(void);
extern long rcu_batches_completed_bh(void);
extern void rcu_bh_force_quiescent_state(void);
extern void rcu_sched_force_quiescent_state(void);
-#ifdef CONFIG_NO_HZ
-void rcu_enter_nohz(void);
-void rcu_exit_nohz(void);
-#else /* CONFIG_NO_HZ */
-static inline void rcu_enter_nohz(void)
-{
-}
-static inline void rcu_exit_nohz(void)
-{
-}
-#endif /* CONFIG_NO_HZ */
-
/* A context switch is a grace period for RCU-sched and RCU-bh. */
static inline int rcu_blocking_is_gp(void)
{
SD_LV_NONE = 0,
SD_LV_SIBLING,
SD_LV_MC,
+ SD_LV_BOOK,
SD_LV_CPU,
SD_LV_NODE,
SD_LV_ALLNODES,
struct rcu_node;
+enum perf_event_task_context {
+ perf_invalid_context = -1,
+ perf_hw_context = 0,
+ perf_sw_context,
+ perf_nr_task_contexts,
+};
+
struct task_struct {
volatile long state; /* -1 unrunnable, 0 runnable, >0 stopped */
void *stack;
unsigned int policy;
cpumask_t cpus_allowed;
-#ifdef CONFIG_TREE_PREEMPT_RCU
+#ifdef CONFIG_PREEMPT_RCU
int rcu_read_lock_nesting;
char rcu_read_unlock_special;
- struct rcu_node *rcu_blocked_node;
struct list_head rcu_node_entry;
+#endif /* #ifdef CONFIG_PREEMPT_RCU */
+#ifdef CONFIG_TREE_PREEMPT_RCU
+ struct rcu_node *rcu_blocked_node;
#endif /* #ifdef CONFIG_TREE_PREEMPT_RCU */
#if defined(CONFIG_SCHEDSTATS) || defined(CONFIG_TASK_DELAY_ACCT)
struct list_head cpu_timers[3];
/* process credentials */
- const struct cred *real_cred; /* objective and real subjective task
+ const struct cred __rcu *real_cred; /* objective and real subjective task
* credentials (COW) */
- const struct cred *cred; /* effective (overridable) subjective task
+ const struct cred __rcu *cred; /* effective (overridable) subjective task
* credentials (COW) */
struct mutex cred_guard_mutex; /* guard against foreign influences on
* credential calculations
#endif
#ifdef CONFIG_CGROUPS
/* Control Group info protected by css_set_lock */
- struct css_set *cgroups;
+ struct css_set __rcu *cgroups;
/* cg_list protected by css_set_lock and tsk->alloc_lock */
struct list_head cg_list;
#endif
struct futex_pi_state *pi_state_cache;
#endif
#ifdef CONFIG_PERF_EVENTS
- struct perf_event_context *perf_event_ctxp;
+ struct perf_event_context *perf_event_ctxp[perf_nr_task_contexts];
struct mutex perf_event_mutex;
struct list_head perf_event_list;
#endif
/*
* Per process flags
*/
-#define PF_ALIGNWARN 0x00000001 /* Print alignment warning msgs */
- /* Not implemented yet, only for 486*/
+#define PF_KSOFTIRQD 0x00000001 /* I am ksoftirqd */
#define PF_STARTING 0x00000002 /* being created */
#define PF_EXITING 0x00000004 /* getting shut down */
#define PF_EXITPIDONE 0x00000008 /* pi exit done on shut down */
#define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
#define used_math() tsk_used_math(current)
-#ifdef CONFIG_TREE_PREEMPT_RCU
+#ifdef CONFIG_PREEMPT_RCU
#define RCU_READ_UNLOCK_BLOCKED (1 << 0) /* blocked while in RCU read-side. */
#define RCU_READ_UNLOCK_NEED_QS (1 << 1) /* RCU core needs CPU response. */
{
p->rcu_read_lock_nesting = 0;
p->rcu_read_unlock_special = 0;
+#ifdef CONFIG_TREE_PREEMPT_RCU
p->rcu_blocked_node = NULL;
+#endif
INIT_LIST_HEAD(&p->rcu_node_entry);
}
extern void sched_clock_idle_wakeup_event(u64 delta_ns);
#endif
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+/*
+ * An i/f to runtime opt-in for irq time accounting based off of sched_clock.
+ * The reason for this explicit opt-in is not to have perf penalty with
+ * slow sched_clocks.
+ */
+extern void enable_sched_clock_irqtime(void);
+extern void disable_sched_clock_irqtime(void);
+#else
+static inline void enable_sched_clock_irqtime(void) {}
+static inline void disable_sched_clock_irqtime(void) {}
+#endif
+
extern unsigned long long
task_sched_runtime(struct task_struct *task);
extern unsigned long long thread_group_sched_runtime(struct task_struct *task);
extern int __cond_resched_softirq(void);
-#define cond_resched_softirq() ({ \
- __might_sleep(__FILE__, __LINE__, SOFTIRQ_OFFSET); \
- __cond_resched_softirq(); \
+#define cond_resched_softirq() ({ \
+ __might_sleep(__FILE__, __LINE__, SOFTIRQ_DISABLE_OFFSET); \
+ __cond_resched_softirq(); \
})
/*
extern int cap_task_fix_setuid(struct cred *new, const struct cred *old, int flags);
extern int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3,
unsigned long arg4, unsigned long arg5);
-extern int cap_task_setscheduler(struct task_struct *p, int policy, struct sched_param *lp);
+extern int cap_task_setscheduler(struct task_struct *p);
extern int cap_task_setioprio(struct task_struct *p, int ioprio);
extern int cap_task_setnice(struct task_struct *p, int nice);
extern int cap_syslog(int type, bool from_file);
* Sets the new child socket's sid to the openreq sid.
* @inet_conn_established:
* Sets the connection's peersid to the secmark on skb.
+ * @secmark_relabel_packet:
+ * check if the process should be allowed to relabel packets to the given secid
+ * @security_secmark_refcount_inc
+ * tells the LSM to increment the number of secmark labeling rules loaded
+ * @security_secmark_refcount_dec
+ * tells the LSM to decrement the number of secmark labeling rules loaded
* @req_classify_flow:
* Sets the flow's sid to the openreq sid.
* @tun_dev_create:
* Return 0 if permission is granted.
*
* @secid_to_secctx:
- * Convert secid to security context.
+ * Convert secid to security context. If secdata is NULL the length of
+ * the result will be returned in seclen, but no secdata will be returned.
+ * This does mean that the length could change between calls to check the
+ * length and the next call which actually allocates and returns the secdata.
* @secid contains the security ID.
* @secdata contains the pointer that stores the converted security context.
+ * @seclen pointer which contains the length of the data
* @secctx_to_secid:
* Convert security context to secid.
* @secid contains the pointer to the generated security ID.
int (*task_getioprio) (struct task_struct *p);
int (*task_setrlimit) (struct task_struct *p, unsigned int resource,
struct rlimit *new_rlim);
- int (*task_setscheduler) (struct task_struct *p, int policy,
- struct sched_param *lp);
+ int (*task_setscheduler) (struct task_struct *p);
int (*task_getscheduler) (struct task_struct *p);
int (*task_movememory) (struct task_struct *p);
int (*task_kill) (struct task_struct *p,
struct request_sock *req);
void (*inet_csk_clone) (struct sock *newsk, const struct request_sock *req);
void (*inet_conn_established) (struct sock *sk, struct sk_buff *skb);
+ int (*secmark_relabel_packet) (u32 secid);
+ void (*secmark_refcount_inc) (void);
+ void (*secmark_refcount_dec) (void);
void (*req_classify_flow) (const struct request_sock *req, struct flowi *fl);
int (*tun_dev_create)(void);
void (*tun_dev_post_create)(struct sock *sk);
int security_task_getioprio(struct task_struct *p);
int security_task_setrlimit(struct task_struct *p, unsigned int resource,
struct rlimit *new_rlim);
-int security_task_setscheduler(struct task_struct *p,
- int policy, struct sched_param *lp);
+int security_task_setscheduler(struct task_struct *p);
int security_task_getscheduler(struct task_struct *p);
int security_task_movememory(struct task_struct *p);
int security_task_kill(struct task_struct *p, struct siginfo *info,
return 0;
}
-static inline int security_task_setscheduler(struct task_struct *p,
- int policy,
- struct sched_param *lp)
+static inline int security_task_setscheduler(struct task_struct *p)
{
- return cap_task_setscheduler(p, policy, lp);
+ return cap_task_setscheduler(p);
}
static inline int security_task_getscheduler(struct task_struct *p)
const struct request_sock *req);
void security_inet_conn_established(struct sock *sk,
struct sk_buff *skb);
+int security_secmark_relabel_packet(u32 secid);
+void security_secmark_refcount_inc(void);
+void security_secmark_refcount_dec(void);
int security_tun_dev_create(void);
void security_tun_dev_post_create(struct sock *sk);
int security_tun_dev_attach(struct sock *sk);
{
}
+static inline int security_secmark_relabel_packet(u32 secid)
+{
+ return 0;
+}
+
+static inline void security_secmark_refcount_inc(void)
+{
+}
+
+static inline void security_secmark_refcount_dec(void)
+{
+}
+
static inline int security_tun_dev_create(void)
{
return 0;
#ifdef CONFIG_SECURITY_SELINUX
/**
- * selinux_string_to_sid - map a security context string to a security ID
- * @str: the security context string to be mapped
- * @sid: ID value returned via this.
- *
- * Returns 0 if successful, with the SID stored in sid. A value
- * of zero for sid indicates no SID could be determined (but no error
- * occurred).
- */
-int selinux_string_to_sid(char *str, u32 *sid);
-
-/**
- * selinux_secmark_relabel_packet_permission - secmark permission check
- * @sid: SECMARK ID value to be applied to network packet
- *
- * Returns 0 if the current task is allowed to set the SECMARK label of
- * packets with the supplied security ID. Note that it is implicit that
- * the packet is always being relabeled from the default unlabeled value,
- * and that the access control decision is made in the AVC.
- */
-int selinux_secmark_relabel_packet_permission(u32 sid);
-
-/**
- * selinux_secmark_refcount_inc - increments the secmark use counter
- *
- * SELinux keeps track of the current SECMARK targets in use so it knows
- * when to apply SECMARK label access checks to network packets. This
- * function incements this reference count to indicate that a new SECMARK
- * target has been configured.
- */
-void selinux_secmark_refcount_inc(void);
-
-/**
- * selinux_secmark_refcount_dec - decrements the secmark use counter
- *
- * SELinux keeps track of the current SECMARK targets in use so it knows
- * when to apply SECMARK label access checks to network packets. This
- * function decements this reference count to indicate that one of the
- * existing SECMARK targets has been removed/flushed.
- */
-void selinux_secmark_refcount_dec(void);
-
-/**
* selinux_is_enabled - is SELinux enabled?
*/
bool selinux_is_enabled(void);
#else
-static inline int selinux_string_to_sid(const char *str, u32 *sid)
-{
- *sid = 0;
- return 0;
-}
-
-static inline int selinux_secmark_relabel_packet_permission(u32 sid)
-{
- return 0;
-}
-
-static inline void selinux_secmark_refcount_inc(void)
-{
- return;
-}
-
-static inline void selinux_secmark_refcount_dec(void)
-{
- return;
-}
-
static inline bool selinux_is_enabled(void)
{
return false;
.wait_list = LIST_HEAD_INIT((name).wait_list), \
}
+#define DEFINE_SEMAPHORE(name) \
+ struct semaphore name = __SEMAPHORE_INITIALIZER(name, 1)
+
#define DECLARE_MUTEX(name) \
struct semaphore name = __SEMAPHORE_INITIALIZER(name, 1)
int offset,
unsigned int len, __wsum *csump);
-extern int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
+extern long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode);
extern int memcpy_toiovec(struct iovec *v, unsigned char *kdata, int len);
extern int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata,
int offset, int len);
#define SPI_MODE_OFFSET 6
#define SPI_SCPH_OFFSET 6
#define SPI_SCOL_OFFSET 7
+
#define SPI_TMOD_OFFSET 8
+#define SPI_TMOD_MASK (0x3 << SPI_TMOD_OFFSET)
#define SPI_TMOD_TR 0x0 /* xmit & recv */
#define SPI_TMOD_TO 0x1 /* xmit only */
#define SPI_TMOD_RO 0x2 /* recv only */
#endif /* #else #ifdef CONFIG_DEBUG_LOCK_ALLOC */
/**
- * srcu_dereference - fetch SRCU-protected pointer with checking
+ * srcu_dereference_check - fetch SRCU-protected pointer for later dereferencing
+ * @p: the pointer to fetch and protect for later dereferencing
+ * @sp: pointer to the srcu_struct, which is used to check that we
+ * really are in an SRCU read-side critical section.
+ * @c: condition to check for update-side use
*
- * Makes rcu_dereference_check() do the dirty work.
+ * If PROVE_RCU is enabled, invoking this outside of an RCU read-side
+ * critical section will result in an RCU-lockdep splat, unless @c evaluates
+ * to 1. The @c argument will normally be a logical expression containing
+ * lockdep_is_held() calls.
*/
-#define srcu_dereference(p, sp) \
- rcu_dereference_check(p, srcu_read_lock_held(sp))
+#define srcu_dereference_check(p, sp, c) \
+ __rcu_dereference_check((p), srcu_read_lock_held(sp) || (c), __rcu)
+
+/**
+ * srcu_dereference - fetch SRCU-protected pointer for later dereferencing
+ * @p: the pointer to fetch and protect for later dereferencing
+ * @sp: pointer to the srcu_struct, which is used to check that we
+ * really are in an SRCU read-side critical section.
+ *
+ * Makes rcu_dereference_check() do the dirty work. If PROVE_RCU
+ * is enabled, invoking this outside of an RCU read-side critical
+ * section will result in an RCU-lockdep splat.
+ */
+#define srcu_dereference(p, sp) srcu_dereference_check((p), (sp), 0)
/**
* srcu_read_lock - register a new reader for an SRCU-protected structure.
* @sp: srcu_struct in which to register the new reader.
*
* Enter an SRCU read-side critical section. Note that SRCU read-side
- * critical sections may be nested.
+ * critical sections may be nested. However, it is illegal to
+ * call anything that waits on an SRCU grace period for the same
+ * srcu_struct, whether directly or indirectly. Please note that
+ * one way to indirectly wait on an SRCU grace period is to acquire
+ * a mutex that is held elsewhere while calling synchronize_srcu() or
+ * synchronize_srcu_expedited().
*/
static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp)
{
#else /* CONFIG_STOP_MACHINE && CONFIG_SMP */
-static inline int stop_machine(int (*fn)(void *), void *data,
- const struct cpumask *cpus)
+static inline int __stop_machine(int (*fn)(void *), void *data,
+ const struct cpumask *cpus)
{
int ret;
local_irq_disable();
return ret;
}
+static inline int stop_machine(int (*fn)(void *), void *data,
+ const struct cpumask *cpus)
+{
+ return __stop_machine(fn, data, cpus);
+}
+
#endif /* CONFIG_STOP_MACHINE && CONFIG_SMP */
#endif /* _LINUX_STOP_MACHINE */
enum rpc_gss_proc gc_proc;
u32 gc_seq;
spinlock_t gc_seq_lock;
- struct gss_ctx *gc_gss_ctx;
+ struct gss_ctx __rcu *gc_gss_ctx;
struct xdr_netobj gc_wire_ctx;
u32 gc_win;
unsigned long gc_expiry;
struct gss_cred {
struct rpc_cred gc_base;
enum rpc_gss_svc gc_service;
- struct gss_cl_ctx *gc_ctx;
+ struct gss_cl_ctx __rcu *gc_ctx;
struct gss_upcall_msg *gc_upcall;
unsigned long gc_upcall_timestamp;
unsigned char gc_machine_cred : 1;
* The high-level client handle
*/
struct rpc_clnt {
- struct kref cl_kref; /* Number of references */
+ atomic_t cl_count; /* Number of references */
struct list_head cl_clients; /* Global list of clients */
struct list_head cl_tasks; /* List of tasks */
spinlock_t cl_lock; /* spinlock */
#define SWAP_FLAG_PREFER 0x8000 /* set if swap priority specified */
#define SWAP_FLAG_PRIO_MASK 0x7fff
#define SWAP_FLAG_PRIO_SHIFT 0
+#define SWAP_FLAG_DISCARD 0x10000 /* discard swap cluster after use */
static inline int current_is_kswapd(void)
{
enum {
SWP_USED = (1 << 0), /* is slot in swap_info[] used? */
SWP_WRITEOK = (1 << 1), /* ok to write to this swap? */
- SWP_DISCARDABLE = (1 << 2), /* blkdev supports discard */
+ SWP_DISCARDABLE = (1 << 2), /* swapon+blkdev support discard */
SWP_DISCARDING = (1 << 3), /* now discarding a free cluster */
SWP_SOLIDSTATE = (1 << 4), /* blkdev seeks are cheap */
SWP_CONTINUED = (1 << 5), /* swap_map has count continuation */
extern long total_swap_pages;
extern void si_swapinfo(struct sysinfo *);
extern swp_entry_t get_swap_page(void);
+extern swp_entry_t get_swap_page_of_type(int);
extern int valid_swaphandles(swp_entry_t, unsigned long *);
extern int add_swap_count_continuation(swp_entry_t, gfp_t);
extern void swap_shmem_alloc(swp_entry_t);
extern int try_to_free_swap(struct page *);
struct backing_dev_info;
-#ifdef CONFIG_HIBERNATION
-void hibernation_freeze_swap(void);
-void hibernation_thaw_swap(void);
-swp_entry_t get_swap_for_hibernation(int type);
-void swap_free_for_hibernation(swp_entry_t val);
-#endif
-
/* linux/mm/thrash.c */
extern struct mm_struct *swap_token_mm;
extern void grab_swap_token(struct mm_struct *);
};
/* For futex_wait and futex_wait_requeue_pi */
struct {
- u32 *uaddr;
+ u32 __user *uaddr;
u32 val;
u32 flags;
u32 bitset;
u64 time;
- u32 *uaddr2;
+ u32 __user *uaddr2;
} futex;
/* For nanosleep */
struct {
.balance_interval = 64, \
}
+#ifdef CONFIG_SCHED_BOOK
+#ifndef SD_BOOK_INIT
+#error Please define an appropriate SD_BOOK_INIT in include/asm/topology.h!!!
+#endif
+#endif /* CONFIG_SCHED_BOOK */
+
#ifdef CONFIG_NUMA
#ifndef SD_NODE_INIT
#error Please define an appropriate SD_NODE_INIT in include/asm/topology.h!!!
#include <linux/errno.h>
#include <linux/types.h>
#include <linux/rcupdate.h>
+#include <linux/jump_label.h>
struct module;
struct tracepoint;
extern struct tracepoint __tracepoint_##name; \
static inline void trace_##name(proto) \
{ \
- if (unlikely(__tracepoint_##name.state)) \
+ JUMP_LABEL(&__tracepoint_##name.state, do_trace); \
+ return; \
+do_trace: \
__DO_TRACE(&__tracepoint_##name, \
TP_PROTO(data_proto), \
TP_ARGS(data_args)); \
typedef __s64 int64_t;
#endif
-/* this is a special 64bit data type that is 8-byte aligned */
+/*
+ * aligned_u64 should be used in defining kernel<->userspace ABIs to avoid
+ * common 32/64-bit compat problems.
+ * 64-bit values align to 4-byte boundaries on x86_32 (and possibly other
+ * architectures) and to 8-byte boundaries on 64-bit architetures. The new
+ * aligned_64 type enforces 8-byte alignment so that structs containing
+ * aligned_64 values have the same alignment on 32-bit and 64-bit architectures.
+ * No conversions are necessary between 32-bit user-space and a 64-bit kernel.
+ */
#define aligned_u64 __u64 __attribute__((aligned(8)))
#define aligned_be64 __be64 __attribute__((aligned(8)))
#define aligned_le64 __le64 __attribute__((aligned(8)))
typedef __u16 __bitwise __sum16;
typedef __u32 __bitwise __wsum;
+/* this is a special 64bit data type that is 8-byte aligned */
+#define __aligned_u64 __u64 __attribute__((aligned(8)))
+#define __aligned_be64 __be64 __attribute__((aligned(8)))
+#define __aligned_le64 __le64 __attribute__((aligned(8)))
+
#ifdef __KERNEL__
typedef unsigned __bitwise__ gfp_t;
typedef unsigned __bitwise__ fmode_t;
return x;
}
+/*
+ * More accurate version that also considers the currently pending
+ * deltas. For that we need to loop over all cpus to find the current
+ * deltas. There is no synchronization so the result cannot be
+ * exactly accurate either.
+ */
+static inline unsigned long zone_page_state_snapshot(struct zone *zone,
+ enum zone_stat_item item)
+{
+ long x = atomic_long_read(&zone->vm_stat[item]);
+
+#ifdef CONFIG_SMP
+ int cpu;
+ for_each_online_cpu(cpu)
+ x += per_cpu_ptr(zone->pageset, cpu)->vm_stat_diff[item];
+
+ if (x < 0)
+ x = 0;
+#endif
+ return x;
+}
+
extern unsigned long global_reclaimable_pages(void);
extern unsigned long zone_reclaimable_pages(struct zone *zone);
(wait)->private = current; \
(wait)->func = autoremove_wake_function; \
INIT_LIST_HEAD(&(wait)->task_list); \
+ (wait)->flags = 0; \
} while (0)
/**
#define work_clear_pending(work) \
clear_bit(WORK_STRUCT_PENDING_BIT, work_data_bits(work))
+/*
+ * Workqueue flags and constants. For details, please refer to
+ * Documentation/workqueue.txt.
+ */
enum {
WQ_NON_REENTRANT = 1 << 0, /* guarantee non-reentrance */
WQ_UNBOUND = 1 << 1, /* not bound to any cpu */
/* for userland buffer */
int offset;
+ size_t size;
struct page **pages;
/* for kernel buffers */
* IPv6 Address Label subsystem (addrlabel.c)
*/
extern int ipv6_addr_label_init(void);
+extern void ipv6_addr_label_cleanup(void);
extern void ipv6_addr_label_rtnl_register(void);
extern u32 ipv6_addr_label(struct net *net,
const struct in6_addr *addr,
{
struct sk_buff *skb;
+ release_sock(sk);
if ((skb = sock_alloc_send_skb(sk, len + BT_SKB_RESERVE, nb, err))) {
skb_reserve(skb, BT_SKB_RESERVE);
bt_cb(skb)->incoming = 0;
}
+ lock_sock(sk);
+
+ if (!skb && *err)
+ return NULL;
+
+ *err = sock_error(sk);
+ if (*err)
+ goto out;
+
+ if (sk->sk_shutdown) {
+ *err = -ECONNRESET;
+ goto out;
+ }
return skb;
+
+out:
+ kfree_skb(skb);
+ return NULL;
}
int bt_err(__u16 code);
#ifdef CONFIG_NET_CLS_CGROUP
static inline u32 task_cls_classid(struct task_struct *p)
{
+ int classid;
+
if (in_interrupt())
return 0;
- return container_of(task_subsys_state(p, net_cls_subsys_id),
- struct cgroup_cls_state, css)->classid;
+ rcu_read_lock();
+ classid = container_of(task_subsys_state(p, net_cls_subsys_id),
+ struct cgroup_cls_state, css)->classid;
+ rcu_read_unlock();
+
+ return classid;
}
#else
extern int net_cls_subsys_id;
return 0;
rcu_read_lock();
- id = rcu_dereference(net_cls_subsys_id);
+ id = rcu_dereference_index_check(net_cls_subsys_id,
+ rcu_read_lock_held());
if (id >= 0)
classid = container_of(task_subsys_state(p, id),
struct cgroup_cls_state, css)->classid;
dev->stats.rx_packets++;
dev->stats.rx_bytes += skb->len;
skb->rxhash = 0;
+ skb_set_queue_mapping(skb, 0);
skb_dst_drop(skb);
nf_reset(skb);
}
return csum_partial(diff, sizeof(diff), oldsum);
}
+extern void ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp,
+ int outin);
+
#endif /* __KERNEL__ */
#endif /* _NET_IP_VS_H */
/* nf_conn feature for connections that have a helper */
struct nf_conn_help {
/* Helper. if any */
- struct nf_conntrack_helper *helper;
+ struct nf_conntrack_helper __rcu *helper;
union nf_conntrack_help help;
fl.fl_ip_sport = sport;
fl.fl_ip_dport = dport;
fl.proto = protocol;
+ if (inet_sk(sk)->transparent)
+ fl.flags |= FLOWI_FLAG_ANYSRC;
ip_rt_put(*rp);
*rp = NULL;
security_sk_classify_flow(sk, &fl);
/* Keeping track of sk's, looking them up, and port selection methods. */
void (*hash)(struct sock *sk);
void (*unhash)(struct sock *sk);
+ void (*rehash)(struct sock *sk);
int (*get_port)(struct sock *sk, unsigned short snum);
/* Keeping track of sockets in use */
/* Bound MSS / TSO packet size with the half of the window */
static inline int tcp_bound_to_half_wnd(struct tcp_sock *tp, int pktsize)
{
- if (tp->max_window && pktsize > (tp->max_window >> 1))
- return max(tp->max_window >> 1, 68U - tp->tcp_header_len);
+ int cutoff;
+
+ /* When peer uses tiny windows, there is no use in packetizing
+ * to sub-MSS pieces for the sake of SWS or making sure there
+ * are enough packets in the pipe for fast recovery.
+ *
+ * On the other hand, for extremely large MSS devices, handling
+ * smaller than MSS windows in this way does make sense.
+ */
+ if (tp->max_window >= 512)
+ cutoff = (tp->max_window >> 1);
+ else
+ cutoff = tp->max_window;
+
+ if (cutoff && pktsize > cutoff)
+ return max_t(int, cutoff, 68U - tp->tcp_header_len);
else
return pktsize;
}
}
extern void udp_lib_unhash(struct sock *sk);
+extern void udp_lib_rehash(struct sock *sk, u16 new_hash);
static inline void udp_lib_close(struct sock *sk, long timeout)
{
const struct xfrm_type *type_map[IPPROTO_MAX];
struct xfrm_mode *mode_map[XFRM_MODE_MAX];
int (*init_flags)(struct xfrm_state *x);
- void (*init_tempsel)(struct xfrm_state *x, struct flowi *fl,
- struct xfrm_tmpl *tmpl,
+ void (*init_tempsel)(struct xfrm_selector *sel, struct flowi *fl);
+ void (*init_temprop)(struct xfrm_state *x, struct xfrm_tmpl *tmpl,
xfrm_address_t *daddr, xfrm_address_t *saddr);
int (*tmpl_sort)(struct xfrm_tmpl **dst, struct xfrm_tmpl **src, int n);
int (*state_sort)(struct xfrm_state **dst, struct xfrm_state **src, int n);
#define _TRACE_IRQ_H
#include <linux/tracepoint.h>
-#include <linux/interrupt.h>
+
+struct irqaction;
+struct softirq_action;
#define softirq_name(sirq) { sirq##_SOFTIRQ, #sirq }
#define show_softirq_name(val) \
),
TP_fast_assign(
- __entry->vec = (int)(h - vec);
+ if (vec)
+ __entry->vec = (int)(h - vec);
+ else
+ __entry->vec = (int)(long)h;
),
TP_printk("vec=%d [action=%s]", __entry->vec,
TP_ARGS(h, vec)
);
+/**
+ * softirq_raise - called immediately when a softirq is raised
+ * @h: pointer to struct softirq_action
+ * @vec: pointer to first struct softirq_action in softirq_vec array
+ *
+ * The @h parameter contains a pointer to the softirq vector number which is
+ * raised. @vec is NULL and it means @h includes vector number not
+ * softirq_action. When used in combination with the softirq_entry tracepoint
+ * we can determine the softirq raise latency.
+ */
+DEFINE_EVENT(softirq, softirq_raise,
+
+ TP_PROTO(struct softirq_action *h, struct softirq_action *vec),
+
+ TP_ARGS(h, vec)
+);
+
#endif /* _TRACE_IRQ_H */
/* This part must be outside protection */
#include <linux/netdevice.h>
#include <linux/tracepoint.h>
+#include <linux/ftrace.h>
+
+#define NO_DEV "(no_device)"
+
+TRACE_EVENT(napi_poll,
-DECLARE_TRACE(napi_poll,
TP_PROTO(struct napi_struct *napi),
- TP_ARGS(napi));
+
+ TP_ARGS(napi),
+
+ TP_STRUCT__entry(
+ __field( struct napi_struct *, napi)
+ __string( dev_name, napi->dev ? napi->dev->name : NO_DEV)
+ ),
+
+ TP_fast_assign(
+ __entry->napi = napi;
+ __assign_str(dev_name, napi->dev ? napi->dev->name : NO_DEV);
+ ),
+
+ TP_printk("napi poll on napi struct %p for device %s",
+ __entry->napi, __get_str(dev_name))
+);
+
+#undef NO_DEV
#endif /* _TRACE_NAPI_H_ */
--- /dev/null
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM net
+
+#if !defined(_TRACE_NET_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_NET_H
+
+#include <linux/skbuff.h>
+#include <linux/netdevice.h>
+#include <linux/ip.h>
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(net_dev_xmit,
+
+ TP_PROTO(struct sk_buff *skb,
+ int rc),
+
+ TP_ARGS(skb, rc),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ __field( unsigned int, len )
+ __field( int, rc )
+ __string( name, skb->dev->name )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ __entry->len = skb->len;
+ __entry->rc = rc;
+ __assign_str(name, skb->dev->name);
+ ),
+
+ TP_printk("dev=%s skbaddr=%p len=%u rc=%d",
+ __get_str(name), __entry->skbaddr, __entry->len, __entry->rc)
+);
+
+DECLARE_EVENT_CLASS(net_dev_template,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ __field( unsigned int, len )
+ __string( name, skb->dev->name )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ __entry->len = skb->len;
+ __assign_str(name, skb->dev->name);
+ ),
+
+ TP_printk("dev=%s skbaddr=%p len=%u",
+ __get_str(name), __entry->skbaddr, __entry->len)
+)
+
+DEFINE_EVENT(net_dev_template, net_dev_queue,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_receive_skb,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb)
+);
+
+DEFINE_EVENT(net_dev_template, netif_rx,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb)
+);
+#endif /* _TRACE_NET_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>
#ifndef _TRACE_POWER_ENUM_
#define _TRACE_POWER_ENUM_
enum {
- POWER_NONE = 0,
- POWER_CSTATE = 1,
- POWER_PSTATE = 2,
+ POWER_NONE = 0,
+ POWER_CSTATE = 1, /* C-State */
+ POWER_PSTATE = 2, /* Fequency change or DVFS */
+ POWER_SSTATE = 3, /* Suspend */
};
#endif
+/*
+ * The power events are used for cpuidle & suspend (power_start, power_end)
+ * and for cpufreq (power_frequency)
+ */
DECLARE_EVENT_CLASS(power,
TP_PROTO(unsigned int type, unsigned int state, unsigned int cpu_id),
);
+/*
+ * The clock events are used for clock enable/disable and for
+ * clock rate change
+ */
+DECLARE_EVENT_CLASS(clock,
+
+ TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id),
+
+ TP_ARGS(name, state, cpu_id),
+
+ TP_STRUCT__entry(
+ __string( name, name )
+ __field( u64, state )
+ __field( u64, cpu_id )
+ ),
+
+ TP_fast_assign(
+ __assign_str(name, name);
+ __entry->state = state;
+ __entry->cpu_id = cpu_id;
+ ),
+
+ TP_printk("%s state=%lu cpu_id=%lu", __get_str(name),
+ (unsigned long)__entry->state, (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(clock, clock_enable,
+
+ TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id),
+
+ TP_ARGS(name, state, cpu_id)
+);
+
+DEFINE_EVENT(clock, clock_disable,
+
+ TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id),
+
+ TP_ARGS(name, state, cpu_id)
+);
+
+DEFINE_EVENT(clock, clock_set_rate,
+
+ TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id),
+
+ TP_ARGS(name, state, cpu_id)
+);
+
+/*
+ * The power domain events are used for power domains transitions
+ */
+DECLARE_EVENT_CLASS(power_domain,
+
+ TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id),
+
+ TP_ARGS(name, state, cpu_id),
+
+ TP_STRUCT__entry(
+ __string( name, name )
+ __field( u64, state )
+ __field( u64, cpu_id )
+ ),
+
+ TP_fast_assign(
+ __assign_str(name, name);
+ __entry->state = state;
+ __entry->cpu_id = cpu_id;
+),
+
+ TP_printk("%s state=%lu cpu_id=%lu", __get_str(name),
+ (unsigned long)__entry->state, (unsigned long)__entry->cpu_id)
+);
+
+DEFINE_EVENT(power_domain, power_domain_target,
+
+ TP_PROTO(const char *name, unsigned int state, unsigned int cpu_id),
+
+ TP_ARGS(name, state, cpu_id)
+);
+
#endif /* _TRACE_POWER_H */
/* This part must be outside protection */
(unsigned long long)__entry->vruntime)
);
+/*
+ * Tracepoint for showing priority inheritance modifying a tasks
+ * priority.
+ */
+TRACE_EVENT(sched_pi_setprio,
+
+ TP_PROTO(struct task_struct *tsk, int newprio),
+
+ TP_ARGS(tsk, newprio),
+
+ TP_STRUCT__entry(
+ __array( char, comm, TASK_COMM_LEN )
+ __field( pid_t, pid )
+ __field( int, oldprio )
+ __field( int, newprio )
+ ),
+
+ TP_fast_assign(
+ memcpy(__entry->comm, tsk->comm, TASK_COMM_LEN);
+ __entry->pid = tsk->pid;
+ __entry->oldprio = tsk->prio;
+ __entry->newprio = newprio;
+ ),
+
+ TP_printk("comm=%s pid=%d oldprio=%d newprio=%d",
+ __entry->comm, __entry->pid,
+ __entry->oldprio, __entry->newprio)
+);
+
#endif /* _TRACE_SCHED_H */
/* This part must be outside protection */
__entry->skbaddr, __entry->protocol, __entry->location)
);
+TRACE_EVENT(consume_skb,
+
+ TP_PROTO(struct sk_buff *skb),
+
+ TP_ARGS(skb),
+
+ TP_STRUCT__entry(
+ __field( void *, skbaddr )
+ ),
+
+ TP_fast_assign(
+ __entry->skbaddr = skb;
+ ),
+
+ TP_printk("skbaddr=%p", __entry->skbaddr)
+);
+
TRACE_EVENT(skb_copy_datagram_iovec,
TP_PROTO(const struct sk_buff *skb, int len),
depends on !UML
default y
+config HAVE_IRQ_WORK
+ bool
+
+config IRQ_WORK
+ bool
+ depends on HAVE_IRQ_WORK
+
menu "General setup"
config EXPERIMENTAL
config TREE_RCU
bool "Tree-based hierarchical RCU"
+ depends on !PREEMPT && SMP
help
This option selects the RCU implementation that is
designed for very large SMP system with hundreds or
smaller systems.
config TREE_PREEMPT_RCU
- bool "Preemptable tree-based hierarchical RCU"
+ bool "Preemptible tree-based hierarchical RCU"
depends on PREEMPT
help
This option selects the RCU implementation that is
is not required. This option greatly reduces the
memory footprint of RCU.
+config TINY_PREEMPT_RCU
+ bool "Preemptible UP-only small-memory-footprint RCU"
+ depends on !SMP && PREEMPT
+ help
+ This option selects the RCU implementation that is designed
+ for real-time UP systems. This option greatly reduces the
+ memory footprint of RCU.
+
endchoice
+config PREEMPT_RCU
+ def_bool ( TREE_PREEMPT_RCU || TINY_PREEMPT_RCU )
+ help
+ This option enables preemptible-RCU code that is common between
+ the TREE_PREEMPT_RCU and TINY_PREEMPT_RCU implementations.
+
config RCU_TRACE
bool "Enable tracing for RCU"
depends on TREE_RCU || TREE_PREEMPT_RCU
help
This option controls the fanout of hierarchical implementations
of RCU, allowing RCU to work efficiently on machines with
- large numbers of CPUs. This value must be at least the cube
- root of NR_CPUS, which allows NR_CPUS up to 32,768 for 32-bit
- systems and up to 262,144 for 64-bit systems.
+ large numbers of CPUs. This value must be at least the fourth
+ root of NR_CPUS, which allows NR_CPUS to be insanely large.
+ The default value of RCU_FANOUT should be used for production
+ systems, but if you are stress-testing the RCU implementation
+ itself, small RCU_FANOUT values allow you to test large-system
+ code paths on small(er) systems.
Select a specific number if testing RCU itself.
Take the default if unsure.
default y if (PROFILING || PERF_COUNTERS)
depends on HAVE_PERF_EVENTS
select ANON_INODES
+ select IRQ_WORK
help
Enable kernel support for various performance events provided
by software and hardware.
{
struct semid_ds out;
+ memset(&out, 0, sizeof(out));
+
ipc64_perm_to_ipc_perm(&in->sem_perm, &out.sem_perm);
out.sem_otime = in->sem_otime;
kthread.o wait.o kfifo.o sys_ni.o posix-cpu-timers.o mutex.o \
hrtimer.o rwsem.o nsproxy.o srcu.o semaphore.o \
notifier.o ksysfs.o pm_qos_params.o sched_clock.o cred.o \
- async.o range.o
+ async.o range.o jump_label.o
obj-$(CONFIG_HAVE_EARLY_RES) += early_res.o
obj-y += groups.o
CFLAGS_REMOVE_cgroup-debug.o = -pg
CFLAGS_REMOVE_sched_clock.o = -pg
CFLAGS_REMOVE_perf_event.o = -pg
+CFLAGS_REMOVE_irq_work.o = -pg
endif
obj-$(CONFIG_FREEZER) += freezer.o
obj-$(CONFIG_TREE_PREEMPT_RCU) += rcutree.o
obj-$(CONFIG_TREE_RCU_TRACE) += rcutree_trace.o
obj-$(CONFIG_TINY_RCU) += rcutiny.o
+obj-$(CONFIG_TINY_PREEMPT_RCU) += rcutiny.o
obj-$(CONFIG_RELAY) += relay.o
obj-$(CONFIG_SYSCTL) += utsname_sysctl.o
obj-$(CONFIG_TASK_DELAY_ACCT) += delayacct.o
obj-$(CONFIG_X86_DS) += trace/
obj-$(CONFIG_RING_BUFFER) += trace/
obj-$(CONFIG_SMP) += sched_cpupri.o
+obj-$(CONFIG_IRQ_WORK) += irq_work.o
obj-$(CONFIG_PERF_EVENTS) += perf_event.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
obj-$(CONFIG_USER_RETURN_NOTIFIER) += user-return-notifier.o
* is called after synchronize_rcu(). But for safe use, css_is_removed()
* css_tryget() should be used for avoiding race.
*/
- struct cgroup_subsys_state *css;
+ struct cgroup_subsys_state __rcu *css;
/*
* ID of this css.
*/
}
/**
- * cgroup_attach_task_current_cg - attach task 'tsk' to current task's cgroup
+ * cgroup_attach_task_all - attach task 'tsk' to all cgroups of task 'from'
+ * @from: attach to all cgroups of a given task
* @tsk: the task to be attached
*/
-int cgroup_attach_task_current_cg(struct task_struct *tsk)
+int cgroup_attach_task_all(struct task_struct *from, struct task_struct *tsk)
{
struct cgroupfs_root *root;
- struct cgroup *cur_cg;
int retval = 0;
cgroup_lock();
for_each_active_root(root) {
- cur_cg = task_cgroup_from_root(current, root);
- retval = cgroup_attach_task(cur_cg, tsk);
+ struct cgroup *from_cg = task_cgroup_from_root(from, root);
+
+ retval = cgroup_attach_task(from_cg, tsk);
if (retval)
break;
}
return retval;
}
-EXPORT_SYMBOL_GPL(cgroup_attach_task_current_cg);
+EXPORT_SYMBOL_GPL(cgroup_attach_task_all);
/*
* Attach task with pid 'pid' to cgroup 'cgrp'. Call with cgroup_mutex
return 0;
}
+
+/*
+ * Allocate user-space memory for the duration of a single system call,
+ * in order to marshall parameters inside a compat thunk.
+ */
+void __user *compat_alloc_user_space(unsigned long len)
+{
+ void __user *ptr;
+
+ /* If len would occupy more than half of the entire compat space... */
+ if (unlikely(len > (((compat_uptr_t)~0) >> 1)))
+ return NULL;
+
+ ptr = arch_compat_alloc_user_space(len);
+
+ if (unlikely(!access_ok(VERIFY_WRITE, ptr, len)))
+ return NULL;
+
+ return ptr;
+}
+EXPORT_SYMBOL_GPL(compat_alloc_user_space);
if (tsk->flags & PF_THREAD_BOUND)
return -EINVAL;
- ret = security_task_setscheduler(tsk, 0, NULL);
+ ret = security_task_setscheduler(tsk);
if (ret)
return ret;
if (threadgroup) {
rcu_read_lock();
list_for_each_entry_rcu(c, &tsk->thread_group, thread_group) {
- ret = security_task_setscheduler(c, 0, NULL);
+ ret = security_task_setscheduler(c);
if (ret) {
rcu_read_unlock();
return ret;
int i, bpno;
kdb_bp_t *bp, *bp_check;
int diag;
- int free;
char *symname = NULL;
long offset = 0ul;
int nextarg;
/*
* Find an empty bp structure to allocate
*/
- free = KDB_MAXBPT;
for (bpno = 0, bp = kdb_breakpoints; bpno < KDB_MAXBPT; bpno++, bp++) {
if (bp->bp_free)
break;
{
struct task_struct *tsk = container_of(rhp, struct task_struct, rcu);
-#ifdef CONFIG_PERF_EVENTS
- WARN_ON_ONCE(tsk->perf_event_ctxp);
-#endif
+ perf_event_delayed_put(tsk);
trace_sched_process_free(tsk);
put_task_struct(tsk);
}
if (IS_ERR(pol))
goto fail_nomem_policy;
vma_set_policy(tmp, pol);
+ tmp->vm_mm = mm;
if (anon_vma_fork(tmp, mpnt))
goto fail_nomem_anon_vma_fork;
tmp->vm_flags &= ~VM_LOCKED;
- tmp->vm_mm = mm;
tmp->vm_next = tmp->vm_prev = NULL;
file = tmp->vm_file;
if (file) {
/**
* struct futex_q - The hashed futex queue entry, one per waiting task
+ * @list: priority-sorted list of tasks waiting on this futex
* @task: the task waiting on the futex
* @lock_ptr: the hash bucket lock
* @key: the key the futex is hashed on
*
* A futex_q has a woken state, just like tasks have TASK_RUNNING.
* It is considered woken when plist_node_empty(&q->list) || q->lock_ptr == 0.
- * The order of wakup is always to make the first condition true, then
+ * The order of wakeup is always to make the first condition true, then
* the second.
*
* PI futexes are typically woken before they are removed from the hash list via
* Slow path to fixup the fault we just took in the atomic write
* access to @uaddr.
*
- * We have no generic implementation of a non destructive write to the
+ * We have no generic implementation of a non-destructive write to the
* user address. We know that we faulted in the atomic pagefault
* disabled section so we can as well avoid the #PF overhead by
* calling get_user_pages() right away.
*/
pi_state = this->pi_state;
/*
- * Userspace might have messed up non PI and PI futexes
+ * Userspace might have messed up non-PI and PI futexes
*/
if (unlikely(!pi_state))
return -EINVAL;
/*
* We set q->lock_ptr = NULL _before_ we wake up the task. If
- * a non futex wake up happens on another CPU then the task
- * might exit and p would dereference a non existing task
+ * a non-futex wake up happens on another CPU then the task
+ * might exit and p would dereference a non-existing task
* struct. Prevent this by holding a reference on p across the
* wake up.
*/
/**
* futex_requeue() - Requeue waiters from uaddr1 to uaddr2
- * uaddr1: source futex user address
- * uaddr2: target futex user address
- * nr_wake: number of waiters to wake (must be 1 for requeue_pi)
- * nr_requeue: number of waiters to requeue (0-INT_MAX)
- * requeue_pi: if we are attempting to requeue from a non-pi futex to a
+ * @uaddr1: source futex user address
+ * @fshared: 0 for a PROCESS_PRIVATE futex, 1 for PROCESS_SHARED
+ * @uaddr2: target futex user address
+ * @nr_wake: number of waiters to wake (must be 1 for requeue_pi)
+ * @nr_requeue: number of waiters to requeue (0-INT_MAX)
+ * @cmpval: @uaddr1 expected value (or %NULL)
+ * @requeue_pi: if we are attempting to requeue from a non-pi futex to a
* pi futex (pi to pi requeue is not supported)
*
* Requeue waiters on uaddr1 to uaddr2. In the requeue_pi case, try to acquire
/* The key must be already stored in q->key. */
static inline struct futex_hash_bucket *queue_lock(struct futex_q *q)
+ __acquires(&hb->lock)
{
struct futex_hash_bucket *hb;
- get_futex_key_refs(&q->key);
hb = hash_futex(&q->key);
q->lock_ptr = &hb->lock;
static inline void
queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb)
+ __releases(&hb->lock)
{
spin_unlock(&hb->lock);
- drop_futex_key_refs(&q->key);
}
/**
* an example).
*/
static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb)
+ __releases(&hb->lock)
{
int prio;
* and dropped here.
*/
static void unqueue_me_pi(struct futex_q *q)
+ __releases(q->lock_ptr)
{
WARN_ON(plist_node_empty(&q->list));
plist_del(&q->list, &q->list.plist);
q->pi_state = NULL;
spin_unlock(q->lock_ptr);
-
- drop_futex_key_refs(&q->key);
}
/*
}
retry:
- /* Prepare to wait on uaddr. */
+ /*
+ * Prepare to wait on uaddr. On success, holds hb lock and increments
+ * q.key refs.
+ */
ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
if (ret)
goto out;
/* If we were woken (and unqueued), we succeeded, whatever. */
ret = 0;
+ /* unqueue_me() drops q.key ref */
if (!unqueue_me(&q))
- goto out_put_key;
+ goto out;
ret = -ETIMEDOUT;
if (to && !to->task)
- goto out_put_key;
+ goto out;
/*
* We expect signal_pending(current), but we might be the
* victim of a spurious wakeup as well.
*/
- if (!signal_pending(current)) {
- put_futex_key(fshared, &q.key);
+ if (!signal_pending(current))
goto retry;
- }
ret = -ERESTARTSYS;
if (!abs_time)
- goto out_put_key;
+ goto out;
restart = ¤t_thread_info()->restart_block;
restart->fn = futex_wait_restart;
- restart->futex.uaddr = (u32 *)uaddr;
+ restart->futex.uaddr = uaddr;
restart->futex.val = val;
restart->futex.time = abs_time->tv64;
restart->futex.bitset = bitset;
ret = -ERESTART_RESTARTBLOCK;
-out_put_key:
- put_futex_key(fshared, &q.key);
out:
if (to) {
hrtimer_cancel(&to->timer);
static long futex_wait_restart(struct restart_block *restart)
{
- u32 __user *uaddr = (u32 __user *)restart->futex.uaddr;
+ u32 __user *uaddr = restart->futex.uaddr;
int fshared = 0;
ktime_t t, *tp = NULL;
q.rt_waiter = &rt_waiter;
q.requeue_pi_key = &key2;
- /* Prepare to wait on uaddr. */
+ /*
+ * Prepare to wait on uaddr. On success, increments q.key (key1) ref
+ * count.
+ */
ret = futex_wait_setup(uaddr, val, fshared, &q, &hb);
if (ret)
goto out_key2;
* In order for us to be here, we know our q.key == key2, and since
* we took the hb->lock above, we also know that futex_requeue() has
* completed and we no longer have to concern ourselves with a wakeup
- * race with the atomic proxy lock acquition by the requeue code.
+ * race with the atomic proxy lock acquisition by the requeue code. The
+ * futex_requeue dropped our key1 reference and incremented our key2
+ * reference count.
*/
/* Check if the requeue code acquired the second futex for us. */
*/
static inline int fetch_robust_entry(struct robust_list __user **entry,
struct robust_list __user * __user *head,
- int *pi)
+ unsigned int *pi)
{
unsigned long uentry;
* of the complex code paths. Also we want to prevent
* registration of robust lists in that case. NULL is
* guaranteed to fault and we get -EFAULT on functional
- * implementation, the non functional ones will return
+ * implementation, the non-functional ones will return
* -ENOSYS.
*/
curval = cmpxchg_futex_value_locked(NULL, 0, 0);
*/
static inline int
fetch_robust_entry(compat_uptr_t *uentry, struct robust_list __user **entry,
- compat_uptr_t __user *head, int *pi)
+ compat_uptr_t __user *head, unsigned int *pi)
{
if (get_user(*uentry, head))
return -EFAULT;
* @children: child nodes
* @all: list head for list of all nodes
* @parent: parent node
- * @info: associated profiling data structure if not a directory
- * @ghost: when an object file containing profiling data is unloaded we keep a
- * copy of the profiling data here to allow collecting coverage data
- * for cleanup code. Such a node is called a "ghost".
+ * @loaded_info: array of pointers to profiling data sets for loaded object
+ * files.
+ * @num_loaded: number of profiling data sets for loaded object files.
+ * @unloaded_info: accumulated copy of profiling data sets for unloaded
+ * object files. Used only when gcov_persist=1.
* @dentry: main debugfs entry, either a directory or data file
* @links: associated symbolic links
* @name: data file basename
struct list_head children;
struct list_head all;
struct gcov_node *parent;
- struct gcov_info *info;
- struct gcov_info *ghost;
+ struct gcov_info **loaded_info;
+ struct gcov_info *unloaded_info;
struct dentry *dentry;
struct dentry **links;
+ int num_loaded;
char name[0];
};
};
/*
- * Return the profiling data set for a given node. This can either be the
- * original profiling data structure or a duplicate (also called "ghost")
- * in case the associated object file has been unloaded.
+ * Return a profiling data set associated with the given node. This is
+ * either a data set for a loaded object file or a data set copy in case
+ * all associated object files have been unloaded.
*/
static struct gcov_info *get_node_info(struct gcov_node *node)
{
- if (node->info)
- return node->info;
+ if (node->num_loaded > 0)
+ return node->loaded_info[0];
- return node->ghost;
+ return node->unloaded_info;
+}
+
+/*
+ * Return a newly allocated profiling data set which contains the sum of
+ * all profiling data associated with the given node.
+ */
+static struct gcov_info *get_accumulated_info(struct gcov_node *node)
+{
+ struct gcov_info *info;
+ int i = 0;
+
+ if (node->unloaded_info)
+ info = gcov_info_dup(node->unloaded_info);
+ else
+ info = gcov_info_dup(node->loaded_info[i++]);
+ if (!info)
+ return NULL;
+ for (; i < node->num_loaded; i++)
+ gcov_info_add(info, node->loaded_info[i]);
+
+ return info;
}
/*
mutex_lock(&node_lock);
/*
* Read from a profiling data copy to minimize reference tracking
- * complexity and concurrent access.
+ * complexity and concurrent access and to keep accumulating multiple
+ * profiling data sets associated with one node simple.
*/
- info = gcov_info_dup(get_node_info(node));
+ info = get_accumulated_info(node);
if (!info)
goto out_unlock;
iter = gcov_iter_new(info);
return NULL;
}
+/*
+ * Reset all profiling data associated with the specified node.
+ */
+static void reset_node(struct gcov_node *node)
+{
+ int i;
+
+ if (node->unloaded_info)
+ gcov_info_reset(node->unloaded_info);
+ for (i = 0; i < node->num_loaded; i++)
+ gcov_info_reset(node->loaded_info[i]);
+}
+
static void remove_node(struct gcov_node *node);
/*
* write() implementation for gcov data files. Reset profiling data for the
- * associated file. If the object file has been unloaded (i.e. this is
- * a "ghost" node), remove the debug fs node as well.
+ * corresponding file. If all associated object files have been unloaded,
+ * remove the debug fs node as well.
*/
static ssize_t gcov_seq_write(struct file *file, const char __user *addr,
size_t len, loff_t *pos)
node = get_node_by_name(info->filename);
if (node) {
/* Reset counts or remove node for unloaded modules. */
- if (node->ghost)
+ if (node->num_loaded == 0)
remove_node(node);
else
- gcov_info_reset(node->info);
+ reset_node(node);
}
/* Reset counts for open file. */
gcov_info_reset(info);
INIT_LIST_HEAD(&node->list);
INIT_LIST_HEAD(&node->children);
INIT_LIST_HEAD(&node->all);
- node->info = info;
+ if (node->loaded_info) {
+ node->loaded_info[0] = info;
+ node->num_loaded = 1;
+ }
node->parent = parent;
if (name)
strcpy(node->name, name);
struct gcov_node *node;
node = kzalloc(sizeof(struct gcov_node) + strlen(name) + 1, GFP_KERNEL);
- if (!node) {
- pr_warning("out of memory\n");
- return NULL;
+ if (!node)
+ goto err_nomem;
+ if (info) {
+ node->loaded_info = kcalloc(1, sizeof(struct gcov_info *),
+ GFP_KERNEL);
+ if (!node->loaded_info)
+ goto err_nomem;
}
init_node(node, info, name, parent);
/* Differentiate between gcov data file nodes and directory nodes. */
list_add(&node->all, &all_head);
return node;
+
+err_nomem:
+ kfree(node);
+ pr_warning("out of memory\n");
+ return NULL;
}
/* Remove symbolic links associated with node. */
list_del(&node->all);
debugfs_remove(node->dentry);
remove_links(node);
- if (node->ghost)
- gcov_info_free(node->ghost);
+ kfree(node->loaded_info);
+ if (node->unloaded_info)
+ gcov_info_free(node->unloaded_info);
kfree(node);
}
/*
* write() implementation for reset file. Reset all profiling data to zero
- * and remove ghost nodes.
+ * and remove nodes for which all associated object files are unloaded.
*/
static ssize_t reset_write(struct file *file, const char __user *addr,
size_t len, loff_t *pos)
mutex_lock(&node_lock);
restart:
list_for_each_entry(node, &all_head, all) {
- if (node->info)
- gcov_info_reset(node->info);
+ if (node->num_loaded > 0)
+ reset_node(node);
else if (list_empty(&node->children)) {
remove_node(node);
/* Several nodes may have gone - restart loop. */
}
/*
- * The profiling data set associated with this node is being unloaded. Store a
- * copy of the profiling data and turn this node into a "ghost".
+ * Associate a profiling data set with an existing node. Needs to be called
+ * with node_lock held.
*/
-static int ghost_node(struct gcov_node *node)
+static void add_info(struct gcov_node *node, struct gcov_info *info)
{
- node->ghost = gcov_info_dup(node->info);
- if (!node->ghost) {
- pr_warning("could not save data for '%s' (out of memory)\n",
- node->info->filename);
- return -ENOMEM;
+ struct gcov_info **loaded_info;
+ int num = node->num_loaded;
+
+ /*
+ * Prepare new array. This is done first to simplify cleanup in
+ * case the new data set is incompatible, the node only contains
+ * unloaded data sets and there's not enough memory for the array.
+ */
+ loaded_info = kcalloc(num + 1, sizeof(struct gcov_info *), GFP_KERNEL);
+ if (!loaded_info) {
+ pr_warning("could not add '%s' (out of memory)\n",
+ info->filename);
+ return;
+ }
+ memcpy(loaded_info, node->loaded_info,
+ num * sizeof(struct gcov_info *));
+ loaded_info[num] = info;
+ /* Check if the new data set is compatible. */
+ if (num == 0) {
+ /*
+ * A module was unloaded, modified and reloaded. The new
+ * data set replaces the copy of the last one.
+ */
+ if (!gcov_info_is_compatible(node->unloaded_info, info)) {
+ pr_warning("discarding saved data for %s "
+ "(incompatible version)\n", info->filename);
+ gcov_info_free(node->unloaded_info);
+ node->unloaded_info = NULL;
+ }
+ } else {
+ /*
+ * Two different versions of the same object file are loaded.
+ * The initial one takes precedence.
+ */
+ if (!gcov_info_is_compatible(node->loaded_info[0], info)) {
+ pr_warning("could not add '%s' (incompatible "
+ "version)\n", info->filename);
+ kfree(loaded_info);
+ return;
+ }
}
- node->info = NULL;
+ /* Overwrite previous array. */
+ kfree(node->loaded_info);
+ node->loaded_info = loaded_info;
+ node->num_loaded = num + 1;
+}
- return 0;
+/*
+ * Return the index of a profiling data set associated with a node.
+ */
+static int get_info_index(struct gcov_node *node, struct gcov_info *info)
+{
+ int i;
+
+ for (i = 0; i < node->num_loaded; i++) {
+ if (node->loaded_info[i] == info)
+ return i;
+ }
+ return -ENOENT;
}
/*
- * Profiling data for this node has been loaded again. Add profiling data
- * from previous instantiation and turn this node into a regular node.
+ * Save the data of a profiling data set which is being unloaded.
*/
-static void revive_node(struct gcov_node *node, struct gcov_info *info)
+static void save_info(struct gcov_node *node, struct gcov_info *info)
{
- if (gcov_info_is_compatible(node->ghost, info))
- gcov_info_add(info, node->ghost);
+ if (node->unloaded_info)
+ gcov_info_add(node->unloaded_info, info);
else {
- pr_warning("discarding saved data for '%s' (version changed)\n",
+ node->unloaded_info = gcov_info_dup(info);
+ if (!node->unloaded_info) {
+ pr_warning("could not save data for '%s' "
+ "(out of memory)\n", info->filename);
+ }
+ }
+}
+
+/*
+ * Disassociate a profiling data set from a node. Needs to be called with
+ * node_lock held.
+ */
+static void remove_info(struct gcov_node *node, struct gcov_info *info)
+{
+ int i;
+
+ i = get_info_index(node, info);
+ if (i < 0) {
+ pr_warning("could not remove '%s' (not found)\n",
info->filename);
+ return;
}
- gcov_info_free(node->ghost);
- node->ghost = NULL;
- node->info = info;
+ if (gcov_persist)
+ save_info(node, info);
+ /* Shrink array. */
+ node->loaded_info[i] = node->loaded_info[node->num_loaded - 1];
+ node->num_loaded--;
+ if (node->num_loaded > 0)
+ return;
+ /* Last loaded data set was removed. */
+ kfree(node->loaded_info);
+ node->loaded_info = NULL;
+ node->num_loaded = 0;
+ if (!node->unloaded_info)
+ remove_node(node);
}
/*
node = get_node_by_name(info->filename);
switch (action) {
case GCOV_ADD:
- /* Add new node or revive ghost. */
- if (!node) {
+ if (node)
+ add_info(node, info);
+ else
add_node(info);
- break;
- }
- if (gcov_persist)
- revive_node(node, info);
- else {
- pr_warning("could not add '%s' (already exists)\n",
- info->filename);
- }
break;
case GCOV_REMOVE:
- /* Remove node or turn into ghost. */
- if (!node) {
+ if (node)
+ remove_info(node, info);
+ else {
pr_warning("could not remove '%s' (not found)\n",
info->filename);
- break;
}
- if (gcov_persist) {
- if (!ghost_node(node))
- break;
- }
- remove_node(node);
break;
}
mutex_unlock(&node_lock);
right = group_info->ngroups;
while (left < right) {
unsigned int mid = (left+right)/2;
- int cmp = grp - GROUP_AT(group_info, mid);
- if (cmp > 0)
+ if (grp > GROUP_AT(group_info, mid))
left = mid + 1;
- else if (cmp < 0)
+ else if (grp < GROUP_AT(group_info, mid))
right = mid;
else
return 1;
remove_hrtimer(struct hrtimer *timer, struct hrtimer_clock_base *base)
{
if (hrtimer_is_queued(timer)) {
+ unsigned long state;
int reprogram;
/*
debug_deactivate(timer);
timer_stats_hrtimer_clear_start_info(timer);
reprogram = base->cpu_base == &__get_cpu_var(hrtimer_bases);
- __remove_hrtimer(timer, base, HRTIMER_STATE_INACTIVE,
- reprogram);
+ /*
+ * We must preserve the CALLBACK state flag here,
+ * otherwise we could move the timer base in
+ * switch_hrtimer_base.
+ */
+ state = timer->state & HRTIMER_STATE_CALLBACK;
+ __remove_hrtimer(timer, base, state, reprogram);
return 1;
}
return 0;
*/
ktime_t hrtimer_get_remaining(const struct hrtimer *timer)
{
- struct hrtimer_clock_base *base;
unsigned long flags;
ktime_t rem;
- base = lock_hrtimer_base(timer, &flags);
+ lock_hrtimer_base(timer, &flags);
rem = hrtimer_expires_remaining(timer);
unlock_hrtimer_base(timer, &flags);
BUG_ON(timer->state != HRTIMER_STATE_CALLBACK);
enqueue_hrtimer(timer, base);
}
+
+ WARN_ON_ONCE(!(timer->state & HRTIMER_STATE_CALLBACK));
+
timer->state &= ~HRTIMER_STATE_CALLBACK;
}
printk(KERN_ERR "\"echo 0 > /proc/sys/kernel/hung_task_timeout_secs\""
" disables this message.\n");
sched_show_task(t);
- __debug_show_held_locks(t);
+ debug_show_held_locks(t);
touch_nmi_watchdog();
* periodically exit the critical section and enter a new one.
*
* For preemptible RCU it is sufficient to call rcu_read_unlock in order
- * exit the grace period. For classic RCU, a reschedule is required.
+ * to exit the grace period. For classic RCU, a reschedule is required.
*/
static void rcu_lock_break(struct task_struct *g, struct task_struct *t)
{
*/
static int task_bp_pinned(struct perf_event *bp, enum bp_type_idx type)
{
- struct perf_event_context *ctx = bp->ctx;
+ struct task_struct *tsk = bp->hw.bp_target;
struct perf_event *iter;
int count = 0;
list_for_each_entry(iter, &bp_task_head, hw.bp_list) {
- if (iter->ctx == ctx && find_slot_idx(iter) == type)
+ if (iter->hw.bp_target == tsk && find_slot_idx(iter) == type)
count += hw_breakpoint_weight(iter);
}
enum bp_type_idx type)
{
int cpu = bp->cpu;
- struct task_struct *tsk = bp->ctx->task;
+ struct task_struct *tsk = bp->hw.bp_target;
if (cpu >= 0) {
slots->pinned = per_cpu(nr_cpu_bp_pinned[type], cpu);
int weight)
{
int cpu = bp->cpu;
- struct task_struct *tsk = bp->ctx->task;
+ struct task_struct *tsk = bp->hw.bp_target;
/* Pinned counter cpu profiling */
if (!tsk) {
perf_overflow_handler_t triggered,
struct task_struct *tsk)
{
- return perf_event_create_kernel_counter(attr, -1, tsk->pid, triggered);
+ return perf_event_create_kernel_counter(attr, -1, tsk, triggered);
}
EXPORT_SYMBOL_GPL(register_user_hw_breakpoint);
get_online_cpus();
for_each_online_cpu(cpu) {
pevent = per_cpu_ptr(cpu_events, cpu);
- bp = perf_event_create_kernel_counter(attr, cpu, -1, triggered);
+ bp = perf_event_create_kernel_counter(attr, cpu, NULL, triggered);
*pevent = bp;
.priority = 0x7fffffff
};
+static void bp_perf_event_destroy(struct perf_event *event)
+{
+ release_bp_slot(event);
+}
+
+static int hw_breakpoint_event_init(struct perf_event *bp)
+{
+ int err;
+
+ if (bp->attr.type != PERF_TYPE_BREAKPOINT)
+ return -ENOENT;
+
+ err = register_perf_hw_breakpoint(bp);
+ if (err)
+ return err;
+
+ bp->destroy = bp_perf_event_destroy;
+
+ return 0;
+}
+
+static int hw_breakpoint_add(struct perf_event *bp, int flags)
+{
+ if (!(flags & PERF_EF_START))
+ bp->hw.state = PERF_HES_STOPPED;
+
+ return arch_install_hw_breakpoint(bp);
+}
+
+static void hw_breakpoint_del(struct perf_event *bp, int flags)
+{
+ arch_uninstall_hw_breakpoint(bp);
+}
+
+static void hw_breakpoint_start(struct perf_event *bp, int flags)
+{
+ bp->hw.state = 0;
+}
+
+static void hw_breakpoint_stop(struct perf_event *bp, int flags)
+{
+ bp->hw.state = PERF_HES_STOPPED;
+}
+
+static struct pmu perf_breakpoint = {
+ .task_ctx_nr = perf_sw_context, /* could eventually get its own */
+
+ .event_init = hw_breakpoint_event_init,
+ .add = hw_breakpoint_add,
+ .del = hw_breakpoint_del,
+ .start = hw_breakpoint_start,
+ .stop = hw_breakpoint_stop,
+ .read = hw_breakpoint_pmu_read,
+};
+
static int __init init_hw_breakpoint(void)
{
unsigned int **task_bp_pinned;
constraints_initialized = 1;
+ perf_pmu_register(&perf_breakpoint);
+
return register_die_notifier(&hw_breakpoint_exceptions_nb);
err_alloc:
core_initcall(init_hw_breakpoint);
-struct pmu perf_ops_bp = {
- .enable = arch_install_hw_breakpoint,
- .disable = arch_uninstall_hw_breakpoint,
- .read = hw_breakpoint_pmu_read,
-};
--- /dev/null
+/*
+ * Copyright (C) 2010 Red Hat, Inc., Peter Zijlstra <pzijlstr@redhat.com>
+ *
+ * Provides a framework for enqueueing and running callbacks from hardirq
+ * context. The enqueueing is NMI-safe.
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/irq_work.h>
+#include <linux/hardirq.h>
+
+/*
+ * An entry can be in one of four states:
+ *
+ * free NULL, 0 -> {claimed} : free to be used
+ * claimed NULL, 3 -> {pending} : claimed to be enqueued
+ * pending next, 3 -> {busy} : queued, pending callback
+ * busy NULL, 2 -> {free, claimed} : callback in progress, can be claimed
+ *
+ * We use the lower two bits of the next pointer to keep PENDING and BUSY
+ * flags.
+ */
+
+#define IRQ_WORK_PENDING 1UL
+#define IRQ_WORK_BUSY 2UL
+#define IRQ_WORK_FLAGS 3UL
+
+static inline bool irq_work_is_set(struct irq_work *entry, int flags)
+{
+ return (unsigned long)entry->next & flags;
+}
+
+static inline struct irq_work *irq_work_next(struct irq_work *entry)
+{
+ unsigned long next = (unsigned long)entry->next;
+ next &= ~IRQ_WORK_FLAGS;
+ return (struct irq_work *)next;
+}
+
+static inline struct irq_work *next_flags(struct irq_work *entry, int flags)
+{
+ unsigned long next = (unsigned long)entry;
+ next |= flags;
+ return (struct irq_work *)next;
+}
+
+static DEFINE_PER_CPU(struct irq_work *, irq_work_list);
+
+/*
+ * Claim the entry so that no one else will poke at it.
+ */
+static bool irq_work_claim(struct irq_work *entry)
+{
+ struct irq_work *next, *nflags;
+
+ do {
+ next = entry->next;
+ if ((unsigned long)next & IRQ_WORK_PENDING)
+ return false;
+ nflags = next_flags(next, IRQ_WORK_FLAGS);
+ } while (cmpxchg(&entry->next, next, nflags) != next);
+
+ return true;
+}
+
+
+void __weak arch_irq_work_raise(void)
+{
+ /*
+ * Lame architectures will get the timer tick callback
+ */
+}
+
+/*
+ * Queue the entry and raise the IPI if needed.
+ */
+static void __irq_work_queue(struct irq_work *entry)
+{
+ struct irq_work **head, *next;
+
+ head = &get_cpu_var(irq_work_list);
+
+ do {
+ next = *head;
+ /* Can assign non-atomic because we keep the flags set. */
+ entry->next = next_flags(next, IRQ_WORK_FLAGS);
+ } while (cmpxchg(head, next, entry) != next);
+
+ /* The list was empty, raise self-interrupt to start processing. */
+ if (!irq_work_next(entry))
+ arch_irq_work_raise();
+
+ put_cpu_var(irq_work_list);
+}
+
+/*
+ * Enqueue the irq_work @entry, returns true on success, failure when the
+ * @entry was already enqueued by someone else.
+ *
+ * Can be re-enqueued while the callback is still in progress.
+ */
+bool irq_work_queue(struct irq_work *entry)
+{
+ if (!irq_work_claim(entry)) {
+ /*
+ * Already enqueued, can't do!
+ */
+ return false;
+ }
+
+ __irq_work_queue(entry);
+ return true;
+}
+EXPORT_SYMBOL_GPL(irq_work_queue);
+
+/*
+ * Run the irq_work entries on this cpu. Requires to be ran from hardirq
+ * context with local IRQs disabled.
+ */
+void irq_work_run(void)
+{
+ struct irq_work *list, **head;
+
+ head = &__get_cpu_var(irq_work_list);
+ if (*head == NULL)
+ return;
+
+ BUG_ON(!in_irq());
+ BUG_ON(!irqs_disabled());
+
+ list = xchg(head, NULL);
+ while (list != NULL) {
+ struct irq_work *entry = list;
+
+ list = irq_work_next(list);
+
+ /*
+ * Clear the PENDING bit, after this point the @entry
+ * can be re-used.
+ */
+ entry->next = next_flags(NULL, IRQ_WORK_BUSY);
+ entry->func(entry);
+ /*
+ * Clear the BUSY bit and return to the free state if
+ * no-one else claimed it meanwhile.
+ */
+ cmpxchg(&entry->next, next_flags(NULL, IRQ_WORK_BUSY), NULL);
+ }
+}
+EXPORT_SYMBOL_GPL(irq_work_run);
+
+/*
+ * Synchronize against the irq_work @entry, ensures the entry is not
+ * currently in use.
+ */
+void irq_work_sync(struct irq_work *entry)
+{
+ WARN_ON_ONCE(irqs_disabled());
+
+ while (irq_work_is_set(entry, IRQ_WORK_BUSY))
+ cpu_relax();
+}
+EXPORT_SYMBOL_GPL(irq_work_sync);
--- /dev/null
+/*
+ * jump label support
+ *
+ * Copyright (C) 2009 Jason Baron <jbaron@redhat.com>
+ *
+ */
+#include <linux/jump_label.h>
+#include <linux/memory.h>
+#include <linux/uaccess.h>
+#include <linux/module.h>
+#include <linux/list.h>
+#include <linux/jhash.h>
+#include <linux/slab.h>
+#include <linux/sort.h>
+#include <linux/err.h>
+
+#ifdef HAVE_JUMP_LABEL
+
+#define JUMP_LABEL_HASH_BITS 6
+#define JUMP_LABEL_TABLE_SIZE (1 << JUMP_LABEL_HASH_BITS)
+static struct hlist_head jump_label_table[JUMP_LABEL_TABLE_SIZE];
+
+/* mutex to protect coming/going of the the jump_label table */
+static DEFINE_MUTEX(jump_label_mutex);
+
+struct jump_label_entry {
+ struct hlist_node hlist;
+ struct jump_entry *table;
+ int nr_entries;
+ /* hang modules off here */
+ struct hlist_head modules;
+ unsigned long key;
+};
+
+struct jump_label_module_entry {
+ struct hlist_node hlist;
+ struct jump_entry *table;
+ int nr_entries;
+ struct module *mod;
+};
+
+static int jump_label_cmp(const void *a, const void *b)
+{
+ const struct jump_entry *jea = a;
+ const struct jump_entry *jeb = b;
+
+ if (jea->key < jeb->key)
+ return -1;
+
+ if (jea->key > jeb->key)
+ return 1;
+
+ return 0;
+}
+
+static void
+sort_jump_label_entries(struct jump_entry *start, struct jump_entry *stop)
+{
+ unsigned long size;
+
+ size = (((unsigned long)stop - (unsigned long)start)
+ / sizeof(struct jump_entry));
+ sort(start, size, sizeof(struct jump_entry), jump_label_cmp, NULL);
+}
+
+static struct jump_label_entry *get_jump_label_entry(jump_label_t key)
+{
+ struct hlist_head *head;
+ struct hlist_node *node;
+ struct jump_label_entry *e;
+ u32 hash = jhash((void *)&key, sizeof(jump_label_t), 0);
+
+ head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
+ hlist_for_each_entry(e, node, head, hlist) {
+ if (key == e->key)
+ return e;
+ }
+ return NULL;
+}
+
+static struct jump_label_entry *
+add_jump_label_entry(jump_label_t key, int nr_entries, struct jump_entry *table)
+{
+ struct hlist_head *head;
+ struct jump_label_entry *e;
+ u32 hash;
+
+ e = get_jump_label_entry(key);
+ if (e)
+ return ERR_PTR(-EEXIST);
+
+ e = kmalloc(sizeof(struct jump_label_entry), GFP_KERNEL);
+ if (!e)
+ return ERR_PTR(-ENOMEM);
+
+ hash = jhash((void *)&key, sizeof(jump_label_t), 0);
+ head = &jump_label_table[hash & (JUMP_LABEL_TABLE_SIZE - 1)];
+ e->key = key;
+ e->table = table;
+ e->nr_entries = nr_entries;
+ INIT_HLIST_HEAD(&(e->modules));
+ hlist_add_head(&e->hlist, head);
+ return e;
+}
+
+static int
+build_jump_label_hashtable(struct jump_entry *start, struct jump_entry *stop)
+{
+ struct jump_entry *iter, *iter_begin;
+ struct jump_label_entry *entry;
+ int count;
+
+ sort_jump_label_entries(start, stop);
+ iter = start;
+ while (iter < stop) {
+ entry = get_jump_label_entry(iter->key);
+ if (!entry) {
+ iter_begin = iter;
+ count = 0;
+ while ((iter < stop) &&
+ (iter->key == iter_begin->key)) {
+ iter++;
+ count++;
+ }
+ entry = add_jump_label_entry(iter_begin->key,
+ count, iter_begin);
+ if (IS_ERR(entry))
+ return PTR_ERR(entry);
+ } else {
+ WARN_ONCE(1, KERN_ERR "build_jump_hashtable: unexpected entry!\n");
+ return -1;
+ }
+ }
+ return 0;
+}
+
+/***
+ * jump_label_update - update jump label text
+ * @key - key value associated with a a jump label
+ * @type - enum set to JUMP_LABEL_ENABLE or JUMP_LABEL_DISABLE
+ *
+ * Will enable/disable the jump for jump label @key, depending on the
+ * value of @type.
+ *
+ */
+
+void jump_label_update(unsigned long key, enum jump_label_type type)
+{
+ struct jump_entry *iter;
+ struct jump_label_entry *entry;
+ struct hlist_node *module_node;
+ struct jump_label_module_entry *e_module;
+ int count;
+
+ mutex_lock(&jump_label_mutex);
+ entry = get_jump_label_entry((jump_label_t)key);
+ if (entry) {
+ count = entry->nr_entries;
+ iter = entry->table;
+ while (count--) {
+ if (kernel_text_address(iter->code))
+ arch_jump_label_transform(iter, type);
+ iter++;
+ }
+ /* eanble/disable jump labels in modules */
+ hlist_for_each_entry(e_module, module_node, &(entry->modules),
+ hlist) {
+ count = e_module->nr_entries;
+ iter = e_module->table;
+ while (count--) {
+ if (kernel_text_address(iter->code))
+ arch_jump_label_transform(iter, type);
+ iter++;
+ }
+ }
+ }
+ mutex_unlock(&jump_label_mutex);
+}
+
+static int addr_conflict(struct jump_entry *entry, void *start, void *end)
+{
+ if (entry->code <= (unsigned long)end &&
+ entry->code + JUMP_LABEL_NOP_SIZE > (unsigned long)start)
+ return 1;
+
+ return 0;
+}
+
+#ifdef CONFIG_MODULES
+
+static int module_conflict(void *start, void *end)
+{
+ struct hlist_head *head;
+ struct hlist_node *node, *node_next, *module_node, *module_node_next;
+ struct jump_label_entry *e;
+ struct jump_label_module_entry *e_module;
+ struct jump_entry *iter;
+ int i, count;
+ int conflict = 0;
+
+ for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
+ head = &jump_label_table[i];
+ hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
+ hlist_for_each_entry_safe(e_module, module_node,
+ module_node_next,
+ &(e->modules), hlist) {
+ count = e_module->nr_entries;
+ iter = e_module->table;
+ while (count--) {
+ if (addr_conflict(iter, start, end)) {
+ conflict = 1;
+ goto out;
+ }
+ iter++;
+ }
+ }
+ }
+ }
+out:
+ return conflict;
+}
+
+#endif
+
+/***
+ * jump_label_text_reserved - check if addr range is reserved
+ * @start: start text addr
+ * @end: end text addr
+ *
+ * checks if the text addr located between @start and @end
+ * overlaps with any of the jump label patch addresses. Code
+ * that wants to modify kernel text should first verify that
+ * it does not overlap with any of the jump label addresses.
+ *
+ * returns 1 if there is an overlap, 0 otherwise
+ */
+int jump_label_text_reserved(void *start, void *end)
+{
+ struct jump_entry *iter;
+ struct jump_entry *iter_start = __start___jump_table;
+ struct jump_entry *iter_stop = __start___jump_table;
+ int conflict = 0;
+
+ mutex_lock(&jump_label_mutex);
+ iter = iter_start;
+ while (iter < iter_stop) {
+ if (addr_conflict(iter, start, end)) {
+ conflict = 1;
+ goto out;
+ }
+ iter++;
+ }
+
+ /* now check modules */
+#ifdef CONFIG_MODULES
+ conflict = module_conflict(start, end);
+#endif
+out:
+ mutex_unlock(&jump_label_mutex);
+ return conflict;
+}
+
+static __init int init_jump_label(void)
+{
+ int ret;
+ struct jump_entry *iter_start = __start___jump_table;
+ struct jump_entry *iter_stop = __stop___jump_table;
+ struct jump_entry *iter;
+
+ mutex_lock(&jump_label_mutex);
+ ret = build_jump_label_hashtable(__start___jump_table,
+ __stop___jump_table);
+ iter = iter_start;
+ while (iter < iter_stop) {
+ arch_jump_label_text_poke_early(iter->code);
+ iter++;
+ }
+ mutex_unlock(&jump_label_mutex);
+ return ret;
+}
+early_initcall(init_jump_label);
+
+#ifdef CONFIG_MODULES
+
+static struct jump_label_module_entry *
+add_jump_label_module_entry(struct jump_label_entry *entry,
+ struct jump_entry *iter_begin,
+ int count, struct module *mod)
+{
+ struct jump_label_module_entry *e;
+
+ e = kmalloc(sizeof(struct jump_label_module_entry), GFP_KERNEL);
+ if (!e)
+ return ERR_PTR(-ENOMEM);
+ e->mod = mod;
+ e->nr_entries = count;
+ e->table = iter_begin;
+ hlist_add_head(&e->hlist, &entry->modules);
+ return e;
+}
+
+static int add_jump_label_module(struct module *mod)
+{
+ struct jump_entry *iter, *iter_begin;
+ struct jump_label_entry *entry;
+ struct jump_label_module_entry *module_entry;
+ int count;
+
+ /* if the module doesn't have jump label entries, just return */
+ if (!mod->num_jump_entries)
+ return 0;
+
+ sort_jump_label_entries(mod->jump_entries,
+ mod->jump_entries + mod->num_jump_entries);
+ iter = mod->jump_entries;
+ while (iter < mod->jump_entries + mod->num_jump_entries) {
+ entry = get_jump_label_entry(iter->key);
+ iter_begin = iter;
+ count = 0;
+ while ((iter < mod->jump_entries + mod->num_jump_entries) &&
+ (iter->key == iter_begin->key)) {
+ iter++;
+ count++;
+ }
+ if (!entry) {
+ entry = add_jump_label_entry(iter_begin->key, 0, NULL);
+ if (IS_ERR(entry))
+ return PTR_ERR(entry);
+ }
+ module_entry = add_jump_label_module_entry(entry, iter_begin,
+ count, mod);
+ if (IS_ERR(module_entry))
+ return PTR_ERR(module_entry);
+ }
+ return 0;
+}
+
+static void remove_jump_label_module(struct module *mod)
+{
+ struct hlist_head *head;
+ struct hlist_node *node, *node_next, *module_node, *module_node_next;
+ struct jump_label_entry *e;
+ struct jump_label_module_entry *e_module;
+ int i;
+
+ /* if the module doesn't have jump label entries, just return */
+ if (!mod->num_jump_entries)
+ return;
+
+ for (i = 0; i < JUMP_LABEL_TABLE_SIZE; i++) {
+ head = &jump_label_table[i];
+ hlist_for_each_entry_safe(e, node, node_next, head, hlist) {
+ hlist_for_each_entry_safe(e_module, module_node,
+ module_node_next,
+ &(e->modules), hlist) {
+ if (e_module->mod == mod) {
+ hlist_del(&e_module->hlist);
+ kfree(e_module);
+ }
+ }
+ if (hlist_empty(&e->modules) && (e->nr_entries == 0)) {
+ hlist_del(&e->hlist);
+ kfree(e);
+ }
+ }
+ }
+}
+
+static int
+jump_label_module_notify(struct notifier_block *self, unsigned long val,
+ void *data)
+{
+ struct module *mod = data;
+ int ret = 0;
+
+ switch (val) {
+ case MODULE_STATE_COMING:
+ mutex_lock(&jump_label_mutex);
+ ret = add_jump_label_module(mod);
+ if (ret)
+ remove_jump_label_module(mod);
+ mutex_unlock(&jump_label_mutex);
+ break;
+ case MODULE_STATE_GOING:
+ mutex_lock(&jump_label_mutex);
+ remove_jump_label_module(mod);
+ mutex_unlock(&jump_label_mutex);
+ break;
+ }
+ return ret;
+}
+
+/***
+ * apply_jump_label_nops - patch module jump labels with arch_get_jump_label_nop()
+ * @mod: module to patch
+ *
+ * Allow for run-time selection of the optimal nops. Before the module
+ * loads patch these with arch_get_jump_label_nop(), which is specified by
+ * the arch specific jump label code.
+ */
+void jump_label_apply_nops(struct module *mod)
+{
+ struct jump_entry *iter;
+
+ /* if the module doesn't have jump label entries, just return */
+ if (!mod->num_jump_entries)
+ return;
+
+ iter = mod->jump_entries;
+ while (iter < mod->jump_entries + mod->num_jump_entries) {
+ arch_jump_label_text_poke_early(iter->code);
+ iter++;
+ }
+}
+
+struct notifier_block jump_label_module_nb = {
+ .notifier_call = jump_label_module_notify,
+ .priority = 0,
+};
+
+static __init int init_jump_label_module(void)
+{
+ return register_module_notifier(&jump_label_module_nb);
+}
+early_initcall(init_jump_label_module);
+
+#endif /* CONFIG_MODULES */
+
+#endif
n = setup_sgl_buf(sgl, fifo->data + off, nents, l);
n += setup_sgl_buf(sgl + n, fifo->data, nents - n, len - l);
- if (n)
- sg_mark_end(sgl + n - 1);
return n;
}
#include <linux/memory.h>
#include <linux/ftrace.h>
#include <linux/cpu.h>
+#include <linux/jump_label.h>
#include <asm-generic/sections.h>
#include <asm/cacheflush.h>
* Return an optimized kprobe whose optimizing code replaces
* instructions including addr (exclude breakpoint).
*/
-struct kprobe *__kprobes get_optimized_kprobe(unsigned long addr)
+static struct kprobe *__kprobes get_optimized_kprobe(unsigned long addr)
{
int i;
struct kprobe *p = NULL;
void __kprobes kretprobe_hash_lock(struct task_struct *tsk,
struct hlist_head **head, unsigned long *flags)
+__acquires(hlist_lock)
{
unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
spinlock_t *hlist_lock;
static void __kprobes kretprobe_table_lock(unsigned long hash,
unsigned long *flags)
+__acquires(hlist_lock)
{
spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
spin_lock_irqsave(hlist_lock, *flags);
void __kprobes kretprobe_hash_unlock(struct task_struct *tsk,
unsigned long *flags)
+__releases(hlist_lock)
{
unsigned long hash = hash_ptr(tsk, KPROBE_HASH_BITS);
spinlock_t *hlist_lock;
spin_unlock_irqrestore(hlist_lock, *flags);
}
-void __kprobes kretprobe_table_unlock(unsigned long hash, unsigned long *flags)
+static void __kprobes kretprobe_table_unlock(unsigned long hash,
+ unsigned long *flags)
+__releases(hlist_lock)
{
spinlock_t *hlist_lock = kretprobe_table_lock_ptr(hash);
spin_unlock_irqrestore(hlist_lock, *flags);
preempt_disable();
if (!kernel_text_address((unsigned long) p->addr) ||
in_kprobes_functions((unsigned long) p->addr) ||
- ftrace_text_reserved(p->addr, p->addr)) {
+ ftrace_text_reserved(p->addr, p->addr) ||
+ jump_label_text_reserved(p->addr, p->addr)) {
preempt_enable();
return -EINVAL;
}
if (num <= 0)
return -EINVAL;
for (i = 0; i < num; i++) {
- unsigned long addr;
+ unsigned long addr, offset;
jp = jps[i];
addr = arch_deref_entry_point(jp->entry);
- if (!kernel_text_address(addr))
- ret = -EINVAL;
- else {
- /* Todo: Verify probepoint is a function entry point */
+ /* Verify probepoint is a function entry point */
+ if (kallsyms_lookup_size_offset(addr, NULL, &offset) &&
+ offset == 0) {
jp->kp.pre_handler = setjmp_pre_handler;
jp->kp.break_handler = longjmp_break_handler;
ret = register_kprobe(&jp->kp);
- }
+ } else
+ ret = -EINVAL;
+
if (ret < 0) {
if (i > 0)
unregister_jprobes(jps, i);
}
#endif
+ if (unlikely(subclass >= MAX_LOCKDEP_SUBCLASSES)) {
+ debug_locks_off();
+ printk(KERN_ERR
+ "BUG: looking up invalid subclass: %u\n", subclass);
+ printk(KERN_ERR
+ "turning off the locking correctness validator.\n");
+ dump_stack();
+ return NULL;
+ }
+
/*
* Static locks do not have their class-keys yet - for them the key
* is the lock object itself:
raw_local_irq_restore(flags);
if (!subclass || force)
- lock->class_cache = class;
+ lock->class_cache[0] = class;
+ else if (subclass < NR_LOCKDEP_CACHING_CLASSES)
+ lock->class_cache[subclass] = class;
if (DEBUG_LOCKS_WARN_ON(class->subclass != subclass))
return NULL;
void lockdep_init_map(struct lockdep_map *lock, const char *name,
struct lock_class_key *key, int subclass)
{
- lock->class_cache = NULL;
+ int i;
+
+ for (i = 0; i < NR_LOCKDEP_CACHING_CLASSES; i++)
+ lock->class_cache[i] = NULL;
+
#ifdef CONFIG_LOCK_STAT
lock->cpu = raw_smp_processor_id();
#endif
if (DEBUG_LOCKS_WARN_ON(!irqs_disabled()))
return 0;
- if (unlikely(subclass >= MAX_LOCKDEP_SUBCLASSES)) {
- debug_locks_off();
- printk("BUG: MAX_LOCKDEP_SUBCLASSES too low!\n");
- printk("turning off the locking correctness validator.\n");
- dump_stack();
- return 0;
- }
-
if (lock->key == &__lockdep_no_validate__)
check = 1;
- if (!subclass)
- class = lock->class_cache;
+ if (subclass < NR_LOCKDEP_CACHING_CLASSES)
+ class = lock->class_cache[subclass];
/*
- * Not cached yet or subclass?
+ * Not cached?
*/
if (unlikely(!class)) {
class = register_lock_class(lock, subclass, 0);
return 1;
if (hlock->references) {
- struct lock_class *class = lock->class_cache;
+ struct lock_class *class = lock->class_cache[0];
if (!class)
class = look_up_lock_class(lock, 0);
if (list_empty(head))
continue;
list_for_each_entry_safe(class, next, head, hash_entry) {
- if (unlikely(class == lock->class_cache)) {
+ int match = 0;
+
+ for (j = 0; j < NR_LOCKDEP_CACHING_CLASSES; j++)
+ match |= class == lock->class_cache[j];
+
+ if (unlikely(match)) {
if (debug_locks_off_graph_unlock())
WARN_ON(1);
goto out_restore;
* Careful: only use this function if you are sure that
* the task cannot run in parallel!
*/
-void __debug_show_held_locks(struct task_struct *task)
+void debug_show_held_locks(struct task_struct *task)
{
if (unlikely(!debug_locks)) {
printk("INFO: lockdep is turned off.\n");
}
lockdep_print_held_locks(task);
}
-EXPORT_SYMBOL_GPL(__debug_show_held_locks);
-
-void debug_show_held_locks(struct task_struct *task)
-{
- __debug_show_held_locks(task);
-}
EXPORT_SYMBOL_GPL(debug_show_held_locks);
void lockdep_sys_exit(void)
#include <linux/async.h>
#include <linux/percpu.h>
#include <linux/kmemleak.h>
+#include <linux/jump_label.h>
#define CREATE_TRACE_POINTS
#include <trace/events/module.h>
{
struct module *mod = _mod;
list_del(&mod->list);
+ module_bug_cleanup(mod);
return 0;
}
sizeof(*mod->tracepoints),
&mod->num_tracepoints);
#endif
+#ifdef HAVE_JUMP_LABEL
+ mod->jump_entries = section_objs(info, "__jump_table",
+ sizeof(*mod->jump_entries),
+ &mod->num_jump_entries);
+#endif
#ifdef CONFIG_EVENT_TRACING
mod->trace_events = section_objs(info, "_ftrace_events",
sizeof(*mod->trace_events),
if (err < 0)
goto ddebug;
+ module_bug_finalize(info.hdr, info.sechdrs, mod);
list_add_rcu(&mod->list, &modules);
mutex_unlock(&module_mutex);
mutex_lock(&module_mutex);
/* Unlink carefully: kallsyms could be walking list. */
list_del_rcu(&mod->list);
+ module_bug_cleanup(mod);
+
ddebug:
if (!mod->taints)
dynamic_debug_remove(info.debug);
# include <asm/mutex.h>
#endif
-/***
- * mutex_init - initialize the mutex
- * @lock: the mutex to be initialized
- * @key: the lock_class_key for the class; used by mutex lock debugging
- *
- * Initialize the mutex to unlocked state.
- *
- * It is not allowed to initialize an already locked mutex.
- */
void
__mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
{
static __used noinline void __sched
__mutex_lock_slowpath(atomic_t *lock_count);
-/***
+/**
* mutex_lock - acquire the mutex
* @lock: the mutex to be acquired
*
static __used noinline void __sched __mutex_unlock_slowpath(atomic_t *lock_count);
-/***
+/**
* mutex_unlock - release the mutex
* @lock: the mutex to be released
*
static noinline int __sched
__mutex_lock_interruptible_slowpath(atomic_t *lock_count);
-/***
- * mutex_lock_interruptible - acquire the mutex, interruptable
+/**
+ * mutex_lock_interruptible - acquire the mutex, interruptible
* @lock: the mutex to be acquired
*
* Lock the mutex like mutex_lock(), and return 0 if the mutex has
return prev == 1;
}
-/***
- * mutex_trylock - try acquire the mutex, without waiting
+/**
+ * mutex_trylock - try to acquire the mutex, without waiting
* @lock: the mutex to be acquired
*
* Try to acquire the mutex atomically. Returns 1 if the mutex
* has been acquired successfully, and 0 on contention.
*
* NOTE: this function follows the spin_trylock() convention, so
- * it is negated to the down_trylock() return values! Be careful
+ * it is negated from the down_trylock() return values! Be careful
* about this when converting semaphore users to mutexes.
*
* This function must not be used in interrupt context. The
#include <linux/kernel_stat.h>
#include <linux/perf_event.h>
#include <linux/ftrace_event.h>
-#include <linux/hw_breakpoint.h>
#include <asm/irq_regs.h>
-/*
- * Each CPU has a list of per CPU events:
- */
-static DEFINE_PER_CPU(struct perf_cpu_context, perf_cpu_context);
-
-int perf_max_events __read_mostly = 1;
-static int perf_reserved_percpu __read_mostly;
-static int perf_overcommit __read_mostly = 1;
-
-static atomic_t nr_events __read_mostly;
+atomic_t perf_task_events __read_mostly;
static atomic_t nr_mmap_events __read_mostly;
static atomic_t nr_comm_events __read_mostly;
static atomic_t nr_task_events __read_mostly;
+static LIST_HEAD(pmus);
+static DEFINE_MUTEX(pmus_lock);
+static struct srcu_struct pmus_srcu;
+
/*
* perf event paranoia level:
* -1 - not paranoid at all
static atomic64_t perf_event_id;
-/*
- * Lock for (sysadmin-configurable) event reservations:
- */
-static DEFINE_SPINLOCK(perf_resource_lock);
+void __weak perf_event_print_debug(void) { }
-/*
- * Architecture provided APIs - weak aliases:
- */
-extern __weak const struct pmu *hw_perf_event_init(struct perf_event *event)
+extern __weak const char *perf_pmu_name(void)
{
- return NULL;
+ return "pmu";
}
-void __weak hw_perf_disable(void) { barrier(); }
-void __weak hw_perf_enable(void) { barrier(); }
-
-void __weak perf_event_print_debug(void) { }
-
-static DEFINE_PER_CPU(int, perf_disable_count);
+void perf_pmu_disable(struct pmu *pmu)
+{
+ int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ if (!(*count)++)
+ pmu->pmu_disable(pmu);
+}
-void perf_disable(void)
+void perf_pmu_enable(struct pmu *pmu)
{
- if (!__get_cpu_var(perf_disable_count)++)
- hw_perf_disable();
+ int *count = this_cpu_ptr(pmu->pmu_disable_count);
+ if (!--(*count))
+ pmu->pmu_enable(pmu);
}
-void perf_enable(void)
+static DEFINE_PER_CPU(struct list_head, rotation_list);
+
+/*
+ * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
+ * because they're strictly cpu affine and rotate_start is called with IRQs
+ * disabled, while rotate_context is called from IRQ context.
+ */
+static void perf_pmu_rotate_start(struct pmu *pmu)
{
- if (!--__get_cpu_var(perf_disable_count))
- hw_perf_enable();
+ struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+ struct list_head *head = &__get_cpu_var(rotation_list);
+
+ WARN_ON(!irqs_disabled());
+
+ if (list_empty(&cpuctx->rotation_list))
+ list_add(&cpuctx->rotation_list, head);
}
static void get_ctx(struct perf_event_context *ctx)
* the context could get moved to another task.
*/
static struct perf_event_context *
-perf_lock_task_context(struct task_struct *task, unsigned long *flags)
+perf_lock_task_context(struct task_struct *task, int ctxn, unsigned long *flags)
{
struct perf_event_context *ctx;
rcu_read_lock();
- retry:
- ctx = rcu_dereference(task->perf_event_ctxp);
+retry:
+ ctx = rcu_dereference(task->perf_event_ctxp[ctxn]);
if (ctx) {
/*
* If this context is a clone of another, it might
* can't get swapped on us any more.
*/
raw_spin_lock_irqsave(&ctx->lock, *flags);
- if (ctx != rcu_dereference(task->perf_event_ctxp)) {
+ if (ctx != rcu_dereference(task->perf_event_ctxp[ctxn])) {
raw_spin_unlock_irqrestore(&ctx->lock, *flags);
goto retry;
}
* can't get swapped to another task. This also increments its
* reference count so that the context can't get freed.
*/
-static struct perf_event_context *perf_pin_task_context(struct task_struct *task)
+static struct perf_event_context *
+perf_pin_task_context(struct task_struct *task, int ctxn)
{
struct perf_event_context *ctx;
unsigned long flags;
- ctx = perf_lock_task_context(task, &flags);
+ ctx = perf_lock_task_context(task, ctxn, &flags);
if (ctx) {
++ctx->pin_count;
raw_spin_unlock_irqrestore(&ctx->lock, flags);
}
list_add_rcu(&event->event_entry, &ctx->event_list);
+ if (!ctx->nr_events)
+ perf_pmu_rotate_start(ctx->pmu);
ctx->nr_events++;
if (event->attr.inherit_stat)
ctx->nr_stat++;
{
struct perf_event *group_leader = event->group_leader;
- WARN_ON_ONCE(event->attach_state & PERF_ATTACH_GROUP);
+ /*
+ * We can have double attach due to group movement in perf_event_open.
+ */
+ if (event->attach_state & PERF_ATTACH_GROUP)
+ return;
+
event->attach_state |= PERF_ATTACH_GROUP;
if (group_leader == event)
}
}
-static void
-event_sched_out(struct perf_event *event,
+static inline int
+event_filter_match(struct perf_event *event)
+{
+ return event->cpu == -1 || event->cpu == smp_processor_id();
+}
+
+static int
+__event_sched_out(struct perf_event *event,
struct perf_cpu_context *cpuctx,
struct perf_event_context *ctx)
{
+ u64 delta;
+ /*
+ * An event which could not be activated because of
+ * filter mismatch still needs to have its timings
+ * maintained, otherwise bogus information is return
+ * via read() for time_enabled, time_running:
+ */
+ if (event->state == PERF_EVENT_STATE_INACTIVE
+ && !event_filter_match(event)) {
+ delta = ctx->time - event->tstamp_stopped;
+ event->tstamp_running += delta;
+ event->tstamp_stopped = ctx->time;
+ }
+
if (event->state != PERF_EVENT_STATE_ACTIVE)
- return;
+ return 0;
event->state = PERF_EVENT_STATE_INACTIVE;
if (event->pending_disable) {
event->pending_disable = 0;
event->state = PERF_EVENT_STATE_OFF;
}
- event->tstamp_stopped = ctx->time;
- event->pmu->disable(event);
+ event->pmu->del(event, 0);
event->oncpu = -1;
if (!is_software_event(event))
ctx->nr_active--;
if (event->attr.exclusive || !cpuctx->active_oncpu)
cpuctx->exclusive = 0;
+ return 1;
+}
+
+static void
+event_sched_out(struct perf_event *event,
+ struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ int ret;
+
+ ret = __event_sched_out(event, cpuctx, ctx);
+ if (ret)
+ event->tstamp_stopped = ctx->time;
}
static void
struct perf_event_context *ctx)
{
struct perf_event *event;
-
- if (group_event->state != PERF_EVENT_STATE_ACTIVE)
- return;
+ int state = group_event->state;
event_sched_out(group_event, cpuctx, ctx);
list_for_each_entry(event, &group_event->sibling_list, group_entry)
event_sched_out(event, cpuctx, ctx);
- if (group_event->attr.exclusive)
+ if (state == PERF_EVENT_STATE_ACTIVE && group_event->attr.exclusive)
cpuctx->exclusive = 0;
}
+static inline struct perf_cpu_context *
+__get_cpu_context(struct perf_event_context *ctx)
+{
+ return this_cpu_ptr(ctx->pmu->pmu_cpu_context);
+}
+
/*
* Cross CPU call to remove a performance event
*
*/
static void __perf_event_remove_from_context(void *info)
{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
struct perf_event *event = info;
struct perf_event_context *ctx = event->ctx;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
/*
* If this is a task context, we need to check whether it is
return;
raw_spin_lock(&ctx->lock);
- /*
- * Protect the list operation against NMI by disabling the
- * events on a global level.
- */
- perf_disable();
event_sched_out(event, cpuctx, ctx);
list_del_event(event, ctx);
- if (!ctx->task) {
- /*
- * Allow more per task events with respect to the
- * reservation:
- */
- cpuctx->max_pertask =
- min(perf_max_events - ctx->nr_events,
- perf_max_events - perf_reserved_percpu);
- }
-
- perf_enable();
raw_spin_unlock(&ctx->lock);
}
static void __perf_event_disable(void *info)
{
struct perf_event *event = info;
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
struct perf_event_context *ctx = event->ctx;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
/*
* If this is a per-task event, need to check whether this
return;
}
- retry:
+retry:
task_oncpu_function_call(task, __perf_event_disable, event);
raw_spin_lock_irq(&ctx->lock);
}
static int
-event_sched_in(struct perf_event *event,
+__event_sched_in(struct perf_event *event,
struct perf_cpu_context *cpuctx,
struct perf_event_context *ctx)
{
*/
smp_wmb();
- if (event->pmu->enable(event)) {
+ if (event->pmu->add(event, PERF_EF_START)) {
event->state = PERF_EVENT_STATE_INACTIVE;
event->oncpu = -1;
return -EAGAIN;
}
- event->tstamp_running += ctx->time - event->tstamp_stopped;
-
if (!is_software_event(event))
cpuctx->active_oncpu++;
ctx->nr_active++;
return 0;
}
+static inline int
+event_sched_in(struct perf_event *event,
+ struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ int ret = __event_sched_in(event, cpuctx, ctx);
+ if (ret)
+ return ret;
+ event->tstamp_running += ctx->time - event->tstamp_stopped;
+ return 0;
+}
+
+static void
+group_commit_event_sched_in(struct perf_event *group_event,
+ struct perf_cpu_context *cpuctx,
+ struct perf_event_context *ctx)
+{
+ struct perf_event *event;
+ u64 now = ctx->time;
+
+ group_event->tstamp_running += now - group_event->tstamp_stopped;
+ /*
+ * Schedule in siblings as one group (if any):
+ */
+ list_for_each_entry(event, &group_event->sibling_list, group_entry) {
+ event->tstamp_running += now - event->tstamp_stopped;
+ }
+}
+
static int
group_sched_in(struct perf_event *group_event,
struct perf_cpu_context *cpuctx,
struct perf_event_context *ctx)
{
struct perf_event *event, *partial_group = NULL;
- const struct pmu *pmu = group_event->pmu;
- bool txn = false;
+ struct pmu *pmu = group_event->pmu;
if (group_event->state == PERF_EVENT_STATE_OFF)
return 0;
- /* Check if group transaction availabe */
- if (pmu->start_txn)
- txn = true;
-
- if (txn)
- pmu->start_txn(pmu);
+ pmu->start_txn(pmu);
- if (event_sched_in(group_event, cpuctx, ctx)) {
- if (txn)
- pmu->cancel_txn(pmu);
+ /*
+ * use __event_sched_in() to delay updating tstamp_running
+ * until the transaction is committed. In case of failure
+ * we will keep an unmodified tstamp_running which is a
+ * requirement to get correct timing information
+ */
+ if (__event_sched_in(group_event, cpuctx, ctx)) {
+ pmu->cancel_txn(pmu);
return -EAGAIN;
}
* Schedule in siblings as one group (if any):
*/
list_for_each_entry(event, &group_event->sibling_list, group_entry) {
- if (event_sched_in(event, cpuctx, ctx)) {
+ if (__event_sched_in(event, cpuctx, ctx)) {
partial_group = event;
goto group_error;
}
}
- if (!txn || !pmu->commit_txn(pmu))
+ if (!pmu->commit_txn(pmu)) {
+ /* commit tstamp_running */
+ group_commit_event_sched_in(group_event, cpuctx, ctx);
return 0;
-
+ }
group_error:
/*
* Groups can be scheduled in as one unit only, so undo any
* partial group before returning:
+ *
+ * use __event_sched_out() to avoid updating tstamp_stopped
+ * because the event never actually ran
*/
list_for_each_entry(event, &group_event->sibling_list, group_entry) {
if (event == partial_group)
break;
- event_sched_out(event, cpuctx, ctx);
+ __event_sched_out(event, cpuctx, ctx);
}
- event_sched_out(group_event, cpuctx, ctx);
+ __event_sched_out(group_event, cpuctx, ctx);
- if (txn)
- pmu->cancel_txn(pmu);
+ pmu->cancel_txn(pmu);
return -EAGAIN;
}
*/
static void __perf_install_in_context(void *info)
{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
struct perf_event *event = info;
struct perf_event_context *ctx = event->ctx;
struct perf_event *leader = event->group_leader;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
int err;
/*
ctx->is_active = 1;
update_context_time(ctx);
- /*
- * Protect the list operation against NMI by disabling the
- * events on a global level. NOP for non NMI based events.
- */
- perf_disable();
-
add_event_to_ctx(event, ctx);
if (event->cpu != -1 && event->cpu != smp_processor_id())
}
}
- if (!err && !ctx->task && cpuctx->max_pertask)
- cpuctx->max_pertask--;
-
- unlock:
- perf_enable();
-
+unlock:
raw_spin_unlock(&ctx->lock);
}
{
struct task_struct *task = ctx->task;
+ event->ctx = ctx;
+
if (!task) {
/*
* Per cpu events are installed via an smp call and
event->state = PERF_EVENT_STATE_INACTIVE;
event->tstamp_enabled = ctx->time - event->total_time_enabled;
- list_for_each_entry(sub, &event->sibling_list, group_entry)
- if (sub->state >= PERF_EVENT_STATE_INACTIVE)
+ list_for_each_entry(sub, &event->sibling_list, group_entry) {
+ if (sub->state >= PERF_EVENT_STATE_INACTIVE) {
sub->tstamp_enabled =
ctx->time - sub->total_time_enabled;
+ }
+ }
}
/*
static void __perf_event_enable(void *info)
{
struct perf_event *event = info;
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
struct perf_event_context *ctx = event->ctx;
struct perf_event *leader = event->group_leader;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
int err;
/*
if (!group_can_go_on(event, cpuctx, 1)) {
err = -EEXIST;
} else {
- perf_disable();
if (event == leader)
err = group_sched_in(event, cpuctx, ctx);
else
err = event_sched_in(event, cpuctx, ctx);
- perf_enable();
}
if (err) {
}
}
- unlock:
+unlock:
raw_spin_unlock(&ctx->lock);
}
if (event->state == PERF_EVENT_STATE_ERROR)
event->state = PERF_EVENT_STATE_OFF;
- retry:
+retry:
raw_spin_unlock_irq(&ctx->lock);
task_oncpu_function_call(task, __perf_event_enable, event);
if (event->state == PERF_EVENT_STATE_OFF)
__perf_event_mark_enabled(event, ctx);
- out:
+out:
raw_spin_unlock_irq(&ctx->lock);
}
struct perf_event *event;
raw_spin_lock(&ctx->lock);
+ perf_pmu_disable(ctx->pmu);
ctx->is_active = 0;
if (likely(!ctx->nr_events))
goto out;
update_context_time(ctx);
- perf_disable();
if (!ctx->nr_active)
- goto out_enable;
+ goto out;
- if (event_type & EVENT_PINNED)
+ if (event_type & EVENT_PINNED) {
list_for_each_entry(event, &ctx->pinned_groups, group_entry)
group_sched_out(event, cpuctx, ctx);
+ }
- if (event_type & EVENT_FLEXIBLE)
+ if (event_type & EVENT_FLEXIBLE) {
list_for_each_entry(event, &ctx->flexible_groups, group_entry)
group_sched_out(event, cpuctx, ctx);
-
- out_enable:
- perf_enable();
- out:
+ }
+out:
+ perf_pmu_enable(ctx->pmu);
raw_spin_unlock(&ctx->lock);
}
}
}
-/*
- * Called from scheduler to remove the events of the current task,
- * with interrupts disabled.
- *
- * We stop each event and update the event value in event->count.
- *
- * This does not protect us against NMI, but disable()
- * sets the disabled bit in the control field of event _before_
- * accessing the event control register. If a NMI hits, then it will
- * not restart the event.
- */
-void perf_event_task_sched_out(struct task_struct *task,
- struct task_struct *next)
+void perf_event_context_sched_out(struct task_struct *task, int ctxn,
+ struct task_struct *next)
{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
- struct perf_event_context *ctx = task->perf_event_ctxp;
+ struct perf_event_context *ctx = task->perf_event_ctxp[ctxn];
struct perf_event_context *next_ctx;
struct perf_event_context *parent;
+ struct perf_cpu_context *cpuctx;
int do_switch = 1;
- perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
+ if (likely(!ctx))
+ return;
- if (likely(!ctx || !cpuctx->task_ctx))
+ cpuctx = __get_cpu_context(ctx);
+ if (!cpuctx->task_ctx)
return;
rcu_read_lock();
parent = rcu_dereference(ctx->parent_ctx);
- next_ctx = next->perf_event_ctxp;
+ next_ctx = next->perf_event_ctxp[ctxn];
if (parent && next_ctx &&
rcu_dereference(next_ctx->parent_ctx) == parent) {
/*
* XXX do we need a memory barrier of sorts
* wrt to rcu_dereference() of perf_event_ctxp
*/
- task->perf_event_ctxp = next_ctx;
- next->perf_event_ctxp = ctx;
+ task->perf_event_ctxp[ctxn] = next_ctx;
+ next->perf_event_ctxp[ctxn] = ctx;
ctx->task = next;
next_ctx->task = task;
do_switch = 0;
}
}
+#define for_each_task_context_nr(ctxn) \
+ for ((ctxn) = 0; (ctxn) < perf_nr_task_contexts; (ctxn)++)
+
+/*
+ * Called from scheduler to remove the events of the current task,
+ * with interrupts disabled.
+ *
+ * We stop each event and update the event value in event->count.
+ *
+ * This does not protect us against NMI, but disable()
+ * sets the disabled bit in the control field of event _before_
+ * accessing the event control register. If a NMI hits, then it will
+ * not restart the event.
+ */
+void __perf_event_task_sched_out(struct task_struct *task,
+ struct task_struct *next)
+{
+ int ctxn;
+
+ perf_sw_event(PERF_COUNT_SW_CONTEXT_SWITCHES, 1, 1, NULL, 0);
+
+ for_each_task_context_nr(ctxn)
+ perf_event_context_sched_out(task, ctxn, next);
+}
+
static void task_ctx_sched_out(struct perf_event_context *ctx,
enum event_type_t event_type)
{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
if (!cpuctx->task_ctx)
return;
/*
* Called with IRQs disabled
*/
-static void __perf_event_task_sched_out(struct perf_event_context *ctx)
-{
- task_ctx_sched_out(ctx, EVENT_ALL);
-}
-
-/*
- * Called with IRQs disabled
- */
static void cpu_ctx_sched_out(struct perf_cpu_context *cpuctx,
enum event_type_t event_type)
{
if (event->cpu != -1 && event->cpu != smp_processor_id())
continue;
- if (group_can_go_on(event, cpuctx, can_add_hw))
+ if (group_can_go_on(event, cpuctx, can_add_hw)) {
if (group_sched_in(event, cpuctx, ctx))
can_add_hw = 0;
+ }
}
}
ctx->timestamp = perf_clock();
- perf_disable();
-
/*
* First go through the list and put on any pinned groups
* in order to give them the best chance of going on.
if (event_type & EVENT_FLEXIBLE)
ctx_flexible_sched_in(ctx, cpuctx);
- perf_enable();
- out:
+out:
raw_spin_unlock(&ctx->lock);
}
ctx_sched_in(ctx, cpuctx, event_type);
}
-static void task_ctx_sched_in(struct task_struct *task,
+static void task_ctx_sched_in(struct perf_event_context *ctx,
enum event_type_t event_type)
{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
- struct perf_event_context *ctx = task->perf_event_ctxp;
+ struct perf_cpu_context *cpuctx;
- if (likely(!ctx))
- return;
+ cpuctx = __get_cpu_context(ctx);
if (cpuctx->task_ctx == ctx)
return;
+
ctx_sched_in(ctx, cpuctx, event_type);
cpuctx->task_ctx = ctx;
}
-/*
- * Called from scheduler to add the events of the current task
- * with interrupts disabled.
- *
- * We restore the event value and then enable it.
- *
- * This does not protect us against NMI, but enable()
- * sets the enabled bit in the control field of event _before_
- * accessing the event control register. If a NMI hits, then it will
- * keep the event running.
- */
-void perf_event_task_sched_in(struct task_struct *task)
-{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
- struct perf_event_context *ctx = task->perf_event_ctxp;
- if (likely(!ctx))
- return;
+void perf_event_context_sched_in(struct perf_event_context *ctx)
+{
+ struct perf_cpu_context *cpuctx;
+ cpuctx = __get_cpu_context(ctx);
if (cpuctx->task_ctx == ctx)
return;
- perf_disable();
-
+ perf_pmu_disable(ctx->pmu);
/*
* We want to keep the following priority order:
* cpu pinned (that don't need to move), task pinned,
cpuctx->task_ctx = ctx;
- perf_enable();
+ /*
+ * Since these rotations are per-cpu, we need to ensure the
+ * cpu-context we got scheduled on is actually rotating.
+ */
+ perf_pmu_rotate_start(ctx->pmu);
+ perf_pmu_enable(ctx->pmu);
+}
+
+/*
+ * Called from scheduler to add the events of the current task
+ * with interrupts disabled.
+ *
+ * We restore the event value and then enable it.
+ *
+ * This does not protect us against NMI, but enable()
+ * sets the enabled bit in the control field of event _before_
+ * accessing the event control register. If a NMI hits, then it will
+ * keep the event running.
+ */
+void __perf_event_task_sched_in(struct task_struct *task)
+{
+ struct perf_event_context *ctx;
+ int ctxn;
+
+ for_each_task_context_nr(ctxn) {
+ ctx = task->perf_event_ctxp[ctxn];
+ if (likely(!ctx))
+ continue;
+
+ perf_event_context_sched_in(ctx);
+ }
}
#define MAX_INTERRUPTS (~0ULL)
return div64_u64(dividend, divisor);
}
-static void perf_event_stop(struct perf_event *event)
-{
- if (!event->pmu->stop)
- return event->pmu->disable(event);
-
- return event->pmu->stop(event);
-}
-
-static int perf_event_start(struct perf_event *event)
-{
- if (!event->pmu->start)
- return event->pmu->enable(event);
-
- return event->pmu->start(event);
-}
-
static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count)
{
struct hw_perf_event *hwc = &event->hw;
hwc->sample_period = sample_period;
if (local64_read(&hwc->period_left) > 8*sample_period) {
- perf_disable();
- perf_event_stop(event);
+ event->pmu->stop(event, PERF_EF_UPDATE);
local64_set(&hwc->period_left, 0);
- perf_event_start(event);
- perf_enable();
+ event->pmu->start(event, PERF_EF_RELOAD);
}
}
-static void perf_ctx_adjust_freq(struct perf_event_context *ctx)
+static void perf_ctx_adjust_freq(struct perf_event_context *ctx, u64 period)
{
struct perf_event *event;
struct hw_perf_event *hwc;
*/
if (interrupts == MAX_INTERRUPTS) {
perf_log_throttle(event, 1);
- perf_disable();
- event->pmu->unthrottle(event);
- perf_enable();
+ event->pmu->start(event, 0);
}
if (!event->attr.freq || !event->attr.sample_freq)
continue;
- perf_disable();
event->pmu->read(event);
now = local64_read(&event->count);
delta = now - hwc->freq_count_stamp;
hwc->freq_count_stamp = now;
if (delta > 0)
- perf_adjust_period(event, TICK_NSEC, delta);
- perf_enable();
+ perf_adjust_period(event, period, delta);
}
raw_spin_unlock(&ctx->lock);
}
raw_spin_unlock(&ctx->lock);
}
-void perf_event_task_tick(struct task_struct *curr)
+/*
+ * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
+ * because they're strictly cpu affine and rotate_start is called with IRQs
+ * disabled, while rotate_context is called from IRQ context.
+ */
+static void perf_rotate_context(struct perf_cpu_context *cpuctx)
{
- struct perf_cpu_context *cpuctx;
- struct perf_event_context *ctx;
- int rotate = 0;
-
- if (!atomic_read(&nr_events))
- return;
+ u64 interval = (u64)cpuctx->jiffies_interval * TICK_NSEC;
+ struct perf_event_context *ctx = NULL;
+ int rotate = 0, remove = 1;
- cpuctx = &__get_cpu_var(perf_cpu_context);
- if (cpuctx->ctx.nr_events &&
- cpuctx->ctx.nr_events != cpuctx->ctx.nr_active)
- rotate = 1;
+ if (cpuctx->ctx.nr_events) {
+ remove = 0;
+ if (cpuctx->ctx.nr_events != cpuctx->ctx.nr_active)
+ rotate = 1;
+ }
- ctx = curr->perf_event_ctxp;
- if (ctx && ctx->nr_events && ctx->nr_events != ctx->nr_active)
- rotate = 1;
+ ctx = cpuctx->task_ctx;
+ if (ctx && ctx->nr_events) {
+ remove = 0;
+ if (ctx->nr_events != ctx->nr_active)
+ rotate = 1;
+ }
- perf_ctx_adjust_freq(&cpuctx->ctx);
+ perf_pmu_disable(cpuctx->ctx.pmu);
+ perf_ctx_adjust_freq(&cpuctx->ctx, interval);
if (ctx)
- perf_ctx_adjust_freq(ctx);
+ perf_ctx_adjust_freq(ctx, interval);
if (!rotate)
- return;
+ goto done;
- perf_disable();
cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
if (ctx)
task_ctx_sched_out(ctx, EVENT_FLEXIBLE);
cpu_ctx_sched_in(cpuctx, EVENT_FLEXIBLE);
if (ctx)
- task_ctx_sched_in(curr, EVENT_FLEXIBLE);
- perf_enable();
+ task_ctx_sched_in(ctx, EVENT_FLEXIBLE);
+
+done:
+ if (remove)
+ list_del_init(&cpuctx->rotation_list);
+
+ perf_pmu_enable(cpuctx->ctx.pmu);
+}
+
+void perf_event_task_tick(void)
+{
+ struct list_head *head = &__get_cpu_var(rotation_list);
+ struct perf_cpu_context *cpuctx, *tmp;
+
+ WARN_ON(!irqs_disabled());
+
+ list_for_each_entry_safe(cpuctx, tmp, head, rotation_list) {
+ if (cpuctx->jiffies_interval == 1 ||
+ !(jiffies % cpuctx->jiffies_interval))
+ perf_rotate_context(cpuctx);
+ }
}
static int event_enable_on_exec(struct perf_event *event,
* Enable all of a task's events that have been marked enable-on-exec.
* This expects task == current.
*/
-static void perf_event_enable_on_exec(struct task_struct *task)
+static void perf_event_enable_on_exec(struct perf_event_context *ctx)
{
- struct perf_event_context *ctx;
struct perf_event *event;
unsigned long flags;
int enabled = 0;
int ret;
local_irq_save(flags);
- ctx = task->perf_event_ctxp;
if (!ctx || !ctx->nr_events)
goto out;
- __perf_event_task_sched_out(ctx);
+ task_ctx_sched_out(ctx, EVENT_ALL);
raw_spin_lock(&ctx->lock);
raw_spin_unlock(&ctx->lock);
- perf_event_task_sched_in(task);
- out:
+ perf_event_context_sched_in(ctx);
+out:
local_irq_restore(flags);
}
*/
static void __perf_event_read(void *info)
{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
struct perf_event *event = info;
struct perf_event_context *ctx = event->ctx;
+ struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
/*
* If this is a task context, we need to check whether it is
unsigned long flags;
raw_spin_lock_irqsave(&ctx->lock, flags);
- update_context_time(ctx);
+ /*
+ * may read while context is not active
+ * (e.g., thread is blocked), in that case
+ * we cannot update context time
+ */
+ if (ctx->is_active)
+ update_context_time(ctx);
update_event_times(event);
raw_spin_unlock_irqrestore(&ctx->lock, flags);
}
}
/*
- * Initialize the perf_event context in a task_struct:
+ * Callchain support
*/
-static void
-__perf_event_init_context(struct perf_event_context *ctx,
- struct task_struct *task)
+
+struct callchain_cpus_entries {
+ struct rcu_head rcu_head;
+ struct perf_callchain_entry *cpu_entries[0];
+};
+
+static DEFINE_PER_CPU(int, callchain_recursion[PERF_NR_CONTEXTS]);
+static atomic_t nr_callchain_events;
+static DEFINE_MUTEX(callchain_mutex);
+struct callchain_cpus_entries *callchain_cpus_entries;
+
+
+__weak void perf_callchain_kernel(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
{
- raw_spin_lock_init(&ctx->lock);
- mutex_init(&ctx->mutex);
- INIT_LIST_HEAD(&ctx->pinned_groups);
- INIT_LIST_HEAD(&ctx->flexible_groups);
- INIT_LIST_HEAD(&ctx->event_list);
- atomic_set(&ctx->refcount, 1);
- ctx->task = task;
}
-static struct perf_event_context *find_get_context(pid_t pid, int cpu)
+__weak void perf_callchain_user(struct perf_callchain_entry *entry,
+ struct pt_regs *regs)
{
- struct perf_event_context *ctx;
- struct perf_cpu_context *cpuctx;
- struct task_struct *task;
- unsigned long flags;
- int err;
+}
- if (pid == -1 && cpu != -1) {
- /* Must be root to operate on a CPU event: */
- if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
- return ERR_PTR(-EACCES);
+static void release_callchain_buffers_rcu(struct rcu_head *head)
+{
+ struct callchain_cpus_entries *entries;
+ int cpu;
- if (cpu < 0 || cpu >= nr_cpumask_bits)
- return ERR_PTR(-EINVAL);
+ entries = container_of(head, struct callchain_cpus_entries, rcu_head);
- /*
- * We could be clever and allow to attach a event to an
- * offline CPU and activate it when the CPU comes up, but
- * that's for later.
- */
- if (!cpu_online(cpu))
- return ERR_PTR(-ENODEV);
+ for_each_possible_cpu(cpu)
+ kfree(entries->cpu_entries[cpu]);
- cpuctx = &per_cpu(perf_cpu_context, cpu);
- ctx = &cpuctx->ctx;
- get_ctx(ctx);
+ kfree(entries);
+}
- return ctx;
+static void release_callchain_buffers(void)
+{
+ struct callchain_cpus_entries *entries;
+
+ entries = callchain_cpus_entries;
+ rcu_assign_pointer(callchain_cpus_entries, NULL);
+ call_rcu(&entries->rcu_head, release_callchain_buffers_rcu);
+}
+
+static int alloc_callchain_buffers(void)
+{
+ int cpu;
+ int size;
+ struct callchain_cpus_entries *entries;
+
+ /*
+ * We can't use the percpu allocation API for data that can be
+ * accessed from NMI. Use a temporary manual per cpu allocation
+ * until that gets sorted out.
+ */
+ size = sizeof(*entries) + sizeof(struct perf_callchain_entry *) *
+ num_possible_cpus();
+
+ entries = kzalloc(size, GFP_KERNEL);
+ if (!entries)
+ return -ENOMEM;
+
+ size = sizeof(struct perf_callchain_entry) * PERF_NR_CONTEXTS;
+
+ for_each_possible_cpu(cpu) {
+ entries->cpu_entries[cpu] = kmalloc_node(size, GFP_KERNEL,
+ cpu_to_node(cpu));
+ if (!entries->cpu_entries[cpu])
+ goto fail;
+ }
+
+ rcu_assign_pointer(callchain_cpus_entries, entries);
+
+ return 0;
+
+fail:
+ for_each_possible_cpu(cpu)
+ kfree(entries->cpu_entries[cpu]);
+ kfree(entries);
+
+ return -ENOMEM;
+}
+
+static int get_callchain_buffers(void)
+{
+ int err = 0;
+ int count;
+
+ mutex_lock(&callchain_mutex);
+
+ count = atomic_inc_return(&nr_callchain_events);
+ if (WARN_ON_ONCE(count < 1)) {
+ err = -EINVAL;
+ goto exit;
+ }
+
+ if (count > 1) {
+ /* If the allocation failed, give up */
+ if (!callchain_cpus_entries)
+ err = -ENOMEM;
+ goto exit;
+ }
+
+ err = alloc_callchain_buffers();
+ if (err)
+ release_callchain_buffers();
+exit:
+ mutex_unlock(&callchain_mutex);
+
+ return err;
+}
+
+static void put_callchain_buffers(void)
+{
+ if (atomic_dec_and_mutex_lock(&nr_callchain_events, &callchain_mutex)) {
+ release_callchain_buffers();
+ mutex_unlock(&callchain_mutex);
+ }
+}
+
+static int get_recursion_context(int *recursion)
+{
+ int rctx;
+
+ if (in_nmi())
+ rctx = 3;
+ else if (in_irq())
+ rctx = 2;
+ else if (in_softirq())
+ rctx = 1;
+ else
+ rctx = 0;
+
+ if (recursion[rctx])
+ return -1;
+
+ recursion[rctx]++;
+ barrier();
+
+ return rctx;
+}
+
+static inline void put_recursion_context(int *recursion, int rctx)
+{
+ barrier();
+ recursion[rctx]--;
+}
+
+static struct perf_callchain_entry *get_callchain_entry(int *rctx)
+{
+ int cpu;
+ struct callchain_cpus_entries *entries;
+
+ *rctx = get_recursion_context(__get_cpu_var(callchain_recursion));
+ if (*rctx == -1)
+ return NULL;
+
+ entries = rcu_dereference(callchain_cpus_entries);
+ if (!entries)
+ return NULL;
+
+ cpu = smp_processor_id();
+
+ return &entries->cpu_entries[cpu][*rctx];
+}
+
+static void
+put_callchain_entry(int rctx)
+{
+ put_recursion_context(__get_cpu_var(callchain_recursion), rctx);
+}
+
+static struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
+{
+ int rctx;
+ struct perf_callchain_entry *entry;
+
+
+ entry = get_callchain_entry(&rctx);
+ if (rctx == -1)
+ return NULL;
+
+ if (!entry)
+ goto exit_put;
+
+ entry->nr = 0;
+
+ if (!user_mode(regs)) {
+ perf_callchain_store(entry, PERF_CONTEXT_KERNEL);
+ perf_callchain_kernel(entry, regs);
+ if (current->mm)
+ regs = task_pt_regs(current);
+ else
+ regs = NULL;
+ }
+
+ if (regs) {
+ perf_callchain_store(entry, PERF_CONTEXT_USER);
+ perf_callchain_user(entry, regs);
+ }
+
+exit_put:
+ put_callchain_entry(rctx);
+
+ return entry;
+}
+
+/*
+ * Initialize the perf_event context in a task_struct:
+ */
+static void __perf_event_init_context(struct perf_event_context *ctx)
+{
+ raw_spin_lock_init(&ctx->lock);
+ mutex_init(&ctx->mutex);
+ INIT_LIST_HEAD(&ctx->pinned_groups);
+ INIT_LIST_HEAD(&ctx->flexible_groups);
+ INIT_LIST_HEAD(&ctx->event_list);
+ atomic_set(&ctx->refcount, 1);
+}
+
+static struct perf_event_context *
+alloc_perf_context(struct pmu *pmu, struct task_struct *task)
+{
+ struct perf_event_context *ctx;
+
+ ctx = kzalloc(sizeof(struct perf_event_context), GFP_KERNEL);
+ if (!ctx)
+ return NULL;
+
+ __perf_event_init_context(ctx);
+ if (task) {
+ ctx->task = task;
+ get_task_struct(task);
}
+ ctx->pmu = pmu;
+
+ return ctx;
+}
+
+static struct task_struct *
+find_lively_task_by_vpid(pid_t vpid)
+{
+ struct task_struct *task;
+ int err;
rcu_read_lock();
- if (!pid)
+ if (!vpid)
task = current;
else
- task = find_task_by_vpid(pid);
+ task = find_task_by_vpid(vpid);
if (task)
get_task_struct(task);
rcu_read_unlock();
if (!ptrace_may_access(task, PTRACE_MODE_READ))
goto errout;
- retry:
- ctx = perf_lock_task_context(task, &flags);
+ return task;
+errout:
+ put_task_struct(task);
+ return ERR_PTR(err);
+
+}
+
+static struct perf_event_context *
+find_get_context(struct pmu *pmu, struct task_struct *task, int cpu)
+{
+ struct perf_event_context *ctx;
+ struct perf_cpu_context *cpuctx;
+ unsigned long flags;
+ int ctxn, err;
+
+ if (!task && cpu != -1) {
+ /* Must be root to operate on a CPU event: */
+ if (perf_paranoid_cpu() && !capable(CAP_SYS_ADMIN))
+ return ERR_PTR(-EACCES);
+
+ if (cpu < 0 || cpu >= nr_cpumask_bits)
+ return ERR_PTR(-EINVAL);
+
+ /*
+ * We could be clever and allow to attach a event to an
+ * offline CPU and activate it when the CPU comes up, but
+ * that's for later.
+ */
+ if (!cpu_online(cpu))
+ return ERR_PTR(-ENODEV);
+
+ cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+ ctx = &cpuctx->ctx;
+ get_ctx(ctx);
+
+ return ctx;
+ }
+
+ err = -EINVAL;
+ ctxn = pmu->task_ctx_nr;
+ if (ctxn < 0)
+ goto errout;
+
+retry:
+ ctx = perf_lock_task_context(task, ctxn, &flags);
if (ctx) {
unclone_ctx(ctx);
raw_spin_unlock_irqrestore(&ctx->lock, flags);
}
if (!ctx) {
- ctx = kzalloc(sizeof(struct perf_event_context), GFP_KERNEL);
+ ctx = alloc_perf_context(pmu, task);
err = -ENOMEM;
if (!ctx)
goto errout;
- __perf_event_init_context(ctx, task);
+
get_ctx(ctx);
- if (cmpxchg(&task->perf_event_ctxp, NULL, ctx)) {
+
+ if (cmpxchg(&task->perf_event_ctxp[ctxn], NULL, ctx)) {
/*
* We raced with some other task; use
* the context they set.
*/
+ put_task_struct(task);
kfree(ctx);
goto retry;
}
- get_task_struct(task);
}
- put_task_struct(task);
return ctx;
- errout:
- put_task_struct(task);
+errout:
return ERR_PTR(err);
}
kfree(event);
}
-static void perf_pending_sync(struct perf_event *event);
static void perf_buffer_put(struct perf_buffer *buffer);
static void free_event(struct perf_event *event)
{
- perf_pending_sync(event);
+ irq_work_sync(&event->pending);
if (!event->parent) {
- atomic_dec(&nr_events);
+ if (event->attach_state & PERF_ATTACH_TASK)
+ jump_label_dec(&perf_task_events);
if (event->attr.mmap || event->attr.mmap_data)
atomic_dec(&nr_mmap_events);
if (event->attr.comm)
atomic_dec(&nr_comm_events);
if (event->attr.task)
atomic_dec(&nr_task_events);
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN)
+ put_callchain_buffers();
}
if (event->buffer) {
if (event->destroy)
event->destroy(event);
- put_ctx(event->ctx);
+ if (event->ctx)
+ put_ctx(event->ctx);
+
call_rcu(&event->rcu_head, free_event_rcu);
}
static int perf_event_period(struct perf_event *event, u64 __user *arg)
{
struct perf_event_context *ctx = event->ctx;
- unsigned long size;
int ret = 0;
u64 value;
if (!event->attr.sample_period)
return -EINVAL;
- size = copy_from_user(&value, arg, sizeof(value));
- if (size != sizeof(value))
+ if (copy_from_user(&value, arg, sizeof(value)))
return -EFAULT;
if (!value)
static int perf_event_index(struct perf_event *event)
{
+ if (event->hw.state & PERF_HES_STOPPED)
+ return 0;
+
if (event->state != PERF_EVENT_STATE_ACTIVE)
return 0;
}
}
-/*
- * Pending wakeups
- *
- * Handle the case where we need to wakeup up from NMI (or rq->lock) context.
- *
- * The NMI bit means we cannot possibly take locks. Therefore, maintain a
- * single linked list and use cmpxchg() to add entries lockless.
- */
-
-static void perf_pending_event(struct perf_pending_entry *entry)
+static void perf_pending_event(struct irq_work *entry)
{
struct perf_event *event = container_of(entry,
struct perf_event, pending);
}
}
-#define PENDING_TAIL ((struct perf_pending_entry *)-1UL)
-
-static DEFINE_PER_CPU(struct perf_pending_entry *, perf_pending_head) = {
- PENDING_TAIL,
-};
-
-static void perf_pending_queue(struct perf_pending_entry *entry,
- void (*func)(struct perf_pending_entry *))
-{
- struct perf_pending_entry **head;
-
- if (cmpxchg(&entry->next, NULL, PENDING_TAIL) != NULL)
- return;
-
- entry->func = func;
-
- head = &get_cpu_var(perf_pending_head);
-
- do {
- entry->next = *head;
- } while (cmpxchg(head, entry->next, entry) != entry->next);
-
- set_perf_event_pending();
-
- put_cpu_var(perf_pending_head);
-}
-
-static int __perf_pending_run(void)
-{
- struct perf_pending_entry *list;
- int nr = 0;
-
- list = xchg(&__get_cpu_var(perf_pending_head), PENDING_TAIL);
- while (list != PENDING_TAIL) {
- void (*func)(struct perf_pending_entry *);
- struct perf_pending_entry *entry = list;
-
- list = list->next;
-
- func = entry->func;
- entry->next = NULL;
- /*
- * Ensure we observe the unqueue before we issue the wakeup,
- * so that we won't be waiting forever.
- * -- see perf_not_pending().
- */
- smp_wmb();
-
- func(entry);
- nr++;
- }
-
- return nr;
-}
-
-static inline int perf_not_pending(struct perf_event *event)
-{
- /*
- * If we flush on whatever cpu we run, there is a chance we don't
- * need to wait.
- */
- get_cpu();
- __perf_pending_run();
- put_cpu();
-
- /*
- * Ensure we see the proper queue state before going to sleep
- * so that we do not miss the wakeup. -- see perf_pending_handle()
- */
- smp_rmb();
- return event->pending.next == NULL;
-}
-
-static void perf_pending_sync(struct perf_event *event)
-{
- wait_event(event->waitq, perf_not_pending(event));
-}
-
-void perf_event_do_pending(void)
-{
- __perf_pending_run();
-}
-
-/*
- * Callchain support -- arch specific
- */
-
-__weak struct perf_callchain_entry *perf_callchain(struct pt_regs *regs)
-{
- return NULL;
-}
-
-
/*
* We assume there is only KVM supporting the callbacks.
* Later on, we might change it to a list if there is
if (handle->nmi) {
handle->event->pending_wakeup = 1;
- perf_pending_queue(&handle->event->pending,
- perf_pending_event);
+ irq_work_queue(&handle->event->pending);
} else
perf_event_wakeup(handle->event);
}
if (handle->wakeup != local_read(&buffer->wakeup))
perf_output_wakeup(handle);
- out:
+out:
preempt_enable();
}
struct perf_output_handle handle;
struct perf_event_header header;
+ /* protect the callchain buffers */
+ rcu_read_lock();
+
perf_prepare_sample(&header, data, event, regs);
if (perf_output_begin(&handle, event, header.size, nmi, 1))
- return;
+ goto exit;
perf_output_sample(&handle, &header, data, event);
perf_output_end(&handle);
+
+exit:
+ rcu_read_unlock();
}
/*
static void perf_event_task_event(struct perf_task_event *task_event)
{
struct perf_cpu_context *cpuctx;
- struct perf_event_context *ctx = task_event->task_ctx;
+ struct perf_event_context *ctx;
+ struct pmu *pmu;
+ int ctxn;
rcu_read_lock();
- cpuctx = &get_cpu_var(perf_cpu_context);
- perf_event_task_ctx(&cpuctx->ctx, task_event);
- if (!ctx)
- ctx = rcu_dereference(current->perf_event_ctxp);
- if (ctx)
- perf_event_task_ctx(ctx, task_event);
- put_cpu_var(perf_cpu_context);
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+ perf_event_task_ctx(&cpuctx->ctx, task_event);
+
+ ctx = task_event->task_ctx;
+ if (!ctx) {
+ ctxn = pmu->task_ctx_nr;
+ if (ctxn < 0)
+ goto next;
+ ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
+ }
+ if (ctx)
+ perf_event_task_ctx(ctx, task_event);
+next:
+ put_cpu_ptr(pmu->pmu_cpu_context);
+ }
rcu_read_unlock();
}
{
struct perf_cpu_context *cpuctx;
struct perf_event_context *ctx;
- unsigned int size;
char comm[TASK_COMM_LEN];
+ unsigned int size;
+ struct pmu *pmu;
+ int ctxn;
memset(comm, 0, sizeof(comm));
strlcpy(comm, comm_event->task->comm, sizeof(comm));
comm_event->event_id.header.size = sizeof(comm_event->event_id) + size;
rcu_read_lock();
- cpuctx = &get_cpu_var(perf_cpu_context);
- perf_event_comm_ctx(&cpuctx->ctx, comm_event);
- ctx = rcu_dereference(current->perf_event_ctxp);
- if (ctx)
- perf_event_comm_ctx(ctx, comm_event);
- put_cpu_var(perf_cpu_context);
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+ perf_event_comm_ctx(&cpuctx->ctx, comm_event);
+
+ ctxn = pmu->task_ctx_nr;
+ if (ctxn < 0)
+ goto next;
+
+ ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
+ if (ctx)
+ perf_event_comm_ctx(ctx, comm_event);
+next:
+ put_cpu_ptr(pmu->pmu_cpu_context);
+ }
rcu_read_unlock();
}
void perf_event_comm(struct task_struct *task)
{
struct perf_comm_event comm_event;
+ struct perf_event_context *ctx;
+ int ctxn;
+
+ for_each_task_context_nr(ctxn) {
+ ctx = task->perf_event_ctxp[ctxn];
+ if (!ctx)
+ continue;
- if (task->perf_event_ctxp)
- perf_event_enable_on_exec(task);
+ perf_event_enable_on_exec(ctx);
+ }
if (!atomic_read(&nr_comm_events))
return;
char tmp[16];
char *buf = NULL;
const char *name;
+ struct pmu *pmu;
+ int ctxn;
memset(tmp, 0, sizeof(tmp));
mmap_event->event_id.header.size = sizeof(mmap_event->event_id) + size;
rcu_read_lock();
- cpuctx = &get_cpu_var(perf_cpu_context);
- perf_event_mmap_ctx(&cpuctx->ctx, mmap_event, vma->vm_flags & VM_EXEC);
- ctx = rcu_dereference(current->perf_event_ctxp);
- if (ctx)
- perf_event_mmap_ctx(ctx, mmap_event, vma->vm_flags & VM_EXEC);
- put_cpu_var(perf_cpu_context);
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ cpuctx = get_cpu_ptr(pmu->pmu_cpu_context);
+ perf_event_mmap_ctx(&cpuctx->ctx, mmap_event,
+ vma->vm_flags & VM_EXEC);
+
+ ctxn = pmu->task_ctx_nr;
+ if (ctxn < 0)
+ goto next;
+
+ ctx = rcu_dereference(current->perf_event_ctxp[ctxn]);
+ if (ctx) {
+ perf_event_mmap_ctx(ctx, mmap_event,
+ vma->vm_flags & VM_EXEC);
+ }
+next:
+ put_cpu_ptr(pmu->pmu_cpu_context);
+ }
rcu_read_unlock();
kfree(buf);
struct hw_perf_event *hwc = &event->hw;
int ret = 0;
- throttle = (throttle && event->pmu->unthrottle != NULL);
-
if (!throttle) {
hwc->interrupts++;
} else {
event->pending_kill = POLL_HUP;
if (nmi) {
event->pending_disable = 1;
- perf_pending_queue(&event->pending,
- perf_pending_event);
+ irq_work_queue(&event->pending);
} else
perf_event_disable(event);
}
* Generic software event infrastructure
*/
+struct swevent_htable {
+ struct swevent_hlist *swevent_hlist;
+ struct mutex hlist_mutex;
+ int hlist_refcount;
+
+ /* Recursion avoidance in each contexts */
+ int recursion[PERF_NR_CONTEXTS];
+};
+
+static DEFINE_PER_CPU(struct swevent_htable, swevent_htable);
+
/*
* We directly increment event->count and keep a second value in
* event->hw.period_left to count intervals. This period event
}
}
-static void perf_swevent_add(struct perf_event *event, u64 nr,
+static void perf_swevent_event(struct perf_event *event, u64 nr,
int nmi, struct perf_sample_data *data,
struct pt_regs *regs)
{
static int perf_exclude_event(struct perf_event *event,
struct pt_regs *regs)
{
+ if (event->hw.state & PERF_HES_STOPPED)
+ return 0;
+
if (regs) {
if (event->attr.exclude_user && user_mode(regs))
return 1;
/* For the read side: events when they trigger */
static inline struct hlist_head *
-find_swevent_head_rcu(struct perf_cpu_context *ctx, u64 type, u32 event_id)
+find_swevent_head_rcu(struct swevent_htable *swhash, u64 type, u32 event_id)
{
struct swevent_hlist *hlist;
- hlist = rcu_dereference(ctx->swevent_hlist);
+ hlist = rcu_dereference(swhash->swevent_hlist);
if (!hlist)
return NULL;
/* For the event head insertion and removal in the hlist */
static inline struct hlist_head *
-find_swevent_head(struct perf_cpu_context *ctx, struct perf_event *event)
+find_swevent_head(struct swevent_htable *swhash, struct perf_event *event)
{
struct swevent_hlist *hlist;
u32 event_id = event->attr.config;
* and release. Which makes the protected version suitable here.
* The context lock guarantees that.
*/
- hlist = rcu_dereference_protected(ctx->swevent_hlist,
+ hlist = rcu_dereference_protected(swhash->swevent_hlist,
lockdep_is_held(&event->ctx->lock));
if (!hlist)
return NULL;
struct perf_sample_data *data,
struct pt_regs *regs)
{
- struct perf_cpu_context *cpuctx;
+ struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
struct perf_event *event;
struct hlist_node *node;
struct hlist_head *head;
- cpuctx = &__get_cpu_var(perf_cpu_context);
-
rcu_read_lock();
-
- head = find_swevent_head_rcu(cpuctx, type, event_id);
-
+ head = find_swevent_head_rcu(swhash, type, event_id);
if (!head)
goto end;
hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
if (perf_swevent_match(event, type, event_id, data, regs))
- perf_swevent_add(event, nr, nmi, data, regs);
+ perf_swevent_event(event, nr, nmi, data, regs);
}
end:
rcu_read_unlock();
int perf_swevent_get_recursion_context(void)
{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
- int rctx;
-
- if (in_nmi())
- rctx = 3;
- else if (in_irq())
- rctx = 2;
- else if (in_softirq())
- rctx = 1;
- else
- rctx = 0;
-
- if (cpuctx->recursion[rctx])
- return -1;
-
- cpuctx->recursion[rctx]++;
- barrier();
+ struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
- return rctx;
+ return get_recursion_context(swhash->recursion);
}
EXPORT_SYMBOL_GPL(perf_swevent_get_recursion_context);
void inline perf_swevent_put_recursion_context(int rctx)
{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
- barrier();
- cpuctx->recursion[rctx]--;
+ struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
+
+ put_recursion_context(swhash->recursion, rctx);
}
void __perf_sw_event(u32 event_id, u64 nr, int nmi,
{
}
-static int perf_swevent_enable(struct perf_event *event)
+static int perf_swevent_add(struct perf_event *event, int flags)
{
+ struct swevent_htable *swhash = &__get_cpu_var(swevent_htable);
struct hw_perf_event *hwc = &event->hw;
- struct perf_cpu_context *cpuctx;
struct hlist_head *head;
- cpuctx = &__get_cpu_var(perf_cpu_context);
-
if (hwc->sample_period) {
hwc->last_period = hwc->sample_period;
perf_swevent_set_period(event);
}
- head = find_swevent_head(cpuctx, event);
+ hwc->state = !(flags & PERF_EF_START);
+
+ head = find_swevent_head(swhash, event);
if (WARN_ON_ONCE(!head))
return -EINVAL;
return 0;
}
-static void perf_swevent_disable(struct perf_event *event)
+static void perf_swevent_del(struct perf_event *event, int flags)
{
hlist_del_rcu(&event->hlist_entry);
}
-static void perf_swevent_void(struct perf_event *event)
-{
-}
-
-static int perf_swevent_int(struct perf_event *event)
-{
- return 0;
-}
-
-static const struct pmu perf_ops_generic = {
- .enable = perf_swevent_enable,
- .disable = perf_swevent_disable,
- .start = perf_swevent_int,
- .stop = perf_swevent_void,
- .read = perf_swevent_read,
- .unthrottle = perf_swevent_void, /* hwc->interrupts already reset */
-};
-
-/*
- * hrtimer based swevent callback
- */
-
-static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
-{
- enum hrtimer_restart ret = HRTIMER_RESTART;
- struct perf_sample_data data;
- struct pt_regs *regs;
- struct perf_event *event;
- u64 period;
-
- event = container_of(hrtimer, struct perf_event, hw.hrtimer);
- event->pmu->read(event);
-
- perf_sample_data_init(&data, 0);
- data.period = event->hw.last_period;
- regs = get_irq_regs();
-
- if (regs && !perf_exclude_event(event, regs)) {
- if (!(event->attr.exclude_idle && current->pid == 0))
- if (perf_event_overflow(event, 0, &data, regs))
- ret = HRTIMER_NORESTART;
- }
-
- period = max_t(u64, 10000, event->hw.sample_period);
- hrtimer_forward_now(hrtimer, ns_to_ktime(period));
-
- return ret;
-}
-
-static void perf_swevent_start_hrtimer(struct perf_event *event)
-{
- struct hw_perf_event *hwc = &event->hw;
-
- hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
- hwc->hrtimer.function = perf_swevent_hrtimer;
- if (hwc->sample_period) {
- u64 period;
-
- if (hwc->remaining) {
- if (hwc->remaining < 0)
- period = 10000;
- else
- period = hwc->remaining;
- hwc->remaining = 0;
- } else {
- period = max_t(u64, 10000, hwc->sample_period);
- }
- __hrtimer_start_range_ns(&hwc->hrtimer,
- ns_to_ktime(period), 0,
- HRTIMER_MODE_REL, 0);
- }
-}
-
-static void perf_swevent_cancel_hrtimer(struct perf_event *event)
-{
- struct hw_perf_event *hwc = &event->hw;
-
- if (hwc->sample_period) {
- ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
- hwc->remaining = ktime_to_ns(remaining);
-
- hrtimer_cancel(&hwc->hrtimer);
- }
-}
-
-/*
- * Software event: cpu wall time clock
- */
-
-static void cpu_clock_perf_event_update(struct perf_event *event)
-{
- int cpu = raw_smp_processor_id();
- s64 prev;
- u64 now;
-
- now = cpu_clock(cpu);
- prev = local64_xchg(&event->hw.prev_count, now);
- local64_add(now - prev, &event->count);
-}
-
-static int cpu_clock_perf_event_enable(struct perf_event *event)
-{
- struct hw_perf_event *hwc = &event->hw;
- int cpu = raw_smp_processor_id();
-
- local64_set(&hwc->prev_count, cpu_clock(cpu));
- perf_swevent_start_hrtimer(event);
-
- return 0;
-}
-
-static void cpu_clock_perf_event_disable(struct perf_event *event)
-{
- perf_swevent_cancel_hrtimer(event);
- cpu_clock_perf_event_update(event);
-}
-
-static void cpu_clock_perf_event_read(struct perf_event *event)
-{
- cpu_clock_perf_event_update(event);
-}
-
-static const struct pmu perf_ops_cpu_clock = {
- .enable = cpu_clock_perf_event_enable,
- .disable = cpu_clock_perf_event_disable,
- .read = cpu_clock_perf_event_read,
-};
-
-/*
- * Software event: task time clock
- */
-
-static void task_clock_perf_event_update(struct perf_event *event, u64 now)
-{
- u64 prev;
- s64 delta;
-
- prev = local64_xchg(&event->hw.prev_count, now);
- delta = now - prev;
- local64_add(delta, &event->count);
-}
-
-static int task_clock_perf_event_enable(struct perf_event *event)
-{
- struct hw_perf_event *hwc = &event->hw;
- u64 now;
-
- now = event->ctx->time;
-
- local64_set(&hwc->prev_count, now);
-
- perf_swevent_start_hrtimer(event);
-
- return 0;
-}
-
-static void task_clock_perf_event_disable(struct perf_event *event)
+static void perf_swevent_start(struct perf_event *event, int flags)
{
- perf_swevent_cancel_hrtimer(event);
- task_clock_perf_event_update(event, event->ctx->time);
-
+ event->hw.state = 0;
}
-static void task_clock_perf_event_read(struct perf_event *event)
+static void perf_swevent_stop(struct perf_event *event, int flags)
{
- u64 time;
-
- if (!in_nmi()) {
- update_context_time(event->ctx);
- time = event->ctx->time;
- } else {
- u64 now = perf_clock();
- u64 delta = now - event->ctx->timestamp;
- time = event->ctx->time + delta;
- }
-
- task_clock_perf_event_update(event, time);
+ event->hw.state = PERF_HES_STOPPED;
}
-static const struct pmu perf_ops_task_clock = {
- .enable = task_clock_perf_event_enable,
- .disable = task_clock_perf_event_disable,
- .read = task_clock_perf_event_read,
-};
-
/* Deref the hlist from the update side */
static inline struct swevent_hlist *
-swevent_hlist_deref(struct perf_cpu_context *cpuctx)
+swevent_hlist_deref(struct swevent_htable *swhash)
{
- return rcu_dereference_protected(cpuctx->swevent_hlist,
- lockdep_is_held(&cpuctx->hlist_mutex));
+ return rcu_dereference_protected(swhash->swevent_hlist,
+ lockdep_is_held(&swhash->hlist_mutex));
}
static void swevent_hlist_release_rcu(struct rcu_head *rcu_head)
kfree(hlist);
}
-static void swevent_hlist_release(struct perf_cpu_context *cpuctx)
+static void swevent_hlist_release(struct swevent_htable *swhash)
{
- struct swevent_hlist *hlist = swevent_hlist_deref(cpuctx);
+ struct swevent_hlist *hlist = swevent_hlist_deref(swhash);
if (!hlist)
return;
- rcu_assign_pointer(cpuctx->swevent_hlist, NULL);
+ rcu_assign_pointer(swhash->swevent_hlist, NULL);
call_rcu(&hlist->rcu_head, swevent_hlist_release_rcu);
}
static void swevent_hlist_put_cpu(struct perf_event *event, int cpu)
{
- struct perf_cpu_context *cpuctx = &per_cpu(perf_cpu_context, cpu);
+ struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
- mutex_lock(&cpuctx->hlist_mutex);
+ mutex_lock(&swhash->hlist_mutex);
- if (!--cpuctx->hlist_refcount)
- swevent_hlist_release(cpuctx);
+ if (!--swhash->hlist_refcount)
+ swevent_hlist_release(swhash);
- mutex_unlock(&cpuctx->hlist_mutex);
+ mutex_unlock(&swhash->hlist_mutex);
}
static void swevent_hlist_put(struct perf_event *event)
static int swevent_hlist_get_cpu(struct perf_event *event, int cpu)
{
- struct perf_cpu_context *cpuctx = &per_cpu(perf_cpu_context, cpu);
+ struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
int err = 0;
- mutex_lock(&cpuctx->hlist_mutex);
+ mutex_lock(&swhash->hlist_mutex);
- if (!swevent_hlist_deref(cpuctx) && cpu_online(cpu)) {
+ if (!swevent_hlist_deref(swhash) && cpu_online(cpu)) {
struct swevent_hlist *hlist;
hlist = kzalloc(sizeof(*hlist), GFP_KERNEL);
err = -ENOMEM;
goto exit;
}
- rcu_assign_pointer(cpuctx->swevent_hlist, hlist);
+ rcu_assign_pointer(swhash->swevent_hlist, hlist);
}
- cpuctx->hlist_refcount++;
- exit:
- mutex_unlock(&cpuctx->hlist_mutex);
+ swhash->hlist_refcount++;
+exit:
+ mutex_unlock(&swhash->hlist_mutex);
return err;
}
put_online_cpus();
return 0;
- fail:
+fail:
for_each_possible_cpu(cpu) {
if (cpu == failed_cpu)
break;
return err;
}
-#ifdef CONFIG_EVENT_TRACING
+atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+
+static void sw_perf_event_destroy(struct perf_event *event)
+{
+ u64 event_id = event->attr.config;
+
+ WARN_ON(event->parent);
+
+ jump_label_dec(&perf_swevent_enabled[event_id]);
+ swevent_hlist_put(event);
+}
-static const struct pmu perf_ops_tracepoint = {
- .enable = perf_trace_enable,
- .disable = perf_trace_disable,
- .start = perf_swevent_int,
- .stop = perf_swevent_void,
+static int perf_swevent_init(struct perf_event *event)
+{
+ int event_id = event->attr.config;
+
+ if (event->attr.type != PERF_TYPE_SOFTWARE)
+ return -ENOENT;
+
+ switch (event_id) {
+ case PERF_COUNT_SW_CPU_CLOCK:
+ case PERF_COUNT_SW_TASK_CLOCK:
+ return -ENOENT;
+
+ default:
+ break;
+ }
+
+ if (event_id > PERF_COUNT_SW_MAX)
+ return -ENOENT;
+
+ if (!event->parent) {
+ int err;
+
+ err = swevent_hlist_get(event);
+ if (err)
+ return err;
+
+ jump_label_inc(&perf_swevent_enabled[event_id]);
+ event->destroy = sw_perf_event_destroy;
+ }
+
+ return 0;
+}
+
+static struct pmu perf_swevent = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = perf_swevent_init,
+ .add = perf_swevent_add,
+ .del = perf_swevent_del,
+ .start = perf_swevent_start,
+ .stop = perf_swevent_stop,
.read = perf_swevent_read,
- .unthrottle = perf_swevent_void,
};
+#ifdef CONFIG_EVENT_TRACING
+
static int perf_tp_filter_match(struct perf_event *event,
struct perf_sample_data *data)
{
hlist_for_each_entry_rcu(event, node, head, hlist_entry) {
if (perf_tp_event_match(event, &data, regs))
- perf_swevent_add(event, count, 1, &data, regs);
+ perf_swevent_event(event, count, 1, &data, regs);
}
perf_swevent_put_recursion_context(rctx);
perf_trace_destroy(event);
}
-static const struct pmu *tp_perf_event_init(struct perf_event *event)
+static int perf_tp_event_init(struct perf_event *event)
{
int err;
+ if (event->attr.type != PERF_TYPE_TRACEPOINT)
+ return -ENOENT;
+
/*
* Raw tracepoint data is a severe data leak, only allow root to
* have these.
if ((event->attr.sample_type & PERF_SAMPLE_RAW) &&
perf_paranoid_tracepoint_raw() &&
!capable(CAP_SYS_ADMIN))
- return ERR_PTR(-EPERM);
+ return -EPERM;
err = perf_trace_init(event);
if (err)
- return NULL;
+ return err;
event->destroy = tp_perf_event_destroy;
- return &perf_ops_tracepoint;
+ return 0;
+}
+
+static struct pmu perf_tracepoint = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = perf_tp_event_init,
+ .add = perf_trace_add,
+ .del = perf_trace_del,
+ .start = perf_swevent_start,
+ .stop = perf_swevent_stop,
+ .read = perf_swevent_read,
+};
+
+static inline void perf_tp_register(void)
+{
+ perf_pmu_register(&perf_tracepoint);
}
static int perf_event_set_filter(struct perf_event *event, void __user *arg)
#else
-static const struct pmu *tp_perf_event_init(struct perf_event *event)
+static inline void perf_tp_register(void)
{
- return NULL;
}
static int perf_event_set_filter(struct perf_event *event, void __user *arg)
#endif /* CONFIG_EVENT_TRACING */
#ifdef CONFIG_HAVE_HW_BREAKPOINT
-static void bp_perf_event_destroy(struct perf_event *event)
-{
- release_bp_slot(event);
-}
-
-static const struct pmu *bp_perf_event_init(struct perf_event *bp)
-{
- int err;
-
- err = register_perf_hw_breakpoint(bp);
- if (err)
- return ERR_PTR(err);
-
- bp->destroy = bp_perf_event_destroy;
-
- return &perf_ops_bp;
-}
-
void perf_bp_event(struct perf_event *bp, void *data)
{
struct perf_sample_data sample;
perf_sample_data_init(&sample, bp->attr.bp_addr);
- if (!perf_exclude_event(bp, regs))
- perf_swevent_add(bp, 1, 1, &sample, regs);
-}
-#else
-static const struct pmu *bp_perf_event_init(struct perf_event *bp)
-{
- return NULL;
-}
-
-void perf_bp_event(struct perf_event *bp, void *regs)
-{
+ if (!bp->hw.state && !perf_exclude_event(bp, regs))
+ perf_swevent_event(bp, 1, 1, &sample, regs);
}
#endif
-atomic_t perf_swevent_enabled[PERF_COUNT_SW_MAX];
+/*
+ * hrtimer based swevent callback
+ */
-static void sw_perf_event_destroy(struct perf_event *event)
+static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
{
- u64 event_id = event->attr.config;
-
- WARN_ON(event->parent);
+ enum hrtimer_restart ret = HRTIMER_RESTART;
+ struct perf_sample_data data;
+ struct pt_regs *regs;
+ struct perf_event *event;
+ u64 period;
- atomic_dec(&perf_swevent_enabled[event_id]);
- swevent_hlist_put(event);
+ event = container_of(hrtimer, struct perf_event, hw.hrtimer);
+ event->pmu->read(event);
+
+ perf_sample_data_init(&data, 0);
+ data.period = event->hw.last_period;
+ regs = get_irq_regs();
+
+ if (regs && !perf_exclude_event(event, regs)) {
+ if (!(event->attr.exclude_idle && current->pid == 0))
+ if (perf_event_overflow(event, 0, &data, regs))
+ ret = HRTIMER_NORESTART;
+ }
+
+ period = max_t(u64, 10000, event->hw.sample_period);
+ hrtimer_forward_now(hrtimer, ns_to_ktime(period));
+
+ return ret;
}
-static const struct pmu *sw_perf_event_init(struct perf_event *event)
+static void perf_swevent_start_hrtimer(struct perf_event *event)
{
- const struct pmu *pmu = NULL;
- u64 event_id = event->attr.config;
+ struct hw_perf_event *hwc = &event->hw;
+
+ hrtimer_init(&hwc->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ hwc->hrtimer.function = perf_swevent_hrtimer;
+ if (hwc->sample_period) {
+ s64 period = local64_read(&hwc->period_left);
+
+ if (period) {
+ if (period < 0)
+ period = 10000;
+ local64_set(&hwc->period_left, 0);
+ } else {
+ period = max_t(u64, 10000, hwc->sample_period);
+ }
+ __hrtimer_start_range_ns(&hwc->hrtimer,
+ ns_to_ktime(period), 0,
+ HRTIMER_MODE_REL_PINNED, 0);
+ }
+}
+
+static void perf_swevent_cancel_hrtimer(struct perf_event *event)
+{
+ struct hw_perf_event *hwc = &event->hw;
+
+ if (hwc->sample_period) {
+ ktime_t remaining = hrtimer_get_remaining(&hwc->hrtimer);
+ local64_set(&hwc->period_left, ktime_to_ns(remaining));
+
+ hrtimer_cancel(&hwc->hrtimer);
+ }
+}
+
+/*
+ * Software event: cpu wall time clock
+ */
+
+static void cpu_clock_event_update(struct perf_event *event)
+{
+ s64 prev;
+ u64 now;
+
+ now = local_clock();
+ prev = local64_xchg(&event->hw.prev_count, now);
+ local64_add(now - prev, &event->count);
+}
+
+static void cpu_clock_event_start(struct perf_event *event, int flags)
+{
+ local64_set(&event->hw.prev_count, local_clock());
+ perf_swevent_start_hrtimer(event);
+}
+
+static void cpu_clock_event_stop(struct perf_event *event, int flags)
+{
+ perf_swevent_cancel_hrtimer(event);
+ cpu_clock_event_update(event);
+}
+
+static int cpu_clock_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ cpu_clock_event_start(event, flags);
+
+ return 0;
+}
+
+static void cpu_clock_event_del(struct perf_event *event, int flags)
+{
+ cpu_clock_event_stop(event, flags);
+}
+
+static void cpu_clock_event_read(struct perf_event *event)
+{
+ cpu_clock_event_update(event);
+}
+
+static int cpu_clock_event_init(struct perf_event *event)
+{
+ if (event->attr.type != PERF_TYPE_SOFTWARE)
+ return -ENOENT;
+
+ if (event->attr.config != PERF_COUNT_SW_CPU_CLOCK)
+ return -ENOENT;
+
+ return 0;
+}
+
+static struct pmu perf_cpu_clock = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = cpu_clock_event_init,
+ .add = cpu_clock_event_add,
+ .del = cpu_clock_event_del,
+ .start = cpu_clock_event_start,
+ .stop = cpu_clock_event_stop,
+ .read = cpu_clock_event_read,
+};
+
+/*
+ * Software event: task time clock
+ */
+
+static void task_clock_event_update(struct perf_event *event, u64 now)
+{
+ u64 prev;
+ s64 delta;
+
+ prev = local64_xchg(&event->hw.prev_count, now);
+ delta = now - prev;
+ local64_add(delta, &event->count);
+}
+
+static void task_clock_event_start(struct perf_event *event, int flags)
+{
+ local64_set(&event->hw.prev_count, event->ctx->time);
+ perf_swevent_start_hrtimer(event);
+}
+
+static void task_clock_event_stop(struct perf_event *event, int flags)
+{
+ perf_swevent_cancel_hrtimer(event);
+ task_clock_event_update(event, event->ctx->time);
+}
+
+static int task_clock_event_add(struct perf_event *event, int flags)
+{
+ if (flags & PERF_EF_START)
+ task_clock_event_start(event, flags);
+
+ return 0;
+}
+
+static void task_clock_event_del(struct perf_event *event, int flags)
+{
+ task_clock_event_stop(event, PERF_EF_UPDATE);
+}
+
+static void task_clock_event_read(struct perf_event *event)
+{
+ u64 time;
+
+ if (!in_nmi()) {
+ update_context_time(event->ctx);
+ time = event->ctx->time;
+ } else {
+ u64 now = perf_clock();
+ u64 delta = now - event->ctx->timestamp;
+ time = event->ctx->time + delta;
+ }
+
+ task_clock_event_update(event, time);
+}
+
+static int task_clock_event_init(struct perf_event *event)
+{
+ if (event->attr.type != PERF_TYPE_SOFTWARE)
+ return -ENOENT;
+
+ if (event->attr.config != PERF_COUNT_SW_TASK_CLOCK)
+ return -ENOENT;
+
+ return 0;
+}
+
+static struct pmu perf_task_clock = {
+ .task_ctx_nr = perf_sw_context,
+
+ .event_init = task_clock_event_init,
+ .add = task_clock_event_add,
+ .del = task_clock_event_del,
+ .start = task_clock_event_start,
+ .stop = task_clock_event_stop,
+ .read = task_clock_event_read,
+};
+
+static void perf_pmu_nop_void(struct pmu *pmu)
+{
+}
+
+static int perf_pmu_nop_int(struct pmu *pmu)
+{
+ return 0;
+}
+
+static void perf_pmu_start_txn(struct pmu *pmu)
+{
+ perf_pmu_disable(pmu);
+}
+
+static int perf_pmu_commit_txn(struct pmu *pmu)
+{
+ perf_pmu_enable(pmu);
+ return 0;
+}
+
+static void perf_pmu_cancel_txn(struct pmu *pmu)
+{
+ perf_pmu_enable(pmu);
+}
+
+/*
+ * Ensures all contexts with the same task_ctx_nr have the same
+ * pmu_cpu_context too.
+ */
+static void *find_pmu_context(int ctxn)
+{
+ struct pmu *pmu;
+
+ if (ctxn < 0)
+ return NULL;
+
+ list_for_each_entry(pmu, &pmus, entry) {
+ if (pmu->task_ctx_nr == ctxn)
+ return pmu->pmu_cpu_context;
+ }
+
+ return NULL;
+}
+
+static void free_pmu_context(void * __percpu cpu_context)
+{
+ struct pmu *pmu;
+
+ mutex_lock(&pmus_lock);
/*
- * Software events (currently) can't in general distinguish
- * between user, kernel and hypervisor events.
- * However, context switches and cpu migrations are considered
- * to be kernel events, and page faults are never hypervisor
- * events.
+ * Like a real lame refcount.
*/
- switch (event_id) {
- case PERF_COUNT_SW_CPU_CLOCK:
- pmu = &perf_ops_cpu_clock;
+ list_for_each_entry(pmu, &pmus, entry) {
+ if (pmu->pmu_cpu_context == cpu_context)
+ goto out;
+ }
- break;
- case PERF_COUNT_SW_TASK_CLOCK:
- /*
- * If the user instantiates this as a per-cpu event,
- * use the cpu_clock event instead.
- */
- if (event->ctx->task)
- pmu = &perf_ops_task_clock;
- else
- pmu = &perf_ops_cpu_clock;
+ free_percpu(cpu_context);
+out:
+ mutex_unlock(&pmus_lock);
+}
- break;
- case PERF_COUNT_SW_PAGE_FAULTS:
- case PERF_COUNT_SW_PAGE_FAULTS_MIN:
- case PERF_COUNT_SW_PAGE_FAULTS_MAJ:
- case PERF_COUNT_SW_CONTEXT_SWITCHES:
- case PERF_COUNT_SW_CPU_MIGRATIONS:
- case PERF_COUNT_SW_ALIGNMENT_FAULTS:
- case PERF_COUNT_SW_EMULATION_FAULTS:
- if (!event->parent) {
- int err;
-
- err = swevent_hlist_get(event);
- if (err)
- return ERR_PTR(err);
+int perf_pmu_register(struct pmu *pmu)
+{
+ int cpu, ret;
+
+ mutex_lock(&pmus_lock);
+ ret = -ENOMEM;
+ pmu->pmu_disable_count = alloc_percpu(int);
+ if (!pmu->pmu_disable_count)
+ goto unlock;
- atomic_inc(&perf_swevent_enabled[event_id]);
- event->destroy = sw_perf_event_destroy;
+ pmu->pmu_cpu_context = find_pmu_context(pmu->task_ctx_nr);
+ if (pmu->pmu_cpu_context)
+ goto got_cpu_context;
+
+ pmu->pmu_cpu_context = alloc_percpu(struct perf_cpu_context);
+ if (!pmu->pmu_cpu_context)
+ goto free_pdc;
+
+ for_each_possible_cpu(cpu) {
+ struct perf_cpu_context *cpuctx;
+
+ cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
+ __perf_event_init_context(&cpuctx->ctx);
+ cpuctx->ctx.type = cpu_context;
+ cpuctx->ctx.pmu = pmu;
+ cpuctx->jiffies_interval = 1;
+ INIT_LIST_HEAD(&cpuctx->rotation_list);
+ }
+
+got_cpu_context:
+ if (!pmu->start_txn) {
+ if (pmu->pmu_enable) {
+ /*
+ * If we have pmu_enable/pmu_disable calls, install
+ * transaction stubs that use that to try and batch
+ * hardware accesses.
+ */
+ pmu->start_txn = perf_pmu_start_txn;
+ pmu->commit_txn = perf_pmu_commit_txn;
+ pmu->cancel_txn = perf_pmu_cancel_txn;
+ } else {
+ pmu->start_txn = perf_pmu_nop_void;
+ pmu->commit_txn = perf_pmu_nop_int;
+ pmu->cancel_txn = perf_pmu_nop_void;
}
- pmu = &perf_ops_generic;
- break;
}
+ if (!pmu->pmu_enable) {
+ pmu->pmu_enable = perf_pmu_nop_void;
+ pmu->pmu_disable = perf_pmu_nop_void;
+ }
+
+ list_add_rcu(&pmu->entry, &pmus);
+ ret = 0;
+unlock:
+ mutex_unlock(&pmus_lock);
+
+ return ret;
+
+free_pdc:
+ free_percpu(pmu->pmu_disable_count);
+ goto unlock;
+}
+
+void perf_pmu_unregister(struct pmu *pmu)
+{
+ mutex_lock(&pmus_lock);
+ list_del_rcu(&pmu->entry);
+ mutex_unlock(&pmus_lock);
+
+ /*
+ * We dereference the pmu list under both SRCU and regular RCU, so
+ * synchronize against both of those.
+ */
+ synchronize_srcu(&pmus_srcu);
+ synchronize_rcu();
+
+ free_percpu(pmu->pmu_disable_count);
+ free_pmu_context(pmu->pmu_cpu_context);
+}
+
+struct pmu *perf_init_event(struct perf_event *event)
+{
+ struct pmu *pmu = NULL;
+ int idx;
+
+ idx = srcu_read_lock(&pmus_srcu);
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ int ret = pmu->event_init(event);
+ if (!ret)
+ goto unlock;
+
+ if (ret != -ENOENT) {
+ pmu = ERR_PTR(ret);
+ goto unlock;
+ }
+ }
+ pmu = ERR_PTR(-ENOENT);
+unlock:
+ srcu_read_unlock(&pmus_srcu, idx);
+
return pmu;
}
* Allocate and initialize a event structure
*/
static struct perf_event *
-perf_event_alloc(struct perf_event_attr *attr,
- int cpu,
- struct perf_event_context *ctx,
- struct perf_event *group_leader,
- struct perf_event *parent_event,
- perf_overflow_handler_t overflow_handler,
- gfp_t gfpflags)
-{
- const struct pmu *pmu;
+perf_event_alloc(struct perf_event_attr *attr, int cpu,
+ struct task_struct *task,
+ struct perf_event *group_leader,
+ struct perf_event *parent_event,
+ perf_overflow_handler_t overflow_handler)
+{
+ struct pmu *pmu;
struct perf_event *event;
struct hw_perf_event *hwc;
long err;
- event = kzalloc(sizeof(*event), gfpflags);
+ event = kzalloc(sizeof(*event), GFP_KERNEL);
if (!event)
return ERR_PTR(-ENOMEM);
INIT_LIST_HEAD(&event->event_entry);
INIT_LIST_HEAD(&event->sibling_list);
init_waitqueue_head(&event->waitq);
+ init_irq_work(&event->pending, perf_pending_event);
mutex_init(&event->mmap_mutex);
event->attr = *attr;
event->group_leader = group_leader;
event->pmu = NULL;
- event->ctx = ctx;
event->oncpu = -1;
event->parent = parent_event;
event->state = PERF_EVENT_STATE_INACTIVE;
+ if (task) {
+ event->attach_state = PERF_ATTACH_TASK;
+#ifdef CONFIG_HAVE_HW_BREAKPOINT
+ /*
+ * hw_breakpoint is a bit difficult here..
+ */
+ if (attr->type == PERF_TYPE_BREAKPOINT)
+ event->hw.bp_target = task;
+#endif
+ }
+
if (!overflow_handler && parent_event)
overflow_handler = parent_event->overflow_handler;
if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
goto done;
- switch (attr->type) {
- case PERF_TYPE_RAW:
- case PERF_TYPE_HARDWARE:
- case PERF_TYPE_HW_CACHE:
- pmu = hw_perf_event_init(event);
- break;
-
- case PERF_TYPE_SOFTWARE:
- pmu = sw_perf_event_init(event);
- break;
-
- case PERF_TYPE_TRACEPOINT:
- pmu = tp_perf_event_init(event);
- break;
-
- case PERF_TYPE_BREAKPOINT:
- pmu = bp_perf_event_init(event);
- break;
-
+ pmu = perf_init_event(event);
- default:
- break;
- }
done:
err = 0;
if (!pmu)
event->pmu = pmu;
if (!event->parent) {
- atomic_inc(&nr_events);
+ if (event->attach_state & PERF_ATTACH_TASK)
+ jump_label_inc(&perf_task_events);
if (event->attr.mmap || event->attr.mmap_data)
atomic_inc(&nr_mmap_events);
if (event->attr.comm)
atomic_inc(&nr_comm_events);
if (event->attr.task)
atomic_inc(&nr_task_events);
+ if (event->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
+ err = get_callchain_buffers();
+ if (err) {
+ free_event(event);
+ return ERR_PTR(err);
+ }
+ }
}
return event;
struct perf_event_attr __user *, attr_uptr,
pid_t, pid, int, cpu, int, group_fd, unsigned long, flags)
{
- struct perf_event *event, *group_leader = NULL, *output_event = NULL;
+ struct perf_event *group_leader = NULL, *output_event = NULL;
+ struct perf_event *event, *sibling;
struct perf_event_attr attr;
struct perf_event_context *ctx;
struct file *event_file = NULL;
struct file *group_file = NULL;
+ struct task_struct *task = NULL;
+ struct pmu *pmu;
int event_fd;
+ int move_group = 0;
int fput_needed = 0;
int err;
if (event_fd < 0)
return event_fd;
- /*
- * Get the target context (task or percpu):
- */
- ctx = find_get_context(pid, cpu);
- if (IS_ERR(ctx)) {
- err = PTR_ERR(ctx);
- goto err_fd;
- }
-
if (group_fd != -1) {
group_leader = perf_fget_light(group_fd, &fput_needed);
if (IS_ERR(group_leader)) {
err = PTR_ERR(group_leader);
- goto err_put_context;
+ goto err_fd;
}
group_file = group_leader->filp;
if (flags & PERF_FLAG_FD_OUTPUT)
group_leader = NULL;
}
+ if (pid != -1) {
+ task = find_lively_task_by_vpid(pid);
+ if (IS_ERR(task)) {
+ err = PTR_ERR(task);
+ goto err_group_fd;
+ }
+ }
+
+ event = perf_event_alloc(&attr, cpu, task, group_leader, NULL, NULL);
+ if (IS_ERR(event)) {
+ err = PTR_ERR(event);
+ goto err_task;
+ }
+
+ /*
+ * Special case software events and allow them to be part of
+ * any hardware group.
+ */
+ pmu = event->pmu;
+
+ if (group_leader &&
+ (is_software_event(event) != is_software_event(group_leader))) {
+ if (is_software_event(event)) {
+ /*
+ * If event and group_leader are not both a software
+ * event, and event is, then group leader is not.
+ *
+ * Allow the addition of software events to !software
+ * groups, this is safe because software events never
+ * fail to schedule.
+ */
+ pmu = group_leader->pmu;
+ } else if (is_software_event(group_leader) &&
+ (group_leader->group_flags & PERF_GROUP_SOFTWARE)) {
+ /*
+ * In case the group is a pure software group, and we
+ * try to add a hardware event, move the whole group to
+ * the hardware context.
+ */
+ move_group = 1;
+ }
+ }
+
+ /*
+ * Get the target context (task or percpu):
+ */
+ ctx = find_get_context(pmu, task, cpu);
+ if (IS_ERR(ctx)) {
+ err = PTR_ERR(ctx);
+ goto err_alloc;
+ }
+
/*
* Look up the group leader (we will attach this event to it):
*/
* becoming part of another group-sibling):
*/
if (group_leader->group_leader != group_leader)
- goto err_put_context;
+ goto err_context;
/*
* Do not allow to attach to a group in a different
* task or CPU context:
*/
- if (group_leader->ctx != ctx)
- goto err_put_context;
+ if (move_group) {
+ if (group_leader->ctx->type != ctx->type)
+ goto err_context;
+ } else {
+ if (group_leader->ctx != ctx)
+ goto err_context;
+ }
+
/*
* Only a group leader can be exclusive or pinned
*/
if (attr.exclusive || attr.pinned)
- goto err_put_context;
- }
-
- event = perf_event_alloc(&attr, cpu, ctx, group_leader,
- NULL, NULL, GFP_KERNEL);
- if (IS_ERR(event)) {
- err = PTR_ERR(event);
- goto err_put_context;
+ goto err_context;
}
if (output_event) {
err = perf_event_set_output(event, output_event);
if (err)
- goto err_free_put_context;
+ goto err_context;
}
event_file = anon_inode_getfile("[perf_event]", &perf_fops, event, O_RDWR);
if (IS_ERR(event_file)) {
err = PTR_ERR(event_file);
- goto err_free_put_context;
+ goto err_context;
+ }
+
+ if (move_group) {
+ struct perf_event_context *gctx = group_leader->ctx;
+
+ mutex_lock(&gctx->mutex);
+ perf_event_remove_from_context(group_leader);
+ list_for_each_entry(sibling, &group_leader->sibling_list,
+ group_entry) {
+ perf_event_remove_from_context(sibling);
+ put_ctx(gctx);
+ }
+ mutex_unlock(&gctx->mutex);
+ put_ctx(gctx);
}
event->filp = event_file;
WARN_ON_ONCE(ctx->parent_ctx);
mutex_lock(&ctx->mutex);
+
+ if (move_group) {
+ perf_install_in_context(ctx, group_leader, cpu);
+ get_ctx(ctx);
+ list_for_each_entry(sibling, &group_leader->sibling_list,
+ group_entry) {
+ perf_install_in_context(ctx, sibling, cpu);
+ get_ctx(ctx);
+ }
+ }
+
perf_install_in_context(ctx, event, cpu);
++ctx->generation;
mutex_unlock(&ctx->mutex);
fd_install(event_fd, event_file);
return event_fd;
-err_free_put_context:
+err_context:
+ put_ctx(ctx);
+err_alloc:
free_event(event);
-err_put_context:
+err_task:
+ if (task)
+ put_task_struct(task);
+err_group_fd:
fput_light(group_file, fput_needed);
- put_ctx(ctx);
err_fd:
put_unused_fd(event_fd);
return err;
*
* @attr: attributes of the counter to create
* @cpu: cpu in which the counter is bound
- * @pid: task to profile
+ * @task: task to profile (NULL for percpu)
*/
struct perf_event *
perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
- pid_t pid,
+ struct task_struct *task,
perf_overflow_handler_t overflow_handler)
{
- struct perf_event *event;
struct perf_event_context *ctx;
- int err;
-
- /*
- * Get the target context (task or percpu):
- */
-
- ctx = find_get_context(pid, cpu);
- if (IS_ERR(ctx)) {
- err = PTR_ERR(ctx);
- goto err_exit;
- }
-
- event = perf_event_alloc(attr, cpu, ctx, NULL,
- NULL, overflow_handler, GFP_KERNEL);
- if (IS_ERR(event)) {
- err = PTR_ERR(event);
- goto err_put_context;
- }
-
- event->filp = NULL;
- WARN_ON_ONCE(ctx->parent_ctx);
- mutex_lock(&ctx->mutex);
- perf_install_in_context(ctx, event, cpu);
- ++ctx->generation;
- mutex_unlock(&ctx->mutex);
-
- event->owner = current;
- get_task_struct(current);
- mutex_lock(¤t->perf_event_mutex);
- list_add_tail(&event->owner_entry, ¤t->perf_event_list);
- mutex_unlock(¤t->perf_event_mutex);
-
- return event;
-
- err_put_context:
- put_ctx(ctx);
- err_exit:
- return ERR_PTR(err);
-}
-EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
-
-/*
- * inherit a event from parent task to child task:
- */
-static struct perf_event *
-inherit_event(struct perf_event *parent_event,
- struct task_struct *parent,
- struct perf_event_context *parent_ctx,
- struct task_struct *child,
- struct perf_event *group_leader,
- struct perf_event_context *child_ctx)
-{
- struct perf_event *child_event;
-
- /*
- * Instead of creating recursive hierarchies of events,
- * we link inherited events back to the original parent,
- * which has a filp for sure, which we use as the reference
- * count:
- */
- if (parent_event->parent)
- parent_event = parent_event->parent;
-
- child_event = perf_event_alloc(&parent_event->attr,
- parent_event->cpu, child_ctx,
- group_leader, parent_event,
- NULL, GFP_KERNEL);
- if (IS_ERR(child_event))
- return child_event;
- get_ctx(child_ctx);
-
- /*
- * Make the child state follow the state of the parent event,
- * not its attr.disabled bit. We hold the parent's mutex,
- * so we won't race with perf_event_{en, dis}able_family.
- */
- if (parent_event->state >= PERF_EVENT_STATE_INACTIVE)
- child_event->state = PERF_EVENT_STATE_INACTIVE;
- else
- child_event->state = PERF_EVENT_STATE_OFF;
-
- if (parent_event->attr.freq) {
- u64 sample_period = parent_event->hw.sample_period;
- struct hw_perf_event *hwc = &child_event->hw;
-
- hwc->sample_period = sample_period;
- hwc->last_period = sample_period;
-
- local64_set(&hwc->period_left, sample_period);
- }
-
- child_event->overflow_handler = parent_event->overflow_handler;
-
- /*
- * Link it up in the child's context:
- */
- add_event_to_ctx(child_event, child_ctx);
-
- /*
- * Get a reference to the parent filp - we will fput it
- * when the child event exits. This is safe to do because
- * we are in the parent and we know that the filp still
- * exists and has a nonzero count:
- */
- atomic_long_inc(&parent_event->filp->f_count);
-
- /*
- * Link this into the parent event's child list
- */
- WARN_ON_ONCE(parent_event->ctx->parent_ctx);
- mutex_lock(&parent_event->child_mutex);
- list_add_tail(&child_event->child_list, &parent_event->child_list);
- mutex_unlock(&parent_event->child_mutex);
+ struct perf_event *event;
+ int err;
- return child_event;
-}
+ /*
+ * Get the target context (task or percpu):
+ */
-static int inherit_group(struct perf_event *parent_event,
- struct task_struct *parent,
- struct perf_event_context *parent_ctx,
- struct task_struct *child,
- struct perf_event_context *child_ctx)
-{
- struct perf_event *leader;
- struct perf_event *sub;
- struct perf_event *child_ctr;
+ event = perf_event_alloc(attr, cpu, task, NULL, NULL, overflow_handler);
+ if (IS_ERR(event)) {
+ err = PTR_ERR(event);
+ goto err;
+ }
- leader = inherit_event(parent_event, parent, parent_ctx,
- child, NULL, child_ctx);
- if (IS_ERR(leader))
- return PTR_ERR(leader);
- list_for_each_entry(sub, &parent_event->sibling_list, group_entry) {
- child_ctr = inherit_event(sub, parent, parent_ctx,
- child, leader, child_ctx);
- if (IS_ERR(child_ctr))
- return PTR_ERR(child_ctr);
+ ctx = find_get_context(event->pmu, task, cpu);
+ if (IS_ERR(ctx)) {
+ err = PTR_ERR(ctx);
+ goto err_free;
}
- return 0;
+
+ event->filp = NULL;
+ WARN_ON_ONCE(ctx->parent_ctx);
+ mutex_lock(&ctx->mutex);
+ perf_install_in_context(ctx, event, cpu);
+ ++ctx->generation;
+ mutex_unlock(&ctx->mutex);
+
+ event->owner = current;
+ get_task_struct(current);
+ mutex_lock(¤t->perf_event_mutex);
+ list_add_tail(&event->owner_entry, ¤t->perf_event_list);
+ mutex_unlock(¤t->perf_event_mutex);
+
+ return event;
+
+err_free:
+ free_event(event);
+err:
+ return ERR_PTR(err);
}
+EXPORT_SYMBOL_GPL(perf_event_create_kernel_counter);
static void sync_child_event(struct perf_event *child_event,
struct task_struct *child)
}
}
-/*
- * When a child task exits, feed back event values to parent events.
- */
-void perf_event_exit_task(struct task_struct *child)
+static void perf_event_exit_task_context(struct task_struct *child, int ctxn)
{
struct perf_event *child_event, *tmp;
struct perf_event_context *child_ctx;
unsigned long flags;
- if (likely(!child->perf_event_ctxp)) {
+ if (likely(!child->perf_event_ctxp[ctxn])) {
perf_event_task(child, NULL, 0);
return;
}
* scheduled, so we are now safe from rescheduling changing
* our context.
*/
- child_ctx = child->perf_event_ctxp;
- __perf_event_task_sched_out(child_ctx);
+ child_ctx = child->perf_event_ctxp[ctxn];
+ task_ctx_sched_out(child_ctx, EVENT_ALL);
/*
* Take the context lock here so that if find_get_context is
* incremented the context's refcount before we do put_ctx below.
*/
raw_spin_lock(&child_ctx->lock);
- child->perf_event_ctxp = NULL;
+ child->perf_event_ctxp[ctxn] = NULL;
/*
* If this context is a clone; unclone it so it can't get
* swapped to another process while we're removing all
put_ctx(child_ctx);
}
+/*
+ * When a child task exits, feed back event values to parent events.
+ */
+void perf_event_exit_task(struct task_struct *child)
+{
+ int ctxn;
+
+ for_each_task_context_nr(ctxn)
+ perf_event_exit_task_context(child, ctxn);
+}
+
static void perf_free_event(struct perf_event *event,
struct perf_event_context *ctx)
{
/*
* free an unexposed, unused context as created by inheritance by
- * init_task below, used by fork() in case of fail.
+ * perf_event_init_task below, used by fork() in case of fail.
*/
void perf_event_free_task(struct task_struct *task)
{
- struct perf_event_context *ctx = task->perf_event_ctxp;
+ struct perf_event_context *ctx;
struct perf_event *event, *tmp;
+ int ctxn;
- if (!ctx)
- return;
+ for_each_task_context_nr(ctxn) {
+ ctx = task->perf_event_ctxp[ctxn];
+ if (!ctx)
+ continue;
- mutex_lock(&ctx->mutex);
+ mutex_lock(&ctx->mutex);
again:
- list_for_each_entry_safe(event, tmp, &ctx->pinned_groups, group_entry)
- perf_free_event(event, ctx);
+ list_for_each_entry_safe(event, tmp, &ctx->pinned_groups,
+ group_entry)
+ perf_free_event(event, ctx);
- list_for_each_entry_safe(event, tmp, &ctx->flexible_groups,
- group_entry)
- perf_free_event(event, ctx);
+ list_for_each_entry_safe(event, tmp, &ctx->flexible_groups,
+ group_entry)
+ perf_free_event(event, ctx);
- if (!list_empty(&ctx->pinned_groups) ||
- !list_empty(&ctx->flexible_groups))
- goto again;
+ if (!list_empty(&ctx->pinned_groups) ||
+ !list_empty(&ctx->flexible_groups))
+ goto again;
- mutex_unlock(&ctx->mutex);
+ mutex_unlock(&ctx->mutex);
- put_ctx(ctx);
+ put_ctx(ctx);
+ }
+}
+
+void perf_event_delayed_put(struct task_struct *task)
+{
+ int ctxn;
+
+ for_each_task_context_nr(ctxn)
+ WARN_ON_ONCE(task->perf_event_ctxp[ctxn]);
+}
+
+/*
+ * inherit a event from parent task to child task:
+ */
+static struct perf_event *
+inherit_event(struct perf_event *parent_event,
+ struct task_struct *parent,
+ struct perf_event_context *parent_ctx,
+ struct task_struct *child,
+ struct perf_event *group_leader,
+ struct perf_event_context *child_ctx)
+{
+ struct perf_event *child_event;
+ unsigned long flags;
+
+ /*
+ * Instead of creating recursive hierarchies of events,
+ * we link inherited events back to the original parent,
+ * which has a filp for sure, which we use as the reference
+ * count:
+ */
+ if (parent_event->parent)
+ parent_event = parent_event->parent;
+
+ child_event = perf_event_alloc(&parent_event->attr,
+ parent_event->cpu,
+ child,
+ group_leader, parent_event,
+ NULL);
+ if (IS_ERR(child_event))
+ return child_event;
+ get_ctx(child_ctx);
+
+ /*
+ * Make the child state follow the state of the parent event,
+ * not its attr.disabled bit. We hold the parent's mutex,
+ * so we won't race with perf_event_{en, dis}able_family.
+ */
+ if (parent_event->state >= PERF_EVENT_STATE_INACTIVE)
+ child_event->state = PERF_EVENT_STATE_INACTIVE;
+ else
+ child_event->state = PERF_EVENT_STATE_OFF;
+
+ if (parent_event->attr.freq) {
+ u64 sample_period = parent_event->hw.sample_period;
+ struct hw_perf_event *hwc = &child_event->hw;
+
+ hwc->sample_period = sample_period;
+ hwc->last_period = sample_period;
+
+ local64_set(&hwc->period_left, sample_period);
+ }
+
+ child_event->ctx = child_ctx;
+ child_event->overflow_handler = parent_event->overflow_handler;
+
+ /*
+ * Link it up in the child's context:
+ */
+ raw_spin_lock_irqsave(&child_ctx->lock, flags);
+ add_event_to_ctx(child_event, child_ctx);
+ raw_spin_unlock_irqrestore(&child_ctx->lock, flags);
+
+ /*
+ * Get a reference to the parent filp - we will fput it
+ * when the child event exits. This is safe to do because
+ * we are in the parent and we know that the filp still
+ * exists and has a nonzero count:
+ */
+ atomic_long_inc(&parent_event->filp->f_count);
+
+ /*
+ * Link this into the parent event's child list
+ */
+ WARN_ON_ONCE(parent_event->ctx->parent_ctx);
+ mutex_lock(&parent_event->child_mutex);
+ list_add_tail(&child_event->child_list, &parent_event->child_list);
+ mutex_unlock(&parent_event->child_mutex);
+
+ return child_event;
+}
+
+static int inherit_group(struct perf_event *parent_event,
+ struct task_struct *parent,
+ struct perf_event_context *parent_ctx,
+ struct task_struct *child,
+ struct perf_event_context *child_ctx)
+{
+ struct perf_event *leader;
+ struct perf_event *sub;
+ struct perf_event *child_ctr;
+
+ leader = inherit_event(parent_event, parent, parent_ctx,
+ child, NULL, child_ctx);
+ if (IS_ERR(leader))
+ return PTR_ERR(leader);
+ list_for_each_entry(sub, &parent_event->sibling_list, group_entry) {
+ child_ctr = inherit_event(sub, parent, parent_ctx,
+ child, leader, child_ctx);
+ if (IS_ERR(child_ctr))
+ return PTR_ERR(child_ctr);
+ }
+ return 0;
}
static int
inherit_task_group(struct perf_event *event, struct task_struct *parent,
struct perf_event_context *parent_ctx,
- struct task_struct *child,
+ struct task_struct *child, int ctxn,
int *inherited_all)
{
int ret;
- struct perf_event_context *child_ctx = child->perf_event_ctxp;
+ struct perf_event_context *child_ctx;
if (!event->attr.inherit) {
*inherited_all = 0;
return 0;
}
+ child_ctx = child->perf_event_ctxp[ctxn];
if (!child_ctx) {
/*
* This is executed from the parent task context, so
* child.
*/
- child_ctx = kzalloc(sizeof(struct perf_event_context),
- GFP_KERNEL);
+ child_ctx = alloc_perf_context(event->pmu, child);
if (!child_ctx)
return -ENOMEM;
- __perf_event_init_context(child_ctx, child);
- child->perf_event_ctxp = child_ctx;
- get_task_struct(child);
+ child->perf_event_ctxp[ctxn] = child_ctx;
}
ret = inherit_group(event, parent, parent_ctx,
return ret;
}
-
/*
* Initialize the perf_event context in task_struct
*/
-int perf_event_init_task(struct task_struct *child)
+int perf_event_init_context(struct task_struct *child, int ctxn)
{
struct perf_event_context *child_ctx, *parent_ctx;
struct perf_event_context *cloned_ctx;
int inherited_all = 1;
int ret = 0;
- child->perf_event_ctxp = NULL;
+ child->perf_event_ctxp[ctxn] = NULL;
mutex_init(&child->perf_event_mutex);
INIT_LIST_HEAD(&child->perf_event_list);
- if (likely(!parent->perf_event_ctxp))
+ if (likely(!parent->perf_event_ctxp[ctxn]))
return 0;
/*
* If the parent's context is a clone, pin it so it won't get
* swapped under us.
*/
- parent_ctx = perf_pin_task_context(parent);
+ parent_ctx = perf_pin_task_context(parent, ctxn);
/*
* No need to check if parent_ctx != NULL here; since we saw
* the list, not manipulating it:
*/
list_for_each_entry(event, &parent_ctx->pinned_groups, group_entry) {
- ret = inherit_task_group(event, parent, parent_ctx, child,
- &inherited_all);
+ ret = inherit_task_group(event, parent, parent_ctx,
+ child, ctxn, &inherited_all);
if (ret)
break;
}
list_for_each_entry(event, &parent_ctx->flexible_groups, group_entry) {
- ret = inherit_task_group(event, parent, parent_ctx, child,
- &inherited_all);
+ ret = inherit_task_group(event, parent, parent_ctx,
+ child, ctxn, &inherited_all);
if (ret)
break;
}
- child_ctx = child->perf_event_ctxp;
+ child_ctx = child->perf_event_ctxp[ctxn];
if (child_ctx && inherited_all) {
/*
return ret;
}
+/*
+ * Initialize the perf_event context in task_struct
+ */
+int perf_event_init_task(struct task_struct *child)
+{
+ int ctxn, ret;
+
+ for_each_task_context_nr(ctxn) {
+ ret = perf_event_init_context(child, ctxn);
+ if (ret)
+ return ret;
+ }
+
+ return 0;
+}
+
static void __init perf_event_init_all_cpus(void)
{
+ struct swevent_htable *swhash;
int cpu;
- struct perf_cpu_context *cpuctx;
for_each_possible_cpu(cpu) {
- cpuctx = &per_cpu(perf_cpu_context, cpu);
- mutex_init(&cpuctx->hlist_mutex);
- __perf_event_init_context(&cpuctx->ctx, NULL);
+ swhash = &per_cpu(swevent_htable, cpu);
+ mutex_init(&swhash->hlist_mutex);
+ INIT_LIST_HEAD(&per_cpu(rotation_list, cpu));
}
}
static void __cpuinit perf_event_init_cpu(int cpu)
{
- struct perf_cpu_context *cpuctx;
-
- cpuctx = &per_cpu(perf_cpu_context, cpu);
+ struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
- spin_lock(&perf_resource_lock);
- cpuctx->max_pertask = perf_max_events - perf_reserved_percpu;
- spin_unlock(&perf_resource_lock);
-
- mutex_lock(&cpuctx->hlist_mutex);
- if (cpuctx->hlist_refcount > 0) {
+ mutex_lock(&swhash->hlist_mutex);
+ if (swhash->hlist_refcount > 0) {
struct swevent_hlist *hlist;
- hlist = kzalloc(sizeof(*hlist), GFP_KERNEL);
- WARN_ON_ONCE(!hlist);
- rcu_assign_pointer(cpuctx->swevent_hlist, hlist);
+ hlist = kzalloc_node(sizeof(*hlist), GFP_KERNEL, cpu_to_node(cpu));
+ WARN_ON(!hlist);
+ rcu_assign_pointer(swhash->swevent_hlist, hlist);
}
- mutex_unlock(&cpuctx->hlist_mutex);
+ mutex_unlock(&swhash->hlist_mutex);
}
#ifdef CONFIG_HOTPLUG_CPU
-static void __perf_event_exit_cpu(void *info)
+static void perf_pmu_rotate_stop(struct pmu *pmu)
{
- struct perf_cpu_context *cpuctx = &__get_cpu_var(perf_cpu_context);
- struct perf_event_context *ctx = &cpuctx->ctx;
+ struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+
+ WARN_ON(!irqs_disabled());
+
+ list_del_init(&cpuctx->rotation_list);
+}
+
+static void __perf_event_exit_context(void *__info)
+{
+ struct perf_event_context *ctx = __info;
struct perf_event *event, *tmp;
+ perf_pmu_rotate_stop(ctx->pmu);
+
list_for_each_entry_safe(event, tmp, &ctx->pinned_groups, group_entry)
__perf_event_remove_from_context(event);
list_for_each_entry_safe(event, tmp, &ctx->flexible_groups, group_entry)
__perf_event_remove_from_context(event);
}
+
+static void perf_event_exit_cpu_context(int cpu)
+{
+ struct perf_event_context *ctx;
+ struct pmu *pmu;
+ int idx;
+
+ idx = srcu_read_lock(&pmus_srcu);
+ list_for_each_entry_rcu(pmu, &pmus, entry) {
+ ctx = &per_cpu_ptr(pmu->pmu_cpu_context, cpu)->ctx;
+
+ mutex_lock(&ctx->mutex);
+ smp_call_function_single(cpu, __perf_event_exit_context, ctx, 1);
+ mutex_unlock(&ctx->mutex);
+ }
+ srcu_read_unlock(&pmus_srcu, idx);
+}
+
static void perf_event_exit_cpu(int cpu)
{
- struct perf_cpu_context *cpuctx = &per_cpu(perf_cpu_context, cpu);
- struct perf_event_context *ctx = &cpuctx->ctx;
+ struct swevent_htable *swhash = &per_cpu(swevent_htable, cpu);
- mutex_lock(&cpuctx->hlist_mutex);
- swevent_hlist_release(cpuctx);
- mutex_unlock(&cpuctx->hlist_mutex);
+ mutex_lock(&swhash->hlist_mutex);
+ swevent_hlist_release(swhash);
+ mutex_unlock(&swhash->hlist_mutex);
- mutex_lock(&ctx->mutex);
- smp_call_function_single(cpu, __perf_event_exit_cpu, NULL, 1);
- mutex_unlock(&ctx->mutex);
+ perf_event_exit_cpu_context(cpu);
}
#else
static inline void perf_event_exit_cpu(int cpu) { }
{
unsigned int cpu = (long)hcpu;
- switch (action) {
+ switch (action & ~CPU_TASKS_FROZEN) {
case CPU_UP_PREPARE:
- case CPU_UP_PREPARE_FROZEN:
+ case CPU_DOWN_FAILED:
perf_event_init_cpu(cpu);
break;
+ case CPU_UP_CANCELED:
case CPU_DOWN_PREPARE:
- case CPU_DOWN_PREPARE_FROZEN:
perf_event_exit_cpu(cpu);
break;
return NOTIFY_OK;
}
-/*
- * This has to have a higher priority than migration_notifier in sched.c.
- */
-static struct notifier_block __cpuinitdata perf_cpu_nb = {
- .notifier_call = perf_cpu_notify,
- .priority = 20,
-};
-
void __init perf_event_init(void)
{
perf_event_init_all_cpus();
- perf_cpu_notify(&perf_cpu_nb, (unsigned long)CPU_UP_PREPARE,
- (void *)(long)smp_processor_id());
- perf_cpu_notify(&perf_cpu_nb, (unsigned long)CPU_ONLINE,
- (void *)(long)smp_processor_id());
- register_cpu_notifier(&perf_cpu_nb);
-}
-
-static ssize_t perf_show_reserve_percpu(struct sysdev_class *class,
- struct sysdev_class_attribute *attr,
- char *buf)
-{
- return sprintf(buf, "%d\n", perf_reserved_percpu);
-}
-
-static ssize_t
-perf_set_reserve_percpu(struct sysdev_class *class,
- struct sysdev_class_attribute *attr,
- const char *buf,
- size_t count)
-{
- struct perf_cpu_context *cpuctx;
- unsigned long val;
- int err, cpu, mpt;
-
- err = strict_strtoul(buf, 10, &val);
- if (err)
- return err;
- if (val > perf_max_events)
- return -EINVAL;
-
- spin_lock(&perf_resource_lock);
- perf_reserved_percpu = val;
- for_each_online_cpu(cpu) {
- cpuctx = &per_cpu(perf_cpu_context, cpu);
- raw_spin_lock_irq(&cpuctx->ctx.lock);
- mpt = min(perf_max_events - cpuctx->ctx.nr_events,
- perf_max_events - perf_reserved_percpu);
- cpuctx->max_pertask = mpt;
- raw_spin_unlock_irq(&cpuctx->ctx.lock);
- }
- spin_unlock(&perf_resource_lock);
-
- return count;
-}
-
-static ssize_t perf_show_overcommit(struct sysdev_class *class,
- struct sysdev_class_attribute *attr,
- char *buf)
-{
- return sprintf(buf, "%d\n", perf_overcommit);
-}
-
-static ssize_t
-perf_set_overcommit(struct sysdev_class *class,
- struct sysdev_class_attribute *attr,
- const char *buf, size_t count)
-{
- unsigned long val;
- int err;
-
- err = strict_strtoul(buf, 10, &val);
- if (err)
- return err;
- if (val > 1)
- return -EINVAL;
-
- spin_lock(&perf_resource_lock);
- perf_overcommit = val;
- spin_unlock(&perf_resource_lock);
-
- return count;
-}
-
-static SYSDEV_CLASS_ATTR(
- reserve_percpu,
- 0644,
- perf_show_reserve_percpu,
- perf_set_reserve_percpu
- );
-
-static SYSDEV_CLASS_ATTR(
- overcommit,
- 0644,
- perf_show_overcommit,
- perf_set_overcommit
- );
-
-static struct attribute *perfclass_attrs[] = {
- &attr_reserve_percpu.attr,
- &attr_overcommit.attr,
- NULL
-};
-
-static struct attribute_group perfclass_attr_group = {
- .attrs = perfclass_attrs,
- .name = "perf_events",
-};
-
-static int __init perf_event_sysfs_init(void)
-{
- return sysfs_create_group(&cpu_sysdev_class.kset.kobj,
- &perfclass_attr_group);
+ init_srcu_struct(&pmus_srcu);
+ perf_pmu_register(&perf_swevent);
+ perf_pmu_register(&perf_cpu_clock);
+ perf_pmu_register(&perf_task_clock);
+ perf_tp_register();
+ perf_cpu_notifier(perf_cpu_notify);
}
-device_initcall(perf_event_sysfs_init);
struct task_struct *result = NULL;
if (pid) {
struct hlist_node *first;
- first = rcu_dereference_check(pid->tasks[type].first,
+ first = rcu_dereference_check(hlist_first_rcu(&pid->tasks[type]),
rcu_read_lock_held() ||
lockdep_tasklist_lock_is_held());
if (first)
*/
struct task_struct *find_task_by_pid_ns(pid_t nr, struct pid_namespace *ns)
{
+ rcu_lockdep_assert(rcu_read_lock_held());
return pid_task(find_pid_ns(nr, ns), PIDTYPE_PID);
}
} else if (count == 11) { /* len('0x12345678/0') */
if (copy_from_user(ascii_value, buf, 11))
return -EFAULT;
+ if (strlen(ascii_value) != 10)
+ return -EINVAL;
x = sscanf(ascii_value, "%x", &value);
if (x != 1)
return -EINVAL;
- pr_debug(KERN_ERR "%s, %d, 0x%x\n", ascii_value, x, value);
+ pr_debug("%s, %d, 0x%x\n", ascii_value, x, value);
} else
return -EINVAL;
goto Close;
suspend_console();
- hibernation_freeze_swap();
saved_mask = clear_gfp_allowed_mask(GFP_IOFS);
error = dpm_suspend_start(PMSG_FREEZE);
if (error)
buffer = NULL;
alloc_normal = 0;
alloc_highmem = 0;
- hibernation_thaw_swap();
}
/* Helper functions used for the shrinking of memory. */
return nr_alloc;
}
-static unsigned long preallocate_image_memory(unsigned long nr_pages)
+static unsigned long preallocate_image_memory(unsigned long nr_pages,
+ unsigned long avail_normal)
{
- return preallocate_image_pages(nr_pages, GFP_IMAGE);
+ unsigned long alloc;
+
+ if (avail_normal <= alloc_normal)
+ return 0;
+
+ alloc = avail_normal - alloc_normal;
+ if (nr_pages < alloc)
+ alloc = nr_pages;
+
+ return preallocate_image_pages(alloc, GFP_IMAGE);
}
#ifdef CONFIG_HIGHMEM
*/
static void free_unnecessary_pages(void)
{
- unsigned long save_highmem, to_free_normal, to_free_highmem;
+ unsigned long save, to_free_normal, to_free_highmem;
- to_free_normal = alloc_normal - count_data_pages();
- save_highmem = count_highmem_pages();
- if (alloc_highmem > save_highmem) {
- to_free_highmem = alloc_highmem - save_highmem;
+ save = count_data_pages();
+ if (alloc_normal >= save) {
+ to_free_normal = alloc_normal - save;
+ save = 0;
+ } else {
+ to_free_normal = 0;
+ save -= alloc_normal;
+ }
+ save += count_highmem_pages();
+ if (alloc_highmem >= save) {
+ to_free_highmem = alloc_highmem - save;
} else {
to_free_highmem = 0;
- to_free_normal -= save_highmem - alloc_highmem;
+ to_free_normal -= save - alloc_highmem;
}
memory_bm_position_reset(©_bm);
{
struct zone *zone;
unsigned long saveable, size, max_size, count, highmem, pages = 0;
- unsigned long alloc, save_highmem, pages_highmem;
+ unsigned long alloc, save_highmem, pages_highmem, avail_normal;
struct timeval start, stop;
int error;
else
count += zone_page_state(zone, NR_FREE_PAGES);
}
+ avail_normal = count;
count += highmem;
count -= totalreserve_pages;
*/
if (size >= saveable) {
pages = preallocate_image_highmem(save_highmem);
- pages += preallocate_image_memory(saveable - pages);
+ pages += preallocate_image_memory(saveable - pages, avail_normal);
goto out;
}
/* Estimate the minimum size of the image. */
pages = minimum_image_size(saveable);
+ /*
+ * To avoid excessive pressure on the normal zone, leave room in it to
+ * accommodate an image of the minimum size (unless it's already too
+ * small, in which case don't preallocate pages from it at all).
+ */
+ if (avail_normal > pages)
+ avail_normal -= pages;
+ else
+ avail_normal = 0;
if (size < pages)
size = min_t(unsigned long, pages, max_size);
*/
pages_highmem = preallocate_image_highmem(highmem / 2);
alloc = (count - max_size) - pages_highmem;
- pages = preallocate_image_memory(alloc);
- if (pages < alloc)
- goto err_out;
- size = max_size - size;
- alloc = size;
- size = preallocate_highmem_fraction(size, highmem, count);
- pages_highmem += size;
- alloc -= size;
- pages += preallocate_image_memory(alloc);
- pages += pages_highmem;
+ pages = preallocate_image_memory(alloc, avail_normal);
+ if (pages < alloc) {
+ /* We have exhausted non-highmem pages, try highmem. */
+ alloc -= pages;
+ pages += pages_highmem;
+ pages_highmem = preallocate_image_highmem(alloc);
+ if (pages_highmem < alloc)
+ goto err_out;
+ pages += pages_highmem;
+ /*
+ * size is the desired number of saveable pages to leave in
+ * memory, so try to preallocate (all memory - size) pages.
+ */
+ alloc = (count - pages) - size;
+ pages += preallocate_image_highmem(alloc);
+ } else {
+ /*
+ * There are approximately max_size saveable pages at this point
+ * and we want to reduce this number down to size.
+ */
+ alloc = max_size - size;
+ size = preallocate_highmem_fraction(alloc, highmem, count);
+ pages_highmem += size;
+ alloc -= size;
+ size = preallocate_image_memory(alloc, avail_normal);
+ pages_highmem += preallocate_image_highmem(alloc - size);
+ pages += pages_highmem + size;
+ }
/*
* We only need as many page frames for the image as there are saveable
{
unsigned long offset;
- offset = swp_offset(get_swap_for_hibernation(swap));
+ offset = swp_offset(get_swap_page_of_type(swap));
if (offset) {
if (swsusp_extents_insert(offset))
- swap_free_for_hibernation(swp_entry(swap, offset));
+ swap_free(swp_entry(swap, offset));
else
return swapdev_block(swap, offset);
}
ext = container_of(node, struct swsusp_extent, node);
rb_erase(node, &swsusp_extents);
for (offset = ext->start; offset <= ext->end; offset++)
- swap_free_for_hibernation(swp_entry(swap, offset));
+ swap_free(swp_entry(swap, offset));
kfree(ext);
}
* provides serialisation for access to the entire console
* driver system.
*/
-static DECLARE_MUTEX(console_sem);
+static DEFINE_SEMAPHORE(console_sem);
struct console *console_drivers;
EXPORT_SYMBOL_GPL(console_drivers);
/* If a crash is occurring, make sure we can't deadlock */
spin_lock_init(&logbuf_lock);
/* And make sure that we print immediately */
- init_MUTEX(&console_sem);
+ sema_init(&console_sem, 1);
}
#if defined(CONFIG_PRINTK_TIME)
EXPORT_SYMBOL_GPL(debug_lockdep_rcu_enabled);
/**
- * rcu_read_lock_bh_held - might we be in RCU-bh read-side critical section?
+ * rcu_read_lock_bh_held() - might we be in RCU-bh read-side critical section?
*
* Check for bottom half being disabled, which covers both the
* CONFIG_PROVE_RCU and not cases. Note that if someone uses
* rcu_read_lock_bh(), but then later enables BH, lockdep (if enabled)
- * will show the situation.
+ * will show the situation. This is useful for debug checks in functions
+ * that require that they be called within an RCU read-side critical
+ * section.
*
* Check debug_lockdep_rcu_enabled() to prevent false positives during boot.
*/
{
if (!debug_lockdep_rcu_enabled())
return 1;
- return in_softirq();
+ return in_softirq() || irqs_disabled();
}
EXPORT_SYMBOL_GPL(rcu_read_lock_bh_held);
EXPORT_SYMBOL_GPL(rcu_scheduler_active);
#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
+/* Forward declarations for rcutiny_plugin.h. */
+static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp);
+static void __call_rcu(struct rcu_head *head,
+ void (*func)(struct rcu_head *rcu),
+ struct rcu_ctrlblk *rcp);
+
+#include "rcutiny_plugin.h"
+
#ifdef CONFIG_NO_HZ
static long rcu_dynticks_nesting = 1;
rcu_sched_qs(cpu);
else if (!in_softirq())
rcu_bh_qs(cpu);
+ rcu_preempt_check_callbacks();
}
/*
*rcp->donetail = NULL;
if (rcp->curtail == rcp->donetail)
rcp->curtail = &rcp->rcucblist;
+ rcu_preempt_remove_callbacks(rcp);
rcp->donetail = &rcp->rcucblist;
local_irq_restore(flags);
{
__rcu_process_callbacks(&rcu_sched_ctrlblk);
__rcu_process_callbacks(&rcu_bh_ctrlblk);
+ rcu_preempt_process_callbacks();
}
/*
}
/*
- * Post an RCU callback to be invoked after the end of an RCU grace
+ * Post an RCU callback to be invoked after the end of an RCU-sched grace
* period. But since we have but one CPU, that would be after any
* quiescent state.
*/
-void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+void call_rcu_sched(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
{
__call_rcu(head, func, &rcu_sched_ctrlblk);
}
-EXPORT_SYMBOL_GPL(call_rcu);
+EXPORT_SYMBOL_GPL(call_rcu_sched);
/*
* Post an RCU bottom-half callback to be invoked after any subsequent
}
EXPORT_SYMBOL_GPL(call_rcu_bh);
-void rcu_barrier(void)
-{
- struct rcu_synchronize rcu;
-
- init_rcu_head_on_stack(&rcu.head);
- init_completion(&rcu.completion);
- /* Will wake me after RCU finished. */
- call_rcu(&rcu.head, wakeme_after_rcu);
- /* Wait for it. */
- wait_for_completion(&rcu.completion);
- destroy_rcu_head_on_stack(&rcu.head);
-}
-EXPORT_SYMBOL_GPL(rcu_barrier);
-
void rcu_barrier_bh(void)
{
struct rcu_synchronize rcu;
{
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
}
-
-#include "rcutiny_plugin.h"
/*
- * Read-Copy Update mechanism for mutual exclusion (tree-based version)
+ * Read-Copy Update mechanism for mutual exclusion, the Bloatwatch edition
* Internal non-public definitions that provide either classic
- * or preemptable semantics.
+ * or preemptible semantics.
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*
- * Copyright IBM Corporation, 2009
+ * Copyright (c) 2010 Linaro
*
* Author: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
*/
+#ifdef CONFIG_TINY_PREEMPT_RCU
+
+#include <linux/delay.h>
+
+/* Global control variables for preemptible RCU. */
+struct rcu_preempt_ctrlblk {
+ struct rcu_ctrlblk rcb; /* curtail: ->next ptr of last CB for GP. */
+ struct rcu_head **nexttail;
+ /* Tasks blocked in a preemptible RCU */
+ /* read-side critical section while an */
+ /* preemptible-RCU grace period is in */
+ /* progress must wait for a later grace */
+ /* period. This pointer points to the */
+ /* ->next pointer of the last task that */
+ /* must wait for a later grace period, or */
+ /* to &->rcb.rcucblist if there is no */
+ /* such task. */
+ struct list_head blkd_tasks;
+ /* Tasks blocked in RCU read-side critical */
+ /* section. Tasks are placed at the head */
+ /* of this list and age towards the tail. */
+ struct list_head *gp_tasks;
+ /* Pointer to the first task blocking the */
+ /* current grace period, or NULL if there */
+ /* is not such task. */
+ struct list_head *exp_tasks;
+ /* Pointer to first task blocking the */
+ /* current expedited grace period, or NULL */
+ /* if there is no such task. If there */
+ /* is no current expedited grace period, */
+ /* then there cannot be any such task. */
+ u8 gpnum; /* Current grace period. */
+ u8 gpcpu; /* Last grace period blocked by the CPU. */
+ u8 completed; /* Last grace period completed. */
+ /* If all three are equal, RCU is idle. */
+};
+
+static struct rcu_preempt_ctrlblk rcu_preempt_ctrlblk = {
+ .rcb.donetail = &rcu_preempt_ctrlblk.rcb.rcucblist,
+ .rcb.curtail = &rcu_preempt_ctrlblk.rcb.rcucblist,
+ .nexttail = &rcu_preempt_ctrlblk.rcb.rcucblist,
+ .blkd_tasks = LIST_HEAD_INIT(rcu_preempt_ctrlblk.blkd_tasks),
+};
+
+static int rcu_preempted_readers_exp(void);
+static void rcu_report_exp_done(void);
+
+/*
+ * Return true if the CPU has not yet responded to the current grace period.
+ */
+static int rcu_cpu_blocking_cur_gp(void)
+{
+ return rcu_preempt_ctrlblk.gpcpu != rcu_preempt_ctrlblk.gpnum;
+}
+
+/*
+ * Check for a running RCU reader. Because there is only one CPU,
+ * there can be but one running RCU reader at a time. ;-)
+ */
+static int rcu_preempt_running_reader(void)
+{
+ return current->rcu_read_lock_nesting;
+}
+
+/*
+ * Check for preempted RCU readers blocking any grace period.
+ * If the caller needs a reliable answer, it must disable hard irqs.
+ */
+static int rcu_preempt_blocked_readers_any(void)
+{
+ return !list_empty(&rcu_preempt_ctrlblk.blkd_tasks);
+}
+
+/*
+ * Check for preempted RCU readers blocking the current grace period.
+ * If the caller needs a reliable answer, it must disable hard irqs.
+ */
+static int rcu_preempt_blocked_readers_cgp(void)
+{
+ return rcu_preempt_ctrlblk.gp_tasks != NULL;
+}
+
+/*
+ * Return true if another preemptible-RCU grace period is needed.
+ */
+static int rcu_preempt_needs_another_gp(void)
+{
+ return *rcu_preempt_ctrlblk.rcb.curtail != NULL;
+}
+
+/*
+ * Return true if a preemptible-RCU grace period is in progress.
+ * The caller must disable hardirqs.
+ */
+static int rcu_preempt_gp_in_progress(void)
+{
+ return rcu_preempt_ctrlblk.completed != rcu_preempt_ctrlblk.gpnum;
+}
+
+/*
+ * Record a preemptible-RCU quiescent state for the specified CPU. Note
+ * that this just means that the task currently running on the CPU is
+ * in a quiescent state. There might be any number of tasks blocked
+ * while in an RCU read-side critical section.
+ *
+ * Unlike the other rcu_*_qs() functions, callers to this function
+ * must disable irqs in order to protect the assignment to
+ * ->rcu_read_unlock_special.
+ *
+ * Because this is a single-CPU implementation, the only way a grace
+ * period can end is if the CPU is in a quiescent state. The reason is
+ * that a blocked preemptible-RCU reader can exit its critical section
+ * only if the CPU is running it at the time. Therefore, when the
+ * last task blocking the current grace period exits its RCU read-side
+ * critical section, neither the CPU nor blocked tasks will be stopping
+ * the current grace period. (In contrast, SMP implementations
+ * might have CPUs running in RCU read-side critical sections that
+ * block later grace periods -- but this is not possible given only
+ * one CPU.)
+ */
+static void rcu_preempt_cpu_qs(void)
+{
+ /* Record both CPU and task as having responded to current GP. */
+ rcu_preempt_ctrlblk.gpcpu = rcu_preempt_ctrlblk.gpnum;
+ current->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_NEED_QS;
+
+ /*
+ * If there is no GP, or if blocked readers are still blocking GP,
+ * then there is nothing more to do.
+ */
+ if (!rcu_preempt_gp_in_progress() || rcu_preempt_blocked_readers_cgp())
+ return;
+
+ /* Advance callbacks. */
+ rcu_preempt_ctrlblk.completed = rcu_preempt_ctrlblk.gpnum;
+ rcu_preempt_ctrlblk.rcb.donetail = rcu_preempt_ctrlblk.rcb.curtail;
+ rcu_preempt_ctrlblk.rcb.curtail = rcu_preempt_ctrlblk.nexttail;
+
+ /* If there are no blocked readers, next GP is done instantly. */
+ if (!rcu_preempt_blocked_readers_any())
+ rcu_preempt_ctrlblk.rcb.donetail = rcu_preempt_ctrlblk.nexttail;
+
+ /* If there are done callbacks, make RCU_SOFTIRQ process them. */
+ if (*rcu_preempt_ctrlblk.rcb.donetail != NULL)
+ raise_softirq(RCU_SOFTIRQ);
+}
+
+/*
+ * Start a new RCU grace period if warranted. Hard irqs must be disabled.
+ */
+static void rcu_preempt_start_gp(void)
+{
+ if (!rcu_preempt_gp_in_progress() && rcu_preempt_needs_another_gp()) {
+
+ /* Official start of GP. */
+ rcu_preempt_ctrlblk.gpnum++;
+
+ /* Any blocked RCU readers block new GP. */
+ if (rcu_preempt_blocked_readers_any())
+ rcu_preempt_ctrlblk.gp_tasks =
+ rcu_preempt_ctrlblk.blkd_tasks.next;
+
+ /* If there is no running reader, CPU is done with GP. */
+ if (!rcu_preempt_running_reader())
+ rcu_preempt_cpu_qs();
+ }
+}
+
+/*
+ * We have entered the scheduler, and the current task might soon be
+ * context-switched away from. If this task is in an RCU read-side
+ * critical section, we will no longer be able to rely on the CPU to
+ * record that fact, so we enqueue the task on the blkd_tasks list.
+ * If the task started after the current grace period began, as recorded
+ * by ->gpcpu, we enqueue at the beginning of the list. Otherwise
+ * before the element referenced by ->gp_tasks (or at the tail if
+ * ->gp_tasks is NULL) and point ->gp_tasks at the newly added element.
+ * The task will dequeue itself when it exits the outermost enclosing
+ * RCU read-side critical section. Therefore, the current grace period
+ * cannot be permitted to complete until the ->gp_tasks pointer becomes
+ * NULL.
+ *
+ * Caller must disable preemption.
+ */
+void rcu_preempt_note_context_switch(void)
+{
+ struct task_struct *t = current;
+ unsigned long flags;
+
+ local_irq_save(flags); /* must exclude scheduler_tick(). */
+ if (rcu_preempt_running_reader() &&
+ (t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
+
+ /* Possibly blocking in an RCU read-side critical section. */
+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
+
+ /*
+ * If this CPU has already checked in, then this task
+ * will hold up the next grace period rather than the
+ * current grace period. Queue the task accordingly.
+ * If the task is queued for the current grace period
+ * (i.e., this CPU has not yet passed through a quiescent
+ * state for the current grace period), then as long
+ * as that task remains queued, the current grace period
+ * cannot end.
+ */
+ list_add(&t->rcu_node_entry, &rcu_preempt_ctrlblk.blkd_tasks);
+ if (rcu_cpu_blocking_cur_gp())
+ rcu_preempt_ctrlblk.gp_tasks = &t->rcu_node_entry;
+ }
+
+ /*
+ * Either we were not in an RCU read-side critical section to
+ * begin with, or we have now recorded that critical section
+ * globally. Either way, we can now note a quiescent state
+ * for this CPU. Again, if we were in an RCU read-side critical
+ * section, and if that critical section was blocking the current
+ * grace period, then the fact that the task has been enqueued
+ * means that current grace period continues to be blocked.
+ */
+ rcu_preempt_cpu_qs();
+ local_irq_restore(flags);
+}
+
+/*
+ * Tiny-preemptible RCU implementation for rcu_read_lock().
+ * Just increment ->rcu_read_lock_nesting, shared state will be updated
+ * if we block.
+ */
+void __rcu_read_lock(void)
+{
+ current->rcu_read_lock_nesting++;
+ barrier(); /* needed if we ever invoke rcu_read_lock in rcutiny.c */
+}
+EXPORT_SYMBOL_GPL(__rcu_read_lock);
+
+/*
+ * Handle special cases during rcu_read_unlock(), such as needing to
+ * notify RCU core processing or task having blocked during the RCU
+ * read-side critical section.
+ */
+static void rcu_read_unlock_special(struct task_struct *t)
+{
+ int empty;
+ int empty_exp;
+ unsigned long flags;
+ struct list_head *np;
+ int special;
+
+ /*
+ * NMI handlers cannot block and cannot safely manipulate state.
+ * They therefore cannot possibly be special, so just leave.
+ */
+ if (in_nmi())
+ return;
+
+ local_irq_save(flags);
+
+ /*
+ * If RCU core is waiting for this CPU to exit critical section,
+ * let it know that we have done so.
+ */
+ special = t->rcu_read_unlock_special;
+ if (special & RCU_READ_UNLOCK_NEED_QS)
+ rcu_preempt_cpu_qs();
+
+ /* Hardware IRQ handlers cannot block. */
+ if (in_irq()) {
+ local_irq_restore(flags);
+ return;
+ }
+
+ /* Clean up if blocked during RCU read-side critical section. */
+ if (special & RCU_READ_UNLOCK_BLOCKED) {
+ t->rcu_read_unlock_special &= ~RCU_READ_UNLOCK_BLOCKED;
+
+ /*
+ * Remove this task from the ->blkd_tasks list and adjust
+ * any pointers that might have been referencing it.
+ */
+ empty = !rcu_preempt_blocked_readers_cgp();
+ empty_exp = rcu_preempt_ctrlblk.exp_tasks == NULL;
+ np = t->rcu_node_entry.next;
+ if (np == &rcu_preempt_ctrlblk.blkd_tasks)
+ np = NULL;
+ list_del(&t->rcu_node_entry);
+ if (&t->rcu_node_entry == rcu_preempt_ctrlblk.gp_tasks)
+ rcu_preempt_ctrlblk.gp_tasks = np;
+ if (&t->rcu_node_entry == rcu_preempt_ctrlblk.exp_tasks)
+ rcu_preempt_ctrlblk.exp_tasks = np;
+ INIT_LIST_HEAD(&t->rcu_node_entry);
+
+ /*
+ * If this was the last task on the current list, and if
+ * we aren't waiting on the CPU, report the quiescent state
+ * and start a new grace period if needed.
+ */
+ if (!empty && !rcu_preempt_blocked_readers_cgp()) {
+ rcu_preempt_cpu_qs();
+ rcu_preempt_start_gp();
+ }
+
+ /*
+ * If this was the last task on the expedited lists,
+ * then we need wake up the waiting task.
+ */
+ if (!empty_exp && rcu_preempt_ctrlblk.exp_tasks == NULL)
+ rcu_report_exp_done();
+ }
+ local_irq_restore(flags);
+}
+
+/*
+ * Tiny-preemptible RCU implementation for rcu_read_unlock().
+ * Decrement ->rcu_read_lock_nesting. If the result is zero (outermost
+ * rcu_read_unlock()) and ->rcu_read_unlock_special is non-zero, then
+ * invoke rcu_read_unlock_special() to clean up after a context switch
+ * in an RCU read-side critical section and other special cases.
+ */
+void __rcu_read_unlock(void)
+{
+ struct task_struct *t = current;
+
+ barrier(); /* needed if we ever invoke rcu_read_unlock in rcutiny.c */
+ --t->rcu_read_lock_nesting;
+ barrier(); /* decrement before load of ->rcu_read_unlock_special */
+ if (t->rcu_read_lock_nesting == 0 &&
+ unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
+ rcu_read_unlock_special(t);
+#ifdef CONFIG_PROVE_LOCKING
+ WARN_ON_ONCE(t->rcu_read_lock_nesting < 0);
+#endif /* #ifdef CONFIG_PROVE_LOCKING */
+}
+EXPORT_SYMBOL_GPL(__rcu_read_unlock);
+
+/*
+ * Check for a quiescent state from the current CPU. When a task blocks,
+ * the task is recorded in the rcu_preempt_ctrlblk structure, which is
+ * checked elsewhere. This is called from the scheduling-clock interrupt.
+ *
+ * Caller must disable hard irqs.
+ */
+static void rcu_preempt_check_callbacks(void)
+{
+ struct task_struct *t = current;
+
+ if (rcu_preempt_gp_in_progress() &&
+ (!rcu_preempt_running_reader() ||
+ !rcu_cpu_blocking_cur_gp()))
+ rcu_preempt_cpu_qs();
+ if (&rcu_preempt_ctrlblk.rcb.rcucblist !=
+ rcu_preempt_ctrlblk.rcb.donetail)
+ raise_softirq(RCU_SOFTIRQ);
+ if (rcu_preempt_gp_in_progress() &&
+ rcu_cpu_blocking_cur_gp() &&
+ rcu_preempt_running_reader())
+ t->rcu_read_unlock_special |= RCU_READ_UNLOCK_NEED_QS;
+}
+
+/*
+ * TINY_PREEMPT_RCU has an extra callback-list tail pointer to
+ * update, so this is invoked from __rcu_process_callbacks() to
+ * handle that case. Of course, it is invoked for all flavors of
+ * RCU, but RCU callbacks can appear only on one of the lists, and
+ * neither ->nexttail nor ->donetail can possibly be NULL, so there
+ * is no need for an explicit check.
+ */
+static void rcu_preempt_remove_callbacks(struct rcu_ctrlblk *rcp)
+{
+ if (rcu_preempt_ctrlblk.nexttail == rcp->donetail)
+ rcu_preempt_ctrlblk.nexttail = &rcp->rcucblist;
+}
+
+/*
+ * Process callbacks for preemptible RCU.
+ */
+static void rcu_preempt_process_callbacks(void)
+{
+ __rcu_process_callbacks(&rcu_preempt_ctrlblk.rcb);
+}
+
+/*
+ * Queue a preemptible -RCU callback for invocation after a grace period.
+ */
+void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
+{
+ unsigned long flags;
+
+ debug_rcu_head_queue(head);
+ head->func = func;
+ head->next = NULL;
+
+ local_irq_save(flags);
+ *rcu_preempt_ctrlblk.nexttail = head;
+ rcu_preempt_ctrlblk.nexttail = &head->next;
+ rcu_preempt_start_gp(); /* checks to see if GP needed. */
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(call_rcu);
+
+void rcu_barrier(void)
+{
+ struct rcu_synchronize rcu;
+
+ init_rcu_head_on_stack(&rcu.head);
+ init_completion(&rcu.completion);
+ /* Will wake me after RCU finished. */
+ call_rcu(&rcu.head, wakeme_after_rcu);
+ /* Wait for it. */
+ wait_for_completion(&rcu.completion);
+ destroy_rcu_head_on_stack(&rcu.head);
+}
+EXPORT_SYMBOL_GPL(rcu_barrier);
+
+/*
+ * synchronize_rcu - wait until a grace period has elapsed.
+ *
+ * Control will return to the caller some time after a full grace
+ * period has elapsed, in other words after all currently executing RCU
+ * read-side critical sections have completed. RCU read-side critical
+ * sections are delimited by rcu_read_lock() and rcu_read_unlock(),
+ * and may be nested.
+ */
+void synchronize_rcu(void)
+{
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+ if (!rcu_scheduler_active)
+ return;
+#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
+
+ WARN_ON_ONCE(rcu_preempt_running_reader());
+ if (!rcu_preempt_blocked_readers_any())
+ return;
+
+ /* Once we get past the fastpath checks, same code as rcu_barrier(). */
+ rcu_barrier();
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu);
+
+static DECLARE_WAIT_QUEUE_HEAD(sync_rcu_preempt_exp_wq);
+static unsigned long sync_rcu_preempt_exp_count;
+static DEFINE_MUTEX(sync_rcu_preempt_exp_mutex);
+
+/*
+ * Return non-zero if there are any tasks in RCU read-side critical
+ * sections blocking the current preemptible-RCU expedited grace period.
+ * If there is no preemptible-RCU expedited grace period currently in
+ * progress, returns zero unconditionally.
+ */
+static int rcu_preempted_readers_exp(void)
+{
+ return rcu_preempt_ctrlblk.exp_tasks != NULL;
+}
+
+/*
+ * Report the exit from RCU read-side critical section for the last task
+ * that queued itself during or before the current expedited preemptible-RCU
+ * grace period.
+ */
+static void rcu_report_exp_done(void)
+{
+ wake_up(&sync_rcu_preempt_exp_wq);
+}
+
+/*
+ * Wait for an rcu-preempt grace period, but expedite it. The basic idea
+ * is to rely in the fact that there is but one CPU, and that it is
+ * illegal for a task to invoke synchronize_rcu_expedited() while in a
+ * preemptible-RCU read-side critical section. Therefore, any such
+ * critical sections must correspond to blocked tasks, which must therefore
+ * be on the ->blkd_tasks list. So just record the current head of the
+ * list in the ->exp_tasks pointer, and wait for all tasks including and
+ * after the task pointed to by ->exp_tasks to drain.
+ */
+void synchronize_rcu_expedited(void)
+{
+ unsigned long flags;
+ struct rcu_preempt_ctrlblk *rpcp = &rcu_preempt_ctrlblk;
+ unsigned long snap;
+
+ barrier(); /* ensure prior action seen before grace period. */
+
+ WARN_ON_ONCE(rcu_preempt_running_reader());
+
+ /*
+ * Acquire lock so that there is only one preemptible RCU grace
+ * period in flight. Of course, if someone does the expedited
+ * grace period for us while we are acquiring the lock, just leave.
+ */
+ snap = sync_rcu_preempt_exp_count + 1;
+ mutex_lock(&sync_rcu_preempt_exp_mutex);
+ if (ULONG_CMP_LT(snap, sync_rcu_preempt_exp_count))
+ goto unlock_mb_ret; /* Others did our work for us. */
+
+ local_irq_save(flags);
+
+ /*
+ * All RCU readers have to already be on blkd_tasks because
+ * we cannot legally be executing in an RCU read-side critical
+ * section.
+ */
+
+ /* Snapshot current head of ->blkd_tasks list. */
+ rpcp->exp_tasks = rpcp->blkd_tasks.next;
+ if (rpcp->exp_tasks == &rpcp->blkd_tasks)
+ rpcp->exp_tasks = NULL;
+ local_irq_restore(flags);
+
+ /* Wait for tail of ->blkd_tasks list to drain. */
+ if (rcu_preempted_readers_exp())
+ wait_event(sync_rcu_preempt_exp_wq,
+ !rcu_preempted_readers_exp());
+
+ /* Clean up and exit. */
+ barrier(); /* ensure expedited GP seen before counter increment. */
+ sync_rcu_preempt_exp_count++;
+unlock_mb_ret:
+ mutex_unlock(&sync_rcu_preempt_exp_mutex);
+ barrier(); /* ensure subsequent action seen after grace period. */
+}
+EXPORT_SYMBOL_GPL(synchronize_rcu_expedited);
+
+/*
+ * Does preemptible RCU need the CPU to stay out of dynticks mode?
+ */
+int rcu_preempt_needs_cpu(void)
+{
+ if (!rcu_preempt_running_reader())
+ rcu_preempt_cpu_qs();
+ return rcu_preempt_ctrlblk.rcb.rcucblist != NULL;
+}
+
+/*
+ * Check for a task exiting while in a preemptible -RCU read-side
+ * critical section, clean up if so. No need to issue warnings,
+ * as debug_check_no_locks_held() already does this if lockdep
+ * is enabled.
+ */
+void exit_rcu(void)
+{
+ struct task_struct *t = current;
+
+ if (t->rcu_read_lock_nesting == 0)
+ return;
+ t->rcu_read_lock_nesting = 1;
+ rcu_read_unlock();
+}
+
+#else /* #ifdef CONFIG_TINY_PREEMPT_RCU */
+
+/*
+ * Because preemptible RCU does not exist, it never has any callbacks
+ * to check.
+ */
+static void rcu_preempt_check_callbacks(void)
+{
+}
+
+/*
+ * Because preemptible RCU does not exist, it never has any callbacks
+ * to remove.
+ */
+static void rcu_preempt_remove_callbacks(struct rcu_ctrlblk *rcp)
+{
+}
+
+/*
+ * Because preemptible RCU does not exist, it never has any callbacks
+ * to process.
+ */
+static void rcu_preempt_process_callbacks(void)
+{
+}
+
+#endif /* #else #ifdef CONFIG_TINY_PREEMPT_RCU */
+
#ifdef CONFIG_DEBUG_LOCK_ALLOC
#include <linux/kernel_stat.h>
};
static LIST_HEAD(rcu_torture_freelist);
-static struct rcu_torture *rcu_torture_current;
+static struct rcu_torture __rcu *rcu_torture_current;
static long rcu_torture_current_version;
static struct rcu_torture rcu_tortures[10 * RCU_TORTURE_PIPE_LEN];
static DEFINE_SPINLOCK(rcu_torture_lock);
#define FULLSTOP_SHUTDOWN 1 /* System shutdown with rcutorture running. */
#define FULLSTOP_RMMOD 2 /* Normal rmmod of rcutorture. */
static int fullstop = FULLSTOP_RMMOD;
-DEFINE_MUTEX(fullstop_mutex); /* Protect fullstop transitions and spawning */
- /* of kthreads. */
+/*
+ * Protect fullstop transitions and spawning of kthreads.
+ */
+static DEFINE_MUTEX(fullstop_mutex);
/*
* Detect and respond to a system shutdown.
mdelay(longdelay_ms);
if (!(rcu_random(rrsp) % (nrealreaders * 2 * shortdelay_us)))
udelay(shortdelay_us);
+#ifdef CONFIG_PREEMPT
+ if (!preempt_count() && !(rcu_random(rrsp) % (nrealreaders * 20000)))
+ preempt_schedule(); /* No QS if preempt_disable() in effect */
+#endif
}
static void rcu_torture_read_unlock(int idx) __releases(RCU)
delay = rcu_random(rrsp) % (nrealreaders * 2 * longdelay * uspertick);
if (!delay)
schedule_timeout_interruptible(longdelay);
+ else
+ rcu_read_delay(rrsp);
}
static void srcu_torture_read_unlock(int idx) __releases(&srcu_ctl)
continue;
rp->rtort_pipe_count = 0;
udelay(rcu_random(&rand) & 0x3ff);
- old_rp = rcu_torture_current;
+ old_rp = rcu_dereference_check(rcu_torture_current,
+ current == writer_task);
rp->rtort_mbtest = 1;
rcu_assign_pointer(rcu_torture_current, rp);
smp_wmb(); /* Mods to old_rp must follow rcu_assign_pointer() */
module_param(qhimark, int, 0);
module_param(qlowmark, int, 0);
+#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
+int rcu_cpu_stall_suppress __read_mostly = RCU_CPU_STALL_SUPPRESS_INIT;
+module_param(rcu_cpu_stall_suppress, int, 0644);
+#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
+
static void force_quiescent_state(struct rcu_state *rsp, int relaxed);
static int rcu_pending(int cpu);
#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
-int rcu_cpu_stall_panicking __read_mostly;
+int rcu_cpu_stall_suppress __read_mostly;
static void record_gp_stall_check_time(struct rcu_state *rsp)
{
rcu_print_task_stall(rnp);
raw_spin_unlock_irqrestore(&rnp->lock, flags);
- /* OK, time to rat on our buddy... */
-
+ /*
+ * OK, time to rat on our buddy...
+ * See Documentation/RCU/stallwarn.txt for info on how to debug
+ * RCU CPU stall warnings.
+ */
printk(KERN_ERR "INFO: %s detected stalls on CPUs/tasks: {",
rsp->name);
rcu_for_each_leaf_node(rsp, rnp) {
unsigned long flags;
struct rcu_node *rnp = rcu_get_root(rsp);
+ /*
+ * OK, time to rat on ourselves...
+ * See Documentation/RCU/stallwarn.txt for info on how to debug
+ * RCU CPU stall warnings.
+ */
printk(KERN_ERR "INFO: %s detected stall on CPU %d (t=%lu jiffies)\n",
rsp->name, smp_processor_id(), jiffies - rsp->gp_start);
trigger_all_cpu_backtrace();
long delta;
struct rcu_node *rnp;
- if (rcu_cpu_stall_panicking)
+ if (rcu_cpu_stall_suppress)
return;
- delta = jiffies - rsp->jiffies_stall;
+ delta = jiffies - ACCESS_ONCE(rsp->jiffies_stall);
rnp = rdp->mynode;
- if ((rnp->qsmask & rdp->grpmask) && delta >= 0) {
+ if ((ACCESS_ONCE(rnp->qsmask) & rdp->grpmask) && delta >= 0) {
/* We haven't checked in, so go dump stack. */
print_cpu_stall(rsp);
static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
{
- rcu_cpu_stall_panicking = 1;
+ rcu_cpu_stall_suppress = 1;
return NOTIFY_DONE;
}
+/**
+ * rcu_cpu_stall_reset - prevent further stall warnings in current grace period
+ *
+ * Set the stall-warning timeout way off into the future, thus preventing
+ * any RCU CPU stall-warning messages from appearing in the current set of
+ * RCU grace periods.
+ *
+ * The caller must disable hard irqs.
+ */
+void rcu_cpu_stall_reset(void)
+{
+ rcu_sched_state.jiffies_stall = jiffies + ULONG_MAX / 2;
+ rcu_bh_state.jiffies_stall = jiffies + ULONG_MAX / 2;
+ rcu_preempt_stall_reset();
+}
+
static struct notifier_block rcu_panic_block = {
.notifier_call = rcu_panic,
};
{
}
+void rcu_cpu_stall_reset(void)
+{
+}
+
static void __init check_cpu_stall_init(void)
{
}
rcu_start_gp(struct rcu_state *rsp, unsigned long flags)
__releases(rcu_get_root(rsp)->lock)
{
- struct rcu_data *rdp = rsp->rda[smp_processor_id()];
+ struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
struct rcu_node *rnp = rcu_get_root(rsp);
if (!cpu_needs_another_gp(rsp, rdp) || rsp->fqs_active) {
static void rcu_send_cbs_to_orphanage(struct rcu_state *rsp)
{
int i;
- struct rcu_data *rdp = rsp->rda[smp_processor_id()];
+ struct rcu_data *rdp = this_cpu_ptr(rsp->rda);
if (rdp->nxtlist == NULL)
return; /* irqs disabled, so comparison is stable. */
for (i = 0; i < RCU_NEXT_SIZE; i++)
rdp->nxttail[i] = &rdp->nxtlist;
rsp->orphan_qlen += rdp->qlen;
+ rdp->n_cbs_orphaned += rdp->qlen;
rdp->qlen = 0;
raw_spin_unlock(&rsp->onofflock); /* irqs remain disabled. */
}
struct rcu_data *rdp;
raw_spin_lock_irqsave(&rsp->onofflock, flags);
- rdp = rsp->rda[smp_processor_id()];
+ rdp = this_cpu_ptr(rsp->rda);
if (rsp->orphan_cbs_list == NULL) {
raw_spin_unlock_irqrestore(&rsp->onofflock, flags);
return;
*rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_cbs_list;
rdp->nxttail[RCU_NEXT_TAIL] = rsp->orphan_cbs_tail;
rdp->qlen += rsp->orphan_qlen;
+ rdp->n_cbs_adopted += rsp->orphan_qlen;
rsp->orphan_cbs_list = NULL;
rsp->orphan_cbs_tail = &rsp->orphan_cbs_list;
rsp->orphan_qlen = 0;
unsigned long flags;
unsigned long mask;
int need_report = 0;
- struct rcu_data *rdp = rsp->rda[cpu];
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
struct rcu_node *rnp;
/* Exclude any attempts to start a new grace period. */
/* Update count, and requeue any remaining callbacks. */
rdp->qlen -= count;
+ rdp->n_cbs_invoked += count;
if (list != NULL) {
*tail = rdp->nxtlist;
rdp->nxtlist = list;
cpu = rnp->grplo;
bit = 1;
for (; cpu <= rnp->grphi; cpu++, bit <<= 1) {
- if ((rnp->qsmask & bit) != 0 && f(rsp->rda[cpu]))
+ if ((rnp->qsmask & bit) != 0 &&
+ f(per_cpu_ptr(rsp->rda, cpu)))
mask |= bit;
}
if (mask != 0) {
* a quiescent state betweentimes.
*/
local_irq_save(flags);
- rdp = rsp->rda[smp_processor_id()];
+ rdp = this_cpu_ptr(rsp->rda);
rcu_process_gp_end(rsp, rdp);
check_for_new_grace_period(rsp, rdp);
{
unsigned long flags;
int i;
- struct rcu_data *rdp = rsp->rda[cpu];
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
struct rcu_node *rnp = rcu_get_root(rsp);
/* Set up local state, ensuring consistent view of global state. */
{
unsigned long flags;
unsigned long mask;
- struct rcu_data *rdp = rsp->rda[cpu];
+ struct rcu_data *rdp = per_cpu_ptr(rsp->rda, cpu);
struct rcu_node *rnp = rcu_get_root(rsp);
/* Set up local state, ensuring consistent view of global state. */
/*
* Helper function for rcu_init() that initializes one rcu_state structure.
*/
-static void __init rcu_init_one(struct rcu_state *rsp)
+static void __init rcu_init_one(struct rcu_state *rsp,
+ struct rcu_data __percpu *rda)
{
static char *buf[] = { "rcu_node_level_0",
"rcu_node_level_1",
}
}
+ rsp->rda = rda;
rnp = rsp->level[NUM_RCU_LVLS - 1];
for_each_possible_cpu(i) {
while (i > rnp->grphi)
rnp++;
- rsp->rda[i]->mynode = rnp;
+ per_cpu_ptr(rsp->rda, i)->mynode = rnp;
rcu_boot_init_percpu_data(i, rsp);
}
}
-/*
- * Helper macro for __rcu_init() and __rcu_init_preempt(). To be used
- * nowhere else! Assigns leaf node pointers into each CPU's rcu_data
- * structure.
- */
-#define RCU_INIT_FLAVOR(rsp, rcu_data) \
-do { \
- int i; \
- \
- for_each_possible_cpu(i) { \
- (rsp)->rda[i] = &per_cpu(rcu_data, i); \
- } \
- rcu_init_one(rsp); \
-} while (0)
-
void __init rcu_init(void)
{
int cpu;
rcu_bootup_announce();
- RCU_INIT_FLAVOR(&rcu_sched_state, rcu_sched_data);
- RCU_INIT_FLAVOR(&rcu_bh_state, rcu_bh_data);
+ rcu_init_one(&rcu_sched_state, &rcu_sched_data);
+ rcu_init_one(&rcu_bh_state, &rcu_bh_data);
__rcu_init_preempt();
open_softirq(RCU_SOFTIRQ, rcu_process_callbacks);
long qlen; /* # of queued callbacks */
long qlen_last_fqs_check;
/* qlen at last check for QS forcing */
+ unsigned long n_cbs_invoked; /* count of RCU cbs invoked. */
+ unsigned long n_cbs_orphaned; /* RCU cbs sent to orphanage. */
+ unsigned long n_cbs_adopted; /* RCU cbs adopted from orphanage. */
unsigned long n_force_qs_snap;
/* did other CPU force QS recently? */
long blimit; /* Upper limit on a processed batch */
#define RCU_STALL_DELAY_DELTA 0
#endif
-#define RCU_SECONDS_TILL_STALL_CHECK (10 * HZ + RCU_STALL_DELAY_DELTA)
+#define RCU_SECONDS_TILL_STALL_CHECK (CONFIG_RCU_CPU_STALL_TIMEOUT * HZ + \
+ RCU_STALL_DELAY_DELTA)
/* for rsp->jiffies_stall */
-#define RCU_SECONDS_TILL_STALL_RECHECK (30 * HZ + RCU_STALL_DELAY_DELTA)
+#define RCU_SECONDS_TILL_STALL_RECHECK (3 * RCU_SECONDS_TILL_STALL_CHECK + 30)
/* for rsp->jiffies_stall */
#define RCU_STALL_RAT_DELAY 2 /* Allow other CPUs time */
/* to take at least one */
/* scheduling clock irq */
/* before ratting on them. */
-#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
+#ifdef CONFIG_RCU_CPU_STALL_DETECTOR_RUNNABLE
+#define RCU_CPU_STALL_SUPPRESS_INIT 0
+#else
+#define RCU_CPU_STALL_SUPPRESS_INIT 1
+#endif
-#define ULONG_CMP_GE(a, b) (ULONG_MAX / 2 >= (a) - (b))
-#define ULONG_CMP_LT(a, b) (ULONG_MAX / 2 < (a) - (b))
+#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
/*
* RCU global state, including node hierarchy. This hierarchy is
struct rcu_node *level[NUM_RCU_LVLS]; /* Hierarchy levels. */
u32 levelcnt[MAX_RCU_LVLS + 1]; /* # nodes in each level. */
u8 levelspread[NUM_RCU_LVLS]; /* kids/node in each level. */
- struct rcu_data *rda[NR_CPUS]; /* array of rdp pointers. */
+ struct rcu_data __percpu *rda; /* pointer of percu rcu_data. */
/* The following fields are guarded by the root rcu_node's lock. */
#ifdef CONFIG_RCU_CPU_STALL_DETECTOR
static void rcu_print_detail_task_stall(struct rcu_state *rsp);
static void rcu_print_task_stall(struct rcu_node *rnp);
+static void rcu_preempt_stall_reset(void);
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
static void rcu_preempt_check_blocked_tasks(struct rcu_node *rnp);
#ifdef CONFIG_HOTPLUG_CPU
printk(KERN_INFO
"\tRCU-based detection of stalled CPUs is disabled.\n");
#endif
-#ifndef CONFIG_RCU_CPU_STALL_VERBOSE
+#if defined(CONFIG_TREE_PREEMPT_RCU) && !defined(CONFIG_RCU_CPU_STALL_VERBOSE)
printk(KERN_INFO "\tVerbose stalled-CPUs detection is disabled.\n");
#endif
#if NUM_RCU_LVL_4 != 0
(t->rcu_read_unlock_special & RCU_READ_UNLOCK_BLOCKED) == 0) {
/* Possibly blocking in an RCU read-side critical section. */
- rdp = rcu_preempt_state.rda[cpu];
+ rdp = per_cpu_ptr(rcu_preempt_state.rda, cpu);
rnp = rdp->mynode;
raw_spin_lock_irqsave(&rnp->lock, flags);
t->rcu_read_unlock_special |= RCU_READ_UNLOCK_BLOCKED;
*/
void __rcu_read_lock(void)
{
- ACCESS_ONCE(current->rcu_read_lock_nesting)++;
+ current->rcu_read_lock_nesting++;
barrier(); /* needed if we ever invoke rcu_read_lock in rcutree.c */
}
EXPORT_SYMBOL_GPL(__rcu_read_lock);
struct task_struct *t = current;
barrier(); /* needed if we ever invoke rcu_read_unlock in rcutree.c */
- if (--ACCESS_ONCE(t->rcu_read_lock_nesting) == 0 &&
+ --t->rcu_read_lock_nesting;
+ barrier(); /* decrement before load of ->rcu_read_unlock_special */
+ if (t->rcu_read_lock_nesting == 0 &&
unlikely(ACCESS_ONCE(t->rcu_read_unlock_special)))
rcu_read_unlock_special(t);
#ifdef CONFIG_PROVE_LOCKING
}
}
+/*
+ * Suppress preemptible RCU's CPU stall warnings by pushing the
+ * time of the next stall-warning message comfortably far into the
+ * future.
+ */
+static void rcu_preempt_stall_reset(void)
+{
+ rcu_preempt_state.jiffies_stall = jiffies + ULONG_MAX / 2;
+}
+
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
/*
*
* Control will return to the caller some time after a full grace
* period has elapsed, in other words after all currently executing RCU
- * read-side critical sections have completed. RCU read-side critical
- * sections are delimited by rcu_read_lock() and rcu_read_unlock(),
- * and may be nested.
+ * read-side critical sections have completed. Note, however, that
+ * upon return from synchronize_rcu(), the caller might well be executing
+ * concurrently with new RCU read-side critical sections that began while
+ * synchronize_rcu() was waiting. RCU read-side critical sections are
+ * delimited by rcu_read_lock() and rcu_read_unlock(), and may be nested.
*/
void synchronize_rcu(void)
{
*/
static void __init __rcu_init_preempt(void)
{
- RCU_INIT_FLAVOR(&rcu_preempt_state, rcu_preempt_data);
+ rcu_init_one(&rcu_preempt_state, &rcu_preempt_data);
}
/*
{
}
+/*
+ * Because preemptible RCU does not exist, there is no need to suppress
+ * its CPU stall warnings.
+ */
+static void rcu_preempt_stall_reset(void)
+{
+}
+
#endif /* #ifdef CONFIG_RCU_CPU_STALL_DETECTOR */
/*
}
/*
- * In classic RCU, call_rcu() is just call_rcu_sched().
- */
-void call_rcu(struct rcu_head *head, void (*func)(struct rcu_head *rcu))
-{
- call_rcu_sched(head, func);
-}
-EXPORT_SYMBOL_GPL(call_rcu);
-
-/*
* Wait for an rcu-preempt grace period, but make it happen quickly.
* But because preemptable RCU does not exist, map to rcu-sched.
*/
rdp->dynticks_fqs);
#endif /* #ifdef CONFIG_NO_HZ */
seq_printf(m, " of=%lu ri=%lu", rdp->offline_fqs, rdp->resched_ipi);
- seq_printf(m, " ql=%ld b=%ld\n", rdp->qlen, rdp->blimit);
+ seq_printf(m, " ql=%ld b=%ld", rdp->qlen, rdp->blimit);
+ seq_printf(m, " ci=%lu co=%lu ca=%lu\n",
+ rdp->n_cbs_invoked, rdp->n_cbs_orphaned, rdp->n_cbs_adopted);
}
#define PRINT_RCU_DATA(name, func, m) \
rdp->dynticks_fqs);
#endif /* #ifdef CONFIG_NO_HZ */
seq_printf(m, ",%lu,%lu", rdp->offline_fqs, rdp->resched_ipi);
- seq_printf(m, ",%ld,%ld\n", rdp->qlen, rdp->blimit);
+ seq_printf(m, ",%ld,%ld", rdp->qlen, rdp->blimit);
+ seq_printf(m, ",%lu,%lu,%lu\n",
+ rdp->n_cbs_invoked, rdp->n_cbs_orphaned, rdp->n_cbs_adopted);
}
static int show_rcudata_csv(struct seq_file *m, void *unused)
#ifdef CONFIG_NO_HZ
seq_puts(m, "\"dt\",\"dt nesting\",\"dn\",\"df\",");
#endif /* #ifdef CONFIG_NO_HZ */
- seq_puts(m, "\"of\",\"ri\",\"ql\",\"b\"\n");
+ seq_puts(m, "\"of\",\"ri\",\"ql\",\"b\",\"ci\",\"co\",\"ca\"\n");
#ifdef CONFIG_TREE_PREEMPT_RCU
seq_puts(m, "\"rcu_preempt:\"\n");
PRINT_RCU_DATA(rcu_preempt_data, print_one_rcu_data_csv, m);
struct rcu_data *rdp;
for_each_possible_cpu(cpu) {
- rdp = rsp->rda[cpu];
+ rdp = per_cpu_ptr(rsp->rda, cpu);
if (rdp->beenonline)
print_one_rcu_pending(m, rdp);
}
*/
cpumask_var_t rto_mask;
atomic_t rto_count;
-#ifdef CONFIG_SMP
struct cpupri cpupri;
-#endif
};
/*
*/
static struct root_domain def_root_domain;
-#endif
+#endif /* CONFIG_SMP */
/*
* This is the main, per-CPU runqueue data structure.
*/
unsigned long nr_uninterruptible;
- struct task_struct *curr, *idle;
+ struct task_struct *curr, *idle, *stop;
unsigned long next_balance;
struct mm_struct *prev_mm;
u64 clock;
+ u64 clock_task;
atomic_t nr_iowait;
u64 avg_idle;
#endif
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+ u64 prev_irq_time;
+#endif
+
/* calc_load related fields */
unsigned long calc_load_update;
long calc_load_active;
#endif /* CONFIG_CGROUP_SCHED */
+static u64 irq_time_cpu(int cpu);
+static void sched_irq_time_avg_update(struct rq *rq, u64 irq_time);
+
inline void update_rq_clock(struct rq *rq)
{
- if (!rq->skip_clock_update)
- rq->clock = sched_clock_cpu(cpu_of(rq));
+ if (!rq->skip_clock_update) {
+ int cpu = cpu_of(rq);
+ u64 irq_time;
+
+ rq->clock = sched_clock_cpu(cpu);
+ irq_time = irq_time_cpu(cpu);
+ if (rq->clock - irq_time > rq->clock_task)
+ rq->clock_task = rq->clock - irq_time;
+
+ sched_irq_time_avg_update(rq, irq_time);
+ }
}
/*
size_t cnt, loff_t *ppos)
{
char buf[64];
- char *cmp = buf;
+ char *cmp;
int neg = 0;
int i;
return -EFAULT;
buf[cnt] = 0;
+ cmp = strstrip(buf);
if (strncmp(buf, "NO_", 3) == 0) {
neg = 1;
}
for (i = 0; sched_feat_names[i]; i++) {
- int len = strlen(sched_feat_names[i]);
-
- if (strncmp(cmp, sched_feat_names[i], len) == 0) {
+ if (strcmp(cmp, sched_feat_names[i]) == 0) {
if (neg)
sysctl_sched_features &= ~(1UL << i);
else
static void sched_rt_avg_update(struct rq *rq, u64 rt_delta)
{
}
+
+static void sched_avg_update(struct rq *rq)
+{
+}
#endif /* CONFIG_SMP */
#if BITS_PER_LONG == 32
static const struct sched_class rt_sched_class;
-#define sched_class_highest (&rt_sched_class)
+#define sched_class_highest (&stop_sched_class)
#define for_each_class(class) \
for (class = sched_class_highest; class; class = class->next)
static void set_load_weight(struct task_struct *p)
{
- if (task_has_rt_policy(p)) {
- p->se.load.weight = 0;
- p->se.load.inv_weight = WMULT_CONST;
- return;
- }
-
/*
* SCHED_IDLE tasks get minimal weight:
*/
dec_nr_running(rq);
}
+#ifdef CONFIG_IRQ_TIME_ACCOUNTING
+
+/*
+ * There are no locks covering percpu hardirq/softirq time.
+ * They are only modified in account_system_vtime, on corresponding CPU
+ * with interrupts disabled. So, writes are safe.
+ * They are read and saved off onto struct rq in update_rq_clock().
+ * This may result in other CPU reading this CPU's irq time and can
+ * race with irq/account_system_vtime on this CPU. We would either get old
+ * or new value (or semi updated value on 32 bit) with a side effect of
+ * accounting a slice of irq time to wrong task when irq is in progress
+ * while we read rq->clock. That is a worthy compromise in place of having
+ * locks on each irq in account_system_time.
+ */
+static DEFINE_PER_CPU(u64, cpu_hardirq_time);
+static DEFINE_PER_CPU(u64, cpu_softirq_time);
+
+static DEFINE_PER_CPU(u64, irq_start_time);
+static int sched_clock_irqtime;
+
+void enable_sched_clock_irqtime(void)
+{
+ sched_clock_irqtime = 1;
+}
+
+void disable_sched_clock_irqtime(void)
+{
+ sched_clock_irqtime = 0;
+}
+
+static u64 irq_time_cpu(int cpu)
+{
+ if (!sched_clock_irqtime)
+ return 0;
+
+ return per_cpu(cpu_softirq_time, cpu) + per_cpu(cpu_hardirq_time, cpu);
+}
+
+void account_system_vtime(struct task_struct *curr)
+{
+ unsigned long flags;
+ int cpu;
+ u64 now, delta;
+
+ if (!sched_clock_irqtime)
+ return;
+
+ local_irq_save(flags);
+
+ cpu = smp_processor_id();
+ now = sched_clock_cpu(cpu);
+ delta = now - per_cpu(irq_start_time, cpu);
+ per_cpu(irq_start_time, cpu) = now;
+ /*
+ * We do not account for softirq time from ksoftirqd here.
+ * We want to continue accounting softirq time to ksoftirqd thread
+ * in that case, so as not to confuse scheduler with a special task
+ * that do not consume any time, but still wants to run.
+ */
+ if (hardirq_count())
+ per_cpu(cpu_hardirq_time, cpu) += delta;
+ else if (in_serving_softirq() && !(curr->flags & PF_KSOFTIRQD))
+ per_cpu(cpu_softirq_time, cpu) += delta;
+
+ local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(account_system_vtime);
+
+static void sched_irq_time_avg_update(struct rq *rq, u64 curr_irq_time)
+{
+ if (sched_clock_irqtime && sched_feat(NONIRQ_POWER)) {
+ u64 delta_irq = curr_irq_time - rq->prev_irq_time;
+ rq->prev_irq_time = curr_irq_time;
+ sched_rt_avg_update(rq, delta_irq);
+ }
+}
+
+#else
+
+static u64 irq_time_cpu(int cpu)
+{
+ return 0;
+}
+
+static void sched_irq_time_avg_update(struct rq *rq, u64 curr_irq_time) { }
+
+#endif
+
#include "sched_idletask.c"
#include "sched_fair.c"
#include "sched_rt.c"
+#include "sched_stoptask.c"
#ifdef CONFIG_SCHED_DEBUG
# include "sched_debug.c"
#endif
+void sched_set_stop_task(int cpu, struct task_struct *stop)
+{
+ struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
+ struct task_struct *old_stop = cpu_rq(cpu)->stop;
+
+ if (stop) {
+ /*
+ * Make it appear like a SCHED_FIFO task, its something
+ * userspace knows about and won't get confused about.
+ *
+ * Also, it will make PI more or less work without too
+ * much confusion -- but then, stop work should not
+ * rely on PI working anyway.
+ */
+ sched_setscheduler_nocheck(stop, SCHED_FIFO, ¶m);
+
+ stop->sched_class = &stop_sched_class;
+ }
+
+ cpu_rq(cpu)->stop = stop;
+
+ if (old_stop) {
+ /*
+ * Reset it back to a normal scheduling class so that
+ * it can die in pieces.
+ */
+ old_stop->sched_class = &rt_sched_class;
+ }
+}
+
/*
* __normal_prio - return the priority that is based on the static prio
*/
if (p->sched_class != &fair_sched_class)
return 0;
+ if (unlikely(p->policy == SCHED_IDLE))
+ return 0;
+
/*
* Buddy candidates are cache hot:
*/
*/
arch_start_context_switch(prev);
- if (likely(!mm)) {
+ if (!mm) {
next->active_mm = oldmm;
atomic_inc(&oldmm->mm_count);
enter_lazy_tlb(oldmm, next);
} else
switch_mm(oldmm, mm, next);
- if (likely(!prev->mm)) {
+ if (!prev->mm) {
prev->active_mm = NULL;
rq->prev_mm = oldmm;
}
this_rq->cpu_load[i] = (old_load * (scale - 1) + new_load) >> i;
}
+
+ sched_avg_update(this_rq);
}
static void update_cpu_load_active(struct rq *this_rq)
if (task_current(rq, p)) {
update_rq_clock(rq);
- ns = rq->clock - p->se.exec_start;
+ ns = rq->clock_task - p->se.exec_start;
if ((s64)ns < 0)
ns = 0;
}
tmp = cputime_to_cputime64(cputime);
if (hardirq_count() - hardirq_offset)
cpustat->irq = cputime64_add(cpustat->irq, tmp);
- else if (softirq_count())
+ else if (in_serving_softirq())
cpustat->softirq = cputime64_add(cpustat->softirq, tmp);
else
cpustat->system = cputime64_add(cpustat->system, tmp);
rtime = nsecs_to_cputime(p->se.sum_exec_runtime);
if (total) {
- u64 temp;
+ u64 temp = rtime;
- temp = (u64)(rtime * utime);
+ temp *= utime;
do_div(temp, total);
utime = (cputime_t)temp;
} else
rtime = nsecs_to_cputime(cputime.sum_exec_runtime);
if (total) {
- u64 temp;
+ u64 temp = rtime;
- temp = (u64)(rtime * cputime.utime);
+ temp *= cputime.utime;
do_div(temp, total);
utime = (cputime_t)temp;
} else
curr->sched_class->task_tick(rq, curr, 0);
raw_spin_unlock(&rq->lock);
- perf_event_task_tick(curr);
+ perf_event_task_tick();
#ifdef CONFIG_SMP
rq->idle_at_tick = idle_cpu(cpu);
return p;
}
- class = sched_class_highest;
- for ( ; ; ) {
+ for_each_class(class) {
p = class->pick_next_task(rq);
if (p)
return p;
- /*
- * Will never be NULL as the idle class always
- * returns a non-NULL p:
- */
- class = class->next;
}
+
+ BUG(); /* the idle class will always have a runnable task */
}
/*
rq = task_rq_lock(p, &flags);
+ trace_sched_pi_setprio(p, prio);
oldprio = p->prio;
prev_class = p->sched_class;
on_rq = p->se.on_rq;
}
if (user) {
- retval = security_task_setscheduler(p, policy, param);
+ retval = security_task_setscheduler(p);
if (retval)
return retval;
}
*/
rq = __task_rq_lock(p);
+ /*
+ * Changing the policy of the stop threads its a very bad idea
+ */
+ if (p == rq->stop) {
+ __task_rq_unlock(rq);
+ raw_spin_unlock_irqrestore(&p->pi_lock, flags);
+ return -EINVAL;
+ }
+
#ifdef CONFIG_RT_GROUP_SCHED
if (user) {
/*
if (!check_same_owner(p) && !capable(CAP_SYS_NICE))
goto out_unlock;
- retval = security_task_setscheduler(p, 0, NULL);
+ retval = security_task_setscheduler(p);
if (retval)
goto out_unlock;
cpuset_cpus_allowed(p, cpus_allowed);
cpumask_and(new_mask, in_mask, cpus_allowed);
- again:
+again:
retval = set_cpus_allowed_ptr(p, new_mask);
if (!retval) {
idle->se.exec_start = sched_clock();
cpumask_copy(&idle->cpus_allowed, cpumask_of(cpu));
+ /*
+ * We're having a chicken and egg problem, even though we are
+ * holding rq->lock, the cpu isn't yet set to this cpu so the
+ * lockdep check in task_group() will fail.
+ *
+ * Similar case to sched_fork(). / Alternatively we could
+ * use task_rq_lock() here and obtain the other rq->lock.
+ *
+ * Silence PROVE_RCU
+ */
+ rcu_read_lock();
__set_task_cpu(idle, cpu);
+ rcu_read_unlock();
rq->curr = rq->idle = idle;
#if defined(CONFIG_SMP) && defined(__ARCH_WANT_UNLOCKED_CTXSW)
cpumask_var_t nodemask;
cpumask_var_t this_sibling_map;
cpumask_var_t this_core_map;
+ cpumask_var_t this_book_map;
cpumask_var_t send_covered;
cpumask_var_t tmpmask;
struct sched_group **sched_group_nodes;
sa_rootdomain,
sa_tmpmask,
sa_send_covered,
+ sa_this_book_map,
sa_this_core_map,
sa_this_sibling_map,
sa_nodemask,
#ifdef CONFIG_SCHED_MC
static DEFINE_PER_CPU(struct static_sched_domain, core_domains);
static DEFINE_PER_CPU(struct static_sched_group, sched_group_core);
-#endif /* CONFIG_SCHED_MC */
-#if defined(CONFIG_SCHED_MC) && defined(CONFIG_SCHED_SMT)
static int
cpu_to_core_group(int cpu, const struct cpumask *cpu_map,
struct sched_group **sg, struct cpumask *mask)
{
int group;
-
+#ifdef CONFIG_SCHED_SMT
cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
group = cpumask_first(mask);
+#else
+ group = cpu;
+#endif
if (sg)
*sg = &per_cpu(sched_group_core, group).sg;
return group;
}
-#elif defined(CONFIG_SCHED_MC)
+#endif /* CONFIG_SCHED_MC */
+
+/*
+ * book sched-domains:
+ */
+#ifdef CONFIG_SCHED_BOOK
+static DEFINE_PER_CPU(struct static_sched_domain, book_domains);
+static DEFINE_PER_CPU(struct static_sched_group, sched_group_book);
+
static int
-cpu_to_core_group(int cpu, const struct cpumask *cpu_map,
- struct sched_group **sg, struct cpumask *unused)
+cpu_to_book_group(int cpu, const struct cpumask *cpu_map,
+ struct sched_group **sg, struct cpumask *mask)
{
+ int group = cpu;
+#ifdef CONFIG_SCHED_MC
+ cpumask_and(mask, cpu_coregroup_mask(cpu), cpu_map);
+ group = cpumask_first(mask);
+#elif defined(CONFIG_SCHED_SMT)
+ cpumask_and(mask, topology_thread_cpumask(cpu), cpu_map);
+ group = cpumask_first(mask);
+#endif
if (sg)
- *sg = &per_cpu(sched_group_core, cpu).sg;
- return cpu;
+ *sg = &per_cpu(sched_group_book, group).sg;
+ return group;
}
-#endif
+#endif /* CONFIG_SCHED_BOOK */
static DEFINE_PER_CPU(struct static_sched_domain, phys_domains);
static DEFINE_PER_CPU(struct static_sched_group, sched_group_phys);
struct sched_group **sg, struct cpumask *mask)
{
int group;
-#ifdef CONFIG_SCHED_MC
+#ifdef CONFIG_SCHED_BOOK
+ cpumask_and(mask, cpu_book_mask(cpu), cpu_map);
+ group = cpumask_first(mask);
+#elif defined(CONFIG_SCHED_MC)
cpumask_and(mask, cpu_coregroup_mask(cpu), cpu_map);
group = cpumask_first(mask);
#elif defined(CONFIG_SCHED_SMT)
#ifdef CONFIG_SCHED_MC
SD_INIT_FUNC(MC)
#endif
+#ifdef CONFIG_SCHED_BOOK
+ SD_INIT_FUNC(BOOK)
+#endif
static int default_relax_domain_level = -1;
free_cpumask_var(d->tmpmask); /* fall through */
case sa_send_covered:
free_cpumask_var(d->send_covered); /* fall through */
+ case sa_this_book_map:
+ free_cpumask_var(d->this_book_map); /* fall through */
case sa_this_core_map:
free_cpumask_var(d->this_core_map); /* fall through */
case sa_this_sibling_map:
return sa_nodemask;
if (!alloc_cpumask_var(&d->this_core_map, GFP_KERNEL))
return sa_this_sibling_map;
- if (!alloc_cpumask_var(&d->send_covered, GFP_KERNEL))
+ if (!alloc_cpumask_var(&d->this_book_map, GFP_KERNEL))
return sa_this_core_map;
+ if (!alloc_cpumask_var(&d->send_covered, GFP_KERNEL))
+ return sa_this_book_map;
if (!alloc_cpumask_var(&d->tmpmask, GFP_KERNEL))
return sa_send_covered;
d->rd = alloc_rootdomain();
return sd;
}
+static struct sched_domain *__build_book_sched_domain(struct s_data *d,
+ const struct cpumask *cpu_map, struct sched_domain_attr *attr,
+ struct sched_domain *parent, int i)
+{
+ struct sched_domain *sd = parent;
+#ifdef CONFIG_SCHED_BOOK
+ sd = &per_cpu(book_domains, i).sd;
+ SD_INIT(sd, BOOK);
+ set_domain_attribute(sd, attr);
+ cpumask_and(sched_domain_span(sd), cpu_map, cpu_book_mask(i));
+ sd->parent = parent;
+ parent->child = sd;
+ cpu_to_book_group(i, cpu_map, &sd->groups, d->tmpmask);
+#endif
+ return sd;
+}
+
static struct sched_domain *__build_mc_sched_domain(struct s_data *d,
const struct cpumask *cpu_map, struct sched_domain_attr *attr,
struct sched_domain *parent, int i)
d->send_covered, d->tmpmask);
break;
#endif
+#ifdef CONFIG_SCHED_BOOK
+ case SD_LV_BOOK: /* set up book groups */
+ cpumask_and(d->this_book_map, cpu_map, cpu_book_mask(cpu));
+ if (cpu == cpumask_first(d->this_book_map))
+ init_sched_build_groups(d->this_book_map, cpu_map,
+ &cpu_to_book_group,
+ d->send_covered, d->tmpmask);
+ break;
+#endif
case SD_LV_CPU: /* set up physical groups */
cpumask_and(d->nodemask, cpumask_of_node(cpu), cpu_map);
if (!cpumask_empty(d->nodemask))
sd = __build_numa_sched_domains(&d, cpu_map, attr, i);
sd = __build_cpu_sched_domain(&d, cpu_map, attr, sd, i);
+ sd = __build_book_sched_domain(&d, cpu_map, attr, sd, i);
sd = __build_mc_sched_domain(&d, cpu_map, attr, sd, i);
sd = __build_smt_sched_domain(&d, cpu_map, attr, sd, i);
}
for_each_cpu(i, cpu_map) {
build_sched_groups(&d, SD_LV_SIBLING, cpu_map, i);
+ build_sched_groups(&d, SD_LV_BOOK, cpu_map, i);
build_sched_groups(&d, SD_LV_MC, cpu_map, i);
}
init_sched_groups_power(i, sd);
}
#endif
+#ifdef CONFIG_SCHED_BOOK
+ for_each_cpu(i, cpu_map) {
+ sd = &per_cpu(book_domains, i).sd;
+ init_sched_groups_power(i, sd);
+ }
+#endif
for_each_cpu(i, cpu_map) {
sd = &per_cpu(phys_domains, i).sd;
sd = &per_cpu(cpu_domains, i).sd;
#elif defined(CONFIG_SCHED_MC)
sd = &per_cpu(core_domains, i).sd;
+#elif defined(CONFIG_SCHED_BOOK)
+ sd = &per_cpu(book_domains, i).sd;
#else
sd = &per_cpu(phys_domains, i).sd;
#endif
return 1;
- err_free_rq:
+err_free_rq:
kfree(cfs_rq);
- err:
+err:
return 0;
}
return 1;
- err_free_rq:
+err_free_rq:
kfree(rt_rq);
- err:
+err:
return 0;
}
raw_spin_unlock(&rt_rq->rt_runtime_lock);
}
raw_spin_unlock_irq(&tg->rt_bandwidth.rt_runtime_lock);
- unlock:
+unlock:
read_unlock(&tasklist_lock);
mutex_unlock(&rt_constraints_mutex);
/*
* Targeted preemption latency for CPU-bound tasks:
- * (default: 5ms * (1 + ilog(ncpus)), units: nanoseconds)
+ * (default: 6ms * (1 + ilog(ncpus)), units: nanoseconds)
*
* NOTE: this latency value is not the same as the concept of
* 'timeslice length' - timeslices in CFS are of variable length
/*
* Minimal preemption granularity for CPU-bound tasks:
- * (default: 2 msec * (1 + ilog(ncpus)), units: nanoseconds)
+ * (default: 0.75 msec * (1 + ilog(ncpus)), units: nanoseconds)
*/
-unsigned int sysctl_sched_min_granularity = 2000000ULL;
-unsigned int normalized_sysctl_sched_min_granularity = 2000000ULL;
+unsigned int sysctl_sched_min_granularity = 750000ULL;
+unsigned int normalized_sysctl_sched_min_granularity = 750000ULL;
/*
* is kept at sysctl_sched_latency / sysctl_sched_min_granularity
*/
-static unsigned int sched_nr_latency = 3;
+static unsigned int sched_nr_latency = 8;
/*
* After fork, child runs first. If set to 0 (default) then
static void update_curr(struct cfs_rq *cfs_rq)
{
struct sched_entity *curr = cfs_rq->curr;
- u64 now = rq_of(cfs_rq)->clock;
+ u64 now = rq_of(cfs_rq)->clock_task;
unsigned long delta_exec;
if (unlikely(!curr))
/*
* We are starting a new run period:
*/
- se->exec_start = rq_of(cfs_rq)->clock;
+ se->exec_start = rq_of(cfs_rq)->clock_task;
}
/**************************************************
find_idlest_group(struct sched_domain *sd, struct task_struct *p,
int this_cpu, int load_idx)
{
- struct sched_group *idlest = NULL, *this = NULL, *group = sd->groups;
+ struct sched_group *idlest = NULL, *group = sd->groups;
unsigned long min_load = ULONG_MAX, this_load = 0;
int imbalance = 100 + (sd->imbalance_pct-100)/2;
if (local_group) {
this_load = avg_load;
- this = group;
} else if (avg_load < min_load) {
min_load = avg_load;
idlest = group;
set_task_cpu(p, this_cpu);
activate_task(this_rq, p, 0);
check_preempt_curr(this_rq, p, 0);
+
+ /* re-arm NEWIDLE balancing when moving tasks */
+ src_rq->avg_idle = this_rq->avg_idle = 2*sysctl_sched_migration_cost;
+ this_rq->idle_stamp = 0;
}
/*
* 2) too many balance attempts have failed.
*/
- tsk_cache_hot = task_hot(p, rq->clock, sd);
+ tsk_cache_hot = task_hot(p, rq->clock_task, sd);
if (!tsk_cache_hot ||
sd->nr_balance_failed > sd->cache_nice_tries) {
#ifdef CONFIG_SCHEDSTATS
unsigned long this_load;
unsigned long this_load_per_task;
unsigned long this_nr_running;
+ unsigned long this_has_capacity;
/* Statistics of the busiest group */
unsigned long max_load;
unsigned long busiest_load_per_task;
unsigned long busiest_nr_running;
unsigned long busiest_group_capacity;
+ unsigned long busiest_has_capacity;
int group_imb; /* Is there imbalance in this sd */
#if defined(CONFIG_SCHED_MC) || defined(CONFIG_SCHED_SMT)
unsigned long sum_weighted_load; /* Weighted load of group's tasks */
unsigned long group_capacity;
int group_imb; /* Is there an imbalance in the group ? */
+ int group_has_capacity; /* Is there extra capacity in the group? */
};
/**
struct rq *rq = cpu_rq(cpu);
u64 total, available;
- sched_avg_update(rq);
-
total = sched_avg_period() + (rq->clock - rq->age_stamp);
- available = total - rq->rt_avg;
+
+ if (unlikely(total < rq->rt_avg)) {
+ /* Ensures that power won't end up being negative */
+ available = 0;
+ } else {
+ available = total - rq->rt_avg;
+ }
if (unlikely((s64)total < SCHED_LOAD_SCALE))
total = SCHED_LOAD_SCALE;
int local_group, const struct cpumask *cpus,
int *balance, struct sg_lb_stats *sgs)
{
- unsigned long load, max_cpu_load, min_cpu_load;
+ unsigned long load, max_cpu_load, min_cpu_load, max_nr_running;
int i;
unsigned int balance_cpu = -1, first_idle_cpu = 0;
unsigned long avg_load_per_task = 0;
/* Tally up the load of all CPUs in the group */
max_cpu_load = 0;
min_cpu_load = ~0UL;
+ max_nr_running = 0;
for_each_cpu_and(i, sched_group_cpus(group), cpus) {
struct rq *rq = cpu_rq(i);
load = target_load(i, load_idx);
} else {
load = source_load(i, load_idx);
- if (load > max_cpu_load)
+ if (load > max_cpu_load) {
max_cpu_load = load;
+ max_nr_running = rq->nr_running;
+ }
if (min_cpu_load > load)
min_cpu_load = load;
}
if (sgs->sum_nr_running)
avg_load_per_task = sgs->sum_weighted_load / sgs->sum_nr_running;
- if ((max_cpu_load - min_cpu_load) > 2*avg_load_per_task)
+ if ((max_cpu_load - min_cpu_load) > 2*avg_load_per_task && max_nr_running > 1)
sgs->group_imb = 1;
- sgs->group_capacity =
- DIV_ROUND_CLOSEST(group->cpu_power, SCHED_LOAD_SCALE);
+ sgs->group_capacity = DIV_ROUND_CLOSEST(group->cpu_power, SCHED_LOAD_SCALE);
if (!sgs->group_capacity)
sgs->group_capacity = fix_small_capacity(sd, group);
+
+ if (sgs->group_capacity > sgs->sum_nr_running)
+ sgs->group_has_capacity = 1;
}
/**
/*
* In case the child domain prefers tasks go to siblings
* first, lower the sg capacity to one so that we'll try
- * and move all the excess tasks away.
+ * and move all the excess tasks away. We lower the capacity
+ * of a group only if the local group has the capacity to fit
+ * these excess tasks, i.e. nr_running < group_capacity. The
+ * extra check prevents the case where you always pull from the
+ * heaviest group when it is already under-utilized (possible
+ * with a large weight task outweighs the tasks on the system).
*/
- if (prefer_sibling)
+ if (prefer_sibling && !local_group && sds->this_has_capacity)
sgs.group_capacity = min(sgs.group_capacity, 1UL);
if (local_group) {
sds->this = sg;
sds->this_nr_running = sgs.sum_nr_running;
sds->this_load_per_task = sgs.sum_weighted_load;
+ sds->this_has_capacity = sgs.group_has_capacity;
} else if (update_sd_pick_busiest(sd, sds, sg, &sgs, this_cpu)) {
sds->max_load = sgs.avg_load;
sds->busiest = sg;
sds->busiest_nr_running = sgs.sum_nr_running;
sds->busiest_group_capacity = sgs.group_capacity;
sds->busiest_load_per_task = sgs.sum_weighted_load;
+ sds->busiest_has_capacity = sgs.group_has_capacity;
sds->group_imb = sgs.group_imb;
}
return fix_small_imbalance(sds, this_cpu, imbalance);
}
+
/******* find_busiest_group() helpers end here *********************/
/**
* 4) This group is more busy than the avg busieness at this
* sched_domain.
* 5) The imbalance is within the specified limit.
+ *
+ * Note: when doing newidle balance, if the local group has excess
+ * capacity (i.e. nr_running < group_capacity) and the busiest group
+ * does not have any capacity, we force a load balance to pull tasks
+ * to the local group. In this case, we skip past checks 3, 4 and 5.
*/
if (!(*balance))
goto ret;
if (!sds.busiest || sds.busiest_nr_running == 0)
goto out_balanced;
+ /* SD_BALANCE_NEWIDLE trumps SMP nice when underutilized */
+ if (idle == CPU_NEWLY_IDLE && sds.this_has_capacity &&
+ !sds.busiest_has_capacity)
+ goto force_balance;
+
if (sds.this_load >= sds.max_load)
goto out_balanced;
if (100 * sds.max_load <= sd->imbalance_pct * sds.this_load)
goto out_balanced;
+force_balance:
/* Looks like there is an imbalance. Compute it */
calculate_imbalance(&sds, this_cpu, imbalance);
return sds.busiest;
if (!ld_moved) {
schedstat_inc(sd, lb_failed[idle]);
- sd->nr_balance_failed++;
+ /*
+ * Increment the failure counter only on periodic balance.
+ * We do not want newidle balance, which can be very
+ * frequent, pollute the failure counter causing
+ * excessive cache_hot migrations and active balances.
+ */
+ if (idle != CPU_NEWLY_IDLE)
+ sd->nr_balance_failed++;
if (need_active_balance(sd, sd_idle, idle, cpu_of(busiest),
this_cpu)) {
interval = msecs_to_jiffies(sd->balance_interval);
if (time_after(next_balance, sd->last_balance + interval))
next_balance = sd->last_balance + interval;
- if (pulled_task) {
- this_rq->idle_stamp = 0;
+ if (pulled_task)
break;
- }
}
raw_spin_lock(&this_rq->lock);
if (time_before(now, nohz.next_balance))
return 0;
- if (!rq->nr_running)
+ if (rq->idle_at_tick)
return 0;
first_pick_cpu = atomic_read(&nohz.first_pick_cpu);
update_rq_clock(rq);
- if (unlikely(task_cpu(p) != this_cpu))
+ if (unlikely(task_cpu(p) != this_cpu)) {
+ rcu_read_lock();
__set_task_cpu(p, this_cpu);
+ rcu_read_unlock();
+ }
update_curr(cfs_rq);
* release the lock. Decreases scheduling overhead.
*/
SCHED_FEAT(OWNER_SPIN, 1)
+
+/*
+ * Decrement CPU power based on irq activity
+ */
+SCHED_FEAT(NONIRQ_POWER, 1)
if (!task_has_rt_policy(curr))
return;
- delta_exec = rq->clock - curr->se.exec_start;
+ delta_exec = rq->clock_task - curr->se.exec_start;
if (unlikely((s64)delta_exec < 0))
delta_exec = 0;
curr->se.sum_exec_runtime += delta_exec;
account_group_exec_runtime(curr, delta_exec);
- curr->se.exec_start = rq->clock;
+ curr->se.exec_start = rq->clock_task;
cpuacct_charge(curr, delta_exec);
sched_rt_avg_update(rq, delta_exec);
* runqueue. Otherwise simply start this RT task
* on its current runqueue.
*
- * We want to avoid overloading runqueues. Even if
- * the RT task is of higher priority than the current RT task.
- * RT tasks behave differently than other tasks. If
- * one gets preempted, we try to push it off to another queue.
- * So trying to keep a preempting RT task on the same
- * cache hot CPU will force the running RT task to
- * a cold CPU. So we waste all the cache for the lower
- * RT task in hopes of saving some of a RT task
- * that is just being woken and probably will have
- * cold cache anyway.
+ * We want to avoid overloading runqueues. If the woken
+ * task is a higher priority, then it will stay on this CPU
+ * and the lower prio task should be moved to another CPU.
+ * Even though this will probably make the lower prio task
+ * lose its cache, we do not want to bounce a higher task
+ * around just because it gave up its CPU, perhaps for a
+ * lock?
+ *
+ * For equal prio tasks, we just let the scheduler sort it out.
*/
if (unlikely(rt_task(rq->curr)) &&
+ (rq->curr->rt.nr_cpus_allowed < 2 ||
+ rq->curr->prio < p->prio) &&
(p->rt.nr_cpus_allowed > 1)) {
int cpu = find_lowest_rq(p);
} while (rt_rq);
p = rt_task_of(rt_se);
- p->se.exec_start = rq->clock;
+ p->se.exec_start = rq->clock_task;
return p;
}
for_each_leaf_rt_rq(rt_rq, rq) {
array = &rt_rq->active;
idx = sched_find_first_bit(array->bitmap);
- next_idx:
+next_idx:
if (idx >= MAX_RT_PRIO)
continue;
if (next && next->prio < idx)
if (!next_task)
return 0;
- retry:
+retry:
if (unlikely(next_task == rq->curr)) {
WARN_ON(1);
return 0;
* but possible)
*/
}
- skip:
+skip:
double_unlock_balance(this_rq, src_rq);
}
if (!task_running(rq, p) &&
!test_tsk_need_resched(rq->curr) &&
has_pushable_tasks(rq) &&
- p->rt.nr_cpus_allowed > 1)
+ p->rt.nr_cpus_allowed > 1 &&
+ rt_task(rq->curr) &&
+ (rq->curr->rt.nr_cpus_allowed < 2 ||
+ rq->curr->prio < p->prio))
push_rt_tasks(rq);
}
{
struct task_struct *p = rq->curr;
- p->se.exec_start = rq->clock;
+ p->se.exec_start = rq->clock_task;
/* The running task is never eligible for pushing */
dequeue_pushable_task(rq, p);
--- /dev/null
+/*
+ * stop-task scheduling class.
+ *
+ * The stop task is the highest priority task in the system, it preempts
+ * everything and will be preempted by nothing.
+ *
+ * See kernel/stop_machine.c
+ */
+
+#ifdef CONFIG_SMP
+static int
+select_task_rq_stop(struct rq *rq, struct task_struct *p,
+ int sd_flag, int flags)
+{
+ return task_cpu(p); /* stop tasks as never migrate */
+}
+#endif /* CONFIG_SMP */
+
+static void
+check_preempt_curr_stop(struct rq *rq, struct task_struct *p, int flags)
+{
+ resched_task(rq->curr); /* we preempt everything */
+}
+
+static struct task_struct *pick_next_task_stop(struct rq *rq)
+{
+ struct task_struct *stop = rq->stop;
+
+ if (stop && stop->state == TASK_RUNNING)
+ return stop;
+
+ return NULL;
+}
+
+static void
+enqueue_task_stop(struct rq *rq, struct task_struct *p, int flags)
+{
+}
+
+static void
+dequeue_task_stop(struct rq *rq, struct task_struct *p, int flags)
+{
+}
+
+static void yield_task_stop(struct rq *rq)
+{
+ BUG(); /* the stop task should never yield, its pointless. */
+}
+
+static void put_prev_task_stop(struct rq *rq, struct task_struct *prev)
+{
+}
+
+static void task_tick_stop(struct rq *rq, struct task_struct *curr, int queued)
+{
+}
+
+static void set_curr_task_stop(struct rq *rq)
+{
+}
+
+static void switched_to_stop(struct rq *rq, struct task_struct *p,
+ int running)
+{
+ BUG(); /* its impossible to change to this class */
+}
+
+static void prio_changed_stop(struct rq *rq, struct task_struct *p,
+ int oldprio, int running)
+{
+ BUG(); /* how!?, what priority? */
+}
+
+static unsigned int
+get_rr_interval_stop(struct rq *rq, struct task_struct *task)
+{
+ return 0;
+}
+
+/*
+ * Simple, special scheduling class for the per-CPU stop tasks:
+ */
+static const struct sched_class stop_sched_class = {
+ .next = &rt_sched_class,
+
+ .enqueue_task = enqueue_task_stop,
+ .dequeue_task = dequeue_task_stop,
+ .yield_task = yield_task_stop,
+
+ .check_preempt_curr = check_preempt_curr_stop,
+
+ .pick_next_task = pick_next_task_stop,
+ .put_prev_task = put_prev_task_stop,
+
+#ifdef CONFIG_SMP
+ .select_task_rq = select_task_rq_stop,
+#endif
+
+ .set_curr_task = set_curr_task_stop,
+ .task_tick = task_tick_stop,
+
+ .get_rr_interval = get_rr_interval_stop,
+
+ .prio_changed = prio_changed_stop,
+ .switched_to = switched_to_stop,
+
+ /* no .task_new for stop tasks */
+};
#ifdef __ARCH_SI_TRAPNO
err |= __put_user(from->si_trapno, &to->si_trapno);
#endif
+#ifdef BUS_MCEERR_AO
+ /*
+ * Other callers might not initialize the si_lsb field,
+ * so check explicitely for the right codes here.
+ */
+ if (from->si_code == BUS_MCEERR_AR || from->si_code == BUS_MCEERR_AO)
+ err |= __put_user(from->si_addr_lsb, &to->si_addr_lsb);
+#endif
break;
case __SI_CHLD:
err |= __put_user(from->si_pid, &to->si_pid);
EXPORT_SYMBOL_GPL(smp_call_function_any);
/**
- * __smp_call_function_single(): Run a function on another CPU
+ * __smp_call_function_single(): Run a function on a specific CPU
* @cpu: The CPU to run on.
* @data: Pre-allocated and setup data structure
+ * @wait: If true, wait until function has completed on specified CPU.
*
* Like smp_call_function_single(), but allow caller to pass in a
* pre-allocated data structure. Useful for embedding @data inside
void __smp_call_function_single(int cpu, struct call_single_data *data,
int wait)
{
- csd_lock(data);
+ unsigned int this_cpu;
+ unsigned long flags;
+ this_cpu = get_cpu();
/*
* Can deadlock when called with interrupts disabled.
* We allow cpu's that are not yet online though, as no one else can
WARN_ON_ONCE(cpu_online(smp_processor_id()) && wait && irqs_disabled()
&& !oops_in_progress);
- generic_exec_single(cpu, data, wait);
+ if (cpu == this_cpu) {
+ local_irq_save(flags);
+ data->func(data->info);
+ local_irq_restore(flags);
+ } else {
+ csd_lock(data);
+ generic_exec_single(cpu, data, wait);
+ }
+ put_cpu();
}
/**
}
/*
+ * preempt_count and SOFTIRQ_OFFSET usage:
+ * - preempt_count is changed by SOFTIRQ_OFFSET on entering or leaving
+ * softirq processing.
+ * - preempt_count is changed by SOFTIRQ_DISABLE_OFFSET (= 2 * SOFTIRQ_OFFSET)
+ * on local_bh_disable or local_bh_enable.
+ * This lets us distinguish between whether we are currently processing
+ * softirq and whether we just have bh disabled.
+ */
+
+/*
* This one is for softirq.c-internal use,
* where hardirqs are disabled legitimately:
*/
#ifdef CONFIG_TRACE_IRQFLAGS
-static void __local_bh_disable(unsigned long ip)
+static void __local_bh_disable(unsigned long ip, unsigned int cnt)
{
unsigned long flags;
* We must manually increment preempt_count here and manually
* call the trace_preempt_off later.
*/
- preempt_count() += SOFTIRQ_OFFSET;
+ preempt_count() += cnt;
/*
* Were softirqs turned off above:
*/
- if (softirq_count() == SOFTIRQ_OFFSET)
+ if (softirq_count() == cnt)
trace_softirqs_off(ip);
raw_local_irq_restore(flags);
- if (preempt_count() == SOFTIRQ_OFFSET)
+ if (preempt_count() == cnt)
trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));
}
#else /* !CONFIG_TRACE_IRQFLAGS */
-static inline void __local_bh_disable(unsigned long ip)
+static inline void __local_bh_disable(unsigned long ip, unsigned int cnt)
{
- add_preempt_count(SOFTIRQ_OFFSET);
+ add_preempt_count(cnt);
barrier();
}
#endif /* CONFIG_TRACE_IRQFLAGS */
void local_bh_disable(void)
{
- __local_bh_disable((unsigned long)__builtin_return_address(0));
+ __local_bh_disable((unsigned long)__builtin_return_address(0),
+ SOFTIRQ_DISABLE_OFFSET);
}
EXPORT_SYMBOL(local_bh_disable);
+static void __local_bh_enable(unsigned int cnt)
+{
+ WARN_ON_ONCE(in_irq());
+ WARN_ON_ONCE(!irqs_disabled());
+
+ if (softirq_count() == cnt)
+ trace_softirqs_on((unsigned long)__builtin_return_address(0));
+ sub_preempt_count(cnt);
+}
+
/*
* Special-case - softirqs can safely be enabled in
* cond_resched_softirq(), or by __do_softirq(),
*/
void _local_bh_enable(void)
{
- WARN_ON_ONCE(in_irq());
- WARN_ON_ONCE(!irqs_disabled());
-
- if (softirq_count() == SOFTIRQ_OFFSET)
- trace_softirqs_on((unsigned long)__builtin_return_address(0));
- sub_preempt_count(SOFTIRQ_OFFSET);
+ __local_bh_enable(SOFTIRQ_DISABLE_OFFSET);
}
EXPORT_SYMBOL(_local_bh_enable);
/*
* Are softirqs going to be turned on now:
*/
- if (softirq_count() == SOFTIRQ_OFFSET)
+ if (softirq_count() == SOFTIRQ_DISABLE_OFFSET)
trace_softirqs_on(ip);
/*
* Keep preemption disabled until we are done with
* softirq processing:
*/
- sub_preempt_count(SOFTIRQ_OFFSET - 1);
+ sub_preempt_count(SOFTIRQ_DISABLE_OFFSET - 1);
if (unlikely(!in_interrupt() && local_softirq_pending()))
do_softirq();
pending = local_softirq_pending();
account_system_vtime(current);
- __local_bh_disable((unsigned long)__builtin_return_address(0));
+ __local_bh_disable((unsigned long)__builtin_return_address(0),
+ SOFTIRQ_OFFSET);
lockdep_softirq_enter();
cpu = smp_processor_id();
lockdep_softirq_exit();
account_system_vtime(current);
- _local_bh_enable();
+ __local_bh_enable(SOFTIRQ_OFFSET);
}
#ifndef __ARCH_HAS_DO_SOFTIRQ
rcu_irq_enter();
if (idle_cpu(cpu) && !in_interrupt()) {
- __irq_enter();
+ /*
+ * Prevent raise_softirq from needlessly waking up ksoftirqd
+ * here, as softirq will be serviced on return from interrupt.
+ */
+ local_bh_disable();
tick_check_idle(cpu);
- } else
- __irq_enter();
+ _local_bh_enable();
+ }
+
+ __irq_enter();
}
#ifdef __ARCH_IRQ_EXIT_IRQS_DISABLED
{
set_current_state(TASK_INTERRUPTIBLE);
+ current->flags |= PF_KSOFTIRQD;
while (!kthread_should_stop()) {
preempt_disable();
if (!local_softirq_pending()) {
int __init_srcu_struct(struct srcu_struct *sp, const char *name,
struct lock_class_key *key)
{
-#ifdef CONFIG_DEBUG_LOCK_ALLOC
/* Don't re-initialize a lock while it is held. */
debug_check_no_locks_freed((void *)sp, sizeof(*sp));
lockdep_init_map(&sp->dep_map, name, key, 0);
-#endif /* #ifdef CONFIG_DEBUG_LOCK_ALLOC */
return init_srcu_struct_fields(sp);
}
EXPORT_SYMBOL_GPL(__init_srcu_struct);
goto repeat;
}
+extern void sched_set_stop_task(int cpu, struct task_struct *stop);
+
/* manage stopper for a cpu, mostly lifted from sched migration thread mgmt */
static int __cpuinit cpu_stop_cpu_callback(struct notifier_block *nfb,
unsigned long action, void *hcpu)
{
- struct sched_param param = { .sched_priority = MAX_RT_PRIO - 1 };
unsigned int cpu = (unsigned long)hcpu;
struct cpu_stopper *stopper = &per_cpu(cpu_stopper, cpu);
struct task_struct *p;
cpu);
if (IS_ERR(p))
return NOTIFY_BAD;
- sched_setscheduler_nocheck(p, SCHED_FIFO, ¶m);
get_task_struct(p);
+ kthread_bind(p, cpu);
+ sched_set_stop_task(cpu, p);
stopper->thread = p;
break;
case CPU_ONLINE:
- kthread_bind(stopper->thread, cpu);
/* strictly unnecessary, as first user will wake it */
wake_up_process(stopper->thread);
/* mark enabled */
{
struct cpu_stop_work *work;
+ sched_set_stop_task(cpu, NULL);
/* kill the stopper */
kthread_stop(stopper->thread);
/* drain remaining works */
pgid = pid;
if (pgid < 0)
return -EINVAL;
+ rcu_read_lock();
/* From this point forward we keep holding onto the tasklist lock
* so that our parent does not change from under us. -DaveM
out:
/* All paths lead to here, thus we are safe. -DaveM */
write_unlock_irq(&tasklist_lock);
+ rcu_read_unlock();
return err;
}
{
sysctl_set_parent(NULL, root_table);
#ifdef CONFIG_SYSCTL_SYSCALL_CHECK
- {
- int err;
- err = sysctl_check_table(current->nsproxy, root_table);
- }
+ sysctl_check_table(current->nsproxy, root_table);
#endif
return 0;
}
kbuf[left] = 0;
}
- for (; left && vleft--; i++, min++, max++, first=0) {
+ for (; left && vleft--; i++, first = 0) {
unsigned long val;
if (write) {
if (!table->maxlen)
set_fail(&fail, table, "No maxlen");
}
- if ((table->proc_handler == proc_doulongvec_minmax) ||
- (table->proc_handler == proc_doulongvec_ms_jiffies_minmax)) {
- if (table->maxlen > sizeof (unsigned long)) {
- if (!table->extra1)
- set_fail(&fail, table, "No min");
- if (!table->extra2)
- set_fail(&fail, table, "No max");
- }
- }
#ifdef CONFIG_PROC_SYSCTL
if (table->procname && !table->proc_handler)
set_fail(&fail, table, "No proc_handler");
int ret;
struct kprobe *kps[2] = {&kp, &kp2};
- kp.addr = 0; /* addr should be cleard for reusing kprobe. */
+ /* addr and flags should be cleard for reusing kprobe. */
+ kp.addr = NULL;
+ kp.flags = 0;
ret = register_kprobes(kps, 2);
if (ret < 0) {
printk(KERN_ERR "Kprobe smoke test failed: "
int ret;
struct jprobe *jps[2] = {&jp, &jp2};
- jp.kp.addr = 0; /* addr should be cleard for reusing kprobe. */
+ /* addr and flags should be cleard for reusing kprobe. */
+ jp.kp.addr = NULL;
+ jp.kp.flags = 0;
ret = register_jprobes(jps, 2);
if (ret < 0) {
printk(KERN_ERR "Kprobe smoke test failed: "
int ret;
struct kretprobe *rps[2] = {&rp, &rp2};
- rp.kp.addr = 0; /* addr should be cleard for reusing kprobe. */
+ /* addr and flags should be cleard for reusing kprobe. */
+ rp.kp.addr = NULL;
+ rp.kp.flags = 0;
ret = register_kretprobes(rps, 2);
if (ret < 0) {
printk(KERN_ERR "Kprobe smoke test failed: "
#include <linux/delay.h>
#include <linux/tick.h>
#include <linux/kallsyms.h>
-#include <linux/perf_event.h>
+#include <linux/irq_work.h>
#include <linux/sched.h>
#include <linux/slab.h>
run_local_timers();
rcu_check_callbacks(cpu, user_tick);
printk_tick();
- perf_event_do_pending();
+#ifdef CONFIG_IRQ_WORK
+ if (in_irq())
+ irq_work_run();
+#endif
scheduler_tick();
run_posix_cpu_timers(p);
}
help
See Documentation/trace/ftrace-design.txt
+config HAVE_C_RECORDMCOUNT
+ bool
+ help
+ C version of recordmcount available?
+
config TRACER_MAX_TRACE
bool
{
struct ftrace_profile *rec = v;
char str[KSYM_SYMBOL_LEN];
+ int ret = 0;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER
- static DEFINE_MUTEX(mutex);
static struct trace_seq s;
unsigned long long avg;
unsigned long long stddev;
#endif
+ mutex_lock(&ftrace_profile_lock);
+
+ /* we raced with function_profile_reset() */
+ if (unlikely(rec->counter == 0)) {
+ ret = -EBUSY;
+ goto out;
+ }
kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
seq_printf(m, " %-30.30s %10lu", str, rec->counter);
do_div(stddev, (rec->counter - 1) * 1000);
}
- mutex_lock(&mutex);
trace_seq_init(&s);
trace_print_graph_duration(rec->time, &s);
trace_seq_puts(&s, " ");
trace_seq_puts(&s, " ");
trace_print_graph_duration(stddev, &s);
trace_print_seq(m, &s);
- mutex_unlock(&mutex);
#endif
seq_putc(m, '\n');
+out:
+ mutex_unlock(&ftrace_profile_lock);
- return 0;
+ return ret;
}
static void ftrace_profile_reset(struct ftrace_profile_stat *stat)
FTRACE_ENABLE_CALLS = (1 << 0),
FTRACE_DISABLE_CALLS = (1 << 1),
FTRACE_UPDATE_TRACE_FUNC = (1 << 2),
- FTRACE_ENABLE_MCOUNT = (1 << 3),
- FTRACE_DISABLE_MCOUNT = (1 << 4),
- FTRACE_START_FUNC_RET = (1 << 5),
- FTRACE_STOP_FUNC_RET = (1 << 6),
+ FTRACE_START_FUNC_RET = (1 << 3),
+ FTRACE_STOP_FUNC_RET = (1 << 4),
};
static int ftrace_filtered;
static void ftrace_startup_sysctl(void)
{
- int command = FTRACE_ENABLE_MCOUNT;
-
if (unlikely(ftrace_disabled))
return;
saved_ftrace_func = NULL;
/* ftrace_start_up is true if we want ftrace running */
if (ftrace_start_up)
- command |= FTRACE_ENABLE_CALLS;
-
- ftrace_run_update_code(command);
+ ftrace_run_update_code(FTRACE_ENABLE_CALLS);
}
static void ftrace_shutdown_sysctl(void)
{
- int command = FTRACE_DISABLE_MCOUNT;
-
if (unlikely(ftrace_disabled))
return;
/* ftrace_start_up is true if ftrace is running */
if (ftrace_start_up)
- command |= FTRACE_DISABLE_CALLS;
-
- ftrace_run_update_code(command);
+ ftrace_run_update_code(FTRACE_DISABLE_CALLS);
}
static cycle_t ftrace_update_time;
#define FTRACE_BUFF_MAX (KSYM_SYMBOL_LEN+4) /* room for wildcards */
struct ftrace_iterator {
- struct ftrace_page *pg;
- int hidx;
- int idx;
- unsigned flags;
- struct trace_parser parser;
+ loff_t pos;
+ loff_t func_pos;
+ struct ftrace_page *pg;
+ struct dyn_ftrace *func;
+ struct ftrace_func_probe *probe;
+ struct trace_parser parser;
+ int hidx;
+ int idx;
+ unsigned flags;
};
static void *
-t_hash_next(struct seq_file *m, void *v, loff_t *pos)
+t_hash_next(struct seq_file *m, loff_t *pos)
{
struct ftrace_iterator *iter = m->private;
- struct hlist_node *hnd = v;
+ struct hlist_node *hnd = NULL;
struct hlist_head *hhd;
- WARN_ON(!(iter->flags & FTRACE_ITER_HASH));
-
(*pos)++;
+ iter->pos = *pos;
+ if (iter->probe)
+ hnd = &iter->probe->node;
retry:
if (iter->hidx >= FTRACE_FUNC_HASHSIZE)
return NULL;
}
}
- return hnd;
+ if (WARN_ON_ONCE(!hnd))
+ return NULL;
+
+ iter->probe = hlist_entry(hnd, struct ftrace_func_probe, node);
+
+ return iter;
}
static void *t_hash_start(struct seq_file *m, loff_t *pos)
void *p = NULL;
loff_t l;
- if (!(iter->flags & FTRACE_ITER_HASH))
- *pos = 0;
-
- iter->flags |= FTRACE_ITER_HASH;
+ if (iter->func_pos > *pos)
+ return NULL;
iter->hidx = 0;
- for (l = 0; l <= *pos; ) {
- p = t_hash_next(m, p, &l);
+ for (l = 0; l <= (*pos - iter->func_pos); ) {
+ p = t_hash_next(m, &l);
if (!p)
break;
}
- return p;
+ if (!p)
+ return NULL;
+
+ /* Only set this if we have an item */
+ iter->flags |= FTRACE_ITER_HASH;
+
+ return iter;
}
-static int t_hash_show(struct seq_file *m, void *v)
+static int
+t_hash_show(struct seq_file *m, struct ftrace_iterator *iter)
{
struct ftrace_func_probe *rec;
- struct hlist_node *hnd = v;
- rec = hlist_entry(hnd, struct ftrace_func_probe, node);
+ rec = iter->probe;
+ if (WARN_ON_ONCE(!rec))
+ return -EIO;
if (rec->ops->print)
return rec->ops->print(m, rec->ip, rec->ops, rec->data);
struct dyn_ftrace *rec = NULL;
if (iter->flags & FTRACE_ITER_HASH)
- return t_hash_next(m, v, pos);
+ return t_hash_next(m, pos);
(*pos)++;
+ iter->pos = *pos;
if (iter->flags & FTRACE_ITER_PRINTALL)
- return NULL;
+ return t_hash_start(m, pos);
retry:
if (iter->idx >= iter->pg->index) {
}
}
- return rec;
+ if (!rec)
+ return t_hash_start(m, pos);
+
+ iter->func_pos = *pos;
+ iter->func = rec;
+
+ return iter;
+}
+
+static void reset_iter_read(struct ftrace_iterator *iter)
+{
+ iter->pos = 0;
+ iter->func_pos = 0;
+ iter->flags &= ~(FTRACE_ITER_PRINTALL & FTRACE_ITER_HASH);
}
static void *t_start(struct seq_file *m, loff_t *pos)
mutex_lock(&ftrace_lock);
/*
+ * If an lseek was done, then reset and start from beginning.
+ */
+ if (*pos < iter->pos)
+ reset_iter_read(iter);
+
+ /*
* For set_ftrace_filter reading, if we have the filter
* off, we can short cut and just print out that all
* functions are enabled.
if (*pos > 0)
return t_hash_start(m, pos);
iter->flags |= FTRACE_ITER_PRINTALL;
+ /* reset in case of seek/pread */
+ iter->flags &= ~FTRACE_ITER_HASH;
return iter;
}
if (iter->flags & FTRACE_ITER_HASH)
return t_hash_start(m, pos);
+ /*
+ * Unfortunately, we need to restart at ftrace_pages_start
+ * every time we let go of the ftrace_mutex. This is because
+ * those pointers can change without the lock.
+ */
iter->pg = ftrace_pages_start;
iter->idx = 0;
for (l = 0; l <= *pos; ) {
break;
}
- if (!p && iter->flags & FTRACE_ITER_FILTER)
- return t_hash_start(m, pos);
+ if (!p) {
+ if (iter->flags & FTRACE_ITER_FILTER)
+ return t_hash_start(m, pos);
- return p;
+ return NULL;
+ }
+
+ return iter;
}
static void t_stop(struct seq_file *m, void *p)
static int t_show(struct seq_file *m, void *v)
{
struct ftrace_iterator *iter = m->private;
- struct dyn_ftrace *rec = v;
+ struct dyn_ftrace *rec;
if (iter->flags & FTRACE_ITER_HASH)
- return t_hash_show(m, v);
+ return t_hash_show(m, iter);
if (iter->flags & FTRACE_ITER_PRINTALL) {
seq_printf(m, "#### all functions enabled ####\n");
return 0;
}
+ rec = iter->func;
+
if (!rec)
return 0;
ret = ftrace_avail_open(inode, file);
if (!ret) {
- m = (struct seq_file *)file->private_data;
- iter = (struct ftrace_iterator *)m->private;
+ m = file->private_data;
+ iter = m->private;
iter->flags = FTRACE_ITER_FAILURES;
}
#define BUF_MAX_DATA_SIZE (BUF_PAGE_SIZE - (sizeof(u32) * 2))
/* Max number of timestamps that can fit on a page */
-#define RB_TIMESTAMPS_PER_PAGE (BUF_PAGE_SIZE / RB_LEN_TIME_STAMP)
+#define RB_TIMESTAMPS_PER_PAGE (BUF_PAGE_SIZE / RB_LEN_TIME_EXTEND)
int ring_buffer_print_page_header(struct trace_seq *s)
{
}
EXPORT_SYMBOL_GPL(ring_buffer_record_enable_cpu);
+/*
+ * The total entries in the ring buffer is the running counter
+ * of entries entered into the ring buffer, minus the sum of
+ * the entries read from the ring buffer and the number of
+ * entries that were overwritten.
+ */
+static inline unsigned long
+rb_num_of_entries(struct ring_buffer_per_cpu *cpu_buffer)
+{
+ return local_read(&cpu_buffer->entries) -
+ (local_read(&cpu_buffer->overrun) + cpu_buffer->read);
+}
+
/**
* ring_buffer_entries_cpu - get the number of entries in a cpu buffer
* @buffer: The ring buffer
unsigned long ring_buffer_entries_cpu(struct ring_buffer *buffer, int cpu)
{
struct ring_buffer_per_cpu *cpu_buffer;
- unsigned long ret;
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return 0;
cpu_buffer = buffer->buffers[cpu];
- ret = (local_read(&cpu_buffer->entries) - local_read(&cpu_buffer->overrun))
- - cpu_buffer->read;
- return ret;
+ return rb_num_of_entries(cpu_buffer);
}
EXPORT_SYMBOL_GPL(ring_buffer_entries_cpu);
/* if you care about this being correct, lock the buffer */
for_each_buffer_cpu(buffer, cpu) {
cpu_buffer = buffer->buffers[cpu];
- entries += (local_read(&cpu_buffer->entries) -
- local_read(&cpu_buffer->overrun)) - cpu_buffer->read;
+ entries += rb_num_of_entries(cpu_buffer);
}
return entries;
static void rb_advance_iter(struct ring_buffer_iter *iter)
{
- struct ring_buffer *buffer;
struct ring_buffer_per_cpu *cpu_buffer;
struct ring_buffer_event *event;
unsigned length;
cpu_buffer = iter->cpu_buffer;
- buffer = cpu_buffer->buffer;
/*
* Check if we are at the end of the buffer.
static int tracing_release(struct inode *inode, struct file *file)
{
- struct seq_file *m = (struct seq_file *)file->private_data;
+ struct seq_file *m = file->private_data;
struct trace_iterator *iter;
int cpu;
unsigned long ip,
unsigned long parent_ip,
unsigned long flags, int pc);
+void trace_graph_function(struct trace_array *tr,
+ unsigned long ip,
+ unsigned long parent_ip,
+ unsigned long flags, int pc);
void trace_default_header(struct seq_file *m);
void print_trace_header(struct seq_file *m, struct trace_iterator *iter);
int trace_empty(struct trace_iterator *iter);
#include <linux/kprobes.h>
#include "trace.h"
-static char *perf_trace_buf[4];
+static char __percpu *perf_trace_buf[PERF_NR_CONTEXTS];
/*
* Force it to be aligned to unsigned long to avoid misaligned accesses
static int perf_trace_event_init(struct ftrace_event_call *tp_event,
struct perf_event *p_event)
{
- struct hlist_head *list;
+ struct hlist_head __percpu *list;
int ret = -ENOMEM;
int cpu;
tp_event->perf_events = list;
if (!total_ref_count) {
- char *buf;
+ char __percpu *buf;
int i;
- for (i = 0; i < 4; i++) {
- buf = (char *)alloc_percpu(perf_trace_t);
+ for (i = 0; i < PERF_NR_CONTEXTS; i++) {
+ buf = (char __percpu *)alloc_percpu(perf_trace_t);
if (!buf)
goto fail;
if (!total_ref_count) {
int i;
- for (i = 0; i < 4; i++) {
+ for (i = 0; i < PERF_NR_CONTEXTS; i++) {
free_percpu(perf_trace_buf[i]);
perf_trace_buf[i] = NULL;
}
tp_event->class && tp_event->class->reg &&
try_module_get(tp_event->mod)) {
ret = perf_trace_event_init(tp_event, p_event);
+ if (ret)
+ module_put(tp_event->mod);
break;
}
}
return ret;
}
-int perf_trace_enable(struct perf_event *p_event)
+int perf_trace_add(struct perf_event *p_event, int flags)
{
struct ftrace_event_call *tp_event = p_event->tp_event;
+ struct hlist_head __percpu *pcpu_list;
struct hlist_head *list;
- list = tp_event->perf_events;
- if (WARN_ON_ONCE(!list))
+ pcpu_list = tp_event->perf_events;
+ if (WARN_ON_ONCE(!pcpu_list))
return -EINVAL;
- list = this_cpu_ptr(list);
+ if (!(flags & PERF_EF_START))
+ p_event->hw.state = PERF_HES_STOPPED;
+
+ list = this_cpu_ptr(pcpu_list);
hlist_add_head_rcu(&p_event->hlist_entry, list);
return 0;
}
-void perf_trace_disable(struct perf_event *p_event)
+void perf_trace_del(struct perf_event *p_event, int flags)
{
hlist_del_rcu(&p_event->hlist_entry);
}
tp_event->perf_events = NULL;
if (!--total_ref_count) {
- for (i = 0; i < 4; i++) {
+ for (i = 0; i < PERF_NR_CONTEXTS; i++) {
free_percpu(perf_trace_buf[i]);
perf_trace_buf[i] = NULL;
}
}
out:
+ module_put(tp_event->mod);
mutex_unlock(&event_mutex);
}
enum {
FORMAT_HEADER = 1,
- FORMAT_PRINTFMT = 2,
+ FORMAT_FIELD_SEPERATOR = 2,
+ FORMAT_PRINTFMT = 3,
};
static void *f_next(struct seq_file *m, void *v, loff_t *pos)
{
struct ftrace_event_call *call = m->private;
struct ftrace_event_field *field;
- struct list_head *head;
+ struct list_head *common_head = &ftrace_common_fields;
+ struct list_head *head = trace_get_fields(call);
(*pos)++;
switch ((unsigned long)v) {
case FORMAT_HEADER:
- head = &ftrace_common_fields;
+ if (unlikely(list_empty(common_head)))
+ return NULL;
+
+ field = list_entry(common_head->prev,
+ struct ftrace_event_field, link);
+ return field;
+ case FORMAT_FIELD_SEPERATOR:
if (unlikely(list_empty(head)))
return NULL;
return NULL;
}
- head = trace_get_fields(call);
-
- /*
- * To separate common fields from event fields, the
- * LSB is set on the first event field. Clear it in case.
- */
- v = (void *)((unsigned long)v & ~1L);
-
field = v;
- /*
- * If this is a common field, and at the end of the list, then
- * continue with main list.
- */
- if (field->link.prev == &ftrace_common_fields) {
- if (unlikely(list_empty(head)))
- return NULL;
- field = list_entry(head->prev, struct ftrace_event_field, link);
- /* Set the LSB to notify f_show to print an extra newline */
- field = (struct ftrace_event_field *)
- ((unsigned long)field | 1);
- return field;
- }
-
- /* If we are done tell f_show to print the format */
- if (field->link.prev == head)
+ if (field->link.prev == common_head)
+ return (void *)FORMAT_FIELD_SEPERATOR;
+ else if (field->link.prev == head)
return (void *)FORMAT_PRINTFMT;
field = list_entry(field->link.prev, struct ftrace_event_field, link);
seq_printf(m, "format:\n");
return 0;
+ case FORMAT_FIELD_SEPERATOR:
+ seq_putc(m, '\n');
+ return 0;
+
case FORMAT_PRINTFMT:
seq_printf(m, "\nprint fmt: %s\n",
call->print_fmt);
return 0;
}
- /*
- * To separate common fields from event fields, the
- * LSB is set on the first event field. Clear it and
- * print a newline if it is set.
- */
- if ((unsigned long)v & 1) {
- seq_putc(m, '\n');
- v = (void *)((unsigned long)v & ~1L);
- }
-
field = v;
/*
#include "trace.h"
#include "trace_output.h"
+/* When set, irq functions will be ignored */
+static int ftrace_graph_skip_irqs;
+
struct fgraph_cpu_data {
pid_t last_pid;
int depth;
+ int depth_irq;
int ignore;
unsigned long enter_funcs[FTRACE_RETFUNC_DEPTH];
};
struct fgraph_data {
- struct fgraph_cpu_data *cpu_data;
+ struct fgraph_cpu_data __percpu *cpu_data;
/* Place to preserve last processed entry. */
struct ftrace_graph_ent_entry ent;
#define TRACE_GRAPH_PRINT_PROC 0x8
#define TRACE_GRAPH_PRINT_DURATION 0x10
#define TRACE_GRAPH_PRINT_ABS_TIME 0x20
+#define TRACE_GRAPH_PRINT_IRQS 0x40
static struct tracer_opt trace_opts[] = {
/* Display overruns? (for self-debug purpose) */
{ TRACER_OPT(funcgraph-duration, TRACE_GRAPH_PRINT_DURATION) },
/* Display absolute time of an entry */
{ TRACER_OPT(funcgraph-abstime, TRACE_GRAPH_PRINT_ABS_TIME) },
+ /* Display interrupts */
+ { TRACER_OPT(funcgraph-irqs, TRACE_GRAPH_PRINT_IRQS) },
{ } /* Empty entry */
};
static struct tracer_flags tracer_flags = {
/* Don't display overruns and proc by default */
.val = TRACE_GRAPH_PRINT_CPU | TRACE_GRAPH_PRINT_OVERHEAD |
- TRACE_GRAPH_PRINT_DURATION,
+ TRACE_GRAPH_PRINT_DURATION | TRACE_GRAPH_PRINT_IRQS,
.opts = trace_opts
};
return 1;
}
+static inline int ftrace_graph_ignore_irqs(void)
+{
+ if (!ftrace_graph_skip_irqs)
+ return 0;
+
+ return in_irq();
+}
+
int trace_graph_entry(struct ftrace_graph_ent *trace)
{
struct trace_array *tr = graph_array;
return 0;
/* trace it when it is-nested-in or is a function enabled. */
- if (!(trace->depth || ftrace_graph_addr(trace->func)))
+ if (!(trace->depth || ftrace_graph_addr(trace->func)) ||
+ ftrace_graph_ignore_irqs())
return 0;
local_irq_save(flags);
return trace_graph_entry(trace);
}
+static void
+__trace_graph_function(struct trace_array *tr,
+ unsigned long ip, unsigned long flags, int pc)
+{
+ u64 time = trace_clock_local();
+ struct ftrace_graph_ent ent = {
+ .func = ip,
+ .depth = 0,
+ };
+ struct ftrace_graph_ret ret = {
+ .func = ip,
+ .depth = 0,
+ .calltime = time,
+ .rettime = time,
+ };
+
+ __trace_graph_entry(tr, &ent, flags, pc);
+ __trace_graph_return(tr, &ret, flags, pc);
+}
+
+void
+trace_graph_function(struct trace_array *tr,
+ unsigned long ip, unsigned long parent_ip,
+ unsigned long flags, int pc)
+{
+ __trace_graph_function(tr, ip, flags, pc);
+}
+
void __trace_graph_return(struct trace_array *tr,
struct ftrace_graph_ret *trace,
unsigned long flags,
/* Print nsecs (we don't want to exceed 7 numbers) */
if (len < 7) {
- snprintf(nsecs_str, min(sizeof(nsecs_str), 8UL - len), "%03lu",
- nsecs_rem);
+ size_t slen = min_t(size_t, sizeof(nsecs_str), 8UL - len);
+
+ snprintf(nsecs_str, slen, "%03lu", nsecs_rem);
ret = trace_seq_printf(s, ".%s", nsecs_str);
if (!ret)
return TRACE_TYPE_PARTIAL_LINE;
return 0;
}
+/*
+ * Entry check for irq code
+ *
+ * returns 1 if
+ * - we are inside irq code
+ * - we just extered irq code
+ *
+ * retunns 0 if
+ * - funcgraph-interrupts option is set
+ * - we are not inside irq code
+ */
+static int
+check_irq_entry(struct trace_iterator *iter, u32 flags,
+ unsigned long addr, int depth)
+{
+ int cpu = iter->cpu;
+ int *depth_irq;
+ struct fgraph_data *data = iter->private;
+
+ /*
+ * If we are either displaying irqs, or we got called as
+ * a graph event and private data does not exist,
+ * then we bypass the irq check.
+ */
+ if ((flags & TRACE_GRAPH_PRINT_IRQS) ||
+ (!data))
+ return 0;
+
+ depth_irq = &(per_cpu_ptr(data->cpu_data, cpu)->depth_irq);
+
+ /*
+ * We are inside the irq code
+ */
+ if (*depth_irq >= 0)
+ return 1;
+
+ if ((addr < (unsigned long)__irqentry_text_start) ||
+ (addr >= (unsigned long)__irqentry_text_end))
+ return 0;
+
+ /*
+ * We are entering irq code.
+ */
+ *depth_irq = depth;
+ return 1;
+}
+
+/*
+ * Return check for irq code
+ *
+ * returns 1 if
+ * - we are inside irq code
+ * - we just left irq code
+ *
+ * returns 0 if
+ * - funcgraph-interrupts option is set
+ * - we are not inside irq code
+ */
+static int
+check_irq_return(struct trace_iterator *iter, u32 flags, int depth)
+{
+ int cpu = iter->cpu;
+ int *depth_irq;
+ struct fgraph_data *data = iter->private;
+
+ /*
+ * If we are either displaying irqs, or we got called as
+ * a graph event and private data does not exist,
+ * then we bypass the irq check.
+ */
+ if ((flags & TRACE_GRAPH_PRINT_IRQS) ||
+ (!data))
+ return 0;
+
+ depth_irq = &(per_cpu_ptr(data->cpu_data, cpu)->depth_irq);
+
+ /*
+ * We are not inside the irq code.
+ */
+ if (*depth_irq == -1)
+ return 0;
+
+ /*
+ * We are inside the irq code, and this is returning entry.
+ * Let's not trace it and clear the entry depth, since
+ * we are out of irq code.
+ *
+ * This condition ensures that we 'leave the irq code' once
+ * we are out of the entry depth. Thus protecting us from
+ * the RETURN entry loss.
+ */
+ if (*depth_irq >= depth) {
+ *depth_irq = -1;
+ return 1;
+ }
+
+ /*
+ * We are inside the irq code, and this is not the entry.
+ */
+ return 1;
+}
+
static enum print_line_t
print_graph_entry(struct ftrace_graph_ent_entry *field, struct trace_seq *s,
struct trace_iterator *iter, u32 flags)
static enum print_line_t ret;
int cpu = iter->cpu;
+ if (check_irq_entry(iter, flags, call->func, call->depth))
+ return TRACE_TYPE_HANDLED;
+
if (print_graph_prologue(iter, s, TRACE_GRAPH_ENT, call->func, flags))
return TRACE_TYPE_PARTIAL_LINE;
int ret;
int i;
+ if (check_irq_return(iter, flags, trace->depth))
+ return TRACE_TYPE_HANDLED;
+
if (data) {
struct fgraph_cpu_data *cpu_data;
int cpu = iter->cpu;
enum print_line_t
-print_graph_function_flags(struct trace_iterator *iter, u32 flags)
+__print_graph_function_flags(struct trace_iterator *iter, u32 flags)
{
struct ftrace_graph_ent_entry *field;
struct fgraph_data *data = iter->private;
static enum print_line_t
print_graph_function(struct trace_iterator *iter)
{
- return print_graph_function_flags(iter, tracer_flags.val);
+ return __print_graph_function_flags(iter, tracer_flags.val);
+}
+
+enum print_line_t print_graph_function_flags(struct trace_iterator *iter,
+ u32 flags)
+{
+ if (trace_flags & TRACE_ITER_LATENCY_FMT)
+ flags |= TRACE_GRAPH_PRINT_DURATION;
+ else
+ flags |= TRACE_GRAPH_PRINT_ABS_TIME;
+
+ return __print_graph_function_flags(iter, flags);
}
static enum print_line_t
seq_printf(s, "#%.*s|||| / \n", size, spaces);
}
-void print_graph_headers_flags(struct seq_file *s, u32 flags)
+static void __print_graph_headers_flags(struct seq_file *s, u32 flags)
{
int lat = trace_flags & TRACE_ITER_LATENCY_FMT;
print_graph_headers_flags(s, tracer_flags.val);
}
+void print_graph_headers_flags(struct seq_file *s, u32 flags)
+{
+ struct trace_iterator *iter = s->private;
+
+ if (trace_flags & TRACE_ITER_LATENCY_FMT) {
+ /* print nothing if the buffers are empty */
+ if (trace_empty(iter))
+ return;
+
+ print_trace_header(s, iter);
+ flags |= TRACE_GRAPH_PRINT_DURATION;
+ } else
+ flags |= TRACE_GRAPH_PRINT_ABS_TIME;
+
+ __print_graph_headers_flags(s, flags);
+}
+
void graph_trace_open(struct trace_iterator *iter)
{
/* pid and depth on the last trace processed */
pid_t *pid = &(per_cpu_ptr(data->cpu_data, cpu)->last_pid);
int *depth = &(per_cpu_ptr(data->cpu_data, cpu)->depth);
int *ignore = &(per_cpu_ptr(data->cpu_data, cpu)->ignore);
+ int *depth_irq = &(per_cpu_ptr(data->cpu_data, cpu)->depth_irq);
+
*pid = -1;
*depth = 0;
*ignore = 0;
+ *depth_irq = -1;
}
iter->private = data;
}
}
+static int func_graph_set_flag(u32 old_flags, u32 bit, int set)
+{
+ if (bit == TRACE_GRAPH_PRINT_IRQS)
+ ftrace_graph_skip_irqs = !set;
+
+ return 0;
+}
+
static struct trace_event_functions graph_functions = {
.trace = print_graph_function_event,
};
.print_line = print_graph_function,
.print_header = print_graph_headers,
.flags = &tracer_flags,
+ .set_flag = func_graph_set_flag,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_function_graph,
#endif
#ifdef CONFIG_FUNCTION_TRACER
/*
- * irqsoff uses its own tracer function to keep the overhead down:
+ * Prologue for the preempt and irqs off function tracers.
+ *
+ * Returns 1 if it is OK to continue, and data->disabled is
+ * incremented.
+ * 0 if the trace is to be ignored, and data->disabled
+ * is kept the same.
+ *
+ * Note, this function is also used outside this ifdef but
+ * inside the #ifdef of the function graph tracer below.
+ * This is OK, since the function graph tracer is
+ * dependent on the function tracer.
*/
-static void
-irqsoff_tracer_call(unsigned long ip, unsigned long parent_ip)
+static int func_prolog_dec(struct trace_array *tr,
+ struct trace_array_cpu **data,
+ unsigned long *flags)
{
- struct trace_array *tr = irqsoff_trace;
- struct trace_array_cpu *data;
- unsigned long flags;
long disabled;
int cpu;
*/
cpu = raw_smp_processor_id();
if (likely(!per_cpu(tracing_cpu, cpu)))
- return;
+ return 0;
- local_save_flags(flags);
+ local_save_flags(*flags);
/* slight chance to get a false positive on tracing_cpu */
- if (!irqs_disabled_flags(flags))
- return;
+ if (!irqs_disabled_flags(*flags))
+ return 0;
- data = tr->data[cpu];
- disabled = atomic_inc_return(&data->disabled);
+ *data = tr->data[cpu];
+ disabled = atomic_inc_return(&(*data)->disabled);
if (likely(disabled == 1))
- trace_function(tr, ip, parent_ip, flags, preempt_count());
+ return 1;
+
+ atomic_dec(&(*data)->disabled);
+
+ return 0;
+}
+
+/*
+ * irqsoff uses its own tracer function to keep the overhead down:
+ */
+static void
+irqsoff_tracer_call(unsigned long ip, unsigned long parent_ip)
+{
+ struct trace_array *tr = irqsoff_trace;
+ struct trace_array_cpu *data;
+ unsigned long flags;
+
+ if (!func_prolog_dec(tr, &data, &flags))
+ return;
+
+ trace_function(tr, ip, parent_ip, flags, preempt_count());
atomic_dec(&data->disabled);
}
struct trace_array *tr = irqsoff_trace;
struct trace_array_cpu *data;
unsigned long flags;
- long disabled;
int ret;
- int cpu;
int pc;
- cpu = raw_smp_processor_id();
- if (likely(!per_cpu(tracing_cpu, cpu)))
+ if (!func_prolog_dec(tr, &data, &flags))
return 0;
- local_save_flags(flags);
- /* slight chance to get a false positive on tracing_cpu */
- if (!irqs_disabled_flags(flags))
- return 0;
-
- data = tr->data[cpu];
- disabled = atomic_inc_return(&data->disabled);
-
- if (likely(disabled == 1)) {
- pc = preempt_count();
- ret = __trace_graph_entry(tr, trace, flags, pc);
- } else
- ret = 0;
-
+ pc = preempt_count();
+ ret = __trace_graph_entry(tr, trace, flags, pc);
atomic_dec(&data->disabled);
+
return ret;
}
struct trace_array *tr = irqsoff_trace;
struct trace_array_cpu *data;
unsigned long flags;
- long disabled;
- int cpu;
int pc;
- cpu = raw_smp_processor_id();
- if (likely(!per_cpu(tracing_cpu, cpu)))
+ if (!func_prolog_dec(tr, &data, &flags))
return;
- local_save_flags(flags);
- /* slight chance to get a false positive on tracing_cpu */
- if (!irqs_disabled_flags(flags))
- return;
-
- data = tr->data[cpu];
- disabled = atomic_inc_return(&data->disabled);
-
- if (likely(disabled == 1)) {
- pc = preempt_count();
- __trace_graph_return(tr, trace, flags, pc);
- }
-
+ pc = preempt_count();
+ __trace_graph_return(tr, trace, flags, pc);
atomic_dec(&data->disabled);
}
static enum print_line_t irqsoff_print_line(struct trace_iterator *iter)
{
- u32 flags = GRAPH_TRACER_FLAGS;
-
- if (trace_flags & TRACE_ITER_LATENCY_FMT)
- flags |= TRACE_GRAPH_PRINT_DURATION;
- else
- flags |= TRACE_GRAPH_PRINT_ABS_TIME;
-
/*
* In graph mode call the graph tracer output function,
* otherwise go with the TRACE_FN event handler
*/
if (is_graph())
- return print_graph_function_flags(iter, flags);
+ return print_graph_function_flags(iter, GRAPH_TRACER_FLAGS);
return TRACE_TYPE_UNHANDLED;
}
static void irqsoff_print_header(struct seq_file *s)
{
- if (is_graph()) {
- struct trace_iterator *iter = s->private;
- u32 flags = GRAPH_TRACER_FLAGS;
-
- if (trace_flags & TRACE_ITER_LATENCY_FMT) {
- /* print nothing if the buffers are empty */
- if (trace_empty(iter))
- return;
-
- print_trace_header(s, iter);
- flags |= TRACE_GRAPH_PRINT_DURATION;
- } else
- flags |= TRACE_GRAPH_PRINT_ABS_TIME;
-
- print_graph_headers_flags(s, flags);
- } else
+ if (is_graph())
+ print_graph_headers_flags(s, GRAPH_TRACER_FLAGS);
+ else
trace_default_header(s);
}
static void
-trace_graph_function(struct trace_array *tr,
- unsigned long ip, unsigned long flags, int pc)
-{
- u64 time = trace_clock_local();
- struct ftrace_graph_ent ent = {
- .func = ip,
- .depth = 0,
- };
- struct ftrace_graph_ret ret = {
- .func = ip,
- .depth = 0,
- .calltime = time,
- .rettime = time,
- };
-
- __trace_graph_entry(tr, &ent, flags, pc);
- __trace_graph_return(tr, &ret, flags, pc);
-}
-
-static void
__trace_function(struct trace_array *tr,
unsigned long ip, unsigned long parent_ip,
unsigned long flags, int pc)
{
- if (!is_graph())
+ if (is_graph())
+ trace_graph_function(tr, ip, parent_ip, flags, pc);
+ else
trace_function(tr, ip, parent_ip, flags, pc);
- else {
- trace_graph_function(tr, parent_ip, flags, pc);
- trace_graph_function(tr, ip, flags, pc);
- }
}
#else
static int kretprobe_dispatcher(struct kretprobe_instance *ri,
struct pt_regs *regs);
-/* Check the name is good for event/group */
-static int check_event_name(const char *name)
+/* Check the name is good for event/group/fields */
+static int is_good_name(const char *name)
{
if (!isalpha(*name) && *name != '_')
return 0;
else
tp->rp.kp.pre_handler = kprobe_dispatcher;
- if (!event || !check_event_name(event)) {
+ if (!event || !is_good_name(event)) {
ret = -EINVAL;
goto error;
}
if (!tp->call.name)
goto error;
- if (!group || !check_event_name(group)) {
+ if (!group || !is_good_name(group)) {
ret = -EINVAL;
goto error;
}
int i, ret = 0;
int is_return = 0, is_delete = 0;
char *symbol = NULL, *event = NULL, *group = NULL;
- char *arg, *tmp;
+ char *arg;
unsigned long offset = 0;
void *addr = NULL;
char buf[MAX_EVENT_NAME_LEN];
/* parse arguments */
ret = 0;
for (i = 0; i < argc && i < MAX_TRACE_ARGS; i++) {
+ /* Increment count for freeing args in error case */
+ tp->nr_args++;
+
/* Parse argument name */
arg = strchr(argv[i], '=');
- if (arg)
+ if (arg) {
*arg++ = '\0';
- else
+ tp->args[i].name = kstrdup(argv[i], GFP_KERNEL);
+ } else {
arg = argv[i];
+ /* If argument name is omitted, set "argN" */
+ snprintf(buf, MAX_EVENT_NAME_LEN, "arg%d", i + 1);
+ tp->args[i].name = kstrdup(buf, GFP_KERNEL);
+ }
- tp->args[i].name = kstrdup(argv[i], GFP_KERNEL);
if (!tp->args[i].name) {
- pr_info("Failed to allocate argument%d name '%s'.\n",
- i, argv[i]);
+ pr_info("Failed to allocate argument[%d] name.\n", i);
ret = -ENOMEM;
goto error;
}
- tmp = strchr(tp->args[i].name, ':');
- if (tmp)
- *tmp = '_'; /* convert : to _ */
+
+ if (!is_good_name(tp->args[i].name)) {
+ pr_info("Invalid argument[%d] name: %s\n",
+ i, tp->args[i].name);
+ ret = -EINVAL;
+ goto error;
+ }
if (conflict_field_name(tp->args[i].name, tp->args, i)) {
- pr_info("Argument%d name '%s' conflicts with "
+ pr_info("Argument[%d] name '%s' conflicts with "
"another field.\n", i, argv[i]);
ret = -EINVAL;
goto error;
/* Parse fetch argument */
ret = parse_probe_arg(arg, tp, &tp->args[i], is_return);
if (ret) {
- pr_info("Parse error at argument%d. (%d)\n", i, ret);
- kfree(tp->args[i].name);
+ pr_info("Parse error at argument[%d]. (%d)\n", i, ret);
goto error;
}
-
- tp->nr_args++;
}
ret = register_trace_probe(tp);
static arch_spinlock_t wakeup_lock =
(arch_spinlock_t)__ARCH_SPIN_LOCK_UNLOCKED;
+static void wakeup_reset(struct trace_array *tr);
static void __wakeup_reset(struct trace_array *tr);
+static int wakeup_graph_entry(struct ftrace_graph_ent *trace);
+static void wakeup_graph_return(struct ftrace_graph_ret *trace);
static int save_lat_flag;
+#define TRACE_DISPLAY_GRAPH 1
+
+static struct tracer_opt trace_opts[] = {
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+ /* display latency trace as call graph */
+ { TRACER_OPT(display-graph, TRACE_DISPLAY_GRAPH) },
+#endif
+ { } /* Empty entry */
+};
+
+static struct tracer_flags tracer_flags = {
+ .val = 0,
+ .opts = trace_opts,
+};
+
+#define is_graph() (tracer_flags.val & TRACE_DISPLAY_GRAPH)
+
#ifdef CONFIG_FUNCTION_TRACER
+
/*
- * irqsoff uses its own tracer function to keep the overhead down:
+ * Prologue for the wakeup function tracers.
+ *
+ * Returns 1 if it is OK to continue, and preemption
+ * is disabled and data->disabled is incremented.
+ * 0 if the trace is to be ignored, and preemption
+ * is not disabled and data->disabled is
+ * kept the same.
+ *
+ * Note, this function is also used outside this ifdef but
+ * inside the #ifdef of the function graph tracer below.
+ * This is OK, since the function graph tracer is
+ * dependent on the function tracer.
*/
-static void
-wakeup_tracer_call(unsigned long ip, unsigned long parent_ip)
+static int
+func_prolog_preempt_disable(struct trace_array *tr,
+ struct trace_array_cpu **data,
+ int *pc)
{
- struct trace_array *tr = wakeup_trace;
- struct trace_array_cpu *data;
- unsigned long flags;
long disabled;
int cpu;
- int pc;
if (likely(!wakeup_task))
- return;
+ return 0;
- pc = preempt_count();
+ *pc = preempt_count();
preempt_disable_notrace();
cpu = raw_smp_processor_id();
if (cpu != wakeup_current_cpu)
goto out_enable;
- data = tr->data[cpu];
- disabled = atomic_inc_return(&data->disabled);
+ *data = tr->data[cpu];
+ disabled = atomic_inc_return(&(*data)->disabled);
if (unlikely(disabled != 1))
goto out;
- local_irq_save(flags);
+ return 1;
- trace_function(tr, ip, parent_ip, flags, pc);
+out:
+ atomic_dec(&(*data)->disabled);
+
+out_enable:
+ preempt_enable_notrace();
+ return 0;
+}
+/*
+ * wakeup uses its own tracer function to keep the overhead down:
+ */
+static void
+wakeup_tracer_call(unsigned long ip, unsigned long parent_ip)
+{
+ struct trace_array *tr = wakeup_trace;
+ struct trace_array_cpu *data;
+ unsigned long flags;
+ int pc;
+
+ if (!func_prolog_preempt_disable(tr, &data, &pc))
+ return;
+
+ local_irq_save(flags);
+ trace_function(tr, ip, parent_ip, flags, pc);
local_irq_restore(flags);
- out:
atomic_dec(&data->disabled);
- out_enable:
preempt_enable_notrace();
}
};
#endif /* CONFIG_FUNCTION_TRACER */
+static int start_func_tracer(int graph)
+{
+ int ret;
+
+ if (!graph)
+ ret = register_ftrace_function(&trace_ops);
+ else
+ ret = register_ftrace_graph(&wakeup_graph_return,
+ &wakeup_graph_entry);
+
+ if (!ret && tracing_is_enabled())
+ tracer_enabled = 1;
+ else
+ tracer_enabled = 0;
+
+ return ret;
+}
+
+static void stop_func_tracer(int graph)
+{
+ tracer_enabled = 0;
+
+ if (!graph)
+ unregister_ftrace_function(&trace_ops);
+ else
+ unregister_ftrace_graph();
+}
+
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+static int wakeup_set_flag(u32 old_flags, u32 bit, int set)
+{
+
+ if (!(bit & TRACE_DISPLAY_GRAPH))
+ return -EINVAL;
+
+ if (!(is_graph() ^ set))
+ return 0;
+
+ stop_func_tracer(!set);
+
+ wakeup_reset(wakeup_trace);
+ tracing_max_latency = 0;
+
+ return start_func_tracer(set);
+}
+
+static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
+{
+ struct trace_array *tr = wakeup_trace;
+ struct trace_array_cpu *data;
+ unsigned long flags;
+ int pc, ret = 0;
+
+ if (!func_prolog_preempt_disable(tr, &data, &pc))
+ return 0;
+
+ local_save_flags(flags);
+ ret = __trace_graph_entry(tr, trace, flags, pc);
+ atomic_dec(&data->disabled);
+ preempt_enable_notrace();
+
+ return ret;
+}
+
+static void wakeup_graph_return(struct ftrace_graph_ret *trace)
+{
+ struct trace_array *tr = wakeup_trace;
+ struct trace_array_cpu *data;
+ unsigned long flags;
+ int pc;
+
+ if (!func_prolog_preempt_disable(tr, &data, &pc))
+ return;
+
+ local_save_flags(flags);
+ __trace_graph_return(tr, trace, flags, pc);
+ atomic_dec(&data->disabled);
+
+ preempt_enable_notrace();
+ return;
+}
+
+static void wakeup_trace_open(struct trace_iterator *iter)
+{
+ if (is_graph())
+ graph_trace_open(iter);
+}
+
+static void wakeup_trace_close(struct trace_iterator *iter)
+{
+ if (iter->private)
+ graph_trace_close(iter);
+}
+
+#define GRAPH_TRACER_FLAGS (TRACE_GRAPH_PRINT_PROC)
+
+static enum print_line_t wakeup_print_line(struct trace_iterator *iter)
+{
+ /*
+ * In graph mode call the graph tracer output function,
+ * otherwise go with the TRACE_FN event handler
+ */
+ if (is_graph())
+ return print_graph_function_flags(iter, GRAPH_TRACER_FLAGS);
+
+ return TRACE_TYPE_UNHANDLED;
+}
+
+static void wakeup_print_header(struct seq_file *s)
+{
+ if (is_graph())
+ print_graph_headers_flags(s, GRAPH_TRACER_FLAGS);
+ else
+ trace_default_header(s);
+}
+
+static void
+__trace_function(struct trace_array *tr,
+ unsigned long ip, unsigned long parent_ip,
+ unsigned long flags, int pc)
+{
+ if (is_graph())
+ trace_graph_function(tr, ip, parent_ip, flags, pc);
+ else
+ trace_function(tr, ip, parent_ip, flags, pc);
+}
+#else
+#define __trace_function trace_function
+
+static int wakeup_set_flag(u32 old_flags, u32 bit, int set)
+{
+ return -EINVAL;
+}
+
+static int wakeup_graph_entry(struct ftrace_graph_ent *trace)
+{
+ return -1;
+}
+
+static enum print_line_t wakeup_print_line(struct trace_iterator *iter)
+{
+ return TRACE_TYPE_UNHANDLED;
+}
+
+static void wakeup_graph_return(struct ftrace_graph_ret *trace) { }
+static void wakeup_print_header(struct seq_file *s) { }
+static void wakeup_trace_open(struct trace_iterator *iter) { }
+static void wakeup_trace_close(struct trace_iterator *iter) { }
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
/*
* Should this new latency be reported/recorded?
*/
/* The task we are waiting for is waking up */
data = wakeup_trace->data[wakeup_cpu];
- trace_function(wakeup_trace, CALLER_ADDR0, CALLER_ADDR1, flags, pc);
+ __trace_function(wakeup_trace, CALLER_ADDR0, CALLER_ADDR1, flags, pc);
tracing_sched_switch_trace(wakeup_trace, prev, next, flags, pc);
T0 = data->preempt_timestamp;
* is not called by an assembly function (where as schedule is)
* it should be safe to use it here.
*/
- trace_function(wakeup_trace, CALLER_ADDR1, CALLER_ADDR2, flags, pc);
+ __trace_function(wakeup_trace, CALLER_ADDR1, CALLER_ADDR2, flags, pc);
out_locked:
arch_spin_unlock(&wakeup_lock);
*/
smp_wmb();
- register_ftrace_function(&trace_ops);
-
- if (tracing_is_enabled())
- tracer_enabled = 1;
- else
- tracer_enabled = 0;
+ if (start_func_tracer(is_graph()))
+ printk(KERN_ERR "failed to start wakeup tracer\n");
return;
fail_deprobe_wake_new:
static void stop_wakeup_tracer(struct trace_array *tr)
{
tracer_enabled = 0;
- unregister_ftrace_function(&trace_ops);
+ stop_func_tracer(is_graph());
unregister_trace_sched_switch(probe_wakeup_sched_switch, NULL);
unregister_trace_sched_wakeup_new(probe_wakeup, NULL);
unregister_trace_sched_wakeup(probe_wakeup, NULL);
.start = wakeup_tracer_start,
.stop = wakeup_tracer_stop,
.print_max = 1,
+ .print_header = wakeup_print_header,
+ .print_line = wakeup_print_line,
+ .flags = &tracer_flags,
+ .set_flag = wakeup_set_flag,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_wakeup,
#endif
+ .open = wakeup_trace_open,
+ .close = wakeup_trace_close,
.use_max_tr = 1,
};
.stop = wakeup_tracer_stop,
.wait_pipe = poll_wait_pipe,
.print_max = 1,
+ .print_header = wakeup_print_header,
+ .print_line = wakeup_print_line,
+ .flags = &tracer_flags,
+ .set_flag = wakeup_set_flag,
#ifdef CONFIG_FTRACE_SELFTEST
.selftest = trace_selftest_startup_wakeup,
#endif
+ .open = wakeup_trace_open,
+ .close = wakeup_trace_close,
.use_max_tr = 1,
};
{
int ret, cpu;
+ for_each_possible_cpu(cpu) {
+ spin_lock_init(&workqueue_cpu_stat(cpu)->lock);
+ INIT_LIST_HEAD(&workqueue_cpu_stat(cpu)->list);
+ }
+
ret = register_trace_workqueue_insertion(probe_workqueue_insertion, NULL);
if (ret)
goto out;
if (ret)
goto no_creation;
- for_each_possible_cpu(cpu) {
- spin_lock_init(&workqueue_cpu_stat(cpu)->lock);
- INIT_LIST_HEAD(&workqueue_cpu_stat(cpu)->list);
- }
-
return 0;
no_creation:
#include <linux/err.h>
#include <linux/slab.h>
#include <linux/sched.h>
+#include <linux/jump_label.h>
extern struct tracepoint __start___tracepoints[];
extern struct tracepoint __stop___tracepoints[];
* is used.
*/
rcu_assign_pointer(elem->funcs, (*entry)->funcs);
- elem->state = active;
+ if (!elem->state && active) {
+ jump_label_enable(&elem->state);
+ elem->state = active;
+ } else if (elem->state && !active) {
+ jump_label_disable(&elem->state);
+ elem->state = active;
+ }
}
/*
if (elem->unregfunc && elem->state)
elem->unregfunc();
- elem->state = 0;
+ if (elem->state) {
+ jump_label_disable(&elem->state);
+ elem->state = 0;
+ }
rcu_assign_pointer(elem->funcs, NULL);
}
static DEFINE_PER_CPU(struct perf_event *, watchdog_ev);
#endif
-static int __read_mostly did_panic;
static int __initdata no_watchdog;
void touch_softlockup_watchdog(void)
{
- __get_cpu_var(watchdog_touch_ts) = 0;
+ __raw_get_cpu_var(watchdog_touch_ts) = 0;
}
EXPORT_SYMBOL(touch_softlockup_watchdog);
#ifdef CONFIG_HARDLOCKUP_DETECTOR
void touch_nmi_watchdog(void)
{
- __get_cpu_var(watchdog_nmi_touch) = true;
+ if (watchdog_enabled) {
+ unsigned cpu;
+
+ for_each_present_cpu(cpu) {
+ if (per_cpu(watchdog_nmi_touch, cpu) != true)
+ per_cpu(watchdog_nmi_touch, cpu) = true;
+ }
+ }
touch_softlockup_watchdog();
}
EXPORT_SYMBOL(touch_nmi_watchdog);
return 0;
}
-static int
-watchdog_panic(struct notifier_block *this, unsigned long event, void *ptr)
-{
- did_panic = 1;
-
- return NOTIFY_DONE;
-}
-
-static struct notifier_block panic_block = {
- .notifier_call = watchdog_panic,
-};
-
#ifdef CONFIG_HARDLOCKUP_DETECTOR
static struct perf_event_attr wd_hw_attr = {
.type = PERF_TYPE_HARDWARE,
};
/* Callback function for perf event subsystem */
-void watchdog_overflow_callback(struct perf_event *event, int nmi,
+static void watchdog_overflow_callback(struct perf_event *event, int nmi,
struct perf_sample_data *data,
struct pt_regs *regs)
{
/* Try to register using hardware perf events */
wd_attr = &wd_hw_attr;
wd_attr->sample_period = hw_nmi_get_sample_period();
- event = perf_event_create_kernel_counter(wd_attr, cpu, -1, watchdog_overflow_callback);
+ event = perf_event_create_kernel_counter(wd_attr, cpu, NULL, watchdog_overflow_callback);
if (!IS_ERR(event)) {
printk(KERN_INFO "NMI watchdog enabled, takes one hw-pmu counter.\n");
goto out_save;
}
printk(KERN_ERR "NMI watchdog failed to create perf event on cpu%i: %p\n", cpu, event);
- return -1;
+ return PTR_ERR(event);
/* success path */
out_save:
static int watchdog_enable(int cpu)
{
struct task_struct *p = per_cpu(softlockup_watchdog, cpu);
+ int err;
/* enable the perf event */
- if (watchdog_nmi_enable(cpu) != 0)
- return -1;
+ err = watchdog_nmi_enable(cpu);
+ if (err)
+ return err;
/* create the watchdog thread */
if (!p) {
p = kthread_create(watchdog, (void *)(unsigned long)cpu, "watchdog/%d", cpu);
if (IS_ERR(p)) {
printk(KERN_ERR "softlockup watchdog for %i failed\n", cpu);
- return -1;
+ return PTR_ERR(p);
}
kthread_bind(p, cpu);
per_cpu(watchdog_touch_ts, cpu) = 0;
wake_up_process(p);
}
+ /* if any cpu succeeds, watchdog is considered enabled for the system */
+ watchdog_enabled = 1;
+
return 0;
}
per_cpu(softlockup_watchdog, cpu) = NULL;
kthread_stop(p);
}
-
- /* if any cpu succeeds, watchdog is considered enabled for the system */
- watchdog_enabled = 1;
}
static void watchdog_enable_all_cpus(void)
{
int cpu;
+ if (no_watchdog)
+ return;
+
for_each_online_cpu(cpu)
watchdog_disable(cpu);
cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
{
int hotcpu = (unsigned long)hcpu;
+ int err = 0;
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
- if (watchdog_prepare_cpu(hotcpu))
- return NOTIFY_BAD;
+ err = watchdog_prepare_cpu(hotcpu);
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
- if (watchdog_enable(hotcpu))
- return NOTIFY_BAD;
+ err = watchdog_enable(hotcpu);
break;
#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
break;
#endif /* CONFIG_HOTPLUG_CPU */
}
- return NOTIFY_OK;
+ return notifier_from_errno(err);
}
static struct notifier_block __cpuinitdata cpu_nfb = {
return 0;
err = cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
- WARN_ON(err == NOTIFY_BAD);
+ WARN_ON(notifier_to_errno(err));
cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
register_cpu_notifier(&cpu_nfb);
- atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
-
return 0;
}
early_initcall(spawn_watchdog_task);
/*
- * linux/kernel/workqueue.c
+ * kernel/workqueue.c - generic async execution with shared worker pool
*
- * Generic mechanism for defining kernel helper threads for running
- * arbitrary tasks in process context.
+ * Copyright (C) 2002 Ingo Molnar
*
- * Started by Ingo Molnar, Copyright (C) 2002
+ * Derived from the taskqueue/keventd code by:
+ * David Woodhouse <dwmw2@infradead.org>
+ * Andrew Morton
+ * Kai Petzke <wpp@marie.physik.tu-berlin.de>
+ * Theodore Ts'o <tytso@mit.edu>
*
- * Derived from the taskqueue/keventd code by:
+ * Made to use alloc_percpu by Christoph Lameter.
*
- * David Woodhouse <dwmw2@infradead.org>
- * Andrew Morton
- * Kai Petzke <wpp@marie.physik.tu-berlin.de>
- * Theodore Ts'o <tytso@mit.edu>
+ * Copyright (C) 2010 SUSE Linux Products GmbH
+ * Copyright (C) 2010 Tejun Heo <tj@kernel.org>
*
- * Made to use alloc_percpu by Christoph Lameter.
+ * This is the generic async execution mechanism. Work items as are
+ * executed in process context. The worker pool is shared and
+ * automatically managed. There is one worker pool for each CPU and
+ * one extra for works which are better served by workers which are
+ * not bound to any specific CPU.
+ *
+ * Please read Documentation/workqueue.txt for details.
*/
#include <linux/module.h>
select DEBUG_SPINLOCK
select DEBUG_MUTEXES
select DEBUG_LOCK_ALLOC
+ select TRACE_IRQFLAGS
default n
help
This feature enables the kernel to prove that all locking
disabling, allowing multiple RCU-lockdep warnings to be printed
on a single reboot.
+ Say Y to allow multiple RCU-lockdep warnings per boot.
+
+ Say N if you are unsure.
+
+config SPARSE_RCU_POINTER
+ bool "RCU debugging: sparse-based checks for pointer usage"
+ default n
+ help
+ This feature enables the __rcu sparse annotation for
+ RCU-protected pointers. This annotation will cause sparse
+ to flag any non-RCU used of annotated pointers. This can be
+ helpful when debugging RCU usage. Please note that this feature
+ is not intended to enforce code cleanliness; it is instead merely
+ a debugging aid.
+
+ Say Y to make sparse flag questionable use of RCU-protected pointers
+
Say N if you are unsure.
config LOCKDEP
of more runtime overhead.
config TRACE_IRQFLAGS
- depends on DEBUG_KERNEL
bool
- default y
- depends on TRACE_IRQFLAGS_SUPPORT
- depends on PROVE_LOCKING
+ help
+ Enables hooks to interrupt enabling and disabling for
+ either tracing or lock debugging.
config DEBUG_SPINLOCK_SLEEP
bool "Spinlock debugging: sleep-inside-spinlock checking"
Say Y if you are unsure.
+config RCU_CPU_STALL_TIMEOUT
+ int "RCU CPU stall timeout in seconds"
+ depends on RCU_CPU_STALL_DETECTOR
+ range 3 300
+ default 60
+ help
+ If a given RCU grace period extends more than the specified
+ number of seconds, a CPU stall warning is printed. If the
+ RCU grace period persists, additional CPU stall warnings are
+ printed at more widely spaced intervals.
+
+config RCU_CPU_STALL_DETECTOR_RUNNABLE
+ bool "RCU CPU stall checking starts automatically at boot"
+ depends on RCU_CPU_STALL_DETECTOR
+ default y
+ help
+ If set, start checking for RCU CPU stalls immediately on
+ boot. Otherwise, RCU CPU stall checking must be manually
+ enabled.
+
+ Say Y if you are unsure.
+
+ Say N if you wish to suppress RCU CPU stall checking during boot.
+
config RCU_CPU_STALL_VERBOSE
bool "Print additional per-task information for RCU_CPU_STALL_DETECTOR"
depends on RCU_CPU_STALL_DETECTOR && TREE_PREEMPT_RCU
return NULL;
}
-int module_bug_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs,
- struct module *mod)
+void module_bug_finalize(const Elf_Ehdr *hdr, const Elf_Shdr *sechdrs,
+ struct module *mod)
{
char *secstrings;
unsigned int i;
* could potentially lead to deadlock and thus be counter-productive.
*/
list_add(&mod->bug_list, &module_bug_list);
-
- return 0;
}
void module_bug_cleanup(struct module *mod)
#include <linux/dynamic_debug.h>
#include <linux/debugfs.h>
#include <linux/slab.h>
+#include <linux/jump_label.h>
extern struct _ddebug __start___verbose[];
extern struct _ddebug __stop___verbose[];
-/* dynamic_debug_enabled, and dynamic_debug_enabled2 are bitmasks in which
- * bit n is set to 1 if any modname hashes into the bucket n, 0 otherwise. They
- * use independent hash functions, to reduce the chance of false positives.
- */
-long long dynamic_debug_enabled;
-EXPORT_SYMBOL_GPL(dynamic_debug_enabled);
-long long dynamic_debug_enabled2;
-EXPORT_SYMBOL_GPL(dynamic_debug_enabled2);
-
struct ddebug_table {
struct list_head link;
char *mod_name;
}
/*
- * must be called with ddebug_lock held
- */
-
-static int disabled_hash(char hash, bool first_table)
-{
- struct ddebug_table *dt;
- char table_hash_value;
-
- list_for_each_entry(dt, &ddebug_tables, link) {
- if (first_table)
- table_hash_value = dt->ddebugs->primary_hash;
- else
- table_hash_value = dt->ddebugs->secondary_hash;
- if (dt->num_enabled && (hash == table_hash_value))
- return 0;
- }
- return 1;
-}
-
-/*
* Search the tables for _ddebug's which match the given
* `query' and apply the `flags' and `mask' to them. Tells
* the user which ddebug's were changed, or whether none
dt->num_enabled++;
dp->flags = newflags;
if (newflags) {
- dynamic_debug_enabled |=
- (1LL << dp->primary_hash);
- dynamic_debug_enabled2 |=
- (1LL << dp->secondary_hash);
+ jump_label_enable(&dp->enabled);
} else {
- if (disabled_hash(dp->primary_hash, true))
- dynamic_debug_enabled &=
- ~(1LL << dp->primary_hash);
- if (disabled_hash(dp->secondary_hash, false))
- dynamic_debug_enabled2 &=
- ~(1LL << dp->secondary_hash);
+ jump_label_disable(&dp->enabled);
}
if (verbose)
printk(KERN_INFO
* element comparison is needed, so the client's cmp()
* routine can invoke cond_resched() periodically.
*/
- (*cmp)(priv, tail, tail);
+ (*cmp)(priv, tail->next, tail->next);
tail->next->prev = tail;
tail = tail->next;
unsigned int height; /* Height from the bottom */
unsigned int count;
struct rcu_head rcu_head;
- void *slots[RADIX_TREE_MAP_SIZE];
+ void __rcu *slots[RADIX_TREE_MAP_SIZE];
unsigned long tags[RADIX_TREE_MAX_TAGS][RADIX_TREE_TAG_LONGS];
};
left -= sg_size;
sg = alloc_fn(alloc_size, gfp_mask);
- if (unlikely(!sg))
- return -ENOMEM;
+ if (unlikely(!sg)) {
+ /*
+ * Adjust entry count to reflect that the last
+ * entry of the previous table won't be used for
+ * linkage. Without this, sg_kfree() may get
+ * confused.
+ */
+ if (prv)
+ table->nents = ++table->orig_nents;
+
+ return -ENOMEM;
+ }
sg_init_table(sg, alloc_size);
table->nents = table->orig_nents += sg_size;
*/
static unsigned long io_tlb_overflow = 32*1024;
-void *io_tlb_overflow_buffer;
+static void *io_tlb_overflow_buffer;
/*
* This is a free list describing the number of free entries available from
* to find contiguous free memory regions of size up to IO_TLB_SEGSIZE
* between io_tlb_start and io_tlb_end.
*/
- io_tlb_list = alloc_bootmem(io_tlb_nslabs * sizeof(int));
+ io_tlb_list = alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
for (i = 0; i < io_tlb_nslabs; i++)
io_tlb_list[i] = IO_TLB_SEGSIZE - OFFSET(i, IO_TLB_SEGSIZE);
io_tlb_index = 0;
- io_tlb_orig_addr = alloc_bootmem(io_tlb_nslabs * sizeof(phys_addr_t));
+ io_tlb_orig_addr = alloc_bootmem_pages(PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
/*
* Get the overflow emergency buffer
*/
- io_tlb_overflow_buffer = alloc_bootmem_low(io_tlb_overflow);
+ io_tlb_overflow_buffer = alloc_bootmem_low_pages(PAGE_ALIGN(io_tlb_overflow));
if (!io_tlb_overflow_buffer)
panic("Cannot allocate SWIOTLB overflow buffer!\n");
if (verbose)
/*
* Get IO TLB memory from the low pages
*/
- io_tlb_start = alloc_bootmem_low_pages(bytes);
+ io_tlb_start = alloc_bootmem_low_pages(PAGE_ALIGN(bytes));
if (!io_tlb_start)
panic("Cannot allocate SWIOTLB buffer");
get_order(io_tlb_nslabs << IO_TLB_SHIFT));
} else {
free_bootmem_late(__pa(io_tlb_overflow_buffer),
- io_tlb_overflow);
+ PAGE_ALIGN(io_tlb_overflow));
free_bootmem_late(__pa(io_tlb_orig_addr),
- io_tlb_nslabs * sizeof(phys_addr_t));
+ PAGE_ALIGN(io_tlb_nslabs * sizeof(phys_addr_t)));
free_bootmem_late(__pa(io_tlb_list),
- io_tlb_nslabs * sizeof(int));
+ PAGE_ALIGN(io_tlb_nslabs * sizeof(int)));
free_bootmem_late(__pa(io_tlb_start),
- io_tlb_nslabs << IO_TLB_SHIFT);
+ PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
}
config MIGRATION
bool "Page migration"
def_bool y
- depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE
+ depends on NUMA || ARCH_ENABLE_MEMORY_HOTREMOVE || COMPACTION
help
Allows the migration of the physical location of pages of processes
while the virtual addresses are not changed. This is useful in
struct backing_dev_info noop_backing_dev_info = {
.name = "noop",
+ .capabilities = BDI_CAP_NO_ACCT_AND_WRITEBACK,
};
EXPORT_SYMBOL_GPL(noop_backing_dev_info);
err = bdi_init(&default_backing_dev_info);
if (!err)
bdi_register(&default_backing_dev_info, NULL, "default");
+ err = bdi_init(&noop_backing_dev_info);
return err;
}
switch (action) {
case FORK_THREAD:
__set_current_state(TASK_RUNNING);
- task = kthread_run(bdi_writeback_thread, &bdi->wb, "flush-%s",
- dev_name(bdi->dev));
+ task = kthread_create(bdi_writeback_thread, &bdi->wb,
+ "flush-%s", dev_name(bdi->dev));
if (IS_ERR(task)) {
/*
* If thread creation fails, force writeout of
/*
* The spinlock makes sure we do not lose
* wake-ups when racing with 'bdi_queue_work()'.
+ * And as soon as the bdi thread is visible, we
+ * can start it.
*/
spin_lock_bh(&bdi->wb_lock);
bdi->wb.task = task;
spin_unlock_bh(&bdi->wb_lock);
+ wake_up_process(task);
}
break;
*/
vfrom = page_address(fromvec->bv_page) + tovec->bv_offset;
- flush_dcache_page(tovec->bv_page);
bounce_copy_vec(tovec, vfrom);
+ flush_dcache_page(tovec->bv_page);
}
}
/* Similar to reclaim, but different enough that they don't share logic */
static bool too_many_isolated(struct zone *zone)
{
-
- unsigned long inactive, isolated;
+ unsigned long active, inactive, isolated;
inactive = zone_page_state(zone, NR_INACTIVE_FILE) +
zone_page_state(zone, NR_INACTIVE_ANON);
+ active = zone_page_state(zone, NR_ACTIVE_FILE) +
+ zone_page_state(zone, NR_ACTIVE_ANON);
isolated = zone_page_state(zone, NR_ISOLATED_FILE) +
zone_page_state(zone, NR_ISOLATED_ANON);
- return isolated > inactive;
+ return isolated > (inactive + active) / 2;
}
/*
{
struct mm_struct *mm = current->mm;
struct address_space *mapping;
- unsigned long end = start + size;
struct vm_area_struct *vma;
int err = -EINVAL;
int has_write_lock = 0;
if (start + size <= start)
return err;
+ /* Does pgoff wrap? */
+ if (pgoff + (size >> PAGE_SHIFT) < pgoff)
+ return err;
+
/* Can we represent this offset inside this architecture's pte's? */
#if PTE_FILE_MAX_BITS < BITS_PER_LONG
if (pgoff + (size >> PAGE_SHIFT) >= (1UL << PTE_FILE_MAX_BITS))
if (!(vma->vm_flags & VM_CAN_NONLINEAR))
goto out;
- if (end <= start || start < vma->vm_start || end > vma->vm_end)
+ if (start < vma->vm_start || start + size > vma->vm_end)
goto out;
/* Must set VM_NONLINEAR before any pages are populated. */
* and just make the page writable */
avoidcopy = (page_mapcount(old_page) == 1);
if (avoidcopy) {
- if (!trylock_page(old_page)) {
- if (PageAnon(old_page))
- page_move_anon_rmap(old_page, vma, address);
- } else
- unlock_page(old_page);
+ if (PageAnon(old_page))
+ page_move_anon_rmap(old_page, vma, address);
set_huge_ptep_writable(vma, address, ptep);
return 0;
}
set_huge_pte_at(mm, address, ptep,
make_huge_pte(vma, new_page, 1));
page_remove_rmap(old_page);
- hugepage_add_anon_rmap(new_page, vma, address);
+ hugepage_add_new_anon_rmap(new_page, vma, address);
/* Make the old page be freed below */
new_page = old_page;
mmu_notifier_invalidate_range_end(mm,
vma, address);
}
- if (!pagecache_page) {
- page = pte_page(entry);
+ /*
+ * hugetlb_cow() requires page locks of pte_page(entry) and
+ * pagecache_page, so here we need take the former one
+ * when page != pagecache_page or !pagecache_page.
+ * Note that locking order is always pagecache_page -> page,
+ * so no worry about deadlock.
+ */
+ page = pte_page(entry);
+ if (page != pagecache_page)
lock_page(page);
- }
spin_lock(&mm->page_table_lock);
/* Check for a racing update before calling hugetlb_cow */
if (pagecache_page) {
unlock_page(pagecache_page);
put_page(pagecache_page);
- } else {
- unlock_page(page);
}
+ unlock_page(page);
out_mutex:
mutex_unlock(&hugetlb_instantiation_mutex);
if (!ptep)
goto out;
- if (pte_write(*ptep)) {
+ if (pte_write(*ptep) || pte_dirty(*ptep)) {
pte_t entry;
swapped = PageSwapCache(page);
set_pte_at(mm, addr, ptep, entry);
goto out_unlock;
}
- entry = pte_wrprotect(entry);
+ if (pte_dirty(entry))
+ set_page_dirty(page);
+ entry = pte_mkclean(pte_wrprotect(entry));
set_pte_at_notify(mm, addr, ptep, entry);
}
*orig_pte = *ptep;
{
struct page *new_page;
- unlock_page(page); /* any racers will COW it, not modify it */
-
new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
if (new_page) {
copy_user_highpage(new_page, page, address, vma);
add_page_to_unevictable_list(new_page);
}
- page_cache_release(page);
return new_page;
}
static void mem_cgroup_threshold(struct mem_cgroup *memcg)
{
- __mem_cgroup_threshold(memcg, false);
- if (do_swap_account)
- __mem_cgroup_threshold(memcg, true);
+ while (memcg) {
+ __mem_cgroup_threshold(memcg, false);
+ if (do_swap_account)
+ __mem_cgroup_threshold(memcg, true);
+
+ memcg = parent_mem_cgroup(memcg);
+ }
}
static int compare_thresholds(const void *a, const void *b)
* signal.
*/
static int kill_proc_ao(struct task_struct *t, unsigned long addr, int trapno,
- unsigned long pfn)
+ unsigned long pfn, struct page *page)
{
struct siginfo si;
int ret;
#ifdef __ARCH_SI_TRAPNO
si.si_trapno = trapno;
#endif
- si.si_addr_lsb = PAGE_SHIFT;
+ si.si_addr_lsb = compound_order(compound_head(page)) + PAGE_SHIFT;
/*
* Don't use force here, it's convenient if the signal
* can be temporarily blocked.
int nr;
do {
nr = shrink_slab(1000, GFP_KERNEL, 1000);
- if (page_count(p) == 0)
+ if (page_count(p) == 1)
break;
} while (nr > 10);
}
* wrong earlier.
*/
static void kill_procs_ao(struct list_head *to_kill, int doit, int trapno,
- int fail, unsigned long pfn)
+ int fail, struct page *page, unsigned long pfn)
{
struct to_kill *tk, *next;
* process anyways.
*/
else if (kill_proc_ao(tk->tsk, tk->addr, trapno,
- pfn) < 0)
+ pfn, page) < 0)
printk(KERN_ERR
"MCE %#lx: Cannot send advisory machine check signal to %s:%d\n",
pfn, tk->tsk->comm, tk->tsk->pid);
* any accesses to the poisoned memory.
*/
kill_procs_ao(&tokill, !!PageDirty(hpage), trapno,
- ret != SWAP_SUCCESS, pfn);
+ ret != SWAP_SUCCESS, p, pfn);
return ret;
}
unsigned int flags, pte_t orig_pte)
{
spinlock_t *ptl;
- struct page *page;
+ struct page *page, *swapcache = NULL;
swp_entry_t entry;
pte_t pte;
struct mem_cgroup *ptr = NULL;
lock_page(page);
delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
- page = ksm_might_need_to_copy(page, vma, address);
- if (!page) {
- ret = VM_FAULT_OOM;
- goto out;
+ /*
+ * Make sure try_to_free_swap or reuse_swap_page or swapoff did not
+ * release the swapcache from under us. The page pin, and pte_same
+ * test below, are not enough to exclude that. Even if it is still
+ * swapcache, we need to check that the page's swap has not changed.
+ */
+ if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
+ goto out_page;
+
+ if (ksm_might_need_to_copy(page, vma, address)) {
+ swapcache = page;
+ page = ksm_does_need_to_copy(page, vma, address);
+
+ if (unlikely(!page)) {
+ ret = VM_FAULT_OOM;
+ page = swapcache;
+ swapcache = NULL;
+ goto out_page;
+ }
}
if (mem_cgroup_try_charge_swapin(mm, page, GFP_KERNEL, &ptr)) {
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
try_to_free_swap(page);
unlock_page(page);
+ if (swapcache) {
+ /*
+ * Hold the lock to avoid the swap entry to be reused
+ * until we take the PT lock for the pte_same() check
+ * (to avoid false positives from pte_same). For
+ * further safety release the lock after the swap_free
+ * so that the swap count won't change under a
+ * parallel locked swapcache.
+ */
+ unlock_page(swapcache);
+ page_cache_release(swapcache);
+ }
if (flags & FAULT_FLAG_WRITE) {
ret |= do_wp_page(mm, vma, address, page_table, pmd, ptl, pte);
unlock_page(page);
out_release:
page_cache_release(page);
+ if (swapcache) {
+ unlock_page(swapcache);
+ page_cache_release(swapcache);
+ }
return ret;
}
* with threads.
*/
if (flags & FAULT_FLAG_WRITE)
- flush_tlb_page(vma, address);
+ flush_tlb_fix_spurious_fault(vma, address);
}
unlock:
pte_unmap_unlock(pte, ptl);
/* Return the start of the next active pageblock after a given page */
static struct page *next_active_pageblock(struct page *page)
{
- int pageblocks_stride;
-
/* Ensure the starting page is pageblock-aligned */
BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1));
- /* Move forward by at least 1 * pageblock_nr_pages */
- pageblocks_stride = 1;
-
/* If the entire pageblock is free, move to the end of free page */
- if (pageblock_free(page))
- pageblocks_stride += page_order(page) - pageblock_order;
+ if (pageblock_free(page)) {
+ int order;
+ /* be careful. we don't have locks, page_order can be changed.*/
+ order = page_order(page);
+ if ((order < MAX_ORDER) && (order >= pageblock_order))
+ return page + (1 << order);
+ }
- return page + (pageblocks_stride * pageblock_nr_pages);
+ return page + pageblock_nr_pages;
}
/* Checks if this range of memory is likely to be hot-removable. */
}
}
-/* Is the vma a continuation of the stack vma above it? */
-static inline int vma_stack_continue(struct vm_area_struct *vma, unsigned long addr)
-{
- return vma && (vma->vm_end == addr) && (vma->vm_flags & VM_GROWSDOWN);
-}
-
static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long addr)
{
return (vma->vm_flags & VM_GROWSDOWN) &&
removed_exe_file_vma(mm);
fput(new->vm_file);
}
+ unlink_anon_vmas(new);
out_free_mpol:
mpol_put(pol);
out_free_vma:
return 1;
}
#endif /* CONFIG_ARCH_HAS_HOLES_MEMORYMODEL */
+
+#ifdef CONFIG_SMP
+/* Called when a more accurate view of NR_FREE_PAGES is needed */
+unsigned long zone_nr_free_pages(struct zone *zone)
+{
+ unsigned long nr_free_pages = zone_page_state(zone, NR_FREE_PAGES);
+
+ /*
+ * While kswapd is awake, it is considered the zone is under some
+ * memory pressure. Under pressure, there is a risk that
+ * per-cpu-counter-drift will allow the min watermark to be breached
+ * potentially causing a live-lock. While kswapd is awake and
+ * free pages are low, get a better estimate for free pages
+ */
+ if (nr_free_pages < zone->percpu_drift_mark &&
+ !waitqueue_active(&zone->zone_pgdat->kswapd_wait))
+ return zone_page_state_snapshot(zone, NR_FREE_PAGES);
+
+ return nr_free_pages;
+}
+#endif /* CONFIG_SMP */
}
/* return true if the task is not adequate as candidate victim task. */
-static bool oom_unkillable_task(struct task_struct *p, struct mem_cgroup *mem,
- const nodemask_t *nodemask)
+static bool oom_unkillable_task(struct task_struct *p,
+ const struct mem_cgroup *mem, const nodemask_t *nodemask)
{
if (is_global_init(p))
return true;
*/
points += p->signal->oom_score_adj;
- if (points < 0)
- return 0;
+ /*
+ * Never return 0 for an eligible task that may be killed since it's
+ * possible that no single user task uses more than 0.1% of memory and
+ * no single admin tasks uses more than 3.0%.
+ */
+ if (points <= 0)
+ return 1;
return (points < 1000) ? points : 1000;
}
/**
* dump_tasks - dump current memory state of all system tasks
* @mem: current's memory controller, if constrained
+ * @nodemask: nodemask passed to page allocator for mempolicy ooms
*
- * Dumps the current memory state of all system tasks, excluding kernel threads.
+ * Dumps the current memory state of all eligible tasks. Tasks not in the same
+ * memcg, not in the same cpuset, or bound to a disjoint set of mempolicy nodes
+ * are not shown.
* State information includes task's pid, uid, tgid, vm size, rss, cpu, oom_adj
* value, oom_score_adj value, and name.
*
- * If the actual is non-NULL, only tasks that are a member of the mem_cgroup are
- * shown.
- *
* Call with tasklist_lock read-locked.
*/
-static void dump_tasks(const struct mem_cgroup *mem)
+static void dump_tasks(const struct mem_cgroup *mem, const nodemask_t *nodemask)
{
struct task_struct *p;
struct task_struct *task;
pr_info("[ pid ] uid tgid total_vm rss cpu oom_adj oom_score_adj name\n");
for_each_process(p) {
- if (p->flags & PF_KTHREAD)
- continue;
- if (mem && !task_in_mem_cgroup(p, mem))
+ if (oom_unkillable_task(p, mem, nodemask))
continue;
task = find_lock_task_mm(p);
}
static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
- struct mem_cgroup *mem)
+ struct mem_cgroup *mem, const nodemask_t *nodemask)
{
task_lock(current);
pr_warning("%s invoked oom-killer: gfp_mask=0x%x, order=%d, "
mem_cgroup_print_oom_info(mem, p);
show_mem();
if (sysctl_oom_dump_tasks)
- dump_tasks(mem);
+ dump_tasks(mem, nodemask);
}
#define K(x) ((x) << (PAGE_SHIFT-10))
unsigned int victim_points = 0;
if (printk_ratelimit())
- dump_header(p, gfp_mask, order, mem);
+ dump_header(p, gfp_mask, order, mem, nodemask);
/*
* If the task is already exiting, don't alarm the sysadmin or kill
* Determines whether the kernel must panic because of the panic_on_oom sysctl.
*/
static void check_panic_on_oom(enum oom_constraint constraint, gfp_t gfp_mask,
- int order)
+ int order, const nodemask_t *nodemask)
{
if (likely(!sysctl_panic_on_oom))
return;
return;
}
read_lock(&tasklist_lock);
- dump_header(NULL, gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL, nodemask);
read_unlock(&tasklist_lock);
panic("Out of memory: %s panic_on_oom is enabled\n",
sysctl_panic_on_oom == 2 ? "compulsory" : "system-wide");
unsigned int points = 0;
struct task_struct *p;
- check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, 0);
+ check_panic_on_oom(CONSTRAINT_MEMCG, gfp_mask, 0, NULL);
limit = mem_cgroup_get_limit(mem) >> PAGE_SHIFT;
read_lock(&tasklist_lock);
retry:
void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
int order, nodemask_t *nodemask)
{
+ const nodemask_t *mpol_mask;
struct task_struct *p;
unsigned long totalpages;
unsigned long freed = 0;
*/
constraint = constrained_alloc(zonelist, gfp_mask, nodemask,
&totalpages);
- check_panic_on_oom(constraint, gfp_mask, order);
+ mpol_mask = (constraint == CONSTRAINT_MEMORY_POLICY) ? nodemask : NULL;
+ check_panic_on_oom(constraint, gfp_mask, order, mpol_mask);
read_lock(&tasklist_lock);
if (sysctl_oom_kill_allocating_task &&
}
retry:
- p = select_bad_process(&points, totalpages, NULL,
- constraint == CONSTRAINT_MEMORY_POLICY ? nodemask :
- NULL);
+ p = select_bad_process(&points, totalpages, NULL, mpol_mask);
if (PTR_ERR(p) == -1UL)
goto out;
/* Found nothing?!?! Either we hang forever, or we panic. */
if (!p) {
- dump_header(NULL, gfp_mask, order, NULL);
+ dump_header(NULL, gfp_mask, order, NULL, mpol_mask);
read_unlock(&tasklist_lock);
panic("Out of memory and no killable processes...\n");
}
{
int migratetype = 0;
int batch_free = 0;
+ int to_free = count;
spin_lock(&zone->lock);
zone->all_unreclaimable = 0;
zone->pages_scanned = 0;
- __mod_zone_page_state(zone, NR_FREE_PAGES, count);
- while (count) {
+ while (to_free) {
struct page *page;
struct list_head *list;
/* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
__free_one_page(page, zone, 0, page_private(page));
trace_mm_page_pcpu_drain(page, 0, page_private(page));
- } while (--count && --batch_free && !list_empty(list));
+ } while (--to_free && --batch_free && !list_empty(list));
}
+ __mod_zone_page_state(zone, NR_FREE_PAGES, count);
spin_unlock(&zone->lock);
}
zone->all_unreclaimable = 0;
zone->pages_scanned = 0;
- __mod_zone_page_state(zone, NR_FREE_PAGES, 1 << order);
__free_one_page(page, zone, order, migratetype);
+ __mod_zone_page_state(zone, NR_FREE_PAGES, 1 << order);
spin_unlock(&zone->lock);
}
{
/* free_pages my go negative - that's OK */
long min = mark;
- long free_pages = zone_page_state(z, NR_FREE_PAGES) - (1 << order) + 1;
+ long free_pages = zone_nr_free_pages(z) - (1 << order) + 1;
int o;
if (alloc_flags & ALLOC_HIGH)
struct page *page = NULL;
struct reclaim_state reclaim_state;
struct task_struct *p = current;
+ bool drained = false;
cond_resched();
cond_resched();
- if (order != 0)
- drain_all_pages();
+ if (unlikely(!(*did_some_progress)))
+ return NULL;
- if (likely(*did_some_progress))
- page = get_page_from_freelist(gfp_mask, nodemask, order,
+retry:
+ page = get_page_from_freelist(gfp_mask, nodemask, order,
zonelist, high_zoneidx,
alloc_flags, preferred_zone,
migratetype);
+
+ /*
+ * If an allocation failed after direct reclaim, it could be because
+ * pages are pinned on the per-cpu lists. Drain them and try again
+ */
+ if (!page && !drained) {
+ drain_all_pages();
+ drained = true;
+ goto retry;
+ }
+
return page;
}
" all_unreclaimable? %s"
"\n",
zone->name,
- K(zone_page_state(zone, NR_FREE_PAGES)),
+ K(zone_nr_free_pages(zone)),
K(min_wmark_pages(zone)),
K(low_wmark_pages(zone)),
K(high_wmark_pages(zone)),
if (!table)
panic("Failed to allocate %s hash table\n", tablename);
- printk(KERN_INFO "%s hash table entries: %d (order: %d, %lu bytes)\n",
+ printk(KERN_INFO "%s hash table entries: %ld (order: %d, %lu bytes)\n",
tablename,
- (1U << log2qty),
+ (1UL << log2qty),
ilog2(size) - PAGE_SHIFT,
size);
if (pcpu_first_unit_cpu == NR_CPUS)
pcpu_first_unit_cpu = cpu;
+ pcpu_last_unit_cpu = cpu;
}
}
- pcpu_last_unit_cpu = cpu;
pcpu_nr_units = unit;
for_each_possible_cpu(cpu)
unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma)
{
if (PageAnon(page)) {
- if (vma->anon_vma->root != page_anon_vma(page)->root)
+ struct anon_vma *page__anon_vma = page_anon_vma(page);
+ /*
+ * Note: swapoff's unuse_vma() is more efficient with this
+ * check, and needs it to match anon_vma when KSM is active.
+ */
+ if (!vma->anon_vma || !page__anon_vma ||
+ vma->anon_vma->root != page__anon_vma->root)
return -EFAULT;
} else if (page->mapping && !(vma->vm_flags & VM_NONLINEAR)) {
if (!vma->vm_file ||
struct vm_area_struct *vma, unsigned long address, int exclusive)
{
struct anon_vma *anon_vma = vma->anon_vma;
+
BUG_ON(!anon_vma);
- if (!exclusive) {
- struct anon_vma_chain *avc;
- avc = list_entry(vma->anon_vma_chain.prev,
- struct anon_vma_chain, same_vma);
- anon_vma = avc->anon_vma;
- }
+
+ if (PageAnon(page))
+ return;
+ if (!exclusive)
+ anon_vma = anon_vma->root;
+
anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
page->mapping = (struct address_space *) anon_vma;
page->index = linear_page_index(vma, address);
{
struct anon_vma *anon_vma = vma->anon_vma;
int first;
+
+ BUG_ON(!PageLocked(page));
BUG_ON(!anon_vma);
BUG_ON(address < vma->vm_start || address >= vma->vm_end);
first = atomic_inc_and_test(&page->_mapcount);
long total_swap_pages;
static int least_priority;
-static bool swap_for_hibernation;
-
static const char Bad_file[] = "Bad swap file entry ";
static const char Unused_file[] = "Unused swap file entry ";
static const char Bad_offset[] = "Bad swap offset entry ";
nr_blocks = ((sector_t)se->nr_pages - 1) << (PAGE_SHIFT - 9);
if (nr_blocks) {
err = blkdev_issue_discard(si->bdev, start_block,
- nr_blocks, GFP_KERNEL,
- BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER);
+ nr_blocks, GFP_KERNEL, BLKDEV_IFL_WAIT);
if (err)
return err;
cond_resched();
nr_blocks = (sector_t)se->nr_pages << (PAGE_SHIFT - 9);
err = blkdev_issue_discard(si->bdev, start_block,
- nr_blocks, GFP_KERNEL,
- BLKDEV_IFL_WAIT | BLKDEV_IFL_BARRIER);
+ nr_blocks, GFP_KERNEL, BLKDEV_IFL_WAIT);
if (err)
break;
start_block <<= PAGE_SHIFT - 9;
nr_blocks <<= PAGE_SHIFT - 9;
if (blkdev_issue_discard(si->bdev, start_block,
- nr_blocks, GFP_NOIO, BLKDEV_IFL_WAIT |
- BLKDEV_IFL_BARRIER))
+ nr_blocks, GFP_NOIO, BLKDEV_IFL_WAIT))
break;
}
if (offset > si->highest_bit)
scan_base = offset = si->lowest_bit;
- /* reuse swap entry of cache-only swap if not hibernation. */
- if (vm_swap_full()
- && usage == SWAP_HAS_CACHE
- && si->swap_map[offset] == SWAP_HAS_CACHE) {
+ /* reuse swap entry of cache-only swap if not busy. */
+ if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
int swap_was_freed;
spin_unlock(&swap_lock);
swap_was_freed = __try_to_reclaim_swap(si, offset);
spin_lock(&swap_lock);
if (nr_swap_pages <= 0)
goto noswap;
- if (swap_for_hibernation)
- goto noswap;
nr_swap_pages--;
for (type = swap_list.next; type >= 0 && wrapped < 2; type = next) {
return (swp_entry_t) {0};
}
+/* The only caller of this function is now susupend routine */
+swp_entry_t get_swap_page_of_type(int type)
+{
+ struct swap_info_struct *si;
+ pgoff_t offset;
+
+ spin_lock(&swap_lock);
+ si = swap_info[type];
+ if (si && (si->flags & SWP_WRITEOK)) {
+ nr_swap_pages--;
+ /* This is called for allocating swap entry, not cache */
+ offset = scan_swap_map(si, 1);
+ if (offset) {
+ spin_unlock(&swap_lock);
+ return swp_entry(type, offset);
+ }
+ nr_swap_pages++;
+ }
+ spin_unlock(&swap_lock);
+ return (swp_entry_t) {0};
+}
+
static struct swap_info_struct *swap_info_get(swp_entry_t entry)
{
struct swap_info_struct *p;
if (page_swapcount(page))
return 0;
+ /*
+ * Once hibernation has begun to create its image of memory,
+ * there's a danger that one of the calls to try_to_free_swap()
+ * - most probably a call from __try_to_reclaim_swap() while
+ * hibernation is allocating its own swap pages for the image,
+ * but conceivably even a call from memory reclaim - will free
+ * the swap from a page which has already been recorded in the
+ * image as a clean swapcache page, and then reuse its swap for
+ * another page of the image. On waking from hibernation, the
+ * original page might be freed under memory pressure, then
+ * later read back in from swap, now with the wrong data.
+ *
+ * Hibernation clears bits from gfp_allowed_mask to prevent
+ * memory reclaim from writing to disk, so check that here.
+ */
+ if (!(gfp_allowed_mask & __GFP_IO))
+ return 0;
+
delete_from_swap_cache(page);
SetPageDirty(page);
return 1;
#endif
#ifdef CONFIG_HIBERNATION
-
-static pgoff_t hibernation_offset[MAX_SWAPFILES];
-/*
- * Once hibernation starts to use swap, we freeze swap_map[]. Otherwise,
- * saved swap_map[] image to the disk will be an incomplete because it's
- * changing without synchronization with hibernation snap shot.
- * At resume, we just make swap_for_hibernation=false. We can forget
- * used maps easily.
- */
-void hibernation_freeze_swap(void)
-{
- int i;
-
- spin_lock(&swap_lock);
-
- printk(KERN_INFO "PM: Freeze Swap\n");
- swap_for_hibernation = true;
- for (i = 0; i < MAX_SWAPFILES; i++)
- hibernation_offset[i] = 1;
- spin_unlock(&swap_lock);
-}
-
-void hibernation_thaw_swap(void)
-{
- spin_lock(&swap_lock);
- if (swap_for_hibernation) {
- printk(KERN_INFO "PM: Thaw Swap\n");
- swap_for_hibernation = false;
- }
- spin_unlock(&swap_lock);
-}
-
-/*
- * Because updateing swap_map[] can make not-saved-status-change,
- * we use our own easy allocator.
- * Please see kernel/power/swap.c, Used swaps are recorded into
- * RB-tree.
- */
-swp_entry_t get_swap_for_hibernation(int type)
-{
- pgoff_t off;
- swp_entry_t val = {0};
- struct swap_info_struct *si;
-
- spin_lock(&swap_lock);
-
- si = swap_info[type];
- if (!si || !(si->flags & SWP_WRITEOK))
- goto done;
-
- for (off = hibernation_offset[type]; off < si->max; ++off) {
- if (!si->swap_map[off])
- break;
- }
- if (off < si->max) {
- val = swp_entry(type, off);
- hibernation_offset[type] = off + 1;
- }
-done:
- spin_unlock(&swap_lock);
- return val;
-}
-
-void swap_free_for_hibernation(swp_entry_t ent)
-{
- /* Nothing to do */
-}
-
/*
* Find the swap type that corresponds to given device (if any).
*
p->flags |= SWP_SOLIDSTATE;
p->cluster_next = 1 + (random32() % p->highest_bit);
}
- if (discard_swap(p) == 0)
+ if (discard_swap(p) == 0 && (swap_flags & SWAP_FLAG_DISCARD))
p->flags |= SWP_DISCARDABLE;
}
static void purge_fragmented_blocks_allcpus(void);
/*
+ * called before a call to iounmap() if the caller wants vm_area_struct's
+ * immediately freed.
+ */
+void set_iounmap_nonlazy(void)
+{
+ atomic_set(&vmap_lazy_nr, lazy_max_pages()+1);
+}
+
+/*
* Purges all lazily-freed vmap areas.
*
* If sync is 0 then don't purge if there is already a purge in progress.
* If a zone is deemed to be full of pinned pages then just give it a light
* scan then give up on it.
*/
-static bool shrink_zones(int priority, struct zonelist *zonelist,
+static void shrink_zones(int priority, struct zonelist *zonelist,
struct scan_control *sc)
{
struct zoneref *z;
struct zone *zone;
- bool all_unreclaimable = true;
for_each_zone_zonelist_nodemask(zone, z, zonelist,
gfp_zone(sc->gfp_mask), sc->nodemask) {
}
shrink_zone(priority, zone, sc);
- all_unreclaimable = false;
}
+}
+
+static bool zone_reclaimable(struct zone *zone)
+{
+ return zone->pages_scanned < zone_reclaimable_pages(zone) * 6;
+}
+
+/*
+ * As hibernation is going on, kswapd is freezed so that it can't mark
+ * the zone into all_unreclaimable. It can't handle OOM during hibernation.
+ * So let's check zone's unreclaimable in direct reclaim as well as kswapd.
+ */
+static bool all_unreclaimable(struct zonelist *zonelist,
+ struct scan_control *sc)
+{
+ struct zoneref *z;
+ struct zone *zone;
+ bool all_unreclaimable = true;
+
+ for_each_zone_zonelist_nodemask(zone, z, zonelist,
+ gfp_zone(sc->gfp_mask), sc->nodemask) {
+ if (!populated_zone(zone))
+ continue;
+ if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL))
+ continue;
+ if (zone_reclaimable(zone)) {
+ all_unreclaimable = false;
+ break;
+ }
+ }
+
return all_unreclaimable;
}
struct scan_control *sc)
{
int priority;
- bool all_unreclaimable;
unsigned long total_scanned = 0;
struct reclaim_state *reclaim_state = current->reclaim_state;
struct zoneref *z;
sc->nr_scanned = 0;
if (!priority)
disable_swap_token();
- all_unreclaimable = shrink_zones(priority, zonelist, sc);
+ shrink_zones(priority, zonelist, sc);
/*
* Don't shrink slabs when reclaiming memory from
* over limit cgroups
return sc->nr_reclaimed;
/* top priority shrink_zones still had more to do? don't OOM, then */
- if (scanning_global_lru(sc) && !all_unreclaimable)
+ if (scanning_global_lru(sc) && !all_unreclaimable(zonelist, sc))
return 1;
return 0;
total_scanned += sc.nr_scanned;
if (zone->all_unreclaimable)
continue;
- if (nr_slab == 0 &&
- zone->pages_scanned >= (zone_reclaimable_pages(zone) * 6))
+ if (nr_slab == 0 && !zone_reclaimable(zone))
zone->all_unreclaimable = 1;
/*
* If we've done a decent amount of scanning and
int threshold;
for_each_populated_zone(zone) {
+ unsigned long max_drift, tolerate_drift;
+
threshold = calculate_threshold(zone);
for_each_online_cpu(cpu)
per_cpu_ptr(zone->pageset, cpu)->stat_threshold
= threshold;
+
+ /*
+ * Only set percpu_drift_mark if there is a danger that
+ * NR_FREE_PAGES reports the low watermark is ok when in fact
+ * the min watermark could be breached by an allocation
+ */
+ tolerate_drift = low_wmark_pages(zone) - min_wmark_pages(zone);
+ max_drift = num_online_cpus() * threshold;
+ if (max_drift > tolerate_drift)
+ zone->percpu_drift_mark = high_wmark_pages(zone) +
+ max_drift;
}
}
"\n scanned %lu"
"\n spanned %lu"
"\n present %lu",
- zone_page_state(zone, NR_FREE_PAGES),
+ zone_nr_free_pages(zone),
min_wmark_pages(zone),
low_wmark_pages(zone),
high_wmark_pages(zone),
switch (action) {
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
+ refresh_zone_stat_thresholds();
start_cpu_timer(cpu);
node_set_state(cpu_to_node(cpu), N_CPU);
break;
if (vlan_dev)
skb->dev = vlan_dev;
- else if (vlan_id)
- goto drop;
+ else if (vlan_id) {
+ if (!(skb->dev->flags & IFF_PROMISC))
+ goto drop;
+ skb->pkt_type = PACKET_OTHERHOST;
+ }
return (polling ? netif_receive_skb(skb) : netif_rx(skb));
if (vlan_dev)
skb->dev = vlan_dev;
- else if (vlan_id)
- goto drop;
+ else if (vlan_id) {
+ if (!(skb->dev->flags & IFF_PROMISC))
+ goto drop;
+ skb->pkt_type = PACKET_OTHERHOST;
+ }
for (p = napi->gro_list; p; p = p->next) {
NAPI_GRO_CB(p)->same_flow =
}
}
- if (c->tagpool)
+ if (c->tagpool) {
+ p9_idpool_put(0, c->tagpool); /* free reserved tag 0 */
p9_idpool_destroy(c->tagpool);
+ }
/* free requests associated with tags */
for (row = 0; row < (c->max_tag/P9_ROW_MAXTAG); row++) {
int16_t nwqids, count;
err = 0;
+ wqids = NULL;
clnt = oldfid->clnt;
if (clone) {
fid = p9_fid_create(clnt);
else
fid->qid = oldfid->qid;
+ kfree(wqids);
return fid;
clunk_fid:
+ kfree(wqids);
p9_client_clunk(fid);
fid = NULL;
/* Allocate an fcall for the reply */
rpl_context = kmalloc(sizeof *rpl_context, GFP_KERNEL);
- if (!rpl_context)
+ if (!rpl_context) {
+ err = -ENOMEM;
goto err_close;
+ }
/*
* If the request has a buffer, steal it, otherwise
}
rpl_context->rc = req->rc;
if (!rpl_context->rc) {
- kfree(rpl_context);
- goto err_close;
+ err = -ENOMEM;
+ goto err_free2;
}
/*
*/
if (atomic_inc_return(&rdma->rq_count) <= rdma->rq_depth) {
err = post_recv(client, rpl_context);
- if (err) {
- kfree(rpl_context->rc);
- kfree(rpl_context);
- goto err_close;
- }
+ if (err)
+ goto err_free1;
} else
atomic_dec(&rdma->rq_count);
/* Post the request */
c = kmalloc(sizeof *c, GFP_KERNEL);
- if (!c)
- goto err_close;
+ if (!c) {
+ err = -ENOMEM;
+ goto err_free1;
+ }
c->req = req;
c->busa = ib_dma_map_single(rdma->cm_id->device,
return ib_post_send(rdma->qp, &wr, &bad_wr);
error:
+ kfree(c);
+ kfree(rpl_context->rc);
+ kfree(rpl_context);
P9_DPRINTK(P9_DEBUG_ERROR, "EIO\n");
return -EIO;
-
+ err_free1:
+ kfree(rpl_context->rc);
+ err_free2:
+ kfree(rpl_context);
err_close:
spin_lock_irqsave(&rdma->req_lock, flags);
if (rdma->state < P9_RDMA_CLOSING) {
mutex_lock(&virtio_9p_lock);
list_for_each_entry(chan, &virtio_chan_list, chan_list) {
- if (!strncmp(devname, chan->tag, chan->tag_len)) {
+ if (!strncmp(devname, chan->tag, chan->tag_len) &&
+ strlen(devname) == chan->tag_len) {
if (!chan->inuse) {
chan->inuse = true;
found = 1;
config RPS
boolean
- depends on SMP && SYSFS
+ depends on SMP && SYSFS && USE_GENERIC_SMP_HELPERS
default y
menu "Network testing"
source "net/rfkill/Kconfig"
source "net/9p/Kconfig"
source "net/caif/Kconfig"
+source "net/ceph/Kconfig"
endif # if NET
endif
obj-$(CONFIG_WIMAX) += wimax/
obj-$(CONFIG_DNS_RESOLVER) += dns_resolver/
+obj-$(CONFIG_CEPH_LIB) += ceph/
unregister_netdev(net_dev);
free_netdev(net_dev);
}
- read_lock_irq(&devs_lock);
- if (list_empty(&br2684_devs)) {
- /* last br2684 device */
- unregister_atmdevice_notifier(&atm_dev_notifier);
- }
- read_unlock_irq(&devs_lock);
return;
}
if (list_empty(&br2684_devs)) {
/* 1st br2684 device */
- register_atmdevice_notifier(&atm_dev_notifier);
brdev->number = 1;
} else
brdev->number = BRPRIV(list_entry_brdev(br2684_devs.prev))->number + 1;
return -ENOMEM;
#endif
register_atm_ioctl(&br2684_ioctl_ops);
+ register_atmdevice_notifier(&atm_dev_notifier);
return 0;
}
#endif
- /* if not already empty */
- if (!list_empty(&br2684_devs))
- unregister_atmdevice_notifier(&atm_dev_notifier);
+ unregister_atmdevice_notifier(&atm_dev_notifier);
while (!list_empty(&br2684_devs)) {
net_dev = list_entry_brdev(br2684_devs.next);
eg->packets_rcvd++;
mpc->eg_ops->put(eg);
- memset(ATM_SKB(skb), 0, sizeof(struct atm_skb_data));
+ memset(ATM_SKB(new_skb), 0, sizeof(struct atm_skb_data));
netif_rx(new_skb);
}
static void l2cap_streaming_send(struct sock *sk)
{
- struct sk_buff *skb, *tx_skb;
+ struct sk_buff *skb;
struct l2cap_pinfo *pi = l2cap_pi(sk);
u16 control, fcs;
- while ((skb = sk->sk_send_head)) {
- tx_skb = skb_clone(skb, GFP_ATOMIC);
-
- control = get_unaligned_le16(tx_skb->data + L2CAP_HDR_SIZE);
+ while ((skb = skb_dequeue(TX_QUEUE(sk)))) {
+ control = get_unaligned_le16(skb->data + L2CAP_HDR_SIZE);
control |= pi->next_tx_seq << L2CAP_CTRL_TXSEQ_SHIFT;
- put_unaligned_le16(control, tx_skb->data + L2CAP_HDR_SIZE);
+ put_unaligned_le16(control, skb->data + L2CAP_HDR_SIZE);
if (pi->fcs == L2CAP_FCS_CRC16) {
- fcs = crc16(0, (u8 *)tx_skb->data, tx_skb->len - 2);
- put_unaligned_le16(fcs, tx_skb->data + tx_skb->len - 2);
+ fcs = crc16(0, (u8 *)skb->data, skb->len - 2);
+ put_unaligned_le16(fcs, skb->data + skb->len - 2);
}
- l2cap_do_send(sk, tx_skb);
+ l2cap_do_send(sk, skb);
pi->next_tx_seq = (pi->next_tx_seq + 1) % 64;
-
- if (skb_queue_is_last(TX_QUEUE(sk), skb))
- sk->sk_send_head = NULL;
- else
- sk->sk_send_head = skb_queue_next(TX_QUEUE(sk), skb);
-
- skb = skb_dequeue(TX_QUEUE(sk));
- kfree_skb(skb);
}
}
switch (optname) {
case L2CAP_OPTIONS:
+ if (sk->sk_state == BT_CONNECTED) {
+ err = -EINVAL;
+ break;
+ }
+
opts.imtu = l2cap_pi(sk)->imtu;
opts.omtu = l2cap_pi(sk)->omtu;
opts.flush_to = l2cap_pi(sk)->flush_to;
case L2CAP_CONF_MTU:
if (val < L2CAP_DEFAULT_MIN_MTU) {
*result = L2CAP_CONF_UNACCEPT;
- pi->omtu = L2CAP_DEFAULT_MIN_MTU;
+ pi->imtu = L2CAP_DEFAULT_MIN_MTU;
} else
- pi->omtu = val;
- l2cap_add_conf_opt(&ptr, L2CAP_CONF_MTU, 2, pi->omtu);
+ pi->imtu = val;
+ l2cap_add_conf_opt(&ptr, L2CAP_CONF_MTU, 2, pi->imtu);
break;
case L2CAP_CONF_FLUSH_TO:
return 0;
}
+static inline void set_default_fcs(struct l2cap_pinfo *pi)
+{
+ /* FCS is enabled only in ERTM or streaming mode, if one or both
+ * sides request it.
+ */
+ if (pi->mode != L2CAP_MODE_ERTM && pi->mode != L2CAP_MODE_STREAMING)
+ pi->fcs = L2CAP_FCS_NONE;
+ else if (!(pi->conf_state & L2CAP_CONF_NO_FCS_RECV))
+ pi->fcs = L2CAP_FCS_CRC16;
+}
+
static inline int l2cap_config_req(struct l2cap_conn *conn, struct l2cap_cmd_hdr *cmd, u16 cmd_len, u8 *data)
{
struct l2cap_conf_req *req = (struct l2cap_conf_req *) data;
if (!sk)
return -ENOENT;
- if (sk->sk_state != BT_CONFIG) {
- struct l2cap_cmd_rej rej;
-
- rej.reason = cpu_to_le16(0x0002);
- l2cap_send_cmd(conn, cmd->ident, L2CAP_COMMAND_REJ,
- sizeof(rej), &rej);
+ if (sk->sk_state == BT_DISCONN)
goto unlock;
- }
/* Reject if config buffer is too small. */
len = cmd_len - sizeof(*req);
goto unlock;
if (l2cap_pi(sk)->conf_state & L2CAP_CONF_INPUT_DONE) {
- if (!(l2cap_pi(sk)->conf_state & L2CAP_CONF_NO_FCS_RECV) ||
- l2cap_pi(sk)->fcs != L2CAP_FCS_NONE)
- l2cap_pi(sk)->fcs = L2CAP_FCS_CRC16;
+ set_default_fcs(l2cap_pi(sk));
sk->sk_state = BT_CONNECTED;
l2cap_pi(sk)->conf_state |= L2CAP_CONF_INPUT_DONE;
if (l2cap_pi(sk)->conf_state & L2CAP_CONF_OUTPUT_DONE) {
- if (!(l2cap_pi(sk)->conf_state & L2CAP_CONF_NO_FCS_RECV) ||
- l2cap_pi(sk)->fcs != L2CAP_FCS_NONE)
- l2cap_pi(sk)->fcs = L2CAP_FCS_CRC16;
+ set_default_fcs(l2cap_pi(sk));
sk->sk_state = BT_CONNECTED;
l2cap_pi(sk)->next_tx_seq = 0;
static void rfcomm_sk_state_change(struct rfcomm_dlc *d, int err)
{
struct sock *sk = d->owner, *parent;
+ unsigned long flags;
+
if (!sk)
return;
BT_DBG("dlc %p state %ld err %d", d, d->state, err);
+ local_irq_save(flags);
bh_lock_sock(sk);
if (err)
}
bh_unlock_sock(sk);
+ local_irq_restore(flags);
if (parent && sock_flag(sk, SOCK_ZAPPED)) {
/* We have to drop DLC lock here, otherwise
long timeo;
int err;
int ifindex, headroom, tailroom;
+ unsigned int mtu;
struct net_device *dev;
lock_sock(sk);
cf_sk->sk.sk_state = CAIF_DISCONNECTED;
goto out;
}
- dev = dev_get_by_index(sock_net(sk), ifindex);
+
+ err = -ENODEV;
+ rcu_read_lock();
+ dev = dev_get_by_index_rcu(sock_net(sk), ifindex);
+ if (!dev) {
+ rcu_read_unlock();
+ goto out;
+ }
cf_sk->headroom = LL_RESERVED_SPACE_EXTRA(dev, headroom);
+ mtu = dev->mtu;
+ rcu_read_unlock();
+
cf_sk->tailroom = tailroom;
- cf_sk->maxframe = dev->mtu - (headroom + tailroom);
- dev_put(dev);
+ cf_sk->maxframe = mtu - (headroom + tailroom);
if (cf_sk->maxframe < 1) {
- pr_warning("CAIF: %s(): CAIF Interface MTU too small (%d)\n",
- __func__, dev->mtu);
- err = -ENODEV;
+ pr_warning("CAIF: %s(): CAIF Interface MTU too small (%u)\n",
+ __func__, mtu);
goto out;
}
--- /dev/null
+config CEPH_LIB
+ tristate "Ceph core library (EXPERIMENTAL)"
+ depends on INET && EXPERIMENTAL
+ select LIBCRC32C
+ select CRYPTO_AES
+ select CRYPTO
+ default n
+ help
+ Choose Y or M here to include cephlib, which provides the
+ common functionality to both the Ceph filesystem and
+ to the rados block device (rbd).
+
+ More information at http://ceph.newdream.net/.
+
+ If unsure, say N.
+
+config CEPH_LIB_PRETTYDEBUG
+ bool "Include file:line in ceph debug output"
+ depends on CEPH_LIB
+ default n
+ help
+ If you say Y here, debug output will include a filename and
+ line to aid debugging. This increases kernel size and slows
+ execution slightly when debug call sites are enabled (e.g.,
+ via CONFIG_DYNAMIC_DEBUG).
+
+ If unsure, say N.
+
--- /dev/null
+#
+# Makefile for CEPH filesystem.
+#
+
+ifneq ($(KERNELRELEASE),)
+
+obj-$(CONFIG_CEPH_LIB) += libceph.o
+
+libceph-objs := ceph_common.o messenger.o msgpool.o buffer.o pagelist.o \
+ mon_client.o \
+ osd_client.o osdmap.o crush/crush.o crush/mapper.o crush/hash.o \
+ debugfs.o \
+ auth.o auth_none.o \
+ crypto.o armor.o \
+ auth_x.o \
+ ceph_fs.o ceph_strings.o ceph_hash.o \
+ pagevec.o
+
+else
+#Otherwise we were called directly from the command
+# line; invoke the kernel build system.
+
+KERNELDIR ?= /lib/modules/$(shell uname -r)/build
+PWD := $(shell pwd)
+
+default: all
+
+all:
+ $(MAKE) -C $(KERNELDIR) M=$(PWD) CONFIG_CEPH_LIB=m modules
+
+modules_install:
+ $(MAKE) -C $(KERNELDIR) M=$(PWD) CONFIG_CEPH_LIB=m modules_install
+
+clean:
+ $(MAKE) -C $(KERNELDIR) M=$(PWD) clean
+
+endif
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/module.h>
#include <linux/err.h>
#include <linux/slab.h>
-#include "types.h"
+#include <linux/ceph/types.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/messenger.h>
#include "auth_none.h"
#include "auth_x.h"
-#include "decode.h"
-#include "super.h"
-#include "messenger.h"
/*
* get protocol handler
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/err.h>
#include <linux/module.h>
#include <linux/random.h>
#include <linux/slab.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/auth.h>
+
#include "auth_none.h"
-#include "auth.h"
-#include "decode.h"
static void reset(struct ceph_auth_client *ac)
{
#define _FS_CEPH_AUTH_NONE_H
#include <linux/slab.h>
-
-#include "auth.h"
+#include <linux/ceph/auth.h>
/*
* null security mode.
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/err.h>
#include <linux/module.h>
#include <linux/random.h>
#include <linux/slab.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/auth.h>
+
+#include "crypto.h"
#include "auth_x.h"
#include "auth_x_protocol.h"
-#include "crypto.h"
-#include "auth.h"
-#include "decode.h"
#define TEMP_TICKET_BUF_LEN 256
#include <linux/rbtree.h>
+#include <linux/ceph/auth.h>
+
#include "crypto.h"
-#include "auth.h"
#include "auth_x_protocol.h"
/*
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
+#include <linux/module.h>
#include <linux/slab.h>
-#include "buffer.h"
-#include "decode.h"
+#include <linux/ceph/buffer.h>
+#include <linux/ceph/decode.h>
struct ceph_buffer *ceph_buffer_new(size_t len, gfp_t gfp)
{
dout("buffer_new %p\n", b);
return b;
}
+EXPORT_SYMBOL(ceph_buffer_new);
void ceph_buffer_release(struct kref *kref)
{
}
kfree(b);
}
+EXPORT_SYMBOL(ceph_buffer_release);
int ceph_decode_buffer(struct ceph_buffer **b, void **p, void *end)
{
--- /dev/null
+
+#include <linux/ceph/ceph_debug.h>
+#include <linux/backing-dev.h>
+#include <linux/ctype.h>
+#include <linux/fs.h>
+#include <linux/inet.h>
+#include <linux/in6.h>
+#include <linux/module.h>
+#include <linux/mount.h>
+#include <linux/parser.h>
+#include <linux/sched.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/statfs.h>
+#include <linux/string.h>
+
+
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/debugfs.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/auth.h>
+
+
+
+/*
+ * find filename portion of a path (/foo/bar/baz -> baz)
+ */
+const char *ceph_file_part(const char *s, int len)
+{
+ const char *e = s + len;
+
+ while (e != s && *(e-1) != '/')
+ e--;
+ return e;
+}
+EXPORT_SYMBOL(ceph_file_part);
+
+const char *ceph_msg_type_name(int type)
+{
+ switch (type) {
+ case CEPH_MSG_SHUTDOWN: return "shutdown";
+ case CEPH_MSG_PING: return "ping";
+ case CEPH_MSG_AUTH: return "auth";
+ case CEPH_MSG_AUTH_REPLY: return "auth_reply";
+ case CEPH_MSG_MON_MAP: return "mon_map";
+ case CEPH_MSG_MON_GET_MAP: return "mon_get_map";
+ case CEPH_MSG_MON_SUBSCRIBE: return "mon_subscribe";
+ case CEPH_MSG_MON_SUBSCRIBE_ACK: return "mon_subscribe_ack";
+ case CEPH_MSG_STATFS: return "statfs";
+ case CEPH_MSG_STATFS_REPLY: return "statfs_reply";
+ case CEPH_MSG_MDS_MAP: return "mds_map";
+ case CEPH_MSG_CLIENT_SESSION: return "client_session";
+ case CEPH_MSG_CLIENT_RECONNECT: return "client_reconnect";
+ case CEPH_MSG_CLIENT_REQUEST: return "client_request";
+ case CEPH_MSG_CLIENT_REQUEST_FORWARD: return "client_request_forward";
+ case CEPH_MSG_CLIENT_REPLY: return "client_reply";
+ case CEPH_MSG_CLIENT_CAPS: return "client_caps";
+ case CEPH_MSG_CLIENT_CAPRELEASE: return "client_cap_release";
+ case CEPH_MSG_CLIENT_SNAP: return "client_snap";
+ case CEPH_MSG_CLIENT_LEASE: return "client_lease";
+ case CEPH_MSG_OSD_MAP: return "osd_map";
+ case CEPH_MSG_OSD_OP: return "osd_op";
+ case CEPH_MSG_OSD_OPREPLY: return "osd_opreply";
+ default: return "unknown";
+ }
+}
+EXPORT_SYMBOL(ceph_msg_type_name);
+
+/*
+ * Initially learn our fsid, or verify an fsid matches.
+ */
+int ceph_check_fsid(struct ceph_client *client, struct ceph_fsid *fsid)
+{
+ if (client->have_fsid) {
+ if (ceph_fsid_compare(&client->fsid, fsid)) {
+ pr_err("bad fsid, had %pU got %pU",
+ &client->fsid, fsid);
+ return -1;
+ }
+ } else {
+ pr_info("client%lld fsid %pU\n", ceph_client_id(client), fsid);
+ memcpy(&client->fsid, fsid, sizeof(*fsid));
+ ceph_debugfs_client_init(client);
+ client->have_fsid = true;
+ }
+ return 0;
+}
+EXPORT_SYMBOL(ceph_check_fsid);
+
+static int strcmp_null(const char *s1, const char *s2)
+{
+ if (!s1 && !s2)
+ return 0;
+ if (s1 && !s2)
+ return -1;
+ if (!s1 && s2)
+ return 1;
+ return strcmp(s1, s2);
+}
+
+int ceph_compare_options(struct ceph_options *new_opt,
+ struct ceph_client *client)
+{
+ struct ceph_options *opt1 = new_opt;
+ struct ceph_options *opt2 = client->options;
+ int ofs = offsetof(struct ceph_options, mon_addr);
+ int i;
+ int ret;
+
+ ret = memcmp(opt1, opt2, ofs);
+ if (ret)
+ return ret;
+
+ ret = strcmp_null(opt1->name, opt2->name);
+ if (ret)
+ return ret;
+
+ ret = strcmp_null(opt1->secret, opt2->secret);
+ if (ret)
+ return ret;
+
+ /* any matching mon ip implies a match */
+ for (i = 0; i < opt1->num_mon; i++) {
+ if (ceph_monmap_contains(client->monc.monmap,
+ &opt1->mon_addr[i]))
+ return 0;
+ }
+ return -1;
+}
+EXPORT_SYMBOL(ceph_compare_options);
+
+
+static int parse_fsid(const char *str, struct ceph_fsid *fsid)
+{
+ int i = 0;
+ char tmp[3];
+ int err = -EINVAL;
+ int d;
+
+ dout("parse_fsid '%s'\n", str);
+ tmp[2] = 0;
+ while (*str && i < 16) {
+ if (ispunct(*str)) {
+ str++;
+ continue;
+ }
+ if (!isxdigit(str[0]) || !isxdigit(str[1]))
+ break;
+ tmp[0] = str[0];
+ tmp[1] = str[1];
+ if (sscanf(tmp, "%x", &d) < 1)
+ break;
+ fsid->fsid[i] = d & 0xff;
+ i++;
+ str += 2;
+ }
+
+ if (i == 16)
+ err = 0;
+ dout("parse_fsid ret %d got fsid %pU", err, fsid);
+ return err;
+}
+
+/*
+ * ceph options
+ */
+enum {
+ Opt_osdtimeout,
+ Opt_osdkeepalivetimeout,
+ Opt_mount_timeout,
+ Opt_osd_idle_ttl,
+ Opt_last_int,
+ /* int args above */
+ Opt_fsid,
+ Opt_name,
+ Opt_secret,
+ Opt_ip,
+ Opt_last_string,
+ /* string args above */
+ Opt_noshare,
+ Opt_nocrc,
+};
+
+static match_table_t opt_tokens = {
+ {Opt_osdtimeout, "osdtimeout=%d"},
+ {Opt_osdkeepalivetimeout, "osdkeepalive=%d"},
+ {Opt_mount_timeout, "mount_timeout=%d"},
+ {Opt_osd_idle_ttl, "osd_idle_ttl=%d"},
+ /* int args above */
+ {Opt_fsid, "fsid=%s"},
+ {Opt_name, "name=%s"},
+ {Opt_secret, "secret=%s"},
+ {Opt_ip, "ip=%s"},
+ /* string args above */
+ {Opt_noshare, "noshare"},
+ {Opt_nocrc, "nocrc"},
+ {-1, NULL}
+};
+
+void ceph_destroy_options(struct ceph_options *opt)
+{
+ dout("destroy_options %p\n", opt);
+ kfree(opt->name);
+ kfree(opt->secret);
+ kfree(opt);
+}
+EXPORT_SYMBOL(ceph_destroy_options);
+
+int ceph_parse_options(struct ceph_options **popt, char *options,
+ const char *dev_name, const char *dev_name_end,
+ int (*parse_extra_token)(char *c, void *private),
+ void *private)
+{
+ struct ceph_options *opt;
+ const char *c;
+ int err = -ENOMEM;
+ substring_t argstr[MAX_OPT_ARGS];
+
+ opt = kzalloc(sizeof(*opt), GFP_KERNEL);
+ if (!opt)
+ return err;
+ opt->mon_addr = kcalloc(CEPH_MAX_MON, sizeof(*opt->mon_addr),
+ GFP_KERNEL);
+ if (!opt->mon_addr)
+ goto out;
+
+ dout("parse_options %p options '%s' dev_name '%s'\n", opt, options,
+ dev_name);
+
+ /* start with defaults */
+ opt->flags = CEPH_OPT_DEFAULT;
+ opt->osd_timeout = CEPH_OSD_TIMEOUT_DEFAULT;
+ opt->osd_keepalive_timeout = CEPH_OSD_KEEPALIVE_DEFAULT;
+ opt->mount_timeout = CEPH_MOUNT_TIMEOUT_DEFAULT; /* seconds */
+ opt->osd_idle_ttl = CEPH_OSD_IDLE_TTL_DEFAULT; /* seconds */
+
+ /* get mon ip(s) */
+ /* ip1[:port1][,ip2[:port2]...] */
+ err = ceph_parse_ips(dev_name, dev_name_end, opt->mon_addr,
+ CEPH_MAX_MON, &opt->num_mon);
+ if (err < 0)
+ goto out;
+
+ /* parse mount options */
+ while ((c = strsep(&options, ",")) != NULL) {
+ int token, intval, ret;
+ if (!*c)
+ continue;
+ err = -EINVAL;
+ token = match_token((char *)c, opt_tokens, argstr);
+ if (token < 0 && parse_extra_token) {
+ /* extra? */
+ err = parse_extra_token((char *)c, private);
+ if (err < 0) {
+ pr_err("bad option at '%s'\n", c);
+ goto out;
+ }
+ continue;
+ }
+ if (token < Opt_last_int) {
+ ret = match_int(&argstr[0], &intval);
+ if (ret < 0) {
+ pr_err("bad mount option arg (not int) "
+ "at '%s'\n", c);
+ continue;
+ }
+ dout("got int token %d val %d\n", token, intval);
+ } else if (token > Opt_last_int && token < Opt_last_string) {
+ dout("got string token %d val %s\n", token,
+ argstr[0].from);
+ } else {
+ dout("got token %d\n", token);
+ }
+ switch (token) {
+ case Opt_ip:
+ err = ceph_parse_ips(argstr[0].from,
+ argstr[0].to,
+ &opt->my_addr,
+ 1, NULL);
+ if (err < 0)
+ goto out;
+ opt->flags |= CEPH_OPT_MYIP;
+ break;
+
+ case Opt_fsid:
+ err = parse_fsid(argstr[0].from, &opt->fsid);
+ if (err == 0)
+ opt->flags |= CEPH_OPT_FSID;
+ break;
+ case Opt_name:
+ opt->name = kstrndup(argstr[0].from,
+ argstr[0].to-argstr[0].from,
+ GFP_KERNEL);
+ break;
+ case Opt_secret:
+ opt->secret = kstrndup(argstr[0].from,
+ argstr[0].to-argstr[0].from,
+ GFP_KERNEL);
+ break;
+
+ /* misc */
+ case Opt_osdtimeout:
+ opt->osd_timeout = intval;
+ break;
+ case Opt_osdkeepalivetimeout:
+ opt->osd_keepalive_timeout = intval;
+ break;
+ case Opt_osd_idle_ttl:
+ opt->osd_idle_ttl = intval;
+ break;
+ case Opt_mount_timeout:
+ opt->mount_timeout = intval;
+ break;
+
+ case Opt_noshare:
+ opt->flags |= CEPH_OPT_NOSHARE;
+ break;
+
+ case Opt_nocrc:
+ opt->flags |= CEPH_OPT_NOCRC;
+ break;
+
+ default:
+ BUG_ON(token);
+ }
+ }
+
+ /* success */
+ *popt = opt;
+ return 0;
+
+out:
+ ceph_destroy_options(opt);
+ return err;
+}
+EXPORT_SYMBOL(ceph_parse_options);
+
+u64 ceph_client_id(struct ceph_client *client)
+{
+ return client->monc.auth->global_id;
+}
+EXPORT_SYMBOL(ceph_client_id);
+
+/*
+ * create a fresh client instance
+ */
+struct ceph_client *ceph_create_client(struct ceph_options *opt, void *private)
+{
+ struct ceph_client *client;
+ int err = -ENOMEM;
+
+ client = kzalloc(sizeof(*client), GFP_KERNEL);
+ if (client == NULL)
+ return ERR_PTR(-ENOMEM);
+
+ client->private = private;
+ client->options = opt;
+
+ mutex_init(&client->mount_mutex);
+ init_waitqueue_head(&client->auth_wq);
+ client->auth_err = 0;
+
+ client->extra_mon_dispatch = NULL;
+ client->supported_features = CEPH_FEATURE_SUPPORTED_DEFAULT;
+ client->required_features = CEPH_FEATURE_REQUIRED_DEFAULT;
+
+ client->msgr = NULL;
+
+ /* subsystems */
+ err = ceph_monc_init(&client->monc, client);
+ if (err < 0)
+ goto fail;
+ err = ceph_osdc_init(&client->osdc, client);
+ if (err < 0)
+ goto fail_monc;
+
+ return client;
+
+fail_monc:
+ ceph_monc_stop(&client->monc);
+fail:
+ kfree(client);
+ return ERR_PTR(err);
+}
+EXPORT_SYMBOL(ceph_create_client);
+
+void ceph_destroy_client(struct ceph_client *client)
+{
+ dout("destroy_client %p\n", client);
+
+ /* unmount */
+ ceph_osdc_stop(&client->osdc);
+
+ /*
+ * make sure mds and osd connections close out before destroying
+ * the auth module, which is needed to free those connections'
+ * ceph_authorizers.
+ */
+ ceph_msgr_flush();
+
+ ceph_monc_stop(&client->monc);
+
+ ceph_debugfs_client_cleanup(client);
+
+ if (client->msgr)
+ ceph_messenger_destroy(client->msgr);
+
+ ceph_destroy_options(client->options);
+
+ kfree(client);
+ dout("destroy_client %p done\n", client);
+}
+EXPORT_SYMBOL(ceph_destroy_client);
+
+/*
+ * true if we have the mon map (and have thus joined the cluster)
+ */
+static int have_mon_and_osd_map(struct ceph_client *client)
+{
+ return client->monc.monmap && client->monc.monmap->epoch &&
+ client->osdc.osdmap && client->osdc.osdmap->epoch;
+}
+
+/*
+ * mount: join the ceph cluster, and open root directory.
+ */
+int __ceph_open_session(struct ceph_client *client, unsigned long started)
+{
+ struct ceph_entity_addr *myaddr = NULL;
+ int err;
+ unsigned long timeout = client->options->mount_timeout * HZ;
+
+ /* initialize the messenger */
+ if (client->msgr == NULL) {
+ if (ceph_test_opt(client, MYIP))
+ myaddr = &client->options->my_addr;
+ client->msgr = ceph_messenger_create(myaddr,
+ client->supported_features,
+ client->required_features);
+ if (IS_ERR(client->msgr)) {
+ client->msgr = NULL;
+ return PTR_ERR(client->msgr);
+ }
+ client->msgr->nocrc = ceph_test_opt(client, NOCRC);
+ }
+
+ /* open session, and wait for mon and osd maps */
+ err = ceph_monc_open_session(&client->monc);
+ if (err < 0)
+ return err;
+
+ while (!have_mon_and_osd_map(client)) {
+ err = -EIO;
+ if (timeout && time_after_eq(jiffies, started + timeout))
+ return err;
+
+ /* wait */
+ dout("mount waiting for mon_map\n");
+ err = wait_event_interruptible_timeout(client->auth_wq,
+ have_mon_and_osd_map(client) || (client->auth_err < 0),
+ timeout);
+ if (err == -EINTR || err == -ERESTARTSYS)
+ return err;
+ if (client->auth_err < 0)
+ return client->auth_err;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(__ceph_open_session);
+
+
+int ceph_open_session(struct ceph_client *client)
+{
+ int ret;
+ unsigned long started = jiffies; /* note the start time */
+
+ dout("open_session start\n");
+ mutex_lock(&client->mount_mutex);
+
+ ret = __ceph_open_session(client, started);
+
+ mutex_unlock(&client->mount_mutex);
+ return ret;
+}
+EXPORT_SYMBOL(ceph_open_session);
+
+
+static int __init init_ceph_lib(void)
+{
+ int ret = 0;
+
+ ret = ceph_debugfs_init();
+ if (ret < 0)
+ goto out;
+
+ ret = ceph_msgr_init();
+ if (ret < 0)
+ goto out_debugfs;
+
+ pr_info("loaded (mon/osd proto %d/%d, osdmap %d/%d %d/%d)\n",
+ CEPH_MONC_PROTOCOL, CEPH_OSDC_PROTOCOL,
+ CEPH_OSDMAP_VERSION, CEPH_OSDMAP_VERSION_EXT,
+ CEPH_OSDMAP_INC_VERSION, CEPH_OSDMAP_INC_VERSION_EXT);
+
+ return 0;
+
+out_debugfs:
+ ceph_debugfs_cleanup();
+out:
+ return ret;
+}
+
+static void __exit exit_ceph_lib(void)
+{
+ dout("exit_ceph_lib\n");
+ ceph_msgr_exit();
+ ceph_debugfs_cleanup();
+}
+
+module_init(init_ceph_lib);
+module_exit(exit_ceph_lib);
+
+MODULE_AUTHOR("Sage Weil <sage@newdream.net>");
+MODULE_AUTHOR("Yehuda Sadeh <yehuda@hq.newdream.net>");
+MODULE_AUTHOR("Patience Warnick <patience@newdream.net>");
+MODULE_DESCRIPTION("Ceph filesystem for Linux");
+MODULE_LICENSE("GPL");
/*
* Some non-inline ceph helpers
*/
-#include "types.h"
+#include <linux/module.h>
+#include <linux/ceph/types.h>
/*
* return true if @layout appears to be valid
return mode;
}
+EXPORT_SYMBOL(ceph_flags_to_mode);
int ceph_caps_for_mode(int mode)
{
return caps;
}
+EXPORT_SYMBOL(ceph_caps_for_mode);
-#include "types.h"
+#include <linux/ceph/types.h>
/*
* Robert Jenkin's hash function.
--- /dev/null
+/*
+ * Ceph string constants
+ */
+#include <linux/module.h>
+#include <linux/ceph/types.h>
+
+const char *ceph_entity_type_name(int type)
+{
+ switch (type) {
+ case CEPH_ENTITY_TYPE_MDS: return "mds";
+ case CEPH_ENTITY_TYPE_OSD: return "osd";
+ case CEPH_ENTITY_TYPE_MON: return "mon";
+ case CEPH_ENTITY_TYPE_CLIENT: return "client";
+ case CEPH_ENTITY_TYPE_AUTH: return "auth";
+ default: return "unknown";
+ }
+}
+
+const char *ceph_osd_op_name(int op)
+{
+ switch (op) {
+ case CEPH_OSD_OP_READ: return "read";
+ case CEPH_OSD_OP_STAT: return "stat";
+
+ case CEPH_OSD_OP_MASKTRUNC: return "masktrunc";
+
+ case CEPH_OSD_OP_WRITE: return "write";
+ case CEPH_OSD_OP_DELETE: return "delete";
+ case CEPH_OSD_OP_TRUNCATE: return "truncate";
+ case CEPH_OSD_OP_ZERO: return "zero";
+ case CEPH_OSD_OP_WRITEFULL: return "writefull";
+ case CEPH_OSD_OP_ROLLBACK: return "rollback";
+
+ case CEPH_OSD_OP_APPEND: return "append";
+ case CEPH_OSD_OP_STARTSYNC: return "startsync";
+ case CEPH_OSD_OP_SETTRUNC: return "settrunc";
+ case CEPH_OSD_OP_TRIMTRUNC: return "trimtrunc";
+
+ case CEPH_OSD_OP_TMAPUP: return "tmapup";
+ case CEPH_OSD_OP_TMAPGET: return "tmapget";
+ case CEPH_OSD_OP_TMAPPUT: return "tmapput";
+
+ case CEPH_OSD_OP_GETXATTR: return "getxattr";
+ case CEPH_OSD_OP_GETXATTRS: return "getxattrs";
+ case CEPH_OSD_OP_SETXATTR: return "setxattr";
+ case CEPH_OSD_OP_SETXATTRS: return "setxattrs";
+ case CEPH_OSD_OP_RESETXATTRS: return "resetxattrs";
+ case CEPH_OSD_OP_RMXATTR: return "rmxattr";
+ case CEPH_OSD_OP_CMPXATTR: return "cmpxattr";
+
+ case CEPH_OSD_OP_PULL: return "pull";
+ case CEPH_OSD_OP_PUSH: return "push";
+ case CEPH_OSD_OP_BALANCEREADS: return "balance-reads";
+ case CEPH_OSD_OP_UNBALANCEREADS: return "unbalance-reads";
+ case CEPH_OSD_OP_SCRUB: return "scrub";
+
+ case CEPH_OSD_OP_WRLOCK: return "wrlock";
+ case CEPH_OSD_OP_WRUNLOCK: return "wrunlock";
+ case CEPH_OSD_OP_RDLOCK: return "rdlock";
+ case CEPH_OSD_OP_RDUNLOCK: return "rdunlock";
+ case CEPH_OSD_OP_UPLOCK: return "uplock";
+ case CEPH_OSD_OP_DNLOCK: return "dnlock";
+
+ case CEPH_OSD_OP_CALL: return "call";
+
+ case CEPH_OSD_OP_PGLS: return "pgls";
+ }
+ return "???";
+}
+
+
+const char *ceph_pool_op_name(int op)
+{
+ switch (op) {
+ case POOL_OP_CREATE: return "create";
+ case POOL_OP_DELETE: return "delete";
+ case POOL_OP_AUID_CHANGE: return "auid change";
+ case POOL_OP_CREATE_SNAP: return "create snap";
+ case POOL_OP_DELETE_SNAP: return "delete snap";
+ case POOL_OP_CREATE_UNMANAGED_SNAP: return "create unmanaged snap";
+ case POOL_OP_DELETE_UNMANAGED_SNAP: return "delete unmanaged snap";
+ }
+ return "???";
+}
# define BUG_ON(x) assert(!(x))
#endif
-#include "crush.h"
+#include <linux/crush/crush.h>
const char *crush_bucket_alg_name(int alg)
{
#include <linux/types.h>
-#include "hash.h"
+#include <linux/crush/hash.h>
/*
* Robert Jenkins' function for mixing 32-bit values
# define kfree(x) free(x)
#endif
-#include "crush.h"
-#include "hash.h"
+#include <linux/crush/crush.h>
+#include <linux/crush/hash.h>
/*
* Implement the core CRUSH mapping algorithm.
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/err.h>
#include <linux/scatterlist.h>
#include <linux/slab.h>
#include <crypto/hash.h>
+#include <linux/ceph/decode.h>
#include "crypto.h"
-#include "decode.h"
int ceph_crypto_key_encode(struct ceph_crypto_key *key, void **p, void *end)
{
#ifndef _FS_CEPH_CRYPTO_H
#define _FS_CEPH_CRYPTO_H
-#include "types.h"
-#include "buffer.h"
+#include <linux/ceph/types.h>
+#include <linux/ceph/buffer.h>
/*
* cryptographic secret
--- /dev/null
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/device.h>
+#include <linux/slab.h>
+#include <linux/module.h>
+#include <linux/ctype.h>
+#include <linux/debugfs.h>
+#include <linux/seq_file.h>
+
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/debugfs.h>
+
+#ifdef CONFIG_DEBUG_FS
+
+/*
+ * Implement /sys/kernel/debug/ceph fun
+ *
+ * /sys/kernel/debug/ceph/client* - an instance of the ceph client
+ * .../osdmap - current osdmap
+ * .../monmap - current monmap
+ * .../osdc - active osd requests
+ * .../monc - mon client state
+ * .../dentry_lru - dump contents of dentry lru
+ * .../caps - expose cap (reservation) stats
+ * .../bdi - symlink to ../../bdi/something
+ */
+
+static struct dentry *ceph_debugfs_dir;
+
+static int monmap_show(struct seq_file *s, void *p)
+{
+ int i;
+ struct ceph_client *client = s->private;
+
+ if (client->monc.monmap == NULL)
+ return 0;
+
+ seq_printf(s, "epoch %d\n", client->monc.monmap->epoch);
+ for (i = 0; i < client->monc.monmap->num_mon; i++) {
+ struct ceph_entity_inst *inst =
+ &client->monc.monmap->mon_inst[i];
+
+ seq_printf(s, "\t%s%lld\t%s\n",
+ ENTITY_NAME(inst->name),
+ ceph_pr_addr(&inst->addr.in_addr));
+ }
+ return 0;
+}
+
+static int osdmap_show(struct seq_file *s, void *p)
+{
+ int i;
+ struct ceph_client *client = s->private;
+ struct rb_node *n;
+
+ if (client->osdc.osdmap == NULL)
+ return 0;
+ seq_printf(s, "epoch %d\n", client->osdc.osdmap->epoch);
+ seq_printf(s, "flags%s%s\n",
+ (client->osdc.osdmap->flags & CEPH_OSDMAP_NEARFULL) ?
+ " NEARFULL" : "",
+ (client->osdc.osdmap->flags & CEPH_OSDMAP_FULL) ?
+ " FULL" : "");
+ for (n = rb_first(&client->osdc.osdmap->pg_pools); n; n = rb_next(n)) {
+ struct ceph_pg_pool_info *pool =
+ rb_entry(n, struct ceph_pg_pool_info, node);
+ seq_printf(s, "pg_pool %d pg_num %d / %d, lpg_num %d / %d\n",
+ pool->id, pool->v.pg_num, pool->pg_num_mask,
+ pool->v.lpg_num, pool->lpg_num_mask);
+ }
+ for (i = 0; i < client->osdc.osdmap->max_osd; i++) {
+ struct ceph_entity_addr *addr =
+ &client->osdc.osdmap->osd_addr[i];
+ int state = client->osdc.osdmap->osd_state[i];
+ char sb[64];
+
+ seq_printf(s, "\tosd%d\t%s\t%3d%%\t(%s)\n",
+ i, ceph_pr_addr(&addr->in_addr),
+ ((client->osdc.osdmap->osd_weight[i]*100) >> 16),
+ ceph_osdmap_state_str(sb, sizeof(sb), state));
+ }
+ return 0;
+}
+
+static int monc_show(struct seq_file *s, void *p)
+{
+ struct ceph_client *client = s->private;
+ struct ceph_mon_generic_request *req;
+ struct ceph_mon_client *monc = &client->monc;
+ struct rb_node *rp;
+
+ mutex_lock(&monc->mutex);
+
+ if (monc->have_mdsmap)
+ seq_printf(s, "have mdsmap %u\n", (unsigned)monc->have_mdsmap);
+ if (monc->have_osdmap)
+ seq_printf(s, "have osdmap %u\n", (unsigned)monc->have_osdmap);
+ if (monc->want_next_osdmap)
+ seq_printf(s, "want next osdmap\n");
+
+ for (rp = rb_first(&monc->generic_request_tree); rp; rp = rb_next(rp)) {
+ __u16 op;
+ req = rb_entry(rp, struct ceph_mon_generic_request, node);
+ op = le16_to_cpu(req->request->hdr.type);
+ if (op == CEPH_MSG_STATFS)
+ seq_printf(s, "%lld statfs\n", req->tid);
+ else
+ seq_printf(s, "%lld unknown\n", req->tid);
+ }
+
+ mutex_unlock(&monc->mutex);
+ return 0;
+}
+
+static int osdc_show(struct seq_file *s, void *pp)
+{
+ struct ceph_client *client = s->private;
+ struct ceph_osd_client *osdc = &client->osdc;
+ struct rb_node *p;
+
+ mutex_lock(&osdc->request_mutex);
+ for (p = rb_first(&osdc->requests); p; p = rb_next(p)) {
+ struct ceph_osd_request *req;
+ struct ceph_osd_request_head *head;
+ struct ceph_osd_op *op;
+ int num_ops;
+ int opcode, olen;
+ int i;
+
+ req = rb_entry(p, struct ceph_osd_request, r_node);
+
+ seq_printf(s, "%lld\tosd%d\t%d.%x\t", req->r_tid,
+ req->r_osd ? req->r_osd->o_osd : -1,
+ le32_to_cpu(req->r_pgid.pool),
+ le16_to_cpu(req->r_pgid.ps));
+
+ head = req->r_request->front.iov_base;
+ op = (void *)(head + 1);
+
+ num_ops = le16_to_cpu(head->num_ops);
+ olen = le32_to_cpu(head->object_len);
+ seq_printf(s, "%.*s", olen,
+ (const char *)(head->ops + num_ops));
+
+ if (req->r_reassert_version.epoch)
+ seq_printf(s, "\t%u'%llu",
+ (unsigned)le32_to_cpu(req->r_reassert_version.epoch),
+ le64_to_cpu(req->r_reassert_version.version));
+ else
+ seq_printf(s, "\t");
+
+ for (i = 0; i < num_ops; i++) {
+ opcode = le16_to_cpu(op->op);
+ seq_printf(s, "\t%s", ceph_osd_op_name(opcode));
+ op++;
+ }
+
+ seq_printf(s, "\n");
+ }
+ mutex_unlock(&osdc->request_mutex);
+ return 0;
+}
+
+CEPH_DEFINE_SHOW_FUNC(monmap_show)
+CEPH_DEFINE_SHOW_FUNC(osdmap_show)
+CEPH_DEFINE_SHOW_FUNC(monc_show)
+CEPH_DEFINE_SHOW_FUNC(osdc_show)
+
+int ceph_debugfs_init(void)
+{
+ ceph_debugfs_dir = debugfs_create_dir("ceph", NULL);
+ if (!ceph_debugfs_dir)
+ return -ENOMEM;
+ return 0;
+}
+
+void ceph_debugfs_cleanup(void)
+{
+ debugfs_remove(ceph_debugfs_dir);
+}
+
+int ceph_debugfs_client_init(struct ceph_client *client)
+{
+ int ret = -ENOMEM;
+ char name[80];
+
+ snprintf(name, sizeof(name), "%pU.client%lld", &client->fsid,
+ client->monc.auth->global_id);
+
+ client->debugfs_dir = debugfs_create_dir(name, ceph_debugfs_dir);
+ if (!client->debugfs_dir)
+ goto out;
+
+ client->monc.debugfs_file = debugfs_create_file("monc",
+ 0600,
+ client->debugfs_dir,
+ client,
+ &monc_show_fops);
+ if (!client->monc.debugfs_file)
+ goto out;
+
+ client->osdc.debugfs_file = debugfs_create_file("osdc",
+ 0600,
+ client->debugfs_dir,
+ client,
+ &osdc_show_fops);
+ if (!client->osdc.debugfs_file)
+ goto out;
+
+ client->debugfs_monmap = debugfs_create_file("monmap",
+ 0600,
+ client->debugfs_dir,
+ client,
+ &monmap_show_fops);
+ if (!client->debugfs_monmap)
+ goto out;
+
+ client->debugfs_osdmap = debugfs_create_file("osdmap",
+ 0600,
+ client->debugfs_dir,
+ client,
+ &osdmap_show_fops);
+ if (!client->debugfs_osdmap)
+ goto out;
+
+ return 0;
+
+out:
+ ceph_debugfs_client_cleanup(client);
+ return ret;
+}
+
+void ceph_debugfs_client_cleanup(struct ceph_client *client)
+{
+ debugfs_remove(client->debugfs_osdmap);
+ debugfs_remove(client->debugfs_monmap);
+ debugfs_remove(client->osdc.debugfs_file);
+ debugfs_remove(client->monc.debugfs_file);
+ debugfs_remove(client->debugfs_dir);
+}
+
+#else /* CONFIG_DEBUG_FS */
+
+int ceph_debugfs_init(void)
+{
+ return 0;
+}
+
+void ceph_debugfs_cleanup(void)
+{
+}
+
+int ceph_debugfs_client_init(struct ceph_client *client)
+{
+ return 0;
+}
+
+void ceph_debugfs_client_cleanup(struct ceph_client *client)
+{
+}
+
+#endif /* CONFIG_DEBUG_FS */
+
+EXPORT_SYMBOL(ceph_debugfs_init);
+EXPORT_SYMBOL(ceph_debugfs_cleanup);
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/crc32c.h>
#include <linux/ctype.h>
#include <linux/slab.h>
#include <linux/socket.h>
#include <linux/string.h>
+#include <linux/bio.h>
+#include <linux/blkdev.h>
#include <net/tcp.h>
-#include "super.h"
-#include "messenger.h"
-#include "decode.h"
-#include "pagelist.h"
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/pagelist.h>
/*
* Ceph uses the messenger to exchange ceph_msg messages with other
static DEFINE_SPINLOCK(addr_str_lock);
static int last_addr_str;
-const char *pr_addr(const struct sockaddr_storage *ss)
+const char *ceph_pr_addr(const struct sockaddr_storage *ss)
{
int i;
char *s;
return s;
}
+EXPORT_SYMBOL(ceph_pr_addr);
static void encode_my_addr(struct ceph_messenger *msgr)
{
*/
struct workqueue_struct *ceph_msgr_wq;
-int __init ceph_msgr_init(void)
+int ceph_msgr_init(void)
{
ceph_msgr_wq = create_workqueue("ceph-msgr");
if (IS_ERR(ceph_msgr_wq)) {
}
return 0;
}
+EXPORT_SYMBOL(ceph_msgr_init);
void ceph_msgr_exit(void)
{
destroy_workqueue(ceph_msgr_wq);
}
+EXPORT_SYMBOL(ceph_msgr_exit);
void ceph_msgr_flush(void)
{
flush_workqueue(ceph_msgr_wq);
}
+EXPORT_SYMBOL(ceph_msgr_flush);
/*
set_sock_callbacks(sock, con);
- dout("connect %s\n", pr_addr(&con->peer_addr.in_addr));
+ dout("connect %s\n", ceph_pr_addr(&con->peer_addr.in_addr));
ret = sock->ops->connect(sock, (struct sockaddr *)paddr, sizeof(*paddr),
O_NONBLOCK);
if (ret == -EINPROGRESS) {
dout("connect %s EINPROGRESS sk_state = %u\n",
- pr_addr(&con->peer_addr.in_addr),
+ ceph_pr_addr(&con->peer_addr.in_addr),
sock->sk->sk_state);
ret = 0;
}
if (ret < 0) {
pr_err("connect %s error %d\n",
- pr_addr(&con->peer_addr.in_addr), ret);
+ ceph_pr_addr(&con->peer_addr.in_addr), ret);
sock_release(sock);
con->sock = NULL;
con->error_msg = "connect error";
*/
void ceph_con_close(struct ceph_connection *con)
{
- dout("con_close %p peer %s\n", con, pr_addr(&con->peer_addr.in_addr));
+ dout("con_close %p peer %s\n", con,
+ ceph_pr_addr(&con->peer_addr.in_addr));
set_bit(CLOSED, &con->state); /* in case there's queued work */
clear_bit(STANDBY, &con->state); /* avoid connect_seq bump */
clear_bit(LOSSYTX, &con->state); /* so we retry next connect */
mutex_unlock(&con->mutex);
queue_con(con);
}
+EXPORT_SYMBOL(ceph_con_close);
/*
* Reopen a closed connection, with a new peer address.
*/
void ceph_con_open(struct ceph_connection *con, struct ceph_entity_addr *addr)
{
- dout("con_open %p %s\n", con, pr_addr(&addr->in_addr));
+ dout("con_open %p %s\n", con, ceph_pr_addr(&addr->in_addr));
set_bit(OPENING, &con->state);
clear_bit(CLOSED, &con->state);
memcpy(&con->peer_addr, addr, sizeof(*addr));
con->delay = 0; /* reset backoff memory */
queue_con(con);
}
+EXPORT_SYMBOL(ceph_con_open);
/*
* return true if this connection ever successfully opened
INIT_LIST_HEAD(&con->out_sent);
INIT_DELAYED_WORK(&con->work, con_work);
}
+EXPORT_SYMBOL(ceph_con_init);
/*
if (le32_to_cpu(m->hdr.data_len) > 0) {
/* initialize page iterator */
con->out_msg_pos.page = 0;
- con->out_msg_pos.page_pos =
- le16_to_cpu(m->hdr.data_off) & ~PAGE_MASK;
+ if (m->pages)
+ con->out_msg_pos.page_pos =
+ le16_to_cpu(m->hdr.data_off) & ~PAGE_MASK;
+ else
+ con->out_msg_pos.page_pos = 0;
con->out_msg_pos.data_pos = 0;
con->out_msg_pos.did_page_crc = 0;
con->out_more = 1; /* data + footer will follow */
dout("prepare_write_connect %p cseq=%d gseq=%d proto=%d\n", con,
con->connect_seq, global_seq, proto);
- con->out_connect.features = cpu_to_le64(CEPH_FEATURE_SUPPORTED);
+ con->out_connect.features = cpu_to_le64(msgr->supported_features);
con->out_connect.host_type = cpu_to_le32(CEPH_ENTITY_TYPE_CLIENT);
con->out_connect.connect_seq = cpu_to_le32(con->connect_seq);
con->out_connect.global_seq = cpu_to_le32(global_seq);
return ret; /* done! */
}
+#ifdef CONFIG_BLOCK
+static void init_bio_iter(struct bio *bio, struct bio **iter, int *seg)
+{
+ if (!bio) {
+ *iter = NULL;
+ *seg = 0;
+ return;
+ }
+ *iter = bio;
+ *seg = bio->bi_idx;
+}
+
+static void iter_bio_next(struct bio **bio_iter, int *seg)
+{
+ if (*bio_iter == NULL)
+ return;
+
+ BUG_ON(*seg >= (*bio_iter)->bi_vcnt);
+
+ (*seg)++;
+ if (*seg == (*bio_iter)->bi_vcnt)
+ init_bio_iter((*bio_iter)->bi_next, bio_iter, seg);
+}
+#endif
+
/*
* Write as much message data payload as we can. If we finish, queue
* up the footer.
size_t len;
int crc = con->msgr->nocrc;
int ret;
+ int total_max_write;
+ int in_trail = 0;
+ size_t trail_len = (msg->trail ? msg->trail->length : 0);
dout("write_partial_msg_pages %p msg %p page %d/%d offset %d\n",
con, con->out_msg, con->out_msg_pos.page, con->out_msg->nr_pages,
con->out_msg_pos.page_pos);
- while (con->out_msg_pos.page < con->out_msg->nr_pages) {
+#ifdef CONFIG_BLOCK
+ if (msg->bio && !msg->bio_iter)
+ init_bio_iter(msg->bio, &msg->bio_iter, &msg->bio_seg);
+#endif
+
+ while (data_len > con->out_msg_pos.data_pos) {
struct page *page = NULL;
void *kaddr = NULL;
+ int max_write = PAGE_SIZE;
+ int page_shift = 0;
+
+ total_max_write = data_len - trail_len -
+ con->out_msg_pos.data_pos;
/*
* if we are calculating the data crc (the default), we need
* to map the page. if our pages[] has been revoked, use the
* zero page.
*/
- if (msg->pages) {
+
+ /* have we reached the trail part of the data? */
+ if (con->out_msg_pos.data_pos >= data_len - trail_len) {
+ in_trail = 1;
+
+ total_max_write = data_len - con->out_msg_pos.data_pos;
+
+ page = list_first_entry(&msg->trail->head,
+ struct page, lru);
+ if (crc)
+ kaddr = kmap(page);
+ max_write = PAGE_SIZE;
+ } else if (msg->pages) {
page = msg->pages[con->out_msg_pos.page];
if (crc)
kaddr = kmap(page);
struct page, lru);
if (crc)
kaddr = kmap(page);
+#ifdef CONFIG_BLOCK
+ } else if (msg->bio) {
+ struct bio_vec *bv;
+
+ bv = bio_iovec_idx(msg->bio_iter, msg->bio_seg);
+ page = bv->bv_page;
+ page_shift = bv->bv_offset;
+ if (crc)
+ kaddr = kmap(page) + page_shift;
+ max_write = bv->bv_len;
+#endif
} else {
page = con->msgr->zero_page;
if (crc)
kaddr = page_address(con->msgr->zero_page);
}
- len = min((int)(PAGE_SIZE - con->out_msg_pos.page_pos),
- (int)(data_len - con->out_msg_pos.data_pos));
+ len = min_t(int, max_write - con->out_msg_pos.page_pos,
+ total_max_write);
+
if (crc && !con->out_msg_pos.did_page_crc) {
void *base = kaddr + con->out_msg_pos.page_pos;
u32 tmpcrc = le32_to_cpu(con->out_msg->footer.data_crc);
cpu_to_le32(crc32c(tmpcrc, base, len));
con->out_msg_pos.did_page_crc = 1;
}
-
ret = kernel_sendpage(con->sock, page,
- con->out_msg_pos.page_pos, len,
+ con->out_msg_pos.page_pos + page_shift,
+ len,
MSG_DONTWAIT | MSG_NOSIGNAL |
MSG_MORE);
- if (crc && (msg->pages || msg->pagelist))
+ if (crc &&
+ (msg->pages || msg->pagelist || msg->bio || in_trail))
kunmap(page);
if (ret <= 0)
con->out_msg_pos.page_pos = 0;
con->out_msg_pos.page++;
con->out_msg_pos.did_page_crc = 0;
- if (msg->pagelist)
+ if (in_trail)
+ list_move_tail(&page->lru,
+ &msg->trail->head);
+ else if (msg->pagelist)
list_move_tail(&page->lru,
&msg->pagelist->head);
+#ifdef CONFIG_BLOCK
+ else if (msg->bio)
+ iter_bio_next(&msg->bio_iter, &msg->bio_seg);
+#endif
}
}
{
if (memcmp(con->in_banner, CEPH_BANNER, strlen(CEPH_BANNER))) {
pr_err("connect to %s got bad banner\n",
- pr_addr(&con->peer_addr.in_addr));
+ ceph_pr_addr(&con->peer_addr.in_addr));
con->error_msg = "protocol error, bad banner";
return -1;
}
addr_set_port(ss, port);
- dout("parse_ips got %s\n", pr_addr(ss));
+ dout("parse_ips got %s\n", ceph_pr_addr(ss));
if (p == end)
break;
pr_err("parse_ips bad ip '%.*s'\n", (int)(end - c), c);
return -EINVAL;
}
+EXPORT_SYMBOL(ceph_parse_ips);
static int process_banner(struct ceph_connection *con)
{
!(addr_is_blank(&con->actual_peer_addr.in_addr) &&
con->actual_peer_addr.nonce == con->peer_addr.nonce)) {
pr_warning("wrong peer, want %s/%d, got %s/%d\n",
- pr_addr(&con->peer_addr.in_addr),
+ ceph_pr_addr(&con->peer_addr.in_addr),
(int)le32_to_cpu(con->peer_addr.nonce),
- pr_addr(&con->actual_peer_addr.in_addr),
+ ceph_pr_addr(&con->actual_peer_addr.in_addr),
(int)le32_to_cpu(con->actual_peer_addr.nonce));
con->error_msg = "wrong peer at address";
return -1;
addr_set_port(&con->msgr->inst.addr.in_addr, port);
encode_my_addr(con->msgr);
dout("process_banner learned my addr is %s\n",
- pr_addr(&con->msgr->inst.addr.in_addr));
+ ceph_pr_addr(&con->msgr->inst.addr.in_addr));
}
set_bit(NEGOTIATING, &con->state);
static int process_connect(struct ceph_connection *con)
{
- u64 sup_feat = CEPH_FEATURE_SUPPORTED;
- u64 req_feat = CEPH_FEATURE_REQUIRED;
+ u64 sup_feat = con->msgr->supported_features;
+ u64 req_feat = con->msgr->required_features;
u64 server_feat = le64_to_cpu(con->in_reply.features);
dout("process_connect on %p tag %d\n", con, (int)con->in_tag);
pr_err("%s%lld %s feature set mismatch,"
" my %llx < server's %llx, missing %llx\n",
ENTITY_NAME(con->peer_name),
- pr_addr(&con->peer_addr.in_addr),
+ ceph_pr_addr(&con->peer_addr.in_addr),
sup_feat, server_feat, server_feat & ~sup_feat);
con->error_msg = "missing required protocol features";
fail_protocol(con);
pr_err("%s%lld %s protocol version mismatch,"
" my %d != server's %d\n",
ENTITY_NAME(con->peer_name),
- pr_addr(&con->peer_addr.in_addr),
+ ceph_pr_addr(&con->peer_addr.in_addr),
le32_to_cpu(con->out_connect.protocol_version),
le32_to_cpu(con->in_reply.protocol_version));
con->error_msg = "protocol version mismatch";
le32_to_cpu(con->in_connect.connect_seq));
pr_err("%s%lld %s connection reset\n",
ENTITY_NAME(con->peer_name),
- pr_addr(&con->peer_addr.in_addr));
+ ceph_pr_addr(&con->peer_addr.in_addr));
reset_connection(con);
prepare_write_connect(con->msgr, con, 0);
prepare_read_connect(con);
pr_err("%s%lld %s protocol feature mismatch,"
" my required %llx > server's %llx, need %llx\n",
ENTITY_NAME(con->peer_name),
- pr_addr(&con->peer_addr.in_addr),
+ ceph_pr_addr(&con->peer_addr.in_addr),
req_feat, server_feat, req_feat & ~server_feat);
con->error_msg = "missing required protocol features";
fail_protocol(con);
struct kvec *section,
unsigned int sec_len, u32 *crc)
{
- int left;
- int ret;
+ int ret, left;
BUG_ON(!section);
static struct ceph_msg *ceph_alloc_msg(struct ceph_connection *con,
struct ceph_msg_header *hdr,
int *skip);
+
+
+static int read_partial_message_pages(struct ceph_connection *con,
+ struct page **pages,
+ unsigned data_len, int datacrc)
+{
+ void *p;
+ int ret;
+ int left;
+
+ left = min((int)(data_len - con->in_msg_pos.data_pos),
+ (int)(PAGE_SIZE - con->in_msg_pos.page_pos));
+ /* (page) data */
+ BUG_ON(pages == NULL);
+ p = kmap(pages[con->in_msg_pos.page]);
+ ret = ceph_tcp_recvmsg(con->sock, p + con->in_msg_pos.page_pos,
+ left);
+ if (ret > 0 && datacrc)
+ con->in_data_crc =
+ crc32c(con->in_data_crc,
+ p + con->in_msg_pos.page_pos, ret);
+ kunmap(pages[con->in_msg_pos.page]);
+ if (ret <= 0)
+ return ret;
+ con->in_msg_pos.data_pos += ret;
+ con->in_msg_pos.page_pos += ret;
+ if (con->in_msg_pos.page_pos == PAGE_SIZE) {
+ con->in_msg_pos.page_pos = 0;
+ con->in_msg_pos.page++;
+ }
+
+ return ret;
+}
+
+#ifdef CONFIG_BLOCK
+static int read_partial_message_bio(struct ceph_connection *con,
+ struct bio **bio_iter, int *bio_seg,
+ unsigned data_len, int datacrc)
+{
+ struct bio_vec *bv = bio_iovec_idx(*bio_iter, *bio_seg);
+ void *p;
+ int ret, left;
+
+ if (IS_ERR(bv))
+ return PTR_ERR(bv);
+
+ left = min((int)(data_len - con->in_msg_pos.data_pos),
+ (int)(bv->bv_len - con->in_msg_pos.page_pos));
+
+ p = kmap(bv->bv_page) + bv->bv_offset;
+
+ ret = ceph_tcp_recvmsg(con->sock, p + con->in_msg_pos.page_pos,
+ left);
+ if (ret > 0 && datacrc)
+ con->in_data_crc =
+ crc32c(con->in_data_crc,
+ p + con->in_msg_pos.page_pos, ret);
+ kunmap(bv->bv_page);
+ if (ret <= 0)
+ return ret;
+ con->in_msg_pos.data_pos += ret;
+ con->in_msg_pos.page_pos += ret;
+ if (con->in_msg_pos.page_pos == bv->bv_len) {
+ con->in_msg_pos.page_pos = 0;
+ iter_bio_next(bio_iter, bio_seg);
+ }
+
+ return ret;
+}
+#endif
+
/*
* read (part of) a message.
*/
static int read_partial_message(struct ceph_connection *con)
{
struct ceph_msg *m = con->in_msg;
- void *p;
int ret;
int to, left;
unsigned front_len, middle_len, data_len, data_off;
if ((s64)seq - (s64)con->in_seq < 1) {
pr_info("skipping %s%lld %s seq %lld, expected %lld\n",
ENTITY_NAME(con->peer_name),
- pr_addr(&con->peer_addr.in_addr),
+ ceph_pr_addr(&con->peer_addr.in_addr),
seq, con->in_seq + 1);
con->in_base_pos = -front_len - middle_len - data_len -
sizeof(m->footer);
m->middle->vec.iov_len = 0;
con->in_msg_pos.page = 0;
- con->in_msg_pos.page_pos = data_off & ~PAGE_MASK;
+ if (m->pages)
+ con->in_msg_pos.page_pos = data_off & ~PAGE_MASK;
+ else
+ con->in_msg_pos.page_pos = 0;
con->in_msg_pos.data_pos = 0;
}
if (ret <= 0)
return ret;
}
+#ifdef CONFIG_BLOCK
+ if (m->bio && !m->bio_iter)
+ init_bio_iter(m->bio, &m->bio_iter, &m->bio_seg);
+#endif
/* (page) data */
while (con->in_msg_pos.data_pos < data_len) {
- left = min((int)(data_len - con->in_msg_pos.data_pos),
- (int)(PAGE_SIZE - con->in_msg_pos.page_pos));
- BUG_ON(m->pages == NULL);
- p = kmap(m->pages[con->in_msg_pos.page]);
- ret = ceph_tcp_recvmsg(con->sock, p + con->in_msg_pos.page_pos,
- left);
- if (ret > 0 && datacrc)
- con->in_data_crc =
- crc32c(con->in_data_crc,
- p + con->in_msg_pos.page_pos, ret);
- kunmap(m->pages[con->in_msg_pos.page]);
- if (ret <= 0)
- return ret;
- con->in_msg_pos.data_pos += ret;
- con->in_msg_pos.page_pos += ret;
- if (con->in_msg_pos.page_pos == PAGE_SIZE) {
- con->in_msg_pos.page_pos = 0;
- con->in_msg_pos.page++;
+ if (m->pages) {
+ ret = read_partial_message_pages(con, m->pages,
+ data_len, datacrc);
+ if (ret <= 0)
+ return ret;
+#ifdef CONFIG_BLOCK
+ } else if (m->bio) {
+
+ ret = read_partial_message_bio(con,
+ &m->bio_iter, &m->bio_seg,
+ data_len, datacrc);
+ if (ret <= 0)
+ return ret;
+#endif
+ } else {
+ BUG_ON(1);
}
}
static void ceph_fault(struct ceph_connection *con)
{
pr_err("%s%lld %s %s\n", ENTITY_NAME(con->peer_name),
- pr_addr(&con->peer_addr.in_addr), con->error_msg);
+ ceph_pr_addr(&con->peer_addr.in_addr), con->error_msg);
dout("fault %p state %lu to peer %s\n",
- con, con->state, pr_addr(&con->peer_addr.in_addr));
+ con, con->state, ceph_pr_addr(&con->peer_addr.in_addr));
if (test_bit(LOSSYTX, &con->state)) {
dout("fault on LOSSYTX channel\n");
/*
* create a new messenger instance
*/
-struct ceph_messenger *ceph_messenger_create(struct ceph_entity_addr *myaddr)
+struct ceph_messenger *ceph_messenger_create(struct ceph_entity_addr *myaddr,
+ u32 supported_features,
+ u32 required_features)
{
struct ceph_messenger *msgr;
if (msgr == NULL)
return ERR_PTR(-ENOMEM);
+ msgr->supported_features = supported_features;
+ msgr->required_features = required_features;
+
spin_lock_init(&msgr->global_seq_lock);
/* the zero page is needed if a request is "canceled" while the message
dout("messenger_create %p\n", msgr);
return msgr;
}
+EXPORT_SYMBOL(ceph_messenger_create);
void ceph_messenger_destroy(struct ceph_messenger *msgr)
{
kfree(msgr);
dout("destroyed messenger %p\n", msgr);
}
+EXPORT_SYMBOL(ceph_messenger_destroy);
/*
* Queue up an outgoing message on the given connection.
if (test_and_set_bit(WRITE_PENDING, &con->state) == 0)
queue_con(con);
}
+EXPORT_SYMBOL(ceph_con_send);
/*
* Revoke a message that was previously queued for send
test_and_set_bit(WRITE_PENDING, &con->state) == 0)
queue_con(con);
}
+EXPORT_SYMBOL(ceph_con_keepalive);
/*
m->nr_pages = 0;
m->pages = NULL;
m->pagelist = NULL;
+ m->bio = NULL;
+ m->bio_iter = NULL;
+ m->bio_seg = 0;
+ m->trail = NULL;
dout("ceph_msg_new %p front %d\n", m, front_len);
return m;
pr_err("msg_new can't create type %d front %d\n", type, front_len);
return NULL;
}
+EXPORT_SYMBOL(ceph_msg_new);
/*
* Allocate "middle" portion of a message, if it is needed and wasn't
m->pagelist = NULL;
}
+ m->trail = NULL;
+
if (m->pool)
ceph_msgpool_put(m->pool, m);
else
ceph_msg_kfree(m);
}
+EXPORT_SYMBOL(ceph_msg_last_put);
void ceph_msg_dump(struct ceph_msg *msg)
{
DUMP_PREFIX_OFFSET, 16, 1,
&msg->footer, sizeof(msg->footer), true);
}
+EXPORT_SYMBOL(ceph_msg_dump);
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
+#include <linux/module.h>
#include <linux/types.h>
#include <linux/slab.h>
#include <linux/random.h>
#include <linux/sched.h>
-#include "mon_client.h"
-#include "super.h"
-#include "auth.h"
-#include "decode.h"
+#include <linux/ceph/mon_client.h>
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/decode.h>
+
+#include <linux/ceph/auth.h>
/*
* Interact with Ceph monitor cluster. Handle requests for new map
m->num_mon);
for (i = 0; i < m->num_mon; i++)
dout("monmap_decode mon%d is %s\n", i,
- pr_addr(&m->mon_inst[i].addr.in_addr));
+ ceph_pr_addr(&m->mon_inst[i].addr.in_addr));
return m;
bad:
struct ceph_msg *msg = monc->m_subscribe;
struct ceph_mon_subscribe_item *i;
void *p, *end;
+ int num;
p = msg->front.iov_base;
end = p + msg->front_max;
- dout("__send_subscribe to 'mdsmap' %u+\n",
- (unsigned)monc->have_mdsmap);
+ num = 1 + !!monc->want_next_osdmap + !!monc->want_mdsmap;
+ ceph_encode_32(&p, num);
+
if (monc->want_next_osdmap) {
dout("__send_subscribe to 'osdmap' %u\n",
(unsigned)monc->have_osdmap);
- ceph_encode_32(&p, 3);
ceph_encode_string(&p, end, "osdmap", 6);
i = p;
i->have = cpu_to_le64(monc->have_osdmap);
i->onetime = 1;
p += sizeof(*i);
monc->want_next_osdmap = 2; /* requested */
- } else {
- ceph_encode_32(&p, 2);
}
- ceph_encode_string(&p, end, "mdsmap", 6);
- i = p;
- i->have = cpu_to_le64(monc->have_mdsmap);
- i->onetime = 0;
- p += sizeof(*i);
+ if (monc->want_mdsmap) {
+ dout("__send_subscribe to 'mdsmap' %u+\n",
+ (unsigned)monc->have_mdsmap);
+ ceph_encode_string(&p, end, "mdsmap", 6);
+ i = p;
+ i->have = cpu_to_le64(monc->have_mdsmap);
+ i->onetime = 0;
+ p += sizeof(*i);
+ }
ceph_encode_string(&p, end, "monmap", 6);
i = p;
i->have = 0;
mutex_lock(&monc->mutex);
if (monc->hunting) {
pr_info("mon%d %s session established\n",
- monc->cur_mon, pr_addr(&monc->con->peer_addr.in_addr));
+ monc->cur_mon,
+ ceph_pr_addr(&monc->con->peer_addr.in_addr));
monc->hunting = false;
}
dout("handle_subscribe_ack after %d seconds\n", seconds);
mutex_unlock(&monc->mutex);
return 0;
}
+EXPORT_SYMBOL(ceph_monc_got_mdsmap);
int ceph_monc_got_osdmap(struct ceph_mon_client *monc, u32 got)
{
mutex_unlock(&monc->mutex);
return 0;
}
+EXPORT_SYMBOL(ceph_monc_open_session);
/*
* The monitor responds with mount ack indicate mount success. The
kref_put(&req->kref, release_generic_request);
return err;
}
+EXPORT_SYMBOL(ceph_monc_do_statfs);
/*
* pool ops
pool, 0, (char *)snapid, sizeof(*snapid));
}
+EXPORT_SYMBOL(ceph_monc_create_snapid);
int ceph_monc_delete_snapid(struct ceph_mon_client *monc,
u32 pool, u64 snapid)
*/
static int build_initial_monmap(struct ceph_mon_client *monc)
{
- struct ceph_mount_args *args = monc->client->mount_args;
- struct ceph_entity_addr *mon_addr = args->mon_addr;
- int num_mon = args->num_mon;
+ struct ceph_options *opt = monc->client->options;
+ struct ceph_entity_addr *mon_addr = opt->mon_addr;
+ int num_mon = opt->num_mon;
int i;
/* build initial monmap */
}
monc->monmap->num_mon = num_mon;
monc->have_fsid = false;
-
- /* release addr memory */
- kfree(args->mon_addr);
- args->mon_addr = NULL;
- args->num_mon = 0;
return 0;
}
monc->con = NULL;
/* authentication */
- monc->auth = ceph_auth_init(cl->mount_args->name,
- cl->mount_args->secret);
+ monc->auth = ceph_auth_init(cl->options->name,
+ cl->options->secret);
if (IS_ERR(monc->auth))
return PTR_ERR(monc->auth);
monc->auth->want_keys =
out:
return err;
}
+EXPORT_SYMBOL(ceph_monc_init);
void ceph_monc_stop(struct ceph_mon_client *monc)
{
kfree(monc->monmap);
}
+EXPORT_SYMBOL(ceph_monc_stop);
static void handle_auth_reply(struct ceph_mon_client *monc,
struct ceph_msg *msg)
mutex_unlock(&monc->mutex);
return ret;
}
+EXPORT_SYMBOL(ceph_monc_validate_auth);
/*
* handle incoming message
ceph_monc_handle_map(monc, msg);
break;
- case CEPH_MSG_MDS_MAP:
- ceph_mdsc_handle_map(&monc->client->mdsc, msg);
- break;
-
case CEPH_MSG_OSD_MAP:
ceph_osdc_handle_map(&monc->client->osdc, msg);
break;
default:
+ /* can the chained handler handle it? */
+ if (monc->client->extra_mon_dispatch &&
+ monc->client->extra_mon_dispatch(monc->client, msg) == 0)
+ break;
+
pr_err("received unknown message type %d %s\n", type,
ceph_msg_type_name(type));
}
if (monc->con && !monc->hunting)
pr_info("mon%d %s session lost, "
"hunting for new mon\n", monc->cur_mon,
- pr_addr(&monc->con->peer_addr.in_addr));
+ ceph_pr_addr(&monc->con->peer_addr.in_addr));
__close_session(monc);
if (!monc->hunting) {
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
#include <linux/err.h>
#include <linux/sched.h>
#include <linux/types.h>
#include <linux/vmalloc.h>
-#include "msgpool.h"
+#include <linux/ceph/msgpool.h>
static void *alloc_fn(gfp_t gfp_mask, void *arg)
{
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
+#include <linux/module.h>
#include <linux/err.h>
#include <linux/highmem.h>
#include <linux/mm.h>
#include <linux/pagemap.h>
#include <linux/slab.h>
#include <linux/uaccess.h>
+#ifdef CONFIG_BLOCK
+#include <linux/bio.h>
+#endif
-#include "super.h"
-#include "osd_client.h"
-#include "messenger.h"
-#include "decode.h"
-#include "auth.h"
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/osd_client.h>
+#include <linux/ceph/messenger.h>
+#include <linux/ceph/decode.h>
+#include <linux/ceph/auth.h>
+#include <linux/ceph/pagelist.h>
#define OSD_OP_FRONT_LEN 4096
#define OSD_OPREPLY_FRONT_LEN 512
static void kick_requests(struct ceph_osd_client *osdc, struct ceph_osd *osd);
+static int op_needs_trail(int op)
+{
+ switch (op) {
+ case CEPH_OSD_OP_GETXATTR:
+ case CEPH_OSD_OP_SETXATTR:
+ case CEPH_OSD_OP_CMPXATTR:
+ case CEPH_OSD_OP_CALL:
+ return 1;
+ default:
+ return 0;
+ }
+}
+
+static int op_has_extent(int op)
+{
+ return (op == CEPH_OSD_OP_READ ||
+ op == CEPH_OSD_OP_WRITE);
+}
+
+void ceph_calc_raw_layout(struct ceph_osd_client *osdc,
+ struct ceph_file_layout *layout,
+ u64 snapid,
+ u64 off, u64 *plen, u64 *bno,
+ struct ceph_osd_request *req,
+ struct ceph_osd_req_op *op)
+{
+ struct ceph_osd_request_head *reqhead = req->r_request->front.iov_base;
+ u64 orig_len = *plen;
+ u64 objoff, objlen; /* extent in object */
+
+ reqhead->snapid = cpu_to_le64(snapid);
+
+ /* object extent? */
+ ceph_calc_file_object_mapping(layout, off, plen, bno,
+ &objoff, &objlen);
+ if (*plen < orig_len)
+ dout(" skipping last %llu, final file extent %llu~%llu\n",
+ orig_len - *plen, off, *plen);
+
+ if (op_has_extent(op->op)) {
+ op->extent.offset = objoff;
+ op->extent.length = objlen;
+ }
+ req->r_num_pages = calc_pages_for(off, *plen);
+ if (op->op == CEPH_OSD_OP_WRITE)
+ op->payload_len = *plen;
+
+ dout("calc_layout bno=%llx %llu~%llu (%d pages)\n",
+ *bno, objoff, objlen, req->r_num_pages);
+
+}
+EXPORT_SYMBOL(ceph_calc_raw_layout);
+
/*
* Implement client access to distributed object storage cluster.
*
* fill osd op in request message.
*/
static void calc_layout(struct ceph_osd_client *osdc,
- struct ceph_vino vino, struct ceph_file_layout *layout,
+ struct ceph_vino vino,
+ struct ceph_file_layout *layout,
u64 off, u64 *plen,
- struct ceph_osd_request *req)
+ struct ceph_osd_request *req,
+ struct ceph_osd_req_op *op)
{
- struct ceph_osd_request_head *reqhead = req->r_request->front.iov_base;
- struct ceph_osd_op *op = (void *)(reqhead + 1);
- u64 orig_len = *plen;
- u64 objoff, objlen; /* extent in object */
u64 bno;
- reqhead->snapid = cpu_to_le64(vino.snap);
-
- /* object extent? */
- ceph_calc_file_object_mapping(layout, off, plen, &bno,
- &objoff, &objlen);
- if (*plen < orig_len)
- dout(" skipping last %llu, final file extent %llu~%llu\n",
- orig_len - *plen, off, *plen);
+ ceph_calc_raw_layout(osdc, layout, vino.snap, off,
+ plen, &bno, req, op);
sprintf(req->r_oid, "%llx.%08llx", vino.ino, bno);
req->r_oid_len = strlen(req->r_oid);
-
- op->extent.offset = cpu_to_le64(objoff);
- op->extent.length = cpu_to_le64(objlen);
- req->r_num_pages = calc_pages_for(off, *plen);
-
- dout("calc_layout %s (%d) %llu~%llu (%d pages)\n",
- req->r_oid, req->r_oid_len, objoff, objlen, req->r_num_pages);
}
/*
if (req->r_own_pages)
ceph_release_page_vector(req->r_pages,
req->r_num_pages);
+#ifdef CONFIG_BLOCK
+ if (req->r_bio)
+ bio_put(req->r_bio);
+#endif
ceph_put_snap_context(req->r_snapc);
+ if (req->r_trail) {
+ ceph_pagelist_release(req->r_trail);
+ kfree(req->r_trail);
+ }
if (req->r_mempool)
mempool_free(req, req->r_osdc->req_mempool);
else
kfree(req);
}
+EXPORT_SYMBOL(ceph_osdc_release_request);
-/*
- * build new request AND message, calculate layout, and adjust file
- * extent as needed.
- *
- * if the file was recently truncated, we include information about its
- * old and new size so that the object can be updated appropriately. (we
- * avoid synchronously deleting truncated objects because it's slow.)
- *
- * if @do_sync, include a 'startsync' command so that the osd will flush
- * data quickly.
- */
-struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc,
- struct ceph_file_layout *layout,
- struct ceph_vino vino,
- u64 off, u64 *plen,
- int opcode, int flags,
+static int get_num_ops(struct ceph_osd_req_op *ops, int *needs_trail)
+{
+ int i = 0;
+
+ if (needs_trail)
+ *needs_trail = 0;
+ while (ops[i].op) {
+ if (needs_trail && op_needs_trail(ops[i].op))
+ *needs_trail = 1;
+ i++;
+ }
+
+ return i;
+}
+
+struct ceph_osd_request *ceph_osdc_alloc_request(struct ceph_osd_client *osdc,
+ int flags,
struct ceph_snap_context *snapc,
- int do_sync,
- u32 truncate_seq,
- u64 truncate_size,
- struct timespec *mtime,
- bool use_mempool, int num_reply)
+ struct ceph_osd_req_op *ops,
+ bool use_mempool,
+ gfp_t gfp_flags,
+ struct page **pages,
+ struct bio *bio)
{
struct ceph_osd_request *req;
struct ceph_msg *msg;
- struct ceph_osd_request_head *head;
- struct ceph_osd_op *op;
- void *p;
- int num_op = 1 + do_sync;
- size_t msg_size = sizeof(*head) + num_op*sizeof(*op);
- int i;
+ int needs_trail;
+ int num_op = get_num_ops(ops, &needs_trail);
+ size_t msg_size = sizeof(struct ceph_osd_request_head);
+
+ msg_size += num_op*sizeof(struct ceph_osd_op);
if (use_mempool) {
- req = mempool_alloc(osdc->req_mempool, GFP_NOFS);
+ req = mempool_alloc(osdc->req_mempool, gfp_flags);
memset(req, 0, sizeof(*req));
} else {
- req = kzalloc(sizeof(*req), GFP_NOFS);
+ req = kzalloc(sizeof(*req), gfp_flags);
}
if (req == NULL)
return NULL;
req->r_osdc = osdc;
req->r_mempool = use_mempool;
+
kref_init(&req->r_kref);
init_completion(&req->r_completion);
init_completion(&req->r_safe_completion);
msg = ceph_msgpool_get(&osdc->msgpool_op_reply, 0);
else
msg = ceph_msg_new(CEPH_MSG_OSD_OPREPLY,
- OSD_OPREPLY_FRONT_LEN, GFP_NOFS);
+ OSD_OPREPLY_FRONT_LEN, gfp_flags);
if (!msg) {
ceph_osdc_put_request(req);
return NULL;
}
req->r_reply = msg;
+ /* allocate space for the trailing data */
+ if (needs_trail) {
+ req->r_trail = kmalloc(sizeof(struct ceph_pagelist), gfp_flags);
+ if (!req->r_trail) {
+ ceph_osdc_put_request(req);
+ return NULL;
+ }
+ ceph_pagelist_init(req->r_trail);
+ }
/* create request message; allow space for oid */
msg_size += 40;
if (snapc)
if (use_mempool)
msg = ceph_msgpool_get(&osdc->msgpool_op, 0);
else
- msg = ceph_msg_new(CEPH_MSG_OSD_OP, msg_size, GFP_NOFS);
+ msg = ceph_msg_new(CEPH_MSG_OSD_OP, msg_size, gfp_flags);
if (!msg) {
ceph_osdc_put_request(req);
return NULL;
}
+
msg->hdr.type = cpu_to_le16(CEPH_MSG_OSD_OP);
memset(msg->front.iov_base, 0, msg->front.iov_len);
+
+ req->r_request = msg;
+ req->r_pages = pages;
+#ifdef CONFIG_BLOCK
+ if (bio) {
+ req->r_bio = bio;
+ bio_get(req->r_bio);
+ }
+#endif
+
+ return req;
+}
+EXPORT_SYMBOL(ceph_osdc_alloc_request);
+
+static void osd_req_encode_op(struct ceph_osd_request *req,
+ struct ceph_osd_op *dst,
+ struct ceph_osd_req_op *src)
+{
+ dst->op = cpu_to_le16(src->op);
+
+ switch (dst->op) {
+ case CEPH_OSD_OP_READ:
+ case CEPH_OSD_OP_WRITE:
+ dst->extent.offset =
+ cpu_to_le64(src->extent.offset);
+ dst->extent.length =
+ cpu_to_le64(src->extent.length);
+ dst->extent.truncate_size =
+ cpu_to_le64(src->extent.truncate_size);
+ dst->extent.truncate_seq =
+ cpu_to_le32(src->extent.truncate_seq);
+ break;
+
+ case CEPH_OSD_OP_GETXATTR:
+ case CEPH_OSD_OP_SETXATTR:
+ case CEPH_OSD_OP_CMPXATTR:
+ BUG_ON(!req->r_trail);
+
+ dst->xattr.name_len = cpu_to_le32(src->xattr.name_len);
+ dst->xattr.value_len = cpu_to_le32(src->xattr.value_len);
+ dst->xattr.cmp_op = src->xattr.cmp_op;
+ dst->xattr.cmp_mode = src->xattr.cmp_mode;
+ ceph_pagelist_append(req->r_trail, src->xattr.name,
+ src->xattr.name_len);
+ ceph_pagelist_append(req->r_trail, src->xattr.val,
+ src->xattr.value_len);
+ break;
+ case CEPH_OSD_OP_CALL:
+ BUG_ON(!req->r_trail);
+
+ dst->cls.class_len = src->cls.class_len;
+ dst->cls.method_len = src->cls.method_len;
+ dst->cls.indata_len = cpu_to_le32(src->cls.indata_len);
+
+ ceph_pagelist_append(req->r_trail, src->cls.class_name,
+ src->cls.class_len);
+ ceph_pagelist_append(req->r_trail, src->cls.method_name,
+ src->cls.method_len);
+ ceph_pagelist_append(req->r_trail, src->cls.indata,
+ src->cls.indata_len);
+ break;
+ case CEPH_OSD_OP_ROLLBACK:
+ dst->snap.snapid = cpu_to_le64(src->snap.snapid);
+ break;
+ case CEPH_OSD_OP_STARTSYNC:
+ break;
+ default:
+ pr_err("unrecognized osd opcode %d\n", dst->op);
+ WARN_ON(1);
+ break;
+ }
+ dst->payload_len = cpu_to_le32(src->payload_len);
+}
+
+/*
+ * build new request AND message
+ *
+ */
+void ceph_osdc_build_request(struct ceph_osd_request *req,
+ u64 off, u64 *plen,
+ struct ceph_osd_req_op *src_ops,
+ struct ceph_snap_context *snapc,
+ struct timespec *mtime,
+ const char *oid,
+ int oid_len)
+{
+ struct ceph_msg *msg = req->r_request;
+ struct ceph_osd_request_head *head;
+ struct ceph_osd_req_op *src_op;
+ struct ceph_osd_op *op;
+ void *p;
+ int num_op = get_num_ops(src_ops, NULL);
+ size_t msg_size = sizeof(*head) + num_op*sizeof(*op);
+ int flags = req->r_flags;
+ u64 data_len = 0;
+ int i;
+
head = msg->front.iov_base;
op = (void *)(head + 1);
p = (void *)(op + num_op);
- req->r_request = msg;
req->r_snapc = ceph_get_snap_context(snapc);
head->client_inc = cpu_to_le32(1); /* always, for now. */
if (flags & CEPH_OSD_FLAG_WRITE)
ceph_encode_timespec(&head->mtime, mtime);
head->num_ops = cpu_to_le16(num_op);
- op->op = cpu_to_le16(opcode);
- /* calculate max write size */
- calc_layout(osdc, vino, layout, off, plen, req);
- req->r_file_layout = *layout; /* keep a copy */
-
- if (flags & CEPH_OSD_FLAG_WRITE) {
- req->r_request->hdr.data_off = cpu_to_le16(off);
- req->r_request->hdr.data_len = cpu_to_le32(*plen);
- op->payload_len = cpu_to_le32(*plen);
- }
- op->extent.truncate_size = cpu_to_le64(truncate_size);
- op->extent.truncate_seq = cpu_to_le32(truncate_seq);
/* fill in oid */
- head->object_len = cpu_to_le32(req->r_oid_len);
- memcpy(p, req->r_oid, req->r_oid_len);
- p += req->r_oid_len;
-
- if (do_sync) {
+ head->object_len = cpu_to_le32(oid_len);
+ memcpy(p, oid, oid_len);
+ p += oid_len;
+
+ src_op = src_ops;
+ while (src_op->op) {
+ osd_req_encode_op(req, op, src_op);
+ src_op++;
op++;
- op->op = cpu_to_le16(CEPH_OSD_OP_STARTSYNC);
}
+
+ if (req->r_trail)
+ data_len += req->r_trail->length;
+
if (snapc) {
head->snap_seq = cpu_to_le64(snapc->seq);
head->num_snaps = cpu_to_le32(snapc->num_snaps);
}
}
+ if (flags & CEPH_OSD_FLAG_WRITE) {
+ req->r_request->hdr.data_off = cpu_to_le16(off);
+ req->r_request->hdr.data_len = cpu_to_le32(*plen + data_len);
+ } else if (data_len) {
+ req->r_request->hdr.data_off = 0;
+ req->r_request->hdr.data_len = cpu_to_le32(data_len);
+ }
+
BUG_ON(p > msg->front.iov_base + msg->front.iov_len);
msg_size = p - msg->front.iov_base;
msg->front.iov_len = msg_size;
msg->hdr.front_len = cpu_to_le32(msg_size);
+ return;
+}
+EXPORT_SYMBOL(ceph_osdc_build_request);
+
+/*
+ * build new request AND message, calculate layout, and adjust file
+ * extent as needed.
+ *
+ * if the file was recently truncated, we include information about its
+ * old and new size so that the object can be updated appropriately. (we
+ * avoid synchronously deleting truncated objects because it's slow.)
+ *
+ * if @do_sync, include a 'startsync' command so that the osd will flush
+ * data quickly.
+ */
+struct ceph_osd_request *ceph_osdc_new_request(struct ceph_osd_client *osdc,
+ struct ceph_file_layout *layout,
+ struct ceph_vino vino,
+ u64 off, u64 *plen,
+ int opcode, int flags,
+ struct ceph_snap_context *snapc,
+ int do_sync,
+ u32 truncate_seq,
+ u64 truncate_size,
+ struct timespec *mtime,
+ bool use_mempool, int num_reply)
+{
+ struct ceph_osd_req_op ops[3];
+ struct ceph_osd_request *req;
+
+ ops[0].op = opcode;
+ ops[0].extent.truncate_seq = truncate_seq;
+ ops[0].extent.truncate_size = truncate_size;
+ ops[0].payload_len = 0;
+
+ if (do_sync) {
+ ops[1].op = CEPH_OSD_OP_STARTSYNC;
+ ops[1].payload_len = 0;
+ ops[2].op = 0;
+ } else
+ ops[1].op = 0;
+
+ req = ceph_osdc_alloc_request(osdc, flags,
+ snapc, ops,
+ use_mempool,
+ GFP_NOFS, NULL, NULL);
+ if (IS_ERR(req))
+ return req;
+
+ /* calculate max write size */
+ calc_layout(osdc, vino, layout, off, plen, req, ops);
+ req->r_file_layout = *layout; /* keep a copy */
+
+ ceph_osdc_build_request(req, off, plen, ops,
+ snapc,
+ mtime,
+ req->r_oid, req->r_oid_len);
+
return req;
}
+EXPORT_SYMBOL(ceph_osdc_new_request);
/*
* We keep osd requests in an rbtree, sorted by ->r_tid.
dout("__move_osd_to_lru %p\n", osd);
BUG_ON(!list_empty(&osd->o_osd_lru));
list_add_tail(&osd->o_osd_lru, &osdc->osd_lru);
- osd->lru_ttl = jiffies + osdc->client->mount_args->osd_idle_ttl * HZ;
+ osd->lru_ttl = jiffies + osdc->client->options->osd_idle_ttl * HZ;
}
static void __remove_osd_from_lru(struct ceph_osd *osd)
static void __schedule_osd_timeout(struct ceph_osd_client *osdc)
{
schedule_delayed_work(&osdc->timeout_work,
- osdc->client->mount_args->osd_keepalive_timeout * HZ);
+ osdc->client->options->osd_keepalive_timeout * HZ);
}
static void __cancel_osd_timeout(struct ceph_osd_client *osdc)
*/
static void __cancel_request(struct ceph_osd_request *req)
{
- if (req->r_sent) {
+ if (req->r_sent && req->r_osd) {
ceph_con_revoke(&req->r_osd->o_con, req->r_request);
req->r_sent = 0;
}
container_of(work, struct ceph_osd_client, timeout_work.work);
struct ceph_osd_request *req, *last_req = NULL;
struct ceph_osd *osd;
- unsigned long timeout = osdc->client->mount_args->osd_timeout * HZ;
+ unsigned long timeout = osdc->client->options->osd_timeout * HZ;
unsigned long keepalive =
- osdc->client->mount_args->osd_keepalive_timeout * HZ;
+ osdc->client->options->osd_keepalive_timeout * HZ;
unsigned long last_stamp = 0;
struct rb_node *p;
struct list_head slow_osds;
container_of(work, struct ceph_osd_client,
osds_timeout_work.work);
unsigned long delay =
- osdc->client->mount_args->osd_idle_ttl * HZ >> 2;
+ osdc->client->options->osd_idle_ttl * HZ >> 2;
dout("osds timeout\n");
down_read(&osdc->map_sem);
req->r_request->pages = req->r_pages;
req->r_request->nr_pages = req->r_num_pages;
+#ifdef CONFIG_BLOCK
+ req->r_request->bio = req->r_bio;
+#endif
+ req->r_request->trail = req->r_trail;
register_request(osdc, req);
up_read(&osdc->map_sem);
return rc;
}
+EXPORT_SYMBOL(ceph_osdc_start_request);
/*
* wait for a request to complete
dout("wait_request tid %llu result %d\n", req->r_tid, req->r_result);
return req->r_result;
}
+EXPORT_SYMBOL(ceph_osdc_wait_request);
/*
* sync - wait for all in-flight requests to flush. avoid starvation.
mutex_unlock(&osdc->request_mutex);
dout("sync done (thru tid %llu)\n", last_tid);
}
+EXPORT_SYMBOL(ceph_osdc_sync);
/*
* init, shutdown
INIT_DELAYED_WORK(&osdc->osds_timeout_work, handle_osds_timeout);
schedule_delayed_work(&osdc->osds_timeout_work,
- round_jiffies_relative(osdc->client->mount_args->osd_idle_ttl * HZ));
+ round_jiffies_relative(osdc->client->options->osd_idle_ttl * HZ));
err = -ENOMEM;
osdc->req_mempool = mempool_create_kmalloc_pool(10,
out:
return err;
}
+EXPORT_SYMBOL(ceph_osdc_init);
void ceph_osdc_stop(struct ceph_osd_client *osdc)
{
ceph_msgpool_destroy(&osdc->msgpool_op);
ceph_msgpool_destroy(&osdc->msgpool_op_reply);
}
+EXPORT_SYMBOL(ceph_osdc_stop);
/*
* Read some contiguous pages. If we cross a stripe boundary, shorten
dout("readpages result %d\n", rc);
return rc;
}
+EXPORT_SYMBOL(ceph_osdc_readpages);
/*
* do a synchronous write on N pages
dout("writepages result %d\n", rc);
return rc;
}
+EXPORT_SYMBOL(ceph_osdc_writepages);
/*
* handle incoming message
}
m->pages = req->r_pages;
m->nr_pages = req->r_num_pages;
+#ifdef CONFIG_BLOCK
+ m->bio = req->r_bio;
+#endif
}
*skip = 0;
req->r_con_filling_msg = ceph_con_get(con);
-#include "ceph_debug.h"
+#include <linux/ceph/ceph_debug.h>
+#include <linux/module.h>
#include <linux/slab.h>
#include <asm/div64.h>
-#include "super.h"
-#include "osdmap.h"
-#include "crush/hash.h"
-#include "crush/mapper.h"
-#include "decode.h"
+#include <linux/ceph/libceph.h>
+#include <linux/ceph/osdmap.h>
+#include <linux/ceph/decode.h>
+#include <linux/crush/hash.h>
+#include <linux/crush/mapper.h>
char *ceph_osdmap_state_str(char *str, int len, int state)
{
return NULL;
}
+int ceph_pg_poolid_by_name(struct ceph_osdmap *map, const char *name)
+{
+ struct rb_node *rbp;
+
+ for (rbp = rb_first(&map->pg_pools); rbp; rbp = rb_next(rbp)) {
+ struct ceph_pg_pool_info *pi =
+ rb_entry(rbp, struct ceph_pg_pool_info, node);
+ if (pi->name && strcmp(pi->name, name) == 0)
+ return pi->id;
+ }
+ return -ENOENT;
+}
+EXPORT_SYMBOL(ceph_pg_poolid_by_name);
+
static void __remove_pg_pool(struct rb_root *root, struct ceph_pg_pool_info *pi)
{
rb_erase(&pi->node, root);
dout(" obj extent %llu~%llu\n", *oxoff, *oxlen);
}
+EXPORT_SYMBOL(ceph_calc_file_object_mapping);
/*
* calculate an object layout (i.e. pgid) from an oid,
ol->ol_stripe_unit = fl->fl_object_stripe_unit;
return 0;
}
+EXPORT_SYMBOL(ceph_calc_object_layout);
/*
* Calculate raw osd vector for the given pgid. Return pointer to osd
return osds[i];
return -1;
}
+EXPORT_SYMBOL(ceph_calc_pg_primary);
--- /dev/null
+
+#include <linux/module.h>
+#include <linux/gfp.h>
+#include <linux/pagemap.h>
+#include <linux/highmem.h>
+#include <linux/ceph/pagelist.h>
+
+static void ceph_pagelist_unmap_tail(struct ceph_pagelist *pl)
+{
+ if (pl->mapped_tail) {
+ struct page *page = list_entry(pl->head.prev, struct page, lru);
+ kunmap(page);
+ pl->mapped_tail = NULL;
+ }
+}
+
+int ceph_pagelist_release(struct ceph_pagelist *pl)
+{
+ ceph_pagelist_unmap_tail(pl);
+ while (!list_empty(&pl->head)) {
+ struct page *page = list_first_entry(&pl->head, struct page,
+ lru);
+ list_del(&page->lru);
+ __free_page(page);
+ }
+ ceph_pagelist_free_reserve(pl);
+ return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_release);
+
+static int ceph_pagelist_addpage(struct ceph_pagelist *pl)
+{
+ struct page *page;
+
+ if (!pl->num_pages_free) {
+ page = __page_cache_alloc(GFP_NOFS);
+ } else {
+ page = list_first_entry(&pl->free_list, struct page, lru);
+ list_del(&page->lru);
+ --pl->num_pages_free;
+ }
+ if (!page)
+ return -ENOMEM;
+ pl->room += PAGE_SIZE;
+ ceph_pagelist_unmap_tail(pl);
+ list_add_tail(&page->lru, &pl->head);
+ pl->mapped_tail = kmap(page);
+ return 0;
+}
+
+int ceph_pagelist_append(struct ceph_pagelist *pl, const void *buf, size_t len)
+{
+ while (pl->room < len) {
+ size_t bit = pl->room;
+ int ret;
+
+ memcpy(pl->mapped_tail + (pl->length & ~PAGE_CACHE_MASK),
+ buf, bit);
+ pl->length += bit;
+ pl->room -= bit;
+ buf += bit;
+ len -= bit;
+ ret = ceph_pagelist_addpage(pl);
+ if (ret)
+ return ret;
+ }
+
+ memcpy(pl->mapped_tail + (pl->length & ~PAGE_CACHE_MASK), buf, len);
+ pl->length += len;
+ pl->room -= len;
+ return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_append);
+
+/**
+ * Allocate enough pages for a pagelist to append the given amount
+ * of data without without allocating.
+ * Returns: 0 on success, -ENOMEM on error.
+ */
+int ceph_pagelist_reserve(struct ceph_pagelist *pl, size_t space)
+{
+ if (space <= pl->room)
+ return 0;
+ space -= pl->room;
+ space = (space + PAGE_SIZE - 1) >> PAGE_SHIFT; /* conv to num pages */
+
+ while (space > pl->num_pages_free) {
+ struct page *page = __page_cache_alloc(GFP_NOFS);
+ if (!page)
+ return -ENOMEM;
+ list_add_tail(&page->lru, &pl->free_list);
+ ++pl->num_pages_free;
+ }
+ return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_reserve);
+
+/**
+ * Free any pages that have been preallocated.
+ */
+int ceph_pagelist_free_reserve(struct ceph_pagelist *pl)
+{
+ while (!list_empty(&pl->free_list)) {
+ struct page *page = list_first_entry(&pl->free_list,
+ struct page, lru);
+ list_del(&page->lru);
+ __free_page(page);
+ --pl->num_pages_free;
+ }
+ BUG_ON(pl->num_pages_free);
+ return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_free_reserve);
+
+/**
+ * Create a truncation point.
+ */
+void ceph_pagelist_set_cursor(struct ceph_pagelist *pl,
+ struct ceph_pagelist_cursor *c)
+{
+ c->pl = pl;
+ c->page_lru = pl->head.prev;
+ c->room = pl->room;
+}
+EXPORT_SYMBOL(ceph_pagelist_set_cursor);
+
+/**
+ * Truncate a pagelist to the given point. Move extra pages to reserve.
+ * This won't sleep.
+ * Returns: 0 on success,
+ * -EINVAL if the pagelist doesn't match the trunc point pagelist
+ */
+int ceph_pagelist_truncate(struct ceph_pagelist *pl,
+ struct ceph_pagelist_cursor *c)
+{
+ struct page *page;
+
+ if (pl != c->pl)
+ return -EINVAL;
+ ceph_pagelist_unmap_tail(pl);
+ while (pl->head.prev != c->page_lru) {
+ page = list_entry(pl->head.prev, struct page, lru);
+ list_del(&page->lru); /* remove from pagelist */
+ list_add_tail(&page->lru, &pl->free_list); /* add to reserve */
+ ++pl->num_pages_free;
+ }
+ pl->room = c->room;
+ if (!list_empty(&pl->head)) {
+ page = list_entry(pl->head.prev, struct page, lru);
+ pl->mapped_tail = kmap(page);
+ }
+ return 0;
+}
+EXPORT_SYMBOL(ceph_pagelist_truncate);
--- /dev/null
+#include <linux/ceph/ceph_debug.h>
+
+#include <linux/module.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/file.h>
+#include <linux/namei.h>
+#include <linux/writeback.h>
+
+#include <linux/ceph/libceph.h>
+
+/*
+ * build a vector of user pages
+ */
+struct page **ceph_get_direct_page_vector(const char __user *data,
+ int num_pages,
+ loff_t off, size_t len)
+{
+ struct page **pages;
+ int rc;
+
+ pages = kmalloc(sizeof(*pages) * num_pages, GFP_NOFS);
+ if (!pages)
+ return ERR_PTR(-ENOMEM);
+
+ down_read(¤t->mm->mmap_sem);
+ rc = get_user_pages(current, current->mm, (unsigned long)data,
+ num_pages, 0, 0, pages, NULL);
+ up_read(¤t->mm->mmap_sem);
+ if (rc < 0)
+ goto fail;
+ return pages;
+
+fail:
+ kfree(pages);
+ return ERR_PTR(rc);
+}
+EXPORT_SYMBOL(ceph_get_direct_page_vector);
+
+void ceph_put_page_vector(struct page **pages, int num_pages)
+{
+ int i;
+
+ for (i = 0; i < num_pages; i++)
+ put_page(pages[i]);
+ kfree(pages);
+}
+EXPORT_SYMBOL(ceph_put_page_vector);
+
+void ceph_release_page_vector(struct page **pages, int num_pages)
+{
+ int i;
+
+ for (i = 0; i < num_pages; i++)
+ __free_pages(pages[i], 0);
+ kfree(pages);
+}
+EXPORT_SYMBOL(ceph_release_page_vector);
+
+/*
+ * allocate a vector new pages
+ */
+struct page **ceph_alloc_page_vector(int num_pages, gfp_t flags)
+{
+ struct page **pages;
+ int i;
+
+ pages = kmalloc(sizeof(*pages) * num_pages, flags);
+ if (!pages)
+ return ERR_PTR(-ENOMEM);
+ for (i = 0; i < num_pages; i++) {
+ pages[i] = __page_cache_alloc(flags);
+ if (pages[i] == NULL) {
+ ceph_release_page_vector(pages, i);
+ return ERR_PTR(-ENOMEM);
+ }
+ }
+ return pages;
+}
+EXPORT_SYMBOL(ceph_alloc_page_vector);
+
+/*
+ * copy user data into a page vector
+ */
+int ceph_copy_user_to_page_vector(struct page **pages,
+ const char __user *data,
+ loff_t off, size_t len)
+{
+ int i = 0;
+ int po = off & ~PAGE_CACHE_MASK;
+ int left = len;
+ int l, bad;
+
+ while (left > 0) {
+ l = min_t(int, PAGE_CACHE_SIZE-po, left);
+ bad = copy_from_user(page_address(pages[i]) + po, data, l);
+ if (bad == l)
+ return -EFAULT;
+ data += l - bad;
+ left -= l - bad;
+ po += l - bad;
+ if (po == PAGE_CACHE_SIZE) {
+ po = 0;
+ i++;
+ }
+ }
+ return len;
+}
+EXPORT_SYMBOL(ceph_copy_user_to_page_vector);
+
+int ceph_copy_to_page_vector(struct page **pages,
+ const char *data,
+ loff_t off, size_t len)
+{
+ int i = 0;
+ size_t po = off & ~PAGE_CACHE_MASK;
+ size_t left = len;
+ size_t l;
+
+ while (left > 0) {
+ l = min_t(size_t, PAGE_CACHE_SIZE-po, left);
+ memcpy(page_address(pages[i]) + po, data, l);
+ data += l;
+ left -= l;
+ po += l;
+ if (po == PAGE_CACHE_SIZE) {
+ po = 0;
+ i++;
+ }
+ }
+ return len;
+}
+EXPORT_SYMBOL(ceph_copy_to_page_vector);
+
+int ceph_copy_from_page_vector(struct page **pages,
+ char *data,
+ loff_t off, size_t len)
+{
+ int i = 0;
+ size_t po = off & ~PAGE_CACHE_MASK;
+ size_t left = len;
+ size_t l;
+
+ while (left > 0) {
+ l = min_t(size_t, PAGE_CACHE_SIZE-po, left);
+ memcpy(data, page_address(pages[i]) + po, l);
+ data += l;
+ left -= l;
+ po += l;
+ if (po == PAGE_CACHE_SIZE) {
+ po = 0;
+ i++;
+ }
+ }
+ return len;
+}
+EXPORT_SYMBOL(ceph_copy_from_page_vector);
+
+/*
+ * copy user data from a page vector into a user pointer
+ */
+int ceph_copy_page_vector_to_user(struct page **pages,
+ char __user *data,
+ loff_t off, size_t len)
+{
+ int i = 0;
+ int po = off & ~PAGE_CACHE_MASK;
+ int left = len;
+ int l, bad;
+
+ while (left > 0) {
+ l = min_t(int, left, PAGE_CACHE_SIZE-po);
+ bad = copy_to_user(data, page_address(pages[i]) + po, l);
+ if (bad == l)
+ return -EFAULT;
+ data += l - bad;
+ left -= l - bad;
+ if (po) {
+ po += l - bad;
+ if (po == PAGE_CACHE_SIZE)
+ po = 0;
+ }
+ i++;
+ }
+ return len;
+}
+EXPORT_SYMBOL(ceph_copy_page_vector_to_user);
+
+/*
+ * Zero an extent within a page vector. Offset is relative to the
+ * start of the first page.
+ */
+void ceph_zero_page_vector_range(int off, int len, struct page **pages)
+{
+ int i = off >> PAGE_CACHE_SHIFT;
+
+ off &= ~PAGE_CACHE_MASK;
+
+ dout("zero_page_vector_page %u~%u\n", off, len);
+
+ /* leading partial page? */
+ if (off) {
+ int end = min((int)PAGE_CACHE_SIZE, off + len);
+ dout("zeroing %d %p head from %d\n", i, pages[i],
+ (int)off);
+ zero_user_segment(pages[i], off, end);
+ len -= (end - off);
+ i++;
+ }
+ while (len >= PAGE_CACHE_SIZE) {
+ dout("zeroing %d %p len=%d\n", i, pages[i], len);
+ zero_user_segment(pages[i], 0, PAGE_CACHE_SIZE);
+ len -= PAGE_CACHE_SIZE;
+ i++;
+ }
+ /* trailing partial page? */
+ if (len) {
+ dout("zeroing %d %p tail to %d\n", i, pages[i], (int)len);
+ zero_user_segment(pages[i], 0, len);
+ }
+}
+EXPORT_SYMBOL(ceph_zero_page_vector_range);
+
unlock_sock_fast(sk, slow);
/* skb is now orphaned, can be freed outside of locked section */
+ trace_kfree_skb(skb, skb_free_datagram_locked);
__kfree_skb(skb);
}
EXPORT_SYMBOL(skb_free_datagram_locked);
#include <linux/jhash.h>
#include <linux/random.h>
#include <trace/events/napi.h>
+#include <trace/events/net.h>
+#include <trace/events/skb.h>
#include <linux/pci.h>
#include "net-sysfs.h"
}
rc = ops->ndo_start_xmit(skb, dev);
+ trace_net_dev_xmit(skb, rc);
if (rc == NETDEV_TX_OK)
txq_trans_update(txq);
return rc;
skb_dst_drop(nskb);
rc = ops->ndo_start_xmit(nskb, dev);
+ trace_net_dev_xmit(nskb, rc);
if (unlikely(rc != NETDEV_TX_OK)) {
if (rc & ~NETDEV_TX_MASK)
goto out_kfree_gso_skb;
struct sk_buff *skb)
{
int queue_index;
- struct sock *sk = skb->sk;
+ const struct net_device_ops *ops = dev->netdev_ops;
- queue_index = sk_tx_queue_get(sk);
- if (queue_index < 0) {
- const struct net_device_ops *ops = dev->netdev_ops;
+ if (ops->ndo_select_queue) {
+ queue_index = ops->ndo_select_queue(dev, skb);
+ queue_index = dev_cap_txqueue(dev, queue_index);
+ } else {
+ struct sock *sk = skb->sk;
+ queue_index = sk_tx_queue_get(sk);
+ if (queue_index < 0) {
- if (ops->ndo_select_queue) {
- queue_index = ops->ndo_select_queue(dev, skb);
- queue_index = dev_cap_txqueue(dev, queue_index);
- } else {
queue_index = 0;
if (dev->real_num_tx_queues > 1)
queue_index = skb_tx_hash(dev, skb);
#ifdef CONFIG_NET_CLS_ACT
skb->tc_verd = SET_TC_AT(skb->tc_verd, AT_EGRESS);
#endif
+ trace_net_dev_queue(skb);
if (q->enqueue) {
rc = __dev_xmit_skb(skb, q, dev, txq);
goto out;
if (netdev_tstamp_prequeue)
net_timestamp_check(skb);
+ trace_netif_rx(skb);
#ifdef CONFIG_RPS
{
struct rps_dev_flow voidflow, *rflow = &voidflow;
clist = clist->next;
WARN_ON(atomic_read(&skb->users));
+ trace_kfree_skb(skb, net_tx_action);
__kfree_skb(skb);
}
}
if (!netdev_tstamp_prequeue)
net_timestamp_check(skb);
+ trace_netif_receive_skb(skb);
if (vlan_tx_tag_present(skb) && vlan_hwaccel_do_receive(skb))
return NET_RX_SUCCESS;
dev = list_first_entry(head, struct net_device, unreg_list);
call_netdevice_notifiers(NETDEV_UNREGISTER_BATCH, dev);
- synchronize_net();
+ rcu_barrier();
list_for_each_entry(dev, head, unreg_list)
dev_put(dev);
if (info.cmd == ETHTOOL_GRXCLSRLALL) {
if (info.rule_cnt > 0) {
if (info.rule_cnt <= KMALLOC_MAX_SIZE / sizeof(u32))
- rule_buf = kmalloc(info.rule_cnt * sizeof(u32),
+ rule_buf = kzalloc(info.rule_cnt * sizeof(u32),
GFP_USER);
if (!rule_buf)
return -ENOMEM;
(KMALLOC_MAX_SIZE - sizeof(*indir)) / sizeof(*indir->ring_index))
return -ENOMEM;
full_size = sizeof(*indir) + sizeof(*indir->ring_index) * table_size;
- indir = kmalloc(full_size, GFP_USER);
+ indir = kzalloc(full_size, GFP_USER);
if (!indir)
return -ENOMEM;
gstrings.len = ret;
- data = kmalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
+ data = kzalloc(gstrings.len * ETH_GSTRING_LEN, GFP_USER);
if (!data)
return -ENOMEM;
if (regs.len > reglen)
regs.len = reglen;
- regbuf = kmalloc(reglen, GFP_USER);
+ regbuf = kzalloc(reglen, GFP_USER);
if (!regbuf)
return -ENOMEM;
* in any case.
*/
-int verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
+long verify_iovec(struct msghdr *m, struct iovec *iov, struct sockaddr *address, int mode)
{
- int size, err, ct;
+ int size, ct;
+ long err;
if (m->msg_namelen) {
if (mode == VERIFY_READ) {
#define CREATE_TRACE_POINTS
#include <trace/events/skb.h>
+#include <trace/events/net.h>
#include <trace/events/napi.h>
EXPORT_TRACEPOINT_SYMBOL_GPL(kfree_skb);
smp_rmb();
else if (likely(!atomic_dec_and_test(&skb->users)))
return;
+ trace_consume_skb(skb);
__kfree_skb(skb);
}
EXPORT_SYMBOL(consume_skb);
} else if (skb_gro_len(p) != pinfo->gso_size)
return -E2BIG;
- headroom = NET_SKB_PAD + NET_IP_ALIGN;
+ headroom = skb_headroom(p);
nskb = alloc_skb(headroom + skb_gro_offset(p), GFP_ATOMIC);
if (unlikely(!nskb))
return -ENOMEM;
#ifdef CONFIG_CGROUPS
void sock_update_classid(struct sock *sk)
{
- u32 classid = task_cls_classid(current);
+ u32 classid;
+ rcu_read_lock(); /* doing current task, which cannot vanish. */
+ classid = task_cls_classid(current);
+ rcu_read_unlock();
if (classid && classid != sk->sk_classid)
sk->sk_classid = classid;
}
{
int uid;
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
uid = sk->sk_socket ? SOCK_INODE(sk->sk_socket)->i_uid : 0;
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
return uid;
}
EXPORT_SYMBOL(sock_i_uid);
{
unsigned long ino;
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
ino = sk->sk_socket ? SOCK_INODE(sk->sk_socket)->i_ino : 0;
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
return ino;
}
EXPORT_SYMBOL(sock_i_ino);
set_bit(SOCK_NOSPACE, &sk->sk_socket->flags);
sk->sk_write_pending++;
- sk_wait_event(sk, ¤t_timeo, !sk->sk_err &&
- !(sk->sk_shutdown & SEND_SHUTDOWN) &&
- sk_stream_memory_free(sk) &&
- vm_wait);
+ sk_wait_event(sk, ¤t_timeo, sk->sk_err ||
+ (sk->sk_shutdown & SEND_SHUTDOWN) ||
+ (sk_stream_memory_free(sk) &&
+ !vm_wait));
sk->sk_write_pending--;
if (vm_wait) {
config NET_IPGRE
tristate "IP: GRE tunnels over IP"
+ depends on IPV6 || IPV6=n
help
Tunneling means encapsulating data of one protocol type within
another protocol and sending it over a channel that understands the
If unsure, say Y.
config INET_LRO
- bool "Large Receive Offload (ipv4/tcp)"
+ tristate "Large Receive Offload (ipv4/tcp)"
default y
---help---
Support for Large Receive Offload (ipv4/tcp).
}
if (!inet->inet_saddr)
inet->inet_saddr = rt->rt_src; /* Update source address */
- if (!inet->inet_rcv_saddr)
+ if (!inet->inet_rcv_saddr) {
inet->inet_rcv_saddr = rt->rt_src;
+ if (sk->sk_prot->rehash)
+ sk->sk_prot->rehash(sk);
+ }
inet->inet_daddr = rt->rt_dst;
inet->inet_dport = usin->sin_port;
sk->sk_state = TCP_ESTABLISHED;
struct fib_result res;
int no_addr, rpf, accept_local;
+ bool dev_match;
int ret;
struct net *net;
}
*spec_dst = FIB_RES_PREFSRC(res);
fib_combine_itag(itag, &res);
+ dev_match = false;
+
#ifdef CONFIG_IP_ROUTE_MULTIPATH
- if (FIB_RES_DEV(res) == dev || res.fi->fib_nhs > 1)
+ for (ret = 0; ret < res.fi->fib_nhs; ret++) {
+ struct fib_nh *nh = &res.fi->fib_nh[ret];
+
+ if (nh->nh_dev == dev) {
+ dev_match = true;
+ break;
+ }
+ }
#else
if (FIB_RES_DEV(res) == dev)
+ dev_match = true;
#endif
- {
+ if (dev_match) {
ret = FIB_RES_NH(res).nh_scope >= RT_SCOPE_HOST;
fib_res_put(&res);
return ret;
{
struct tnode *ret = node_parent(node);
- return rcu_dereference(ret);
+ return rcu_dereference_check(ret,
+ rcu_read_lock_held() ||
+ lockdep_rtnl_is_held());
}
/* Same as rcu_assign_pointer
static struct leaf *trie_firstleaf(struct trie *t)
{
- struct tnode *n = (struct tnode *) rcu_dereference(t->trie);
+ struct tnode *n = (struct tnode *) rcu_dereference_check(t->trie,
+ rcu_read_lock_held() ||
+ lockdep_rtnl_is_held());
if (!n)
return NULL;
igmpv3_clear_delrec(in_dev);
} else if (len < 12) {
return; /* ignore bogus packet; freed by caller */
+ } else if (IGMP_V1_SEEN(in_dev)) {
+ /* This is a v3 query with v1 queriers present */
+ max_delay = IGMP_Query_Response_Interval;
+ group = 0;
+ } else if (IGMP_V2_SEEN(in_dev)) {
+ /* this is a v3 query with v2 queriers present;
+ * Interpretation of the max_delay code is problematic here.
+ * A real v2 host would use ih_code directly, while v3 has a
+ * different encoding. We use the v3 encoding as more likely
+ * to be intended in a v3 query.
+ */
+ max_delay = IGMPV3_MRC(ih3->code)*(HZ/IGMP_TIMER_SCALE);
} else { /* v3 */
if (!pskb_may_pull(skb, sizeof(struct igmpv3_query)))
return;
#include <net/netns/generic.h>
#include <net/rtnetlink.h>
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
#include <net/ipv6.h>
#include <net/ip6_fib.h>
#include <net/ip6_route.h>
if ((dst = rt->rt_gateway) == 0)
goto tx_error_icmp;
}
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
else if (skb->protocol == htons(ETH_P_IPV6)) {
struct in6_addr *addr6;
int addr_type;
goto tx_error;
}
}
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
else if (skb->protocol == htons(ETH_P_IPV6)) {
struct rt6_info *rt6 = (struct rt6_info *)skb_dst(skb);
if ((iph->ttl = tiph->ttl) == 0) {
if (skb->protocol == htons(ETH_P_IP))
iph->ttl = old_iph->ttl;
-#ifdef CONFIG_IPV6
+#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
else if (skb->protocol == htons(ETH_P_IPV6))
iph->ttl = ((struct ipv6hdr *)old_iph)->hop_limit;
#endif
* we can switch to copy when see the first bad fragment.
*/
if (skb_has_frags(skb)) {
- struct sk_buff *frag;
+ struct sk_buff *frag, *frag2;
int first_len = skb_pagelen(skb);
- int truesizes = 0;
if (first_len - hlen > mtu ||
((first_len - hlen) & 7) ||
if (frag->len > mtu ||
((frag->len & 7) && frag->next) ||
skb_headroom(frag) < hlen)
- goto slow_path;
+ goto slow_path_clean;
/* Partially cloned skb? */
if (skb_shared(frag))
- goto slow_path;
+ goto slow_path_clean;
BUG_ON(frag->sk);
if (skb->sk) {
frag->sk = skb->sk;
frag->destructor = sock_wfree;
}
- truesizes += frag->truesize;
+ skb->truesize -= frag->truesize;
}
/* Everything is OK. Generate! */
frag = skb_shinfo(skb)->frag_list;
skb_frag_list_init(skb);
skb->data_len = first_len - skb_headlen(skb);
- skb->truesize -= truesizes;
skb->len = first_len;
iph->tot_len = htons(first_len);
iph->frag_off = htons(IP_MF);
}
IP_INC_STATS(dev_net(dev), IPSTATS_MIB_FRAGFAILS);
return err;
+
+slow_path_clean:
+ skb_walk_frags(skb, frag2) {
+ if (frag2 == frag)
+ break;
+ frag2->sk = NULL;
+ frag2->destructor = NULL;
+ skb->truesize += frag2->truesize;
+ }
}
slow_path:
case IP_HDRINCL:
val = inet->hdrincl;
break;
+ case IP_NODEFRAG:
+ val = inet->nodefrag;
+ break;
case IP_MTU_DISCOVER:
val = inet->pmtudisc;
break;
/* ip_route_me_harder expects skb->dst to be set */
skb_dst_set_noref(nskb, skb_dst(oldskb));
+ nskb->protocol = htons(ETH_P_IP);
if (ip_route_me_harder(nskb, addr_type))
goto free_nskb;
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/percpu.h>
+#include <linux/security.h>
#include <net/net_namespace.h>
#include <linux/netfilter.h>
rcu_read_unlock();
}
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+static int ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
+{
+ int ret;
+ u32 len;
+ char *secctx;
+
+ ret = security_secid_to_secctx(ct->secmark, &secctx, &len);
+ if (ret)
+ return ret;
+
+ ret = seq_printf(s, "secctx=%s ", secctx);
+
+ security_release_secctx(secctx, len);
+ return ret;
+}
+#else
+static inline int ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
+{
+ return 0;
+}
+#endif
+
static int ct_seq_show(struct seq_file *s, void *v)
{
struct nf_conntrack_tuple_hash *hash = v;
goto release;
#endif
-#ifdef CONFIG_NF_CONNTRACK_SECMARK
- if (seq_printf(s, "secmark=%u ", ct->secmark))
+ if (ct_show_secctx(s, ct))
goto release;
-#endif
if (seq_printf(s, "use=%u\n", atomic_read(&ct->ct_general.use)))
goto release;
const struct net_device *out,
int (*okfn)(struct sk_buff *))
{
+ struct sock *sk = skb->sk;
struct inet_sock *inet = inet_sk(skb->sk);
- if (inet && inet->nodefrag)
+ if (sk && (sk->sk_family == PF_INET) &&
+ inet->nodefrag)
return NF_ACCEPT;
#if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
static struct nf_conntrack_l3proto *l3proto __read_mostly;
#define MAX_IP_NAT_PROTO 256
-static const struct nf_nat_protocol *nf_nat_protos[MAX_IP_NAT_PROTO]
+static const struct nf_nat_protocol __rcu *nf_nat_protos[MAX_IP_NAT_PROTO]
__read_mostly;
static inline const struct nf_nat_protocol *
unsigned char s[4];
if (offset & 1) {
- s[0] = s[2] = 0;
+ s[0] = ~0;
s[1] = ~*optr;
+ s[2] = 0;
s[3] = *nptr;
} else {
- s[1] = s[3] = 0;
s[0] = ~*optr;
+ s[1] = ~0;
s[2] = *nptr;
+ s[3] = 0;
}
*csum = csum_fold(csum_partial(s, 4, ~csum_unfold(*csum)));
}
if (net_ratelimit())
- printk(KERN_WARNING "Neighbour table overflow.\n");
+ printk(KERN_WARNING "ipv4: Neighbour table overflow.\n");
rt_drop(rt);
return -ENOBUFS;
}
}
EXPORT_SYMBOL_GPL(__ip_route_output_key);
+static struct dst_entry *ipv4_blackhole_dst_check(struct dst_entry *dst, u32 cookie)
+{
+ return NULL;
+}
+
static void ipv4_rt_blackhole_update_pmtu(struct dst_entry *dst, u32 mtu)
{
}
.family = AF_INET,
.protocol = cpu_to_be16(ETH_P_IP),
.destroy = ipv4_dst_destroy,
- .check = ipv4_dst_check,
+ .check = ipv4_blackhole_dst_check,
.update_pmtu = ipv4_rt_blackhole_update_pmtu,
.entries = ATOMIC_INIT(0),
};
*/
mask = 0;
- if (sk->sk_err)
- mask = POLLERR;
/*
* POLLHUP is certainly not done right. But poll() doesn't
if (tp->urg_data & TCP_URG_VALID)
mask |= POLLPRI;
}
+ /* This barrier is coupled with smp_wmb() in tcp_reset() */
+ smp_rmb();
+ if (sk->sk_err)
+ mask |= POLLERR;
+
return mask;
}
EXPORT_SYMBOL(tcp_poll);
sg = sk->sk_route_caps & NETIF_F_SG;
while (--iovlen >= 0) {
- int seglen = iov->iov_len;
+ size_t seglen = iov->iov_len;
unsigned char __user *from = iov->iov_base;
iov++;
cnt += tcp_skb_pcount(skb);
if (cnt > packets) {
- if (tcp_is_sack(tp) || (oldcnt >= packets))
+ if ((tcp_is_sack(tp) && !tcp_is_fack(tp)) ||
+ (oldcnt >= packets))
break;
mss = skb_shinfo(skb)->gso_size;
default:
sk->sk_err = ECONNRESET;
}
+ /* This barrier is coupled with smp_rmb() in tcp_poll() */
+ smp_wmb();
if (!sock_flag(sk, SOCK_DEAD))
sk->sk_error_report(sk);
/* This function calculates a "timeout" which is equivalent to the timeout of a
* TCP connection after "boundary" unsuccessful, exponentially backed-off
- * retransmissions with an initial RTO of TCP_RTO_MIN.
+ * retransmissions with an initial RTO of TCP_RTO_MIN or TCP_TIMEOUT_INIT if
+ * syn_set flag is set.
*/
static bool retransmits_timed_out(struct sock *sk,
- unsigned int boundary)
+ unsigned int boundary,
+ bool syn_set)
{
unsigned int timeout, linear_backoff_thresh;
unsigned int start_ts;
+ unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN;
if (!inet_csk(sk)->icsk_retransmits)
return false;
else
start_ts = tcp_sk(sk)->retrans_stamp;
- linear_backoff_thresh = ilog2(TCP_RTO_MAX/TCP_RTO_MIN);
+ linear_backoff_thresh = ilog2(TCP_RTO_MAX/rto_base);
if (boundary <= linear_backoff_thresh)
- timeout = ((2 << boundary) - 1) * TCP_RTO_MIN;
+ timeout = ((2 << boundary) - 1) * rto_base;
else
- timeout = ((2 << linear_backoff_thresh) - 1) * TCP_RTO_MIN +
+ timeout = ((2 << linear_backoff_thresh) - 1) * rto_base +
(boundary - linear_backoff_thresh) * TCP_RTO_MAX;
return (tcp_time_stamp - start_ts) >= timeout;
{
struct inet_connection_sock *icsk = inet_csk(sk);
int retry_until;
- bool do_reset;
+ bool do_reset, syn_set = 0;
if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
if (icsk->icsk_retransmits)
dst_negative_advice(sk);
retry_until = icsk->icsk_syn_retries ? : sysctl_tcp_syn_retries;
+ syn_set = 1;
} else {
- if (retransmits_timed_out(sk, sysctl_tcp_retries1)) {
+ if (retransmits_timed_out(sk, sysctl_tcp_retries1, 0)) {
/* Black hole detection */
tcp_mtu_probing(icsk, sk);
retry_until = tcp_orphan_retries(sk, alive);
do_reset = alive ||
- !retransmits_timed_out(sk, retry_until);
+ !retransmits_timed_out(sk, retry_until, 0);
if (tcp_out_of_resources(sk, do_reset))
return 1;
}
}
- if (retransmits_timed_out(sk, retry_until)) {
+ if (retransmits_timed_out(sk, retry_until, syn_set)) {
/* Has it gone just too far? */
tcp_write_err(sk);
return 1;
icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
}
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, TCP_RTO_MAX);
- if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1))
+ if (retransmits_timed_out(sk, sysctl_tcp_retries1 + 1, 0))
__sk_dst_reset(sk);
out:;
}
EXPORT_SYMBOL(udp_lib_unhash);
+/*
+ * inet_rcv_saddr was changed, we must rehash secondary hash
+ */
+void udp_lib_rehash(struct sock *sk, u16 newhash)
+{
+ if (sk_hashed(sk)) {
+ struct udp_table *udptable = sk->sk_prot->h.udp_table;
+ struct udp_hslot *hslot, *hslot2, *nhslot2;
+
+ hslot2 = udp_hashslot2(udptable, udp_sk(sk)->udp_portaddr_hash);
+ nhslot2 = udp_hashslot2(udptable, newhash);
+ udp_sk(sk)->udp_portaddr_hash = newhash;
+ if (hslot2 != nhslot2) {
+ hslot = udp_hashslot(udptable, sock_net(sk),
+ udp_sk(sk)->udp_port_hash);
+ /* we must lock primary chain too */
+ spin_lock_bh(&hslot->lock);
+
+ spin_lock(&hslot2->lock);
+ hlist_nulls_del_init_rcu(&udp_sk(sk)->udp_portaddr_node);
+ hslot2->count--;
+ spin_unlock(&hslot2->lock);
+
+ spin_lock(&nhslot2->lock);
+ hlist_nulls_add_head_rcu(&udp_sk(sk)->udp_portaddr_node,
+ &nhslot2->head);
+ nhslot2->count++;
+ spin_unlock(&nhslot2->lock);
+
+ spin_unlock_bh(&hslot->lock);
+ }
+ }
+}
+EXPORT_SYMBOL(udp_lib_rehash);
+
+static void udp_v4_rehash(struct sock *sk)
+{
+ u16 new_hash = udp4_portaddr_hash(sock_net(sk),
+ inet_sk(sk)->inet_rcv_saddr,
+ inet_sk(sk)->inet_num);
+ udp_lib_rehash(sk, new_hash);
+}
+
static int __udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
{
int rc;
.backlog_rcv = __udp_queue_rcv_skb,
.hash = udp_lib_hash,
.unhash = udp_lib_unhash,
+ .rehash = udp_v4_rehash,
.get_port = udp_v4_get_port,
.memory_allocated = &udp_memory_allocated,
.sysctl_mem = sysctl_udp_mem,
static int xfrm4_get_tos(struct flowi *fl)
{
- return fl->fl4_tos;
+ return IPTOS_RT_MASK & fl->fl4_tos; /* Strip ECN bits */
}
static int xfrm4_init_path(struct xfrm_dst *path, struct dst_entry *dst,
}
static void
-__xfrm4_init_tempsel(struct xfrm_state *x, struct flowi *fl,
- struct xfrm_tmpl *tmpl,
- xfrm_address_t *daddr, xfrm_address_t *saddr)
+__xfrm4_init_tempsel(struct xfrm_selector *sel, struct flowi *fl)
+{
+ sel->daddr.a4 = fl->fl4_dst;
+ sel->saddr.a4 = fl->fl4_src;
+ sel->dport = xfrm_flowi_dport(fl);
+ sel->dport_mask = htons(0xffff);
+ sel->sport = xfrm_flowi_sport(fl);
+ sel->sport_mask = htons(0xffff);
+ sel->family = AF_INET;
+ sel->prefixlen_d = 32;
+ sel->prefixlen_s = 32;
+ sel->proto = fl->proto;
+ sel->ifindex = fl->oif;
+}
+
+static void
+xfrm4_init_temprop(struct xfrm_state *x, struct xfrm_tmpl *tmpl,
+ xfrm_address_t *daddr, xfrm_address_t *saddr)
{
- x->sel.daddr.a4 = fl->fl4_dst;
- x->sel.saddr.a4 = fl->fl4_src;
- x->sel.dport = xfrm_flowi_dport(fl);
- x->sel.dport_mask = htons(0xffff);
- x->sel.sport = xfrm_flowi_sport(fl);
- x->sel.sport_mask = htons(0xffff);
- x->sel.family = AF_INET;
- x->sel.prefixlen_d = 32;
- x->sel.prefixlen_s = 32;
- x->sel.proto = fl->proto;
- x->sel.ifindex = fl->oif;
x->id = tmpl->id;
if (x->id.daddr.a4 == 0)
x->id.daddr.a4 = daddr->a4;
.owner = THIS_MODULE,
.init_flags = xfrm4_init_flags,
.init_tempsel = __xfrm4_init_tempsel,
+ .init_temprop = xfrm4_init_temprop,
.output = xfrm4_output,
.extract_input = xfrm4_extract_input,
.extract_output = xfrm4_extract_output,
if (err < 0) {
printk(KERN_CRIT "IPv6 Addrconf:"
" cannot initialize default policy table: %d.\n", err);
- return err;
+ goto out;
}
- register_pernet_subsys(&addrconf_ops);
+ err = register_pernet_subsys(&addrconf_ops);
+ if (err < 0)
+ goto out_addrlabel;
/* The addrconf netdev notifier requires that loopback_dev
* has it's ipv6 private information allocated and setup
unregister_netdevice_notifier(&ipv6_dev_notf);
errlo:
unregister_pernet_subsys(&addrconf_ops);
-
+out_addrlabel:
+ ipv6_addr_label_cleanup();
+out:
return err;
}
unregister_netdevice_notifier(&ipv6_dev_notf);
unregister_pernet_subsys(&addrconf_ops);
+ ipv6_addr_label_cleanup();
rtnl_lock();
return register_pernet_subsys(&ipv6_addr_label_ops);
}
+void ipv6_addr_label_cleanup(void)
+{
+ unregister_pernet_subsys(&ipv6_addr_label_ops);
+}
+
static const struct nla_policy ifal_policy[IFAL_MAX+1] = {
[IFAL_ADDRESS] = { .len = sizeof(struct in6_addr), },
[IFAL_LABEL] = { .len = sizeof(u32), },
if (ipv6_addr_any(&np->saddr))
ipv6_addr_set_v4mapped(inet->inet_saddr, &np->saddr);
- if (ipv6_addr_any(&np->rcv_saddr))
+ if (ipv6_addr_any(&np->rcv_saddr)) {
ipv6_addr_set_v4mapped(inet->inet_rcv_saddr,
&np->rcv_saddr);
+ if (sk->sk_prot->rehash)
+ sk->sk_prot->rehash(sk);
+ }
goto out;
}
if (ipv6_addr_any(&np->rcv_saddr)) {
ipv6_addr_copy(&np->rcv_saddr, &fl.fl6_src);
inet->inet_rcv_saddr = LOOPBACK4_IPV6;
+ if (sk->sk_prot->rehash)
+ sk->sk_prot->rehash(sk);
}
ip6_dst_store(sk, dst,
if (skb_has_frags(skb)) {
int first_len = skb_pagelen(skb);
- int truesizes = 0;
+ struct sk_buff *frag2;
if (first_len - hlen > mtu ||
((first_len - hlen) & 7) ||
if (frag->len > mtu ||
((frag->len & 7) && frag->next) ||
skb_headroom(frag) < hlen)
- goto slow_path;
+ goto slow_path_clean;
/* Partially cloned skb? */
if (skb_shared(frag))
- goto slow_path;
+ goto slow_path_clean;
BUG_ON(frag->sk);
if (skb->sk) {
frag->sk = skb->sk;
frag->destructor = sock_wfree;
- truesizes += frag->truesize;
}
+ skb->truesize -= frag->truesize;
}
err = 0;
first_len = skb_pagelen(skb);
skb->data_len = first_len - skb_headlen(skb);
- skb->truesize -= truesizes;
skb->len = first_len;
ipv6_hdr(skb)->payload_len = htons(first_len -
sizeof(struct ipv6hdr));
IPSTATS_MIB_FRAGFAILS);
dst_release(&rt->dst);
return err;
+
+slow_path_clean:
+ skb_walk_frags(skb, frag2) {
+ if (frag2 == frag)
+ break;
+ frag2->sk = NULL;
+ frag2->destructor = NULL;
+ skb->truesize += frag2->truesize;
+ }
}
slow_path:
kfree_skb(NFCT_FRAG6_CB(skb)->orig);
}
-/* Memory Tracking Functions. */
-static void frag_kfree_skb(struct sk_buff *skb)
-{
- atomic_sub(skb->truesize, &nf_init_frags.mem);
- nf_skb_free(skb);
- kfree_skb(skb);
-}
-
/* Destruction primitives. */
static __inline__ void fq_put(struct nf_ct_frag6_queue *fq)
}
found:
- /* We found where to put this one. Check for overlap with
- * preceding fragment, and, if needed, align things so that
- * any overlaps are eliminated.
- */
- if (prev) {
- int i = (NFCT_FRAG6_CB(prev)->offset + prev->len) - offset;
-
- if (i > 0) {
- offset += i;
- if (end <= offset) {
- pr_debug("overlap\n");
- goto err;
- }
- if (!pskb_pull(skb, i)) {
- pr_debug("Can't pull\n");
- goto err;
- }
- if (skb->ip_summed != CHECKSUM_UNNECESSARY)
- skb->ip_summed = CHECKSUM_NONE;
- }
- }
-
- /* Look for overlap with succeeding segments.
- * If we can merge fragments, do it.
+ /* RFC5722, Section 4:
+ * When reassembling an IPv6 datagram, if
+ * one or more its constituent fragments is determined to be an
+ * overlapping fragment, the entire datagram (and any constituent
+ * fragments, including those not yet received) MUST be silently
+ * discarded.
*/
- while (next && NFCT_FRAG6_CB(next)->offset < end) {
- /* overlap is 'i' bytes */
- int i = end - NFCT_FRAG6_CB(next)->offset;
-
- if (i < next->len) {
- /* Eat head of the next overlapped fragment
- * and leave the loop. The next ones cannot overlap.
- */
- pr_debug("Eat head of the overlapped parts.: %d", i);
- if (!pskb_pull(next, i))
- goto err;
- /* next fragment */
- NFCT_FRAG6_CB(next)->offset += i;
- fq->q.meat -= i;
- if (next->ip_summed != CHECKSUM_UNNECESSARY)
- next->ip_summed = CHECKSUM_NONE;
- break;
- } else {
- struct sk_buff *free_it = next;
-
- /* Old fragmnet is completely overridden with
- * new one drop it.
- */
- next = next->next;
+ /* Check for overlap with preceding fragment. */
+ if (prev &&
+ (NFCT_FRAG6_CB(prev)->offset + prev->len) - offset > 0)
+ goto discard_fq;
- if (prev)
- prev->next = next;
- else
- fq->q.fragments = next;
-
- fq->q.meat -= free_it->len;
- frag_kfree_skb(free_it);
- }
- }
+ /* Look for overlap with succeeding segment. */
+ if (next && NFCT_FRAG6_CB(next)->offset < end)
+ goto discard_fq;
NFCT_FRAG6_CB(skb)->offset = offset;
write_unlock(&nf_frags.lock);
return 0;
+discard_fq:
+ fq_kill(fq);
err:
return -1;
}
}
EXPORT_SYMBOL(ip6_frag_match);
-/* Memory Tracking Functions. */
-static void frag_kfree_skb(struct netns_frags *nf, struct sk_buff *skb)
-{
- atomic_sub(skb->truesize, &nf->mem);
- kfree_skb(skb);
-}
-
void ip6_frag_init(struct inet_frag_queue *q, void *a)
{
struct frag_queue *fq = container_of(q, struct frag_queue, q);
}
found:
- /* We found where to put this one. Check for overlap with
- * preceding fragment, and, if needed, align things so that
- * any overlaps are eliminated.
+ /* RFC5722, Section 4:
+ * When reassembling an IPv6 datagram, if
+ * one or more its constituent fragments is determined to be an
+ * overlapping fragment, the entire datagram (and any constituent
+ * fragments, including those not yet received) MUST be silently
+ * discarded.
*/
- if (prev) {
- int i = (FRAG6_CB(prev)->offset + prev->len) - offset;
- if (i > 0) {
- offset += i;
- if (end <= offset)
- goto err;
- if (!pskb_pull(skb, i))
- goto err;
- if (skb->ip_summed != CHECKSUM_UNNECESSARY)
- skb->ip_summed = CHECKSUM_NONE;
- }
- }
+ /* Check for overlap with preceding fragment. */
+ if (prev &&
+ (FRAG6_CB(prev)->offset + prev->len) - offset > 0)
+ goto discard_fq;
- /* Look for overlap with succeeding segments.
- * If we can merge fragments, do it.
- */
- while (next && FRAG6_CB(next)->offset < end) {
- int i = end - FRAG6_CB(next)->offset; /* overlap is 'i' bytes */
-
- if (i < next->len) {
- /* Eat head of the next overlapped fragment
- * and leave the loop. The next ones cannot overlap.
- */
- if (!pskb_pull(next, i))
- goto err;
- FRAG6_CB(next)->offset += i; /* next fragment */
- fq->q.meat -= i;
- if (next->ip_summed != CHECKSUM_UNNECESSARY)
- next->ip_summed = CHECKSUM_NONE;
- break;
- } else {
- struct sk_buff *free_it = next;
-
- /* Old fragment is completely overridden with
- * new one drop it.
- */
- next = next->next;
-
- if (prev)
- prev->next = next;
- else
- fq->q.fragments = next;
-
- fq->q.meat -= free_it->len;
- frag_kfree_skb(fq->q.net, free_it);
- }
- }
+ /* Look for overlap with succeeding segment. */
+ if (next && FRAG6_CB(next)->offset < end)
+ goto discard_fq;
FRAG6_CB(skb)->offset = offset;
write_unlock(&ip6_frags.lock);
return -1;
+discard_fq:
+ fq_kill(fq);
err:
IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
IPSTATS_MIB_REASMFAILS);
if (net_ratelimit())
printk(KERN_WARNING
- "Neighbour table overflow.\n");
+ "ipv6: Neighbour table overflow.\n");
dst_free(&rt->dst);
return NULL;
}
* i.e. Path MTU discovery
*/
-void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
- struct net_device *dev, u32 pmtu)
+static void rt6_do_pmtu_disc(struct in6_addr *daddr, struct in6_addr *saddr,
+ struct net *net, u32 pmtu, int ifindex)
{
struct rt6_info *rt, *nrt;
- struct net *net = dev_net(dev);
int allfrag = 0;
- rt = rt6_lookup(net, daddr, saddr, dev->ifindex, 0);
+ rt = rt6_lookup(net, daddr, saddr, ifindex, 0);
if (rt == NULL)
return;
dst_release(&rt->dst);
}
+void rt6_pmtu_discovery(struct in6_addr *daddr, struct in6_addr *saddr,
+ struct net_device *dev, u32 pmtu)
+{
+ struct net *net = dev_net(dev);
+
+ /*
+ * RFC 1981 states that a node "MUST reduce the size of the packets it
+ * is sending along the path" that caused the Packet Too Big message.
+ * Since it's not possible in the general case to determine which
+ * interface was used to send the original packet, we update the MTU
+ * on the interface that will be used to send future packets. We also
+ * update the MTU on the interface that received the Packet Too Big in
+ * case the original packet was forced out that interface with
+ * SO_BINDTODEVICE or similar. This is the next best thing to the
+ * correct behaviour, which would be to update the MTU on all
+ * interfaces.
+ */
+ rt6_do_pmtu_disc(daddr, saddr, net, pmtu, 0);
+ rt6_do_pmtu_disc(daddr, saddr, net, pmtu, dev->ifindex);
+}
+
/*
* Misc support functions
*/
return udp_lib_get_port(sk, snum, ipv6_rcv_saddr_equal, hash2_nulladdr);
}
+static void udp_v6_rehash(struct sock *sk)
+{
+ u16 new_hash = udp6_portaddr_hash(sock_net(sk),
+ &inet6_sk(sk)->rcv_saddr,
+ inet_sk(sk)->inet_num);
+
+ udp_lib_rehash(sk, new_hash);
+}
+
static inline int compute_score(struct sock *sk, struct net *net,
unsigned short hnum,
struct in6_addr *saddr, __be16 sport,
.backlog_rcv = udpv6_queue_rcv_skb,
.hash = udp_lib_hash,
.unhash = udp_lib_unhash,
+ .rehash = udp_v6_rehash,
.get_port = udp_v6_get_port,
.memory_allocated = &udp_memory_allocated,
.sysctl_mem = sysctl_udp_mem,
#include <net/addrconf.h>
static void
-__xfrm6_init_tempsel(struct xfrm_state *x, struct flowi *fl,
- struct xfrm_tmpl *tmpl,
- xfrm_address_t *daddr, xfrm_address_t *saddr)
+__xfrm6_init_tempsel(struct xfrm_selector *sel, struct flowi *fl)
{
/* Initialize temporary selector matching only
* to current session. */
- ipv6_addr_copy((struct in6_addr *)&x->sel.daddr, &fl->fl6_dst);
- ipv6_addr_copy((struct in6_addr *)&x->sel.saddr, &fl->fl6_src);
- x->sel.dport = xfrm_flowi_dport(fl);
- x->sel.dport_mask = htons(0xffff);
- x->sel.sport = xfrm_flowi_sport(fl);
- x->sel.sport_mask = htons(0xffff);
- x->sel.family = AF_INET6;
- x->sel.prefixlen_d = 128;
- x->sel.prefixlen_s = 128;
- x->sel.proto = fl->proto;
- x->sel.ifindex = fl->oif;
+ ipv6_addr_copy((struct in6_addr *)&sel->daddr, &fl->fl6_dst);
+ ipv6_addr_copy((struct in6_addr *)&sel->saddr, &fl->fl6_src);
+ sel->dport = xfrm_flowi_dport(fl);
+ sel->dport_mask = htons(0xffff);
+ sel->sport = xfrm_flowi_sport(fl);
+ sel->sport_mask = htons(0xffff);
+ sel->family = AF_INET6;
+ sel->prefixlen_d = 128;
+ sel->prefixlen_s = 128;
+ sel->proto = fl->proto;
+ sel->ifindex = fl->oif;
+}
+
+static void
+xfrm6_init_temprop(struct xfrm_state *x, struct xfrm_tmpl *tmpl,
+ xfrm_address_t *daddr, xfrm_address_t *saddr)
+{
x->id = tmpl->id;
if (ipv6_addr_any((struct in6_addr*)&x->id.daddr))
memcpy(&x->id.daddr, daddr, sizeof(x->sel.daddr));
.eth_proto = htons(ETH_P_IPV6),
.owner = THIS_MODULE,
.init_tempsel = __xfrm6_init_tempsel,
+ .init_temprop = xfrm6_init_temprop,
.tmpl_sort = __xfrm6_tmpl_sort,
.state_sort = __xfrm6_state_sort,
.output = xfrm6_output,
memcpy(&val_len, buf+n, 2); /* To avoid alignment problems */
le16_to_cpus(&val_len); n+=2;
- if (val_len > 1016) {
+ if (val_len >= 1016) {
IRDA_DEBUG(2, "%s(), parameter length to long\n", __func__ );
return -RSP_INVALID_COMMAND_FORMAT;
}
{
struct sock *sk = sock->sk;
struct llc_sock *llc = llc_sk(sk);
- int rc = -EINVAL, opt;
+ unsigned int opt;
+ int rc = -EINVAL;
lock_sock(sk);
if (unlikely(level != SOL_LLC || optlen != sizeof(int)))
int __init llc_station_init(void)
{
- u16 rc = -ENOBUFS;
+ int rc = -ENOBUFS;
struct sk_buff *skb;
struct llc_station_state_ev *ev;
set_bit(HT_AGG_STATE_STOPPING, &tid_tx->state);
+ del_timer_sync(&tid_tx->addba_resp_timer);
+
/*
* After this packets are no longer handed right through
* to the driver but are put onto tid_tx->pending instead,
struct net_device *prev_dev = NULL;
struct ieee80211_rx_status *status = IEEE80211_SKB_RXCB(skb);
- if (status->flag & RX_FLAG_INTERNAL_CMTR)
- goto out_free_skb;
-
if (skb_headroom(skb) < sizeof(*rthdr) &&
pskb_expand_head(skb, sizeof(*rthdr), 0, GFP_ATOMIC))
goto out_free_skb;
} else
goto out_free_skb;
- status->flag |= RX_FLAG_INTERNAL_CMTR;
return;
out_free_skb:
skb2 = skb_clone(skb, GFP_ATOMIC);
if (skb2) {
skb2->dev = prev_dev;
- netif_receive_skb(skb2);
+ netif_rx(skb2);
}
}
}
if (prev_dev) {
skb->dev = prev_dev;
- netif_receive_skb(skb);
+ netif_rx(skb);
skb = NULL;
}
rcu_read_unlock();
static DEFINE_MUTEX(afinfo_mutex);
-const struct nf_afinfo *nf_afinfo[NFPROTO_NUMPROTO] __read_mostly;
+const struct nf_afinfo __rcu *nf_afinfo[NFPROTO_NUMPROTO] __read_mostly;
EXPORT_SYMBOL(nf_afinfo);
int nf_register_afinfo(const struct nf_afinfo *afinfo)
ip_vs_out_stats(cp, skb);
ip_vs_set_state(cp, IP_VS_DIR_OUTPUT, skb, pp);
+ ip_vs_update_conntrack(skb, cp, 0);
ip_vs_conn_put(cp);
skb->ipvs_property = 1;
union nf_inet_addr to;
__be16 port;
struct ip_vs_conn *n_cp;
- struct nf_conn *ct;
#ifdef CONFIG_IP_VS_IPV6
/* This application helper doesn't work with IPv6 yet,
ip_vs_control_add(n_cp, cp);
}
- ct = (struct nf_conn *)skb->nfct;
- if (ct && ct != &nf_conntrack_untracked)
- ip_vs_expect_related(skb, ct, n_cp,
- IPPROTO_TCP, &n_cp->dport, 1);
-
/*
* Move tunnel to listen state
*/
}
#endif
-static void
-ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp)
+void
+ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp, int outin)
{
struct nf_conn *ct = (struct nf_conn *)skb->nfct;
struct nf_conntrack_tuple new_tuple;
* real-server we will see RIP->DIP.
*/
new_tuple = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
- new_tuple.src.u3 = cp->daddr;
+ if (outin)
+ new_tuple.src.u3 = cp->daddr;
+ else
+ new_tuple.dst.u3 = cp->vaddr;
/*
* This will also take care of UDP and other protocols.
*/
- new_tuple.src.u.tcp.port = cp->dport;
+ if (outin)
+ new_tuple.src.u.tcp.port = cp->dport;
+ else
+ new_tuple.dst.u.tcp.port = cp->vport;
nf_conntrack_alter_reply(ct, &new_tuple);
}
IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT");
- ip_vs_update_conntrack(skb, cp);
+ ip_vs_update_conntrack(skb, cp, 1);
/* FIXME: when application helper enlarges the packet and the length
is larger than the MTU of outgoing device, there will be still
IP_VS_DBG_PKT(10, pp, skb, 0, "After DNAT");
- ip_vs_update_conntrack(skb, cp);
+ ip_vs_update_conntrack(skb, cp, 1);
/* FIXME: when application helper enlarges the packet and the length
is larger than the MTU of outgoing device, there will be still
static DEFINE_MUTEX(nf_ct_ecache_mutex);
-struct nf_ct_event_notifier *nf_conntrack_event_cb __read_mostly;
+struct nf_ct_event_notifier __rcu *nf_conntrack_event_cb __read_mostly;
EXPORT_SYMBOL_GPL(nf_conntrack_event_cb);
-struct nf_exp_event_notifier *nf_expect_event_cb __read_mostly;
+struct nf_exp_event_notifier __rcu *nf_expect_event_cb __read_mostly;
EXPORT_SYMBOL_GPL(nf_expect_event_cb);
/* deliver cached events and clear cache entry - must be called with locally
#include <linux/skbuff.h>
#include <net/netfilter/nf_conntrack_extend.h>
-static struct nf_ct_ext_type *nf_ct_ext_types[NF_CT_EXT_NUM];
+static struct nf_ct_ext_type __rcu *nf_ct_ext_types[NF_CT_EXT_NUM];
static DEFINE_MUTEX(nf_ct_ext_type_mutex);
void __nf_ct_ext_destroy(struct nf_conn *ct)
{
unsigned int off, len;
struct nf_ct_ext_type *t;
+ size_t alloc_size;
rcu_read_lock();
t = rcu_dereference(nf_ct_ext_types[id]);
BUG_ON(t == NULL);
off = ALIGN(sizeof(struct nf_ct_ext), t->align);
len = off + t->len;
+ alloc_size = t->alloc_size;
rcu_read_unlock();
- *ext = kzalloc(t->alloc_size, gfp);
+ *ext = kzalloc(alloc_size, gfp);
if (!*ext)
return NULL;
#include <linux/rculist_nulls.h>
#include <linux/types.h>
#include <linux/timer.h>
+#include <linux/security.h>
#include <linux/skbuff.h>
#include <linux/errno.h>
#include <linux/netlink.h>
#ifdef CONFIG_NF_CONNTRACK_SECMARK
static inline int
-ctnetlink_dump_secmark(struct sk_buff *skb, const struct nf_conn *ct)
+ctnetlink_dump_secctx(struct sk_buff *skb, const struct nf_conn *ct)
{
- NLA_PUT_BE32(skb, CTA_SECMARK, htonl(ct->secmark));
- return 0;
+ struct nlattr *nest_secctx;
+ int len, ret;
+ char *secctx;
+
+ ret = security_secid_to_secctx(ct->secmark, &secctx, &len);
+ if (ret)
+ return ret;
+
+ ret = -1;
+ nest_secctx = nla_nest_start(skb, CTA_SECCTX | NLA_F_NESTED);
+ if (!nest_secctx)
+ goto nla_put_failure;
+
+ NLA_PUT_STRING(skb, CTA_SECCTX_NAME, secctx);
+ nla_nest_end(skb, nest_secctx);
+ ret = 0;
nla_put_failure:
- return -1;
+ security_release_secctx(secctx, len);
+ return ret;
}
#else
-#define ctnetlink_dump_secmark(a, b) (0)
+#define ctnetlink_dump_secctx(a, b) (0)
#endif
#define master_tuple(ct) &(ct->master->tuplehash[IP_CT_DIR_ORIGINAL].tuple)
ctnetlink_dump_protoinfo(skb, ct) < 0 ||
ctnetlink_dump_helpinfo(skb, ct) < 0 ||
ctnetlink_dump_mark(skb, ct) < 0 ||
- ctnetlink_dump_secmark(skb, ct) < 0 ||
+ ctnetlink_dump_secctx(skb, ct) < 0 ||
ctnetlink_dump_id(skb, ct) < 0 ||
ctnetlink_dump_use(skb, ct) < 0 ||
ctnetlink_dump_master(skb, ct) < 0 ||
;
}
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+static int ctnetlink_nlmsg_secctx_size(const struct nf_conn *ct)
+{
+ int len;
+
+ security_secid_to_secctx(ct->secmark, NULL, &len);
+
+ return sizeof(char) * len;
+}
+#endif
+
static inline size_t
ctnetlink_nlmsg_size(const struct nf_conn *ct)
{
+ nla_total_size(0) /* CTA_HELP */
+ nla_total_size(NF_CT_HELPER_NAME_LEN) /* CTA_HELP_NAME */
#ifdef CONFIG_NF_CONNTRACK_SECMARK
- + nla_total_size(sizeof(u_int32_t)) /* CTA_SECMARK */
+ + nla_total_size(0) /* CTA_SECCTX */
+ + nla_total_size(ctnetlink_nlmsg_secctx_size(ct)) /* CTA_SECCTX_NAME */
#endif
#ifdef CONFIG_NF_NAT_NEEDED
+ 2 * nla_total_size(0) /* CTA_NAT_SEQ_ADJ_ORIG|REPL */
#ifdef CONFIG_NF_CONNTRACK_SECMARK
if ((events & (1 << IPCT_SECMARK) || ct->secmark)
- && ctnetlink_dump_secmark(skb, ct) < 0)
+ && ctnetlink_dump_secctx(skb, ct) < 0)
goto nla_put_failure;
#endif
#include <net/netfilter/nf_conntrack_l4proto.h>
#include <net/netfilter/nf_conntrack_core.h>
-static struct nf_conntrack_l4proto **nf_ct_protos[PF_MAX] __read_mostly;
-struct nf_conntrack_l3proto *nf_ct_l3protos[AF_MAX] __read_mostly;
+static struct nf_conntrack_l4proto __rcu **nf_ct_protos[PF_MAX] __read_mostly;
+struct nf_conntrack_l3proto __rcu *nf_ct_l3protos[AF_MAX] __read_mostly;
EXPORT_SYMBOL_GPL(nf_ct_l3protos);
static DEFINE_MUTEX(nf_ct_proto_mutex);
unsigned int msglen, origlen;
const char *dptr, *end;
s16 diff, tdiff = 0;
- int ret;
+ int ret = NF_ACCEPT;
typeof(nf_nat_sip_seq_adjust_hook) nf_nat_sip_seq_adjust;
if (ctinfo != IP_CT_ESTABLISHED &&
#include <linux/seq_file.h>
#include <linux/percpu.h>
#include <linux/netdevice.h>
+#include <linux/security.h>
#include <net/net_namespace.h>
#ifdef CONFIG_SYSCTL
#include <linux/sysctl.h>
rcu_read_unlock();
}
+#ifdef CONFIG_NF_CONNTRACK_SECMARK
+static int ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
+{
+ int ret;
+ u32 len;
+ char *secctx;
+
+ ret = security_secid_to_secctx(ct->secmark, &secctx, &len);
+ if (ret)
+ return ret;
+
+ ret = seq_printf(s, "secctx=%s ", secctx);
+
+ security_release_secctx(secctx, len);
+ return ret;
+}
+#else
+static inline int ct_show_secctx(struct seq_file *s, const struct nf_conn *ct)
+{
+ return 0;
+}
+#endif
+
/* return 0 on success, 1 in case of error */
static int ct_seq_show(struct seq_file *s, void *v)
{
goto release;
#endif
-#ifdef CONFIG_NF_CONNTRACK_SECMARK
- if (seq_printf(s, "secmark=%u ", ct->secmark))
+ if (ct_show_secctx(s, ct))
goto release;
-#endif
#ifdef CONFIG_NF_CONNTRACK_ZONES
if (seq_printf(s, "zone=%u ", nf_ct_zone(ct)))
#define NF_LOG_PREFIXLEN 128
#define NFLOGGER_NAME_LEN 64
-static const struct nf_logger *nf_loggers[NFPROTO_NUMPROTO] __read_mostly;
+static const struct nf_logger __rcu *nf_loggers[NFPROTO_NUMPROTO] __read_mostly;
static struct list_head nf_loggers_l[NFPROTO_NUMPROTO] __read_mostly;
static DEFINE_MUTEX(nf_log_mutex);
* long term mutex. The handler must provide an an outfn() to accept packets
* for queueing and must reinject all packets it receives, no matter what.
*/
-static const struct nf_queue_handler *queue_handler[NFPROTO_NUMPROTO] __read_mostly;
+static const struct nf_queue_handler __rcu *queue_handler[NFPROTO_NUMPROTO] __read_mostly;
static DEFINE_MUTEX(queue_handler_mutex);
int
nf_tproxy_assign_sock(struct sk_buff *skb, struct sock *sk)
{
- if (inet_sk(sk)->transparent) {
+ bool transparent = (sk->sk_state == TCP_TIME_WAIT) ?
+ inet_twsk(sk)->tw_transparent :
+ inet_sk(sk)->transparent;
+
+ if (transparent) {
skb_orphan(skb);
skb->sk = sk;
skb->destructor = nf_tproxy_destructor;
#include <linux/module.h>
#include <linux/gfp.h>
#include <linux/skbuff.h>
-#include <linux/selinux.h>
#include <linux/netfilter_ipv4/ip_tables.h>
#include <linux/netfilter_ipv6/ip6_tables.h>
#include <linux/netfilter/x_tables.h>
*/
#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
#include <linux/module.h>
+#include <linux/security.h>
#include <linux/skbuff.h>
-#include <linux/selinux.h>
#include <linux/netfilter/x_tables.h>
#include <linux/netfilter/xt_SECMARK.h>
switch (mode) {
case SECMARK_MODE_SEL:
- secmark = info->u.sel.selsid;
+ secmark = info->secid;
break;
-
default:
BUG();
}
return XT_CONTINUE;
}
-static int checkentry_selinux(struct xt_secmark_target_info *info)
+static int checkentry_lsm(struct xt_secmark_target_info *info)
{
int err;
- struct xt_secmark_target_selinux_info *sel = &info->u.sel;
- sel->selctx[SECMARK_SELCTX_MAX - 1] = '\0';
+ info->secctx[SECMARK_SECCTX_MAX - 1] = '\0';
+ info->secid = 0;
- err = selinux_string_to_sid(sel->selctx, &sel->selsid);
+ err = security_secctx_to_secid(info->secctx, strlen(info->secctx),
+ &info->secid);
if (err) {
if (err == -EINVAL)
- pr_info("invalid SELinux context \'%s\'\n",
- sel->selctx);
+ pr_info("invalid security context \'%s\'\n", info->secctx);
return err;
}
- if (!sel->selsid) {
- pr_info("unable to map SELinux context \'%s\'\n", sel->selctx);
+ if (!info->secid) {
+ pr_info("unable to map security context \'%s\'\n", info->secctx);
return -ENOENT;
}
- err = selinux_secmark_relabel_packet_permission(sel->selsid);
+ err = security_secmark_relabel_packet(info->secid);
if (err) {
pr_info("unable to obtain relabeling permission\n");
return err;
}
- selinux_secmark_refcount_inc();
+ security_secmark_refcount_inc();
return 0;
}
switch (info->mode) {
case SECMARK_MODE_SEL:
- err = checkentry_selinux(info);
- if (err <= 0)
- return err;
break;
-
default:
pr_info("invalid mode: %hu\n", info->mode);
return -EINVAL;
}
+ err = checkentry_lsm(info);
+ if (err)
+ return err;
+
if (!mode)
mode = info->mode;
return 0;
{
switch (mode) {
case SECMARK_MODE_SEL:
- selinux_secmark_refcount_dec();
+ security_secmark_refcount_dec();
}
}
static int pipe_rcv_status(struct sock *sk, struct sk_buff *skb)
{
struct pep_sock *pn = pep_sk(sk);
- struct pnpipehdr *hdr = pnp_hdr(skb);
+ struct pnpipehdr *hdr;
int wake = 0;
if (!pskb_may_pull(skb, sizeof(*hdr) + 4))
return -EINVAL;
+ hdr = pnp_hdr(skb);
if (hdr->data[0] != PN_PEP_TYPE_COMMON) {
LIMIT_NETDEBUG(KERN_DEBUG"Phonet unknown PEP type: %u\n",
(unsigned)hdr->data[0]);
unsigned long ret;
void *addr;
- if (to_user)
+ addr = kmap(page);
+ if (to_user) {
rds_stats_add(s_copy_to_user, bytes);
- else
+ ret = copy_to_user(ptr, addr + offset, bytes);
+ } else {
rds_stats_add(s_copy_from_user, bytes);
-
- addr = kmap_atomic(page, KM_USER0);
- if (to_user)
- ret = __copy_to_user_inatomic(ptr, addr + offset, bytes);
- else
- ret = __copy_from_user_inatomic(addr + offset, ptr, bytes);
- kunmap_atomic(addr, KM_USER0);
-
- if (ret) {
- addr = kmap(page);
- if (to_user)
- ret = copy_to_user(ptr, addr + offset, bytes);
- else
- ret = copy_from_user(addr + offset, ptr, bytes);
- kunmap(page);
- if (ret)
- return -EFAULT;
+ ret = copy_from_user(addr + offset, ptr, bytes);
}
+ kunmap(page);
- return 0;
+ return ret ? -EFAULT : 0;
}
EXPORT_SYMBOL_GPL(rds_page_copy_user);
struct rds_connection *conn;
struct rds_tcp_connection *tc;
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
conn = sk->sk_user_data;
if (conn == NULL) {
state_change = sk->sk_state_change;
break;
}
out:
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
state_change(sk);
}
rdsdebug("listen data ready sk %p\n", sk);
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
ready = sk->sk_user_data;
if (ready == NULL) { /* check for teardown race */
ready = sk->sk_data_ready;
queue_work(rds_wq, &rds_tcp_listen_work);
out:
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
ready(sk, bytes);
}
rdsdebug("data ready sk %p bytes %d\n", sk, bytes);
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
conn = sk->sk_user_data;
if (conn == NULL) { /* check for teardown race */
ready = sk->sk_data_ready;
if (rds_tcp_read_sock(conn, GFP_ATOMIC, KM_SOFTIRQ0) == -ENOMEM)
queue_delayed_work(rds_wq, &conn->c_recv_w, 0);
out:
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
ready(sk, bytes);
}
struct rds_connection *conn;
struct rds_tcp_connection *tc;
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
conn = sk->sk_user_data;
if (conn == NULL) {
write_space = sk->sk_write_space;
queue_delayed_work(rds_wq, &conn->c_send_w, 0);
out:
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
/*
* write_space is only called when data leaves tcp's send queue if
if (addr_len == sizeof(struct sockaddr_rose) && addr->srose_ndigis > 1)
return -EINVAL;
- if (addr->srose_ndigis > ROSE_MAX_DIGIS)
+ if ((unsigned int) addr->srose_ndigis > ROSE_MAX_DIGIS)
return -EINVAL;
if ((dev = rose_dev_get(&addr->srose_addr)) == NULL) {
if (addr_len == sizeof(struct sockaddr_rose) && addr->srose_ndigis > 1)
return -EINVAL;
- if (addr->srose_ndigis > ROSE_MAX_DIGIS)
+ if ((unsigned int) addr->srose_ndigis > ROSE_MAX_DIGIS)
return -EINVAL;
/* Source + Destination digis should not exceed ROSE_MAX_DIGIS */
* calls by looking at the number of nested bh disable calls because
* softirqs always disables bh.
*/
- if (softirq_count() != SOFTIRQ_OFFSET) {
+ if (in_serving_softirq()) {
/* If there is an sk_classid we'll use that. */
if (!skb->sk)
return -1;
int toff = off + key->off + (off2 & key->offmask);
__be32 *data, _data;
- if (skb_headroom(skb) + toff < 0)
+ if (skb_headroom(skb) + toff > INT_MAX)
goto out;
data = skb_header_pointer(skb, toff, 4, &_data);
error = -EINVAL;
goto err_out;
}
- if (!list_empty(&flow->list)) {
- error = -EEXIST;
- goto err_out;
- }
} else {
int i;
unsigned long cl;
id = ntohs(hmacs->hmac_ids[i]);
/* Check the id is in the supported range */
- if (id > SCTP_AUTH_HMAC_ID_MAX)
+ if (id > SCTP_AUTH_HMAC_ID_MAX) {
+ id = 0;
continue;
+ }
/* See is we support the id. Supported IDs have name and
* length fields set, so that we can allocated and use
* them. We can safely just check for name, for without the
* name, we can't allocate the TFM.
*/
- if (!sctp_hmac_list[id].hmac_name)
+ if (!sctp_hmac_list[id].hmac_name) {
+ id = 0;
continue;
+ }
break;
}
SCTP_DEBUG_PRINTK("%s: packet:%p vtag:0x%x\n", __func__,
packet, vtag);
- sctp_packet_reset(packet);
packet->vtag = vtag;
if (ecn_capable && sctp_packet_empty(packet)) {
return 0;
}
+static bool list_has_sctp_addr(const struct list_head *list,
+ union sctp_addr *ipaddr)
+{
+ struct sctp_transport *addr;
+
+ list_for_each_entry(addr, list, transports) {
+ if (sctp_cmp_addr_exact(ipaddr, &addr->ipaddr))
+ return true;
+ }
+
+ return false;
+}
/* A restart is occurring, check to make sure no new addresses
* are being added as we may be under a takeover attack.
*/
struct sctp_chunk *init,
sctp_cmd_seq_t *commands)
{
- struct sctp_transport *new_addr, *addr;
- int found;
+ struct sctp_transport *new_addr;
+ int ret = 1;
- /* Implementor's Guide - Sectin 5.2.2
+ /* Implementor's Guide - Section 5.2.2
* ...
* Before responding the endpoint MUST check to see if the
* unexpected INIT adds new addresses to the association. If new
/* Search through all current addresses and make sure
* we aren't adding any new ones.
*/
- new_addr = NULL;
- found = 0;
-
list_for_each_entry(new_addr, &new_asoc->peer.transport_addr_list,
- transports) {
- found = 0;
- list_for_each_entry(addr, &asoc->peer.transport_addr_list,
- transports) {
- if (sctp_cmp_addr_exact(&new_addr->ipaddr,
- &addr->ipaddr)) {
- found = 1;
- break;
- }
- }
- if (!found)
+ transports) {
+ if (!list_has_sctp_addr(&asoc->peer.transport_addr_list,
+ &new_addr->ipaddr)) {
+ sctp_sf_send_restart_abort(&new_addr->ipaddr, init,
+ commands);
+ ret = 0;
break;
- }
-
- /* If a new address was added, ABORT the sender. */
- if (!found && new_addr) {
- sctp_sf_send_restart_abort(&new_addr->ipaddr, init, commands);
+ }
}
/* Return success if all addresses were found. */
- return found;
+ return ret;
}
/* Populate the verification/tie tags based on overlapping INIT
/* Walk through the addrs buffer and count the number of addresses. */
addr_buf = kaddrs;
while (walk_size < addrs_size) {
+ if (walk_size + sizeof(sa_family_t) > addrs_size) {
+ kfree(kaddrs);
+ return -EINVAL;
+ }
+
sa_addr = (struct sockaddr *)addr_buf;
af = sctp_get_af_specific(sa_addr->sa_family);
/* Walk through the addrs buffer and count the number of addresses. */
addr_buf = kaddrs;
while (walk_size < addrs_size) {
+ if (walk_size + sizeof(sa_family_t) > addrs_size) {
+ err = -EINVAL;
+ goto out_free;
+ }
+
sa_addr = (union sctp_addr *)addr_buf;
af = sctp_get_af_specific(sa_addr->sa.sa_family);
- port = ntohs(sa_addr->v4.sin_port);
/* If the address family is not supported or if this address
* causes the address buffer to overflow return EINVAL.
goto out_free;
}
+ port = ntohs(sa_addr->v4.sin_port);
+
/* Save current address so we can work with it */
memcpy(&to, sa_addr, af->sockaddr_len);
static LIST_HEAD(cred_unused);
static unsigned long number_cred_unused;
-#define MAX_HASHTABLE_BITS (10)
+#define MAX_HASHTABLE_BITS (14)
static int param_set_hashtbl_sz(const char *val, const struct kernel_param *kp)
{
unsigned long num;
struct rpc_inode *rpci = RPC_I(inode);
struct gss_upcall_msg *gss_msg;
+restart:
spin_lock(&inode->i_lock);
- while (!list_empty(&rpci->in_downcall)) {
+ list_for_each_entry(gss_msg, &rpci->in_downcall, list) {
- gss_msg = list_entry(rpci->in_downcall.next,
- struct gss_upcall_msg, list);
+ if (!list_empty(&gss_msg->msg.list))
+ continue;
gss_msg->msg.errno = -EPIPE;
atomic_inc(&gss_msg->count);
__gss_unhash_msg(gss_msg);
spin_unlock(&inode->i_lock);
gss_release_msg(gss_msg);
- spin_lock(&inode->i_lock);
+ goto restart;
}
spin_unlock(&inode->i_lock);
if (!supported_gss_krb5_enctype(alg)) {
printk(KERN_WARNING "gss_kerberos_mech: unsupported "
"encryption key algorithm %d\n", alg);
+ p = ERR_PTR(-EINVAL);
goto out_err;
}
p = simple_get_netobj(p, end, &key);
ctx->enctype = ENCTYPE_DES_CBC_RAW;
ctx->gk5e = get_gss_krb5_enctype(ctx->enctype);
- if (ctx->gk5e == NULL)
+ if (ctx->gk5e == NULL) {
+ p = ERR_PTR(-EINVAL);
goto out_err;
+ }
/* The downcall format was designed before we completely understood
* the uses of the context fields; so it includes some stuff we
* just give some minimal sanity-checking, and some we ignore
* completely (like the next twenty bytes): */
- if (unlikely(p + 20 > end || p + 20 < p))
+ if (unlikely(p + 20 > end || p + 20 < p)) {
+ p = ERR_PTR(-EFAULT);
goto out_err;
+ }
p += 20;
p = simple_get_bytes(p, end, &tmp, sizeof(tmp));
if (IS_ERR(p))
if (ctx->seq_send64 != ctx->seq_send) {
dprintk("%s: seq_send64 %lx, seq_send %x overflow?\n", __func__,
(long unsigned)ctx->seq_send64, ctx->seq_send);
+ p = ERR_PTR(-EINVAL);
goto out_err;
}
p = simple_get_bytes(p, end, &ctx->enctype, sizeof(ctx->enctype));
if (version != 1) {
dprintk("RPC: unknown spkm3 token format: "
"obsolete nfs-utils?\n");
+ p = ERR_PTR(-EINVAL);
goto out_err_free_ctx;
}
if (IS_ERR(p))
goto out_err_free_intg_alg;
- if (p != end)
+ if (p != end) {
+ p = ERR_PTR(-EFAULT);
goto out_err_free_intg_key;
+ }
ctx_id->internal_ctx_id = ctx;
goto out_no_principal;
}
- kref_init(&clnt->cl_kref);
+ atomic_set(&clnt->cl_count, 1);
err = rpc_setup_pipedir(clnt, program->pipe_dir_name);
if (err < 0)
if (new->cl_principal == NULL)
goto out_no_principal;
}
- kref_init(&new->cl_kref);
+ atomic_set(&new->cl_count, 1);
err = rpc_setup_pipedir(new, clnt->cl_program->pipe_dir_name);
if (err != 0)
goto out_no_path;
if (new->cl_auth)
atomic_inc(&new->cl_auth->au_count);
xprt_get(clnt->cl_xprt);
- kref_get(&clnt->cl_kref);
+ atomic_inc(&clnt->cl_count);
rpc_register_client(new);
rpciod_up();
return new;
* Free an RPC client
*/
static void
-rpc_free_client(struct kref *kref)
+rpc_free_client(struct rpc_clnt *clnt)
{
- struct rpc_clnt *clnt = container_of(kref, struct rpc_clnt, cl_kref);
-
dprintk("RPC: destroying %s client for %s\n",
clnt->cl_protname, clnt->cl_server);
if (!IS_ERR(clnt->cl_path.dentry)) {
* Free an RPC client
*/
static void
-rpc_free_auth(struct kref *kref)
+rpc_free_auth(struct rpc_clnt *clnt)
{
- struct rpc_clnt *clnt = container_of(kref, struct rpc_clnt, cl_kref);
-
if (clnt->cl_auth == NULL) {
- rpc_free_client(kref);
+ rpc_free_client(clnt);
return;
}
* release remaining GSS contexts. This mechanism ensures
* that it can do so safely.
*/
- kref_init(kref);
+ atomic_inc(&clnt->cl_count);
rpcauth_release(clnt->cl_auth);
clnt->cl_auth = NULL;
- kref_put(kref, rpc_free_client);
+ if (atomic_dec_and_test(&clnt->cl_count))
+ rpc_free_client(clnt);
}
/*
if (list_empty(&clnt->cl_tasks))
wake_up(&destroy_wait);
- kref_put(&clnt->cl_kref, rpc_free_auth);
+ if (atomic_dec_and_test(&clnt->cl_count))
+ rpc_free_auth(clnt);
}
/**
if (clnt != NULL) {
rpc_task_release_client(task);
task->tk_client = clnt;
- kref_get(&clnt->cl_kref);
+ atomic_inc(&clnt->cl_count);
if (clnt->cl_softrtry)
task->tk_flags |= RPC_TASK_SOFT;
/* Add to the client's list of all tasks */
task->tk_status = 0;
if (status >= 0) {
if (task->tk_rqstp) {
- task->tk_action = call_allocate;
+ task->tk_action = call_refresh;
return;
}
}
/*
- * 2. Allocate the buffer. For details, see sched.c:rpc_malloc.
+ * 2. Bind and/or refresh the credentials
+ */
+static void
+call_refresh(struct rpc_task *task)
+{
+ dprint_status(task);
+
+ task->tk_action = call_refreshresult;
+ task->tk_status = 0;
+ task->tk_client->cl_stats->rpcauthrefresh++;
+ rpcauth_refreshcred(task);
+}
+
+/*
+ * 2a. Process the results of a credential refresh
+ */
+static void
+call_refreshresult(struct rpc_task *task)
+{
+ int status = task->tk_status;
+
+ dprint_status(task);
+
+ task->tk_status = 0;
+ task->tk_action = call_allocate;
+ if (status >= 0 && rpcauth_uptodatecred(task))
+ return;
+ switch (status) {
+ case -EACCES:
+ rpc_exit(task, -EACCES);
+ return;
+ case -ENOMEM:
+ rpc_exit(task, -ENOMEM);
+ return;
+ case -ETIMEDOUT:
+ rpc_delay(task, 3*HZ);
+ }
+ task->tk_action = call_refresh;
+}
+
+/*
+ * 2b. Allocate the buffer. For details, see sched.c:rpc_malloc.
* (Note: buffer memory is freed in xprt_release).
*/
static void
call_allocate(struct rpc_task *task)
{
- unsigned int slack = task->tk_client->cl_auth->au_cslack;
+ unsigned int slack = task->tk_rqstp->rq_cred->cr_auth->au_cslack;
struct rpc_rqst *req = task->tk_rqstp;
struct rpc_xprt *xprt = task->tk_xprt;
struct rpc_procinfo *proc = task->tk_msg.rpc_proc;
dprint_status(task);
task->tk_status = 0;
- task->tk_action = call_refresh;
+ task->tk_action = call_bind;
if (req->rq_buffer)
return;
rpc_exit(task, -ERESTARTSYS);
}
-/*
- * 2a. Bind and/or refresh the credentials
- */
-static void
-call_refresh(struct rpc_task *task)
-{
- dprint_status(task);
-
- task->tk_action = call_refreshresult;
- task->tk_status = 0;
- task->tk_client->cl_stats->rpcauthrefresh++;
- rpcauth_refreshcred(task);
-}
-
-/*
- * 2b. Process the results of a credential refresh
- */
-static void
-call_refreshresult(struct rpc_task *task)
-{
- int status = task->tk_status;
-
- dprint_status(task);
-
- task->tk_status = 0;
- task->tk_action = call_bind;
- if (status >= 0 && rpcauth_uptodatecred(task))
- return;
- switch (status) {
- case -EACCES:
- rpc_exit(task, -EACCES);
- return;
- case -ENOMEM:
- rpc_exit(task, -ENOMEM);
- return;
- case -ETIMEDOUT:
- rpc_delay(task, 3*HZ);
- }
- task->tk_action = call_refresh;
-}
-
static inline int
rpc_task_need_encode(struct rpc_task *task)
{
return;
do {
msg = list_entry(head->next, struct rpc_pipe_msg, list);
- list_del(&msg->list);
+ list_del_init(&msg->list);
msg->errno = err;
destroy_msg(msg);
} while (!list_empty(head));
if (msg != NULL) {
spin_lock(&inode->i_lock);
msg->errno = -EAGAIN;
- list_del(&msg->list);
+ list_del_init(&msg->list);
spin_unlock(&inode->i_lock);
rpci->ops->destroy_msg(msg);
}
if (res < 0 || msg->len == msg->copied) {
filp->private_data = NULL;
spin_lock(&inode->i_lock);
- list_del(&msg->list);
+ list_del_init(&msg->list);
spin_unlock(&inode->i_lock);
rpci->ops->destroy_msg(msg);
}
static int
rpc_info_open(struct inode *inode, struct file *file)
{
- struct rpc_clnt *clnt;
+ struct rpc_clnt *clnt = NULL;
int ret = single_open(file, rpc_show_info, NULL);
if (!ret) {
struct seq_file *m = file->private_data;
- mutex_lock(&inode->i_mutex);
- clnt = RPC_I(inode)->private;
- if (clnt) {
- kref_get(&clnt->cl_kref);
+
+ spin_lock(&file->f_path.dentry->d_lock);
+ if (!d_unhashed(file->f_path.dentry))
+ clnt = RPC_I(inode)->private;
+ if (clnt != NULL && atomic_inc_not_zero(&clnt->cl_count)) {
+ spin_unlock(&file->f_path.dentry->d_lock);
m->private = clnt;
} else {
+ spin_unlock(&file->f_path.dentry->d_lock);
single_release(inode, file);
ret = -EINVAL;
}
- mutex_unlock(&inode->i_mutex);
}
return ret;
}
u32 _xid;
__be32 *xp;
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
dprintk("RPC: xs_udp_data_ready...\n");
if (!(xprt = xprt_from_sock(sk)))
goto out;
dropit:
skb_free_datagram(sk, skb);
out:
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
}
static inline void xs_tcp_read_fraghdr(struct rpc_xprt *xprt, struct xdr_skb_reader *desc)
dprintk("RPC: xs_tcp_data_ready...\n");
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
if (!(xprt = xprt_from_sock(sk)))
goto out;
if (xprt->shutdown)
read = tcp_read_sock(sk, &rd_desc, xs_tcp_data_recv);
} while (read > 0);
out:
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
}
/*
{
struct rpc_xprt *xprt;
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
if (!(xprt = xprt_from_sock(sk)))
goto out;
dprintk("RPC: xs_tcp_state_change client %p...\n", xprt);
switch (sk->sk_state) {
case TCP_ESTABLISHED:
- spin_lock_bh(&xprt->transport_lock);
+ spin_lock(&xprt->transport_lock);
if (!xprt_test_and_set_connected(xprt)) {
struct sock_xprt *transport = container_of(xprt,
struct sock_xprt, xprt);
xprt_wake_pending_tasks(xprt, -EAGAIN);
}
- spin_unlock_bh(&xprt->transport_lock);
+ spin_unlock(&xprt->transport_lock);
break;
case TCP_FIN_WAIT1:
/* The client initiated a shutdown of the socket */
xs_sock_mark_closed(xprt);
}
out:
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
}
/**
{
struct rpc_xprt *xprt;
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
if (!(xprt = xprt_from_sock(sk)))
goto out;
dprintk("RPC: %s client %p...\n"
__func__, xprt, sk->sk_err);
xprt_wake_pending_tasks(xprt, -EAGAIN);
out:
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
}
static void xs_write_space(struct sock *sk)
*/
static void xs_udp_write_space(struct sock *sk)
{
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
/* from net/core/sock.c:sock_def_write_space */
if (sock_writeable(sk))
xs_write_space(sk);
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
}
/**
*/
static void xs_tcp_write_space(struct sock *sk)
{
- read_lock(&sk->sk_callback_lock);
+ read_lock_bh(&sk->sk_callback_lock);
/* from net/core/stream.c:sk_stream_write_space */
if (sk_stream_wspace(sk) >= sk_stream_min_wspace(sk))
xs_write_space(sk);
- read_unlock(&sk->sk_callback_lock);
+ read_unlock_bh(&sk->sk_callback_lock);
}
static void xs_udp_do_set_buffer_size(struct rpc_xprt *xprt)
static u32 ordernum = 1;
struct unix_address *addr;
int err;
+ unsigned int retries = 0;
mutex_lock(&u->readlock);
if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
addr->hash)) {
spin_unlock(&unix_table_lock);
- /* Sanity yield. It is unusual case, but yet... */
- if (!(ordernum&0xFF))
- yield();
+ /*
+ * __unix_find_socket_byname() may take long time if many names
+ * are already in use.
+ */
+ cond_resched();
+ /* Give up if all names seems to be in use. */
+ if (retries++ == 0xFFFFF) {
+ err = -ENOSPC;
+ kfree(addr);
+ goto out;
+ }
goto retry;
}
addr->hash ^= sk->sk_type;
} else if (!iwp->pointer)
return -EFAULT;
- extra = kmalloc(extra_size, GFP_KERNEL);
+ extra = kzalloc(extra_size, GFP_KERNEL);
if (!extra)
return -ENOMEM;
err = -EHOSTUNREACH;
goto error_nolock;
}
- skb_dst_set_noref(skb, dst);
+ skb_dst_set(skb, dst_clone(dst));
x = dst->xfrm;
} while (x && !(x->outer_mode->flags & XFRM_MODE_FLAG_TUNNEL));
tmpl->mode == XFRM_MODE_BEET) {
remote = &tmpl->id.daddr;
local = &tmpl->saddr;
- family = tmpl->encap_family;
- if (xfrm_addr_any(local, family)) {
- error = xfrm_get_saddr(net, &tmp, remote, family);
+ if (xfrm_addr_any(local, tmpl->encap_family)) {
+ error = xfrm_get_saddr(net, &tmp, remote, tmpl->encap_family);
if (error)
goto fail;
local = &tmp;
EXPORT_SYMBOL(xfrm_sad_getinfo);
static int
-xfrm_init_tempsel(struct xfrm_state *x, struct flowi *fl,
- struct xfrm_tmpl *tmpl,
- xfrm_address_t *daddr, xfrm_address_t *saddr,
- unsigned short family)
+xfrm_init_tempstate(struct xfrm_state *x, struct flowi *fl,
+ struct xfrm_tmpl *tmpl,
+ xfrm_address_t *daddr, xfrm_address_t *saddr,
+ unsigned short family)
{
struct xfrm_state_afinfo *afinfo = xfrm_state_get_afinfo(family);
if (!afinfo)
return -1;
- afinfo->init_tempsel(x, fl, tmpl, daddr, saddr);
+ afinfo->init_tempsel(&x->sel, fl);
+
+ if (family != tmpl->encap_family) {
+ xfrm_state_put_afinfo(afinfo);
+ afinfo = xfrm_state_get_afinfo(tmpl->encap_family);
+ if (!afinfo)
+ return -1;
+ }
+ afinfo->init_temprop(x, tmpl, daddr, saddr);
xfrm_state_put_afinfo(afinfo);
return 0;
}
int error = 0;
struct xfrm_state *best = NULL;
u32 mark = pol->mark.v & pol->mark.m;
+ unsigned short encap_family = tmpl->encap_family;
to_put = NULL;
spin_lock_bh(&xfrm_state_lock);
- h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, family);
+ h = xfrm_dst_hash(net, daddr, saddr, tmpl->reqid, encap_family);
hlist_for_each_entry(x, entry, net->xfrm.state_bydst+h, bydst) {
- if (x->props.family == family &&
+ if (x->props.family == encap_family &&
x->props.reqid == tmpl->reqid &&
(mark & x->mark.m) == x->mark.v &&
!(x->props.flags & XFRM_STATE_WILDRECV) &&
- xfrm_state_addr_check(x, daddr, saddr, family) &&
+ xfrm_state_addr_check(x, daddr, saddr, encap_family) &&
tmpl->mode == x->props.mode &&
tmpl->id.proto == x->id.proto &&
(tmpl->id.spi == x->id.spi || !tmpl->id.spi))
- xfrm_state_look_at(pol, x, fl, family, daddr, saddr,
+ xfrm_state_look_at(pol, x, fl, encap_family, daddr, saddr,
&best, &acquire_in_progress, &error);
}
if (best)
goto found;
- h_wildcard = xfrm_dst_hash(net, daddr, &saddr_wildcard, tmpl->reqid, family);
+ h_wildcard = xfrm_dst_hash(net, daddr, &saddr_wildcard, tmpl->reqid, encap_family);
hlist_for_each_entry(x, entry, net->xfrm.state_bydst+h_wildcard, bydst) {
- if (x->props.family == family &&
+ if (x->props.family == encap_family &&
x->props.reqid == tmpl->reqid &&
(mark & x->mark.m) == x->mark.v &&
!(x->props.flags & XFRM_STATE_WILDRECV) &&
- xfrm_state_addr_check(x, daddr, saddr, family) &&
+ xfrm_state_addr_check(x, daddr, saddr, encap_family) &&
tmpl->mode == x->props.mode &&
tmpl->id.proto == x->id.proto &&
(tmpl->id.spi == x->id.spi || !tmpl->id.spi))
- xfrm_state_look_at(pol, x, fl, family, daddr, saddr,
+ xfrm_state_look_at(pol, x, fl, encap_family, daddr, saddr,
&best, &acquire_in_progress, &error);
}
if (!x && !error && !acquire_in_progress) {
if (tmpl->id.spi &&
(x0 = __xfrm_state_lookup(net, mark, daddr, tmpl->id.spi,
- tmpl->id.proto, family)) != NULL) {
+ tmpl->id.proto, encap_family)) != NULL) {
to_put = x0;
error = -EEXIST;
goto out;
error = -ENOMEM;
goto out;
}
- /* Initialize temporary selector matching only
+ /* Initialize temporary state matching only
* to current session. */
- xfrm_init_tempsel(x, fl, tmpl, daddr, saddr, family);
+ xfrm_init_tempstate(x, fl, tmpl, daddr, saddr, family);
memcpy(&x->mark, &pol->mark, sizeof(x->mark));
error = security_xfrm_state_alloc_acquire(x, pol->security, fl->secid);
x->km.state = XFRM_STATE_ACQ;
list_add(&x->km.all, &net->xfrm.state_all);
hlist_add_head(&x->bydst, net->xfrm.state_bydst+h);
- h = xfrm_src_hash(net, daddr, saddr, family);
+ h = xfrm_src_hash(net, daddr, saddr, encap_family);
hlist_add_head(&x->bysrc, net->xfrm.state_bysrc+h);
if (x->id.spi) {
- h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, family);
+ h = xfrm_spi_hash(net, &x->id.daddr, x->id.spi, x->id.proto, encap_family);
hlist_add_head(&x->byspi, net->xfrm.state_byspi+h);
}
x->lft.hard_add_expires_seconds = net->xfrm.sysctl_acq_expires;
{
int i;
unsigned int ret;
+ unsigned int nents;
struct scatterlist sg[10];
printk(KERN_INFO "DMA fifo test start\n");
* byte at the beginning, after the kfifo_skip().
*/
sg_init_table(sg, ARRAY_SIZE(sg));
- ret = kfifo_dma_in_prepare(&fifo, sg, ARRAY_SIZE(sg), FIFO_SIZE);
- printk(KERN_INFO "DMA sgl entries: %d\n", ret);
- if (!ret) {
+ nents = kfifo_dma_in_prepare(&fifo, sg, ARRAY_SIZE(sg), FIFO_SIZE);
+ printk(KERN_INFO "DMA sgl entries: %d\n", nents);
+ if (!nents) {
/* fifo is full and no sgl was created */
printk(KERN_WARNING "error kfifo_dma_in_prepare\n");
return -EIO;
/* receive data */
printk(KERN_INFO "scatterlist for receive:\n");
- for (i = 0; i < ARRAY_SIZE(sg); i++) {
+ for (i = 0; i < nents; i++) {
printk(KERN_INFO
"sg[%d] -> "
"page_link 0x%.8lx offset 0x%.8x length 0x%.8x\n",
kfifo_dma_in_finish(&fifo, ret);
/* Prepare to transmit data, example: 8 bytes */
- ret = kfifo_dma_out_prepare(&fifo, sg, ARRAY_SIZE(sg), 8);
- printk(KERN_INFO "DMA sgl entries: %d\n", ret);
- if (!ret) {
+ nents = kfifo_dma_out_prepare(&fifo, sg, ARRAY_SIZE(sg), 8);
+ printk(KERN_INFO "DMA sgl entries: %d\n", nents);
+ if (!nents) {
/* no data was available and no sgl was created */
printk(KERN_WARNING "error kfifo_dma_out_prepare\n");
return -EIO;
}
printk(KERN_INFO "scatterlist for transmit:\n");
- for (i = 0; i < ARRAY_SIZE(sg); i++) {
+ for (i = 0; i < nents; i++) {
printk(KERN_INFO
"sg[%d] -> "
"page_link 0x%.8lx offset 0x%.8x length 0x%.8x\n",
hostprogs-$(CONFIG_LOGO) += pnmtologo
hostprogs-$(CONFIG_VT) += conmakehash
hostprogs-$(CONFIG_IKCONFIG) += bin2c
+hostprogs-$(BUILD_C_RECORDMCOUNT) += recordmcount
always := $(hostprogs-y) $(hostprogs-m)
endif
ifdef CONFIG_FTRACE_MCOUNT_RECORD
+ifdef BUILD_C_RECORDMCOUNT
+# Due to recursion, we must skip empty.o.
+# The empty.o file is created in the make process in order to determine
+# the target endianness and word size. It is made before all other C
+# files, including recordmcount.
+cmd_record_mcount = if [ $(@) != "scripts/mod/empty.o" ]; then \
+ $(objtree)/scripts/recordmcount "$(@)"; \
+ fi;
+else
cmd_record_mcount = set -e ; perl $(srctree)/scripts/recordmcount.pl "$(ARCH)" \
"$(if $(CONFIG_CPU_BIG_ENDIAN),big,little)" \
"$(if $(CONFIG_64BIT),64,32)" \
"$(OBJDUMP)" "$(OBJCOPY)" "$(CC)" "$(LD)" "$(NM)" "$(RM)" "$(MV)" \
"$(if $(part-of-module),1,0)" "$(@)";
endif
+endif
define rule_cc_o_c
$(call echo-cmd,checksrc) $(cmd_checksrc) \
modname_flags = $(if $(filter 1,$(words $(modname))),\
-D"KBUILD_MODNAME=KBUILD_STR($(call name-fix,$(modname)))")
-#hash values
-ifdef CONFIG_DYNAMIC_DEBUG
-debug_flags = -D"DEBUG_HASH=$(shell ./scripts/basic/hash djb2 $(@D)$(modname))"\
- -D"DEBUG_HASH2=$(shell ./scripts/basic/hash r5 $(@D)$(modname))"
-else
-debug_flags =
-endif
-
orig_c_flags = $(KBUILD_CPPFLAGS) $(KBUILD_CFLAGS) $(KBUILD_SUBDIR_CCFLAGS) \
$(ccflags-y) $(CFLAGS_$(basetarget).o)
_c_flags = $(filter-out $(CFLAGS_REMOVE_$(basetarget).o), $(orig_c_flags))
c_flags = -Wp,-MD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
$(__c_flags) $(modkern_cflags) \
- -D"KBUILD_STR(s)=\#s" $(basename_flags) $(modname_flags) \
- $(debug_flags)
+ -D"KBUILD_STR(s)=\#s" $(basename_flags) $(modname_flags)
a_flags = -Wp,-MD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
$(__a_flags) $(modkern_aflags)
# fixdep: Used to generate dependency information during build process
# docproc: Used in Documentation/DocBook
-hostprogs-y := fixdep docproc hash
+hostprogs-y := fixdep docproc
always := $(hostprogs-y)
# fixdep is needed to compile other host programs
*
*/
+#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <unistd.h>
#include <limits.h>
+#include <errno.h>
#include <sys/types.h>
#include <sys/wait.h>
FILEONLY *internalfunctions;
FILEONLY *externalfunctions;
FILEONLY *symbolsonly;
+FILEONLY *findall;
typedef void FILELINE(char * file, char * line);
FILELINE * singlefunctions;
#define KERNELDOCPATH "scripts/"
#define KERNELDOC "kernel-doc"
#define DOCBOOK "-docbook"
+#define LIST "-list"
#define FUNCTION "-function"
#define NOFUNCTION "-nofunction"
#define NODOCSECTIONS "-no-doc-sections"
static char *srctree, *kernsrctree;
+static char **all_list = NULL;
+static int all_list_len = 0;
+
+static void consume_symbol(const char *sym)
+{
+ int i;
+
+ for (i = 0; i < all_list_len; i++) {
+ if (!all_list[i])
+ continue;
+ if (strcmp(sym, all_list[i]))
+ continue;
+ all_list[i] = NULL;
+ break;
+ }
+}
+
static void usage (void)
{
fprintf(stderr, "Usage: docproc {doc|depend} file\n");
struct symfile * sym = &symfilelist[i];
for (j=0; j < sym->symbolcnt; j++) {
vec[idx++] = type;
+ consume_symbol(sym->symbollist[j].name);
vec[idx++] = sym->symbollist[j].name;
}
}
vec[idx++] = &line[i];
}
}
+ for (i = 0; i < idx; i++) {
+ if (strcmp(vec[i], FUNCTION))
+ continue;
+ consume_symbol(vec[i + 1]);
+ }
vec[idx++] = filename;
vec[idx] = NULL;
exec_kernel_doc(vec);
if (*s == '\n')
*s = '\0';
+ asprintf(&s, "DOC: %s", line);
+ consume_symbol(s);
+ free(s);
+
vec[0] = KERNELDOC;
vec[1] = DOCBOOK;
vec[2] = FUNCTION;
exec_kernel_doc(vec);
}
+static void find_all_symbols(char *filename)
+{
+ char *vec[4]; /* kerneldoc -list file NULL */
+ pid_t pid;
+ int ret, i, count, start;
+ char real_filename[PATH_MAX + 1];
+ int pipefd[2];
+ char *data, *str;
+ size_t data_len = 0;
+
+ vec[0] = KERNELDOC;
+ vec[1] = LIST;
+ vec[2] = filename;
+ vec[3] = NULL;
+
+ if (pipe(pipefd)) {
+ perror("pipe");
+ exit(1);
+ }
+
+ switch (pid=fork()) {
+ case -1:
+ perror("fork");
+ exit(1);
+ case 0:
+ close(pipefd[0]);
+ dup2(pipefd[1], 1);
+ memset(real_filename, 0, sizeof(real_filename));
+ strncat(real_filename, kernsrctree, PATH_MAX);
+ strncat(real_filename, "/" KERNELDOCPATH KERNELDOC,
+ PATH_MAX - strlen(real_filename));
+ execvp(real_filename, vec);
+ fprintf(stderr, "exec ");
+ perror(real_filename);
+ exit(1);
+ default:
+ close(pipefd[1]);
+ data = malloc(4096);
+ do {
+ while ((ret = read(pipefd[0],
+ data + data_len,
+ 4096)) > 0) {
+ data_len += ret;
+ data = realloc(data, data_len + 4096);
+ }
+ } while (ret == -EAGAIN);
+ if (ret != 0) {
+ perror("read");
+ exit(1);
+ }
+ waitpid(pid, &ret ,0);
+ }
+ if (WIFEXITED(ret))
+ exitstatus |= WEXITSTATUS(ret);
+ else
+ exitstatus = 0xff;
+
+ count = 0;
+ /* poor man's strtok, but with counting */
+ for (i = 0; i < data_len; i++) {
+ if (data[i] == '\n') {
+ count++;
+ data[i] = '\0';
+ }
+ }
+ start = all_list_len;
+ all_list_len += count;
+ all_list = realloc(all_list, sizeof(char *) * all_list_len);
+ str = data;
+ for (i = 0; i < data_len && start != all_list_len; i++) {
+ if (data[i] == '\0') {
+ all_list[start] = str;
+ str = data + i + 1;
+ start++;
+ }
+ }
+}
+
/*
* Parse file, calling action specific functions for:
* 1) Lines containing !E
* 3) Lines containing !D
* 4) Lines containing !F
* 5) Lines containing !P
- * 6) Default lines - lines not matching the above
+ * 6) Lines containing !C
+ * 7) Default lines - lines not matching the above
*/
static void parse_file(FILE *infile)
{
s++;
docsection(line + 2, s);
break;
+ case 'C':
+ while (*s && !isspace(*s)) s++;
+ *s = '\0';
+ if (findall)
+ findall(line+2);
+ break;
default:
defaultline(line);
}
int main(int argc, char *argv[])
{
FILE * infile;
+ int i;
srctree = getenv("SRCTREE");
if (!srctree)
symbolsonly = find_export_symbols;
singlefunctions = noaction2;
docsection = noaction2;
+ findall = find_all_symbols;
parse_file(infile);
/* Rewind to start from beginning of file again */
symbolsonly = printline;
singlefunctions = singfunc;
docsection = docsect;
+ findall = NULL;
parse_file(infile);
+
+ for (i = 0; i < all_list_len; i++) {
+ if (!all_list[i])
+ continue;
+ fprintf(stderr, "Warning: didn't use docs for %s\n",
+ all_list[i]);
+ }
}
else if (strcmp("depend", argv[1]) == 0)
{
symbolsonly = adddep;
singlefunctions = adddep2;
docsection = adddep2;
+ findall = adddep;
parse_file(infile);
printf("\n");
}
+++ /dev/null
-/*
- * Copyright (C) 2008 Red Hat, Inc., Jason Baron <jbaron@redhat.com>
- *
- */
-
-#include <stdio.h>
-#include <stdlib.h>
-#include <string.h>
-
-#define DYNAMIC_DEBUG_HASH_BITS 6
-
-static const char *program;
-
-static void usage(void)
-{
- printf("Usage: %s <djb2|r5> <modname>\n", program);
- exit(1);
-}
-
-/* djb2 hashing algorithm by Dan Bernstein. From:
- * http://www.cse.yorku.ca/~oz/hash.html
- */
-
-static unsigned int djb2_hash(char *str)
-{
- unsigned long hash = 5381;
- int c;
-
- c = *str;
- while (c) {
- hash = ((hash << 5) + hash) + c;
- c = *++str;
- }
- return (unsigned int)(hash & ((1 << DYNAMIC_DEBUG_HASH_BITS) - 1));
-}
-
-static unsigned int r5_hash(char *str)
-{
- unsigned long hash = 0;
- int c;
-
- c = *str;
- while (c) {
- hash = (hash + (c << 4) + (c >> 4)) * 11;
- c = *++str;
- }
- return (unsigned int)(hash & ((1 << DYNAMIC_DEBUG_HASH_BITS) - 1));
-}
-
-int main(int argc, char *argv[])
-{
- program = argv[0];
-
- if (argc != 3)
- usage();
- if (!strcmp(argv[1], "djb2"))
- printf("%d\n", djb2_hash(argv[2]));
- else if (!strcmp(argv[1], "r5"))
- printf("%d\n", r5_hash(argv[2]));
- else
- usage();
- exit(0);
-}
-
--- /dev/null
+#!/bin/sh
+# Test for gcc 'asm goto' suport
+# Copyright (C) 2010, Jason Baron <jbaron@redhat.com>
+
+echo "int main(void) { entry: asm goto (\"\"::::entry); return 0; }" | $@ -x c - -c -o /dev/null >/dev/null 2>&1 && echo "y"
if (sym->name && !sym_is_choice_value(sym)) {
printf("CONFIG_%s\n", sym->name);
}
- } else {
+ } else if (input_mode != oldnoconfig) {
if (!conf_cnt++)
printf(_("*\n* Restart config...\n*\n"));
rootEntry = menu_get_parent_menu(menu);
struct symbol *sym;
struct property *prompt;
struct expr *dep;
- struct expr *dir_dep;
unsigned int flags;
char *help;
struct file *file;
void menu_add_dep(struct expr *dep)
{
current_entry->dep = expr_alloc_and(current_entry->dep, menu_check_dep(dep));
- current_entry->dir_dep = current_entry->dep;
}
void menu_set_type(int type)
for (menu = parent->list; menu; menu = menu->next)
menu_finalize(menu);
} else if (sym) {
- /* ignore inherited dependencies for dir_dep */
- sym->dir_dep.expr = expr_transform(expr_copy(parent->dir_dep));
- sym->dir_dep.expr = expr_eliminate_dups(sym->dir_dep.expr);
-
basedep = parent->prompt ? parent->prompt->visible.expr : NULL;
basedep = expr_trans_compare(basedep, E_UNEQUAL, &symbol_no);
basedep = expr_eliminate_dups(expr_transform(basedep));
parent->next = last_menu->next;
last_menu->next = NULL;
}
+
+ sym->dir_dep.expr = parent->dep;
}
for (menu = parent->list; menu; menu = menu->next) {
if (sym && sym_is_choice(sym) &&
}
}
calc_newval:
+#if 0
if (sym->dir_dep.tri == no && sym->rev_dep.tri != no) {
fprintf(stderr, "warning: (");
expr_fprint(sym->rev_dep.expr, stderr);
expr_fprint(sym->dir_dep.expr, stderr);
fprintf(stderr, ")\n");
}
+#endif
newval.tri = EXPR_OR(newval.tri, sym->rev_dep.tri);
}
if (newval.tri == mod && sym_get_type(sym) == S_BOOLEAN)
# Note: This only supports 'c'.
# usage:
-# kernel-doc [ -docbook | -html | -text | -man ] [ -no-doc-sections ]
+# kernel-doc [ -docbook | -html | -text | -man | -list ] [ -no-doc-sections ]
# [ -function funcname [ -function funcname ...] ] c file(s)s > outputfile
# or
# [ -nofunction funcname [ -function funcname ...] ] c file(s)s > outputfile
#
# Set output format using one of -docbook -html -text or -man. Default is man.
+# The -list format is for internal use by docproc.
#
# -no-doc-sections
# Do not output DOC: sections
$type_param, "\$1" );
my $blankline_text = "";
+# list mode
+my %highlights_list = ( $type_constant, "\$1",
+ $type_func, "\$1",
+ $type_struct, "\$1",
+ $type_param, "\$1" );
+my $blankline_list = "";
sub usage {
- print "Usage: $0 [ -v ] [ -docbook | -html | -text | -man ] [ -no-doc-sections ]\n";
+ print "Usage: $0 [ -v ] [ -docbook | -html | -text | -man | -list ]\n";
+ print " [ -no-doc-sections ]\n";
print " [ -function funcname [ -function funcname ...] ]\n";
print " [ -nofunction funcname [ -nofunction funcname ...] ]\n";
print " c source file(s) > outputfile\n";
$output_mode = "xml";
%highlights = %highlights_xml;
$blankline = $blankline_xml;
+ } elsif ($cmd eq "-list") {
+ $output_mode = "list";
+ %highlights = %highlights_list;
+ $blankline = $blankline_list;
} elsif ($cmd eq "-gnome") {
$output_mode = "gnome";
%highlights = %highlights_gnome;
}
}
+## list mode output functions
+
+sub output_function_list(%) {
+ my %args = %{$_[0]};
+
+ print $args{'function'} . "\n";
+}
+
+# output enum in list
+sub output_enum_list(%) {
+ my %args = %{$_[0]};
+ print $args{'enum'} . "\n";
+}
+
+# output typedef in list
+sub output_typedef_list(%) {
+ my %args = %{$_[0]};
+ print $args{'typedef'} . "\n";
+}
+
+# output struct as list
+sub output_struct_list(%) {
+ my %args = %{$_[0]};
+
+ print $args{'struct'} . "\n";
+}
+
+sub output_blockhead_list(%) {
+ my %args = %{$_[0]};
+ my ($parameter, $section);
+
+ foreach $section (@{$args{'sectionlist'}}) {
+ print "DOC: $section\n";
+ }
+}
+
##
# generic output function for all types (function, struct/union, typedef, enum);
# calls the generated, variable output_ function name based on
foreach $px (0 .. $#prms) {
$prm_clean = $prms[$px];
$prm_clean =~ s/\[.*\]//;
- $prm_clean =~ s/__attribute__\s*\(\([a-z,_\*\s\(\)]*\)\)//;
+ $prm_clean =~ s/__attribute__\s*\(\([a-z,_\*\s\(\)]*\)\)//i;
# ignore array size in a parameter string;
# however, the original param string may contain
# spaces, e.g.: addr[6 + 2]
--- /dev/null
+/*
+ * recordmcount.c: construct a table of the locations of calls to 'mcount'
+ * so that ftrace can find them quickly.
+ * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>. All rights reserved.
+ * Licensed under the GNU General Public License, version 2 (GPLv2).
+ *
+ * Restructured to fit Linux format, as well as other updates:
+ * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
+ */
+
+/*
+ * Strategy: alter the .o file in-place.
+ *
+ * Append a new STRTAB that has the new section names, followed by a new array
+ * ElfXX_Shdr[] that has the new section headers, followed by the section
+ * contents for __mcount_loc and its relocations. The old shstrtab strings,
+ * and the old ElfXX_Shdr[] array, remain as "garbage" (commonly, a couple
+ * kilobytes.) Subsequent processing by /bin/ld (or the kernel module loader)
+ * will ignore the garbage regions, because they are not designated by the
+ * new .e_shoff nor the new ElfXX_Shdr[]. [In order to remove the garbage,
+ * then use "ld -r" to create a new file that omits the garbage.]
+ */
+
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <sys/stat.h>
+#include <elf.h>
+#include <fcntl.h>
+#include <setjmp.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <unistd.h>
+
+static int fd_map; /* File descriptor for file being modified. */
+static int mmap_failed; /* Boolean flag. */
+static void *ehdr_curr; /* current ElfXX_Ehdr * for resource cleanup */
+static char gpfx; /* prefix for global symbol name (sometimes '_') */
+static struct stat sb; /* Remember .st_size, etc. */
+static jmp_buf jmpenv; /* setjmp/longjmp per-file error escape */
+
+/* setjmp() return values */
+enum {
+ SJ_SETJMP = 0, /* hardwired first return */
+ SJ_FAIL,
+ SJ_SUCCEED
+};
+
+/* Per-file resource cleanup when multiple files. */
+static void
+cleanup(void)
+{
+ if (!mmap_failed)
+ munmap(ehdr_curr, sb.st_size);
+ else
+ free(ehdr_curr);
+ close(fd_map);
+}
+
+static void __attribute__((noreturn))
+fail_file(void)
+{
+ cleanup();
+ longjmp(jmpenv, SJ_FAIL);
+}
+
+static void __attribute__((noreturn))
+succeed_file(void)
+{
+ cleanup();
+ longjmp(jmpenv, SJ_SUCCEED);
+}
+
+/* ulseek, uread, ...: Check return value for errors. */
+
+static off_t
+ulseek(int const fd, off_t const offset, int const whence)
+{
+ off_t const w = lseek(fd, offset, whence);
+ if ((off_t)-1 == w) {
+ perror("lseek");
+ fail_file();
+ }
+ return w;
+}
+
+static size_t
+uread(int const fd, void *const buf, size_t const count)
+{
+ size_t const n = read(fd, buf, count);
+ if (n != count) {
+ perror("read");
+ fail_file();
+ }
+ return n;
+}
+
+static size_t
+uwrite(int const fd, void const *const buf, size_t const count)
+{
+ size_t const n = write(fd, buf, count);
+ if (n != count) {
+ perror("write");
+ fail_file();
+ }
+ return n;
+}
+
+static void *
+umalloc(size_t size)
+{
+ void *const addr = malloc(size);
+ if (0 == addr) {
+ fprintf(stderr, "malloc failed: %zu bytes\n", size);
+ fail_file();
+ }
+ return addr;
+}
+
+/*
+ * Get the whole file as a programming convenience in order to avoid
+ * malloc+lseek+read+free of many pieces. If successful, then mmap
+ * avoids copying unused pieces; else just read the whole file.
+ * Open for both read and write; new info will be appended to the file.
+ * Use MAP_PRIVATE so that a few changes to the in-memory ElfXX_Ehdr
+ * do not propagate to the file until an explicit overwrite at the last.
+ * This preserves most aspects of consistency (all except .st_size)
+ * for simultaneous readers of the file while we are appending to it.
+ * However, multiple writers still are bad. We choose not to use
+ * locking because it is expensive and the use case of kernel build
+ * makes multiple writers unlikely.
+ */
+static void *mmap_file(char const *fname)
+{
+ void *addr;
+
+ fd_map = open(fname, O_RDWR);
+ if (0 > fd_map || 0 > fstat(fd_map, &sb)) {
+ perror(fname);
+ fail_file();
+ }
+ if (!S_ISREG(sb.st_mode)) {
+ fprintf(stderr, "not a regular file: %s\n", fname);
+ fail_file();
+ }
+ addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE,
+ fd_map, 0);
+ mmap_failed = 0;
+ if (MAP_FAILED == addr) {
+ mmap_failed = 1;
+ addr = umalloc(sb.st_size);
+ uread(fd_map, addr, sb.st_size);
+ }
+ return addr;
+}
+
+/* w8rev, w8nat, ...: Handle endianness. */
+
+static uint64_t w8rev(uint64_t const x)
+{
+ return ((0xff & (x >> (0 * 8))) << (7 * 8))
+ | ((0xff & (x >> (1 * 8))) << (6 * 8))
+ | ((0xff & (x >> (2 * 8))) << (5 * 8))
+ | ((0xff & (x >> (3 * 8))) << (4 * 8))
+ | ((0xff & (x >> (4 * 8))) << (3 * 8))
+ | ((0xff & (x >> (5 * 8))) << (2 * 8))
+ | ((0xff & (x >> (6 * 8))) << (1 * 8))
+ | ((0xff & (x >> (7 * 8))) << (0 * 8));
+}
+
+static uint32_t w4rev(uint32_t const x)
+{
+ return ((0xff & (x >> (0 * 8))) << (3 * 8))
+ | ((0xff & (x >> (1 * 8))) << (2 * 8))
+ | ((0xff & (x >> (2 * 8))) << (1 * 8))
+ | ((0xff & (x >> (3 * 8))) << (0 * 8));
+}
+
+static uint32_t w2rev(uint16_t const x)
+{
+ return ((0xff & (x >> (0 * 8))) << (1 * 8))
+ | ((0xff & (x >> (1 * 8))) << (0 * 8));
+}
+
+static uint64_t w8nat(uint64_t const x)
+{
+ return x;
+}
+
+static uint32_t w4nat(uint32_t const x)
+{
+ return x;
+}
+
+static uint32_t w2nat(uint16_t const x)
+{
+ return x;
+}
+
+static uint64_t (*w8)(uint64_t);
+static uint32_t (*w)(uint32_t);
+static uint32_t (*w2)(uint16_t);
+
+/* Names of the sections that could contain calls to mcount. */
+static int
+is_mcounted_section_name(char const *const txtname)
+{
+ return 0 == strcmp(".text", txtname) ||
+ 0 == strcmp(".sched.text", txtname) ||
+ 0 == strcmp(".spinlock.text", txtname) ||
+ 0 == strcmp(".irqentry.text", txtname) ||
+ 0 == strcmp(".text.unlikely", txtname);
+}
+
+/* 32 bit and 64 bit are very similar */
+#include "recordmcount.h"
+#define RECORD_MCOUNT_64
+#include "recordmcount.h"
+
+static void
+do_file(char const *const fname)
+{
+ Elf32_Ehdr *const ehdr = mmap_file(fname);
+ unsigned int reltype = 0;
+
+ ehdr_curr = ehdr;
+ w = w4nat;
+ w2 = w2nat;
+ w8 = w8nat;
+ switch (ehdr->e_ident[EI_DATA]) {
+ static unsigned int const endian = 1;
+ default: {
+ fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
+ ehdr->e_ident[EI_DATA], fname);
+ fail_file();
+ } break;
+ case ELFDATA2LSB: {
+ if (1 != *(unsigned char const *)&endian) {
+ /* main() is big endian, file.o is little endian. */
+ w = w4rev;
+ w2 = w2rev;
+ w8 = w8rev;
+ }
+ } break;
+ case ELFDATA2MSB: {
+ if (0 != *(unsigned char const *)&endian) {
+ /* main() is little endian, file.o is big endian. */
+ w = w4rev;
+ w2 = w2rev;
+ w8 = w8rev;
+ }
+ } break;
+ } /* end switch */
+ if (0 != memcmp(ELFMAG, ehdr->e_ident, SELFMAG)
+ || ET_REL != w2(ehdr->e_type)
+ || EV_CURRENT != ehdr->e_ident[EI_VERSION]) {
+ fprintf(stderr, "unrecognized ET_REL file %s\n", fname);
+ fail_file();
+ }
+
+ gpfx = 0;
+ switch (w2(ehdr->e_machine)) {
+ default: {
+ fprintf(stderr, "unrecognized e_machine %d %s\n",
+ w2(ehdr->e_machine), fname);
+ fail_file();
+ } break;
+ case EM_386: reltype = R_386_32; break;
+ case EM_ARM: reltype = R_ARM_ABS32; break;
+ case EM_IA_64: reltype = R_IA64_IMM64; gpfx = '_'; break;
+ case EM_PPC: reltype = R_PPC_ADDR32; gpfx = '_'; break;
+ case EM_PPC64: reltype = R_PPC64_ADDR64; gpfx = '_'; break;
+ case EM_S390: /* reltype: e_class */ gpfx = '_'; break;
+ case EM_SH: reltype = R_SH_DIR32; break;
+ case EM_SPARCV9: reltype = R_SPARC_64; gpfx = '_'; break;
+ case EM_X86_64: reltype = R_X86_64_64; break;
+ } /* end switch */
+
+ switch (ehdr->e_ident[EI_CLASS]) {
+ default: {
+ fprintf(stderr, "unrecognized ELF class %d %s\n",
+ ehdr->e_ident[EI_CLASS], fname);
+ fail_file();
+ } break;
+ case ELFCLASS32: {
+ if (sizeof(Elf32_Ehdr) != w2(ehdr->e_ehsize)
+ || sizeof(Elf32_Shdr) != w2(ehdr->e_shentsize)) {
+ fprintf(stderr,
+ "unrecognized ET_REL file: %s\n", fname);
+ fail_file();
+ }
+ if (EM_S390 == w2(ehdr->e_machine))
+ reltype = R_390_32;
+ do32(ehdr, fname, reltype);
+ } break;
+ case ELFCLASS64: {
+ Elf64_Ehdr *const ghdr = (Elf64_Ehdr *)ehdr;
+ if (sizeof(Elf64_Ehdr) != w2(ghdr->e_ehsize)
+ || sizeof(Elf64_Shdr) != w2(ghdr->e_shentsize)) {
+ fprintf(stderr,
+ "unrecognized ET_REL file: %s\n", fname);
+ fail_file();
+ }
+ if (EM_S390 == w2(ghdr->e_machine))
+ reltype = R_390_64;
+ do64(ghdr, fname, reltype);
+ } break;
+ } /* end switch */
+
+ cleanup();
+}
+
+int
+main(int argc, char const *argv[])
+{
+ const char ftrace[] = "kernel/trace/ftrace.o";
+ int ftrace_size = sizeof(ftrace) - 1;
+ int n_error = 0; /* gcc-4.3.0 false positive complaint */
+
+ if (argc <= 1) {
+ fprintf(stderr, "usage: recordmcount file.o...\n");
+ return 0;
+ }
+
+ /* Process each file in turn, allowing deep failure. */
+ for (--argc, ++argv; 0 < argc; --argc, ++argv) {
+ int const sjval = setjmp(jmpenv);
+ int len;
+
+ /*
+ * The file kernel/trace/ftrace.o references the mcount
+ * function but does not call it. Since ftrace.o should
+ * not be traced anyway, we just skip it.
+ */
+ len = strlen(argv[0]);
+ if (len >= ftrace_size &&
+ strcmp(argv[0] + (len - ftrace_size), ftrace) == 0)
+ continue;
+
+ switch (sjval) {
+ default: {
+ fprintf(stderr, "internal error: %s\n", argv[0]);
+ exit(1);
+ } break;
+ case SJ_SETJMP: { /* normal sequence */
+ /* Avoid problems if early cleanup() */
+ fd_map = -1;
+ ehdr_curr = NULL;
+ mmap_failed = 1;
+ do_file(argv[0]);
+ } break;
+ case SJ_FAIL: { /* error in do_file or below */
+ ++n_error;
+ } break;
+ case SJ_SUCCEED: { /* premature success */
+ /* do nothing */
+ } break;
+ } /* end switch */
+ }
+ return !!n_error;
+}
+
+
--- /dev/null
+/*
+ * recordmcount.h
+ *
+ * This code was taken out of recordmcount.c written by
+ * Copyright 2009 John F. Reiser <jreiser@BitWagon.com>. All rights reserved.
+ *
+ * The original code had the same algorithms for both 32bit
+ * and 64bit ELF files, but the code was duplicated to support
+ * the difference in structures that were used. This
+ * file creates a macro of everything that is different between
+ * the 64 and 32 bit code, such that by including this header
+ * twice we can create both sets of functions by including this
+ * header once with RECORD_MCOUNT_64 undefined, and again with
+ * it defined.
+ *
+ * This conversion to macros was done by:
+ * Copyright 2010 Steven Rostedt <srostedt@redhat.com>, Red Hat Inc.
+ *
+ * Licensed under the GNU General Public License, version 2 (GPLv2).
+ */
+#undef append_func
+#undef sift_rel_mcount
+#undef find_secsym_ndx
+#undef __has_rel_mcount
+#undef has_rel_mcount
+#undef tot_relsize
+#undef do_func
+#undef Elf_Ehdr
+#undef Elf_Shdr
+#undef Elf_Rel
+#undef Elf_Rela
+#undef Elf_Sym
+#undef ELF_R_SYM
+#undef ELF_R_INFO
+#undef ELF_ST_BIND
+#undef uint_t
+#undef _w
+#undef _align
+#undef _size
+
+#ifdef RECORD_MCOUNT_64
+# define append_func append64
+# define sift_rel_mcount sift64_rel_mcount
+# define find_secsym_ndx find64_secsym_ndx
+# define __has_rel_mcount __has64_rel_mcount
+# define has_rel_mcount has64_rel_mcount
+# define tot_relsize tot64_relsize
+# define do_func do64
+# define Elf_Ehdr Elf64_Ehdr
+# define Elf_Shdr Elf64_Shdr
+# define Elf_Rel Elf64_Rel
+# define Elf_Rela Elf64_Rela
+# define Elf_Sym Elf64_Sym
+# define ELF_R_SYM ELF64_R_SYM
+# define ELF_R_INFO ELF64_R_INFO
+# define ELF_ST_BIND ELF64_ST_BIND
+# define uint_t uint64_t
+# define _w w8
+# define _align 7u
+# define _size 8
+#else
+# define append_func append32
+# define sift_rel_mcount sift32_rel_mcount
+# define find_secsym_ndx find32_secsym_ndx
+# define __has_rel_mcount __has32_rel_mcount
+# define has_rel_mcount has32_rel_mcount
+# define tot_relsize tot32_relsize
+# define do_func do32
+# define Elf_Ehdr Elf32_Ehdr
+# define Elf_Shdr Elf32_Shdr
+# define Elf_Rel Elf32_Rel
+# define Elf_Rela Elf32_Rela
+# define Elf_Sym Elf32_Sym
+# define ELF_R_SYM ELF32_R_SYM
+# define ELF_R_INFO ELF32_R_INFO
+# define ELF_ST_BIND ELF32_ST_BIND
+# define uint_t uint32_t
+# define _w w
+# define _align 3u
+# define _size 4
+#endif
+
+/* Append the new shstrtab, Elf_Shdr[], __mcount_loc and its relocations. */
+static void append_func(Elf_Ehdr *const ehdr,
+ Elf_Shdr *const shstr,
+ uint_t const *const mloc0,
+ uint_t const *const mlocp,
+ Elf_Rel const *const mrel0,
+ Elf_Rel const *const mrelp,
+ unsigned int const rel_entsize,
+ unsigned int const symsec_sh_link)
+{
+ /* Begin constructing output file */
+ Elf_Shdr mcsec;
+ char const *mc_name = (sizeof(Elf_Rela) == rel_entsize)
+ ? ".rela__mcount_loc"
+ : ".rel__mcount_loc";
+ unsigned const old_shnum = w2(ehdr->e_shnum);
+ uint_t const old_shoff = _w(ehdr->e_shoff);
+ uint_t const old_shstr_sh_size = _w(shstr->sh_size);
+ uint_t const old_shstr_sh_offset = _w(shstr->sh_offset);
+ uint_t t = 1 + strlen(mc_name) + _w(shstr->sh_size);
+ uint_t new_e_shoff;
+
+ shstr->sh_size = _w(t);
+ shstr->sh_offset = _w(sb.st_size);
+ t += sb.st_size;
+ t += (_align & -t); /* word-byte align */
+ new_e_shoff = t;
+
+ /* body for new shstrtab */
+ ulseek(fd_map, sb.st_size, SEEK_SET);
+ uwrite(fd_map, old_shstr_sh_offset + (void *)ehdr, old_shstr_sh_size);
+ uwrite(fd_map, mc_name, 1 + strlen(mc_name));
+
+ /* old(modified) Elf_Shdr table, word-byte aligned */
+ ulseek(fd_map, t, SEEK_SET);
+ t += sizeof(Elf_Shdr) * old_shnum;
+ uwrite(fd_map, old_shoff + (void *)ehdr,
+ sizeof(Elf_Shdr) * old_shnum);
+
+ /* new sections __mcount_loc and .rel__mcount_loc */
+ t += 2*sizeof(mcsec);
+ mcsec.sh_name = w((sizeof(Elf_Rela) == rel_entsize) + strlen(".rel")
+ + old_shstr_sh_size);
+ mcsec.sh_type = w(SHT_PROGBITS);
+ mcsec.sh_flags = _w(SHF_ALLOC);
+ mcsec.sh_addr = 0;
+ mcsec.sh_offset = _w(t);
+ mcsec.sh_size = _w((void *)mlocp - (void *)mloc0);
+ mcsec.sh_link = 0;
+ mcsec.sh_info = 0;
+ mcsec.sh_addralign = _w(_size);
+ mcsec.sh_entsize = _w(_size);
+ uwrite(fd_map, &mcsec, sizeof(mcsec));
+
+ mcsec.sh_name = w(old_shstr_sh_size);
+ mcsec.sh_type = (sizeof(Elf_Rela) == rel_entsize)
+ ? w(SHT_RELA)
+ : w(SHT_REL);
+ mcsec.sh_flags = 0;
+ mcsec.sh_addr = 0;
+ mcsec.sh_offset = _w((void *)mlocp - (void *)mloc0 + t);
+ mcsec.sh_size = _w((void *)mrelp - (void *)mrel0);
+ mcsec.sh_link = w(symsec_sh_link);
+ mcsec.sh_info = w(old_shnum);
+ mcsec.sh_addralign = _w(_size);
+ mcsec.sh_entsize = _w(rel_entsize);
+ uwrite(fd_map, &mcsec, sizeof(mcsec));
+
+ uwrite(fd_map, mloc0, (void *)mlocp - (void *)mloc0);
+ uwrite(fd_map, mrel0, (void *)mrelp - (void *)mrel0);
+
+ ehdr->e_shoff = _w(new_e_shoff);
+ ehdr->e_shnum = w2(2 + w2(ehdr->e_shnum)); /* {.rel,}__mcount_loc */
+ ulseek(fd_map, 0, SEEK_SET);
+ uwrite(fd_map, ehdr, sizeof(*ehdr));
+}
+
+
+/*
+ * Look at the relocations in order to find the calls to mcount.
+ * Accumulate the section offsets that are found, and their relocation info,
+ * onto the end of the existing arrays.
+ */
+static uint_t *sift_rel_mcount(uint_t *mlocp,
+ unsigned const offbase,
+ Elf_Rel **const mrelpp,
+ Elf_Shdr const *const relhdr,
+ Elf_Ehdr const *const ehdr,
+ unsigned const recsym,
+ uint_t const recval,
+ unsigned const reltype)
+{
+ uint_t *const mloc0 = mlocp;
+ Elf_Rel *mrelp = *mrelpp;
+ Elf_Shdr *const shdr0 = (Elf_Shdr *)(_w(ehdr->e_shoff)
+ + (void *)ehdr);
+ unsigned const symsec_sh_link = w(relhdr->sh_link);
+ Elf_Shdr const *const symsec = &shdr0[symsec_sh_link];
+ Elf_Sym const *const sym0 = (Elf_Sym const *)(_w(symsec->sh_offset)
+ + (void *)ehdr);
+
+ Elf_Shdr const *const strsec = &shdr0[w(symsec->sh_link)];
+ char const *const str0 = (char const *)(_w(strsec->sh_offset)
+ + (void *)ehdr);
+
+ Elf_Rel const *const rel0 = (Elf_Rel const *)(_w(relhdr->sh_offset)
+ + (void *)ehdr);
+ unsigned rel_entsize = _w(relhdr->sh_entsize);
+ unsigned const nrel = _w(relhdr->sh_size) / rel_entsize;
+ Elf_Rel const *relp = rel0;
+
+ unsigned mcountsym = 0;
+ unsigned t;
+
+ for (t = nrel; t; --t) {
+ if (!mcountsym) {
+ Elf_Sym const *const symp =
+ &sym0[ELF_R_SYM(_w(relp->r_info))];
+ char const *symname = &str0[w(symp->st_name)];
+
+ if ('.' == symname[0])
+ ++symname; /* ppc64 hack */
+ if (0 == strcmp((('_' == gpfx) ? "_mcount" : "mcount"),
+ symname))
+ mcountsym = ELF_R_SYM(_w(relp->r_info));
+ }
+
+ if (mcountsym == ELF_R_SYM(_w(relp->r_info))) {
+ uint_t const addend = _w(_w(relp->r_offset) - recval);
+
+ mrelp->r_offset = _w(offbase
+ + ((void *)mlocp - (void *)mloc0));
+ mrelp->r_info = _w(ELF_R_INFO(recsym, reltype));
+ if (sizeof(Elf_Rela) == rel_entsize) {
+ ((Elf_Rela *)mrelp)->r_addend = addend;
+ *mlocp++ = 0;
+ } else
+ *mlocp++ = addend;
+
+ mrelp = (Elf_Rel *)(rel_entsize + (void *)mrelp);
+ }
+ relp = (Elf_Rel const *)(rel_entsize + (void *)relp);
+ }
+ *mrelpp = mrelp;
+ return mlocp;
+}
+
+
+/*
+ * Find a symbol in the given section, to be used as the base for relocating
+ * the table of offsets of calls to mcount. A local or global symbol suffices,
+ * but avoid a Weak symbol because it may be overridden; the change in value
+ * would invalidate the relocations of the offsets of the calls to mcount.
+ * Often the found symbol will be the unnamed local symbol generated by
+ * GNU 'as' for the start of each section. For example:
+ * Num: Value Size Type Bind Vis Ndx Name
+ * 2: 00000000 0 SECTION LOCAL DEFAULT 1
+ */
+static unsigned find_secsym_ndx(unsigned const txtndx,
+ char const *const txtname,
+ uint_t *const recvalp,
+ Elf_Shdr const *const symhdr,
+ Elf_Ehdr const *const ehdr)
+{
+ Elf_Sym const *const sym0 = (Elf_Sym const *)(_w(symhdr->sh_offset)
+ + (void *)ehdr);
+ unsigned const nsym = _w(symhdr->sh_size) / _w(symhdr->sh_entsize);
+ Elf_Sym const *symp;
+ unsigned t;
+
+ for (symp = sym0, t = nsym; t; --t, ++symp) {
+ unsigned int const st_bind = ELF_ST_BIND(symp->st_info);
+
+ if (txtndx == w2(symp->st_shndx)
+ /* avoid STB_WEAK */
+ && (STB_LOCAL == st_bind || STB_GLOBAL == st_bind)) {
+ *recvalp = _w(symp->st_value);
+ return symp - sym0;
+ }
+ }
+ fprintf(stderr, "Cannot find symbol for section %d: %s.\n",
+ txtndx, txtname);
+ fail_file();
+}
+
+
+/* Evade ISO C restriction: no declaration after statement in has_rel_mcount. */
+static char const *
+__has_rel_mcount(Elf_Shdr const *const relhdr, /* is SHT_REL or SHT_RELA */
+ Elf_Shdr const *const shdr0,
+ char const *const shstrtab,
+ char const *const fname)
+{
+ /* .sh_info depends on .sh_type == SHT_REL[,A] */
+ Elf_Shdr const *const txthdr = &shdr0[w(relhdr->sh_info)];
+ char const *const txtname = &shstrtab[w(txthdr->sh_name)];
+
+ if (0 == strcmp("__mcount_loc", txtname)) {
+ fprintf(stderr, "warning: __mcount_loc already exists: %s\n",
+ fname);
+ succeed_file();
+ }
+ if (SHT_PROGBITS != w(txthdr->sh_type) ||
+ !is_mcounted_section_name(txtname))
+ return NULL;
+ return txtname;
+}
+
+static char const *has_rel_mcount(Elf_Shdr const *const relhdr,
+ Elf_Shdr const *const shdr0,
+ char const *const shstrtab,
+ char const *const fname)
+{
+ if (SHT_REL != w(relhdr->sh_type) && SHT_RELA != w(relhdr->sh_type))
+ return NULL;
+ return __has_rel_mcount(relhdr, shdr0, shstrtab, fname);
+}
+
+
+static unsigned tot_relsize(Elf_Shdr const *const shdr0,
+ unsigned nhdr,
+ const char *const shstrtab,
+ const char *const fname)
+{
+ unsigned totrelsz = 0;
+ Elf_Shdr const *shdrp = shdr0;
+
+ for (; nhdr; --nhdr, ++shdrp) {
+ if (has_rel_mcount(shdrp, shdr0, shstrtab, fname))
+ totrelsz += _w(shdrp->sh_size);
+ }
+ return totrelsz;
+}
+
+
+/* Overall supervision for Elf32 ET_REL file. */
+static void
+do_func(Elf_Ehdr *const ehdr, char const *const fname, unsigned const reltype)
+{
+ Elf_Shdr *const shdr0 = (Elf_Shdr *)(_w(ehdr->e_shoff)
+ + (void *)ehdr);
+ unsigned const nhdr = w2(ehdr->e_shnum);
+ Elf_Shdr *const shstr = &shdr0[w2(ehdr->e_shstrndx)];
+ char const *const shstrtab = (char const *)(_w(shstr->sh_offset)
+ + (void *)ehdr);
+
+ Elf_Shdr const *relhdr;
+ unsigned k;
+
+ /* Upper bound on space: assume all relevant relocs are for mcount. */
+ unsigned const totrelsz = tot_relsize(shdr0, nhdr, shstrtab, fname);
+ Elf_Rel *const mrel0 = umalloc(totrelsz);
+ Elf_Rel * mrelp = mrel0;
+
+ /* 2*sizeof(address) <= sizeof(Elf_Rel) */
+ uint_t *const mloc0 = umalloc(totrelsz>>1);
+ uint_t * mlocp = mloc0;
+
+ unsigned rel_entsize = 0;
+ unsigned symsec_sh_link = 0;
+
+ for (relhdr = shdr0, k = nhdr; k; --k, ++relhdr) {
+ char const *const txtname = has_rel_mcount(relhdr, shdr0,
+ shstrtab, fname);
+ if (txtname) {
+ uint_t recval = 0;
+ unsigned const recsym = find_secsym_ndx(
+ w(relhdr->sh_info), txtname, &recval,
+ &shdr0[symsec_sh_link = w(relhdr->sh_link)],
+ ehdr);
+
+ rel_entsize = _w(relhdr->sh_entsize);
+ mlocp = sift_rel_mcount(mlocp,
+ (void *)mlocp - (void *)mloc0, &mrelp,
+ relhdr, ehdr, recsym, recval, reltype);
+ }
+ }
+ if (mloc0 != mlocp) {
+ append_func(ehdr, shstr, mloc0, mlocp, mrel0, mrelp,
+ rel_entsize, symsec_sh_link);
+ }
+ free(mrel0);
+ free(mloc0);
+}
#
af_names.h
capability_names.h
+rlim_names.h
* aa_simple_write_to_buffer - common routine for getting policy from user
* @op: operation doing the user buffer copy
* @userbuf: user buffer to copy data from (NOT NULL)
- * @alloc_size: size of user buffer
+ * @alloc_size: size of user buffer (REQUIRES: @alloc_size >= @copy_size)
* @copy_size: size of data to copy from user buffer
* @pos: position write is at in the file (NOT NULL)
*
{
char *data;
+ BUG_ON(copy_size > alloc_size);
+
if (*pos != 0)
/* only writes from pos 0, that is complete writes */
return ERR_PTR(-ESPIPE);
};
int aa_map_resource(int resource);
-int aa_task_setrlimit(struct aa_profile *profile, unsigned int resource,
- struct rlimit *new_rlim);
+int aa_task_setrlimit(struct aa_profile *profile, struct task_struct *,
+ unsigned int resource, struct rlimit *new_rlim);
void __aa_transition_rlimits(struct aa_profile *old, struct aa_profile *new);
*ns_name = NULL;
if (name[0] == ':') {
char *split = strchr(&name[1], ':');
+ *ns_name = skip_spaces(&name[1]);
if (split) {
/* overwrite ':' with \0 */
*split = 0;
} else
/* a ns name without a following profile is allowed */
name = NULL;
- *ns_name = &name[1];
}
if (name && *name == 0)
name = NULL;
int error = 0;
if (!unconfined(profile))
- error = aa_task_setrlimit(profile, resource, new_rlim);
+ error = aa_task_setrlimit(profile, task, resource, new_rlim);
return error;
}
{
struct path root, tmp;
char *res;
- int deleted, connected;
- int error = 0;
+ int connected, error = 0;
/* Get the root we want to resolve too, released below */
if (flags & PATH_CHROOT_REL) {
}
spin_lock(&dcache_lock);
- /* There is a race window between path lookup here and the
- * need to strip the " (deleted) string that __d_path applies
- * Detect the race and relookup the path
- *
- * The stripping of (deleted) is a hack that could be removed
- * with an updated __d_path
- */
- do {
- tmp = root;
- deleted = d_unlinked(path->dentry);
- res = __d_path(path, &tmp, buf, buflen);
-
- } while (deleted != d_unlinked(path->dentry));
+ tmp = root;
+ res = __d_path(path, &tmp, buf, buflen);
spin_unlock(&dcache_lock);
*name = res;
*name = buf;
goto out;
}
- if (deleted) {
- /* On some filesystems, newly allocated dentries appear to the
- * security_path hooks as a deleted dentry except without an
- * inode allocated.
- *
- * Remove the appended deleted text and return as string for
- * normal mediation, or auditing. The (deleted) string is
- * guaranteed to be added in this case, so just strip it.
- */
- buf[buflen - 11] = 0; /* - (len(" (deleted)") +\0) */
- if (path->dentry->d_inode && !(flags & PATH_MEDIATE_DELETED)) {
+ /* Handle two cases:
+ * 1. A deleted dentry && profile is not allowing mediation of deleted
+ * 2. On some filesystems, newly allocated dentries appear to the
+ * security_path hooks as a deleted dentry except without an inode
+ * allocated.
+ */
+ if (d_unlinked(path->dentry) && path->dentry->d_inode &&
+ !(flags & PATH_MEDIATE_DELETED)) {
error = -ENOENT;
goto out;
- }
}
/* Determine if the path is connected to the expected root */
/* released below */
ns = aa_get_namespace(root);
- write_lock(&ns->lock);
if (!name) {
/* remove namespace - can only happen if fqname[0] == ':' */
+ write_lock(&ns->parent->lock);
__remove_namespace(ns);
+ write_unlock(&ns->parent->lock);
} else {
/* remove profile */
+ write_lock(&ns->lock);
profile = aa_get_profile(__lookup_profile(&ns->base, name));
if (!profile) {
error = -ENOENT;
}
name = profile->base.hname;
__remove_profile(profile);
+ write_unlock(&ns->lock);
}
- write_unlock(&ns->lock);
/* don't fail removal if audit fails */
(void) audit_policy(OP_PROF_RM, GFP_KERNEL, name, info, error);
/**
* aa_task_setrlimit - test permission to set an rlimit
* @profile - profile confining the task (NOT NULL)
+ * @task - task the resource is being set on
* @resource - the resource being set
* @new_rlim - the new resource limit (NOT NULL)
*
*
* Returns: 0 or error code if setting resource failed
*/
-int aa_task_setrlimit(struct aa_profile *profile, unsigned int resource,
- struct rlimit *new_rlim)
+int aa_task_setrlimit(struct aa_profile *profile, struct task_struct *task,
+ unsigned int resource, struct rlimit *new_rlim)
{
int error = 0;
- if (profile->rlimits.mask & (1 << resource) &&
- new_rlim->rlim_max > profile->rlimits.limits[resource].rlim_max)
-
- error = audit_resource(profile, resource, new_rlim->rlim_max,
- -EACCES);
+ /* TODO: extend resource control to handle other (non current)
+ * processes. AppArmor rules currently have the implicit assumption
+ * that the task is setting the resource of the current process
+ */
+ if ((task != current->group_leader) ||
+ (profile->rlimits.mask & (1 << resource) &&
+ new_rlim->rlim_max > profile->rlimits.limits[resource].rlim_max))
+ error = -EACCES;
- return error;
+ return audit_resource(profile, resource, new_rlim->rlim_max, error);
}
/**
{
}
+static int cap_secmark_relabel_packet(u32 secid)
+{
+ return 0;
+}
+static void cap_secmark_refcount_inc(void)
+{
+}
+
+static void cap_secmark_refcount_dec(void)
+{
+}
static void cap_req_classify_flow(const struct request_sock *req,
struct flowi *fl)
static int cap_secctx_to_secid(const char *secdata, u32 seclen, u32 *secid)
{
- return -EOPNOTSUPP;
+ *secid = 0;
+ return 0;
}
static void cap_release_secctx(char *secdata, u32 seclen)
set_to_cap_if_null(ops, inet_conn_request);
set_to_cap_if_null(ops, inet_csk_clone);
set_to_cap_if_null(ops, inet_conn_established);
+ set_to_cap_if_null(ops, secmark_relabel_packet);
+ set_to_cap_if_null(ops, secmark_refcount_inc);
+ set_to_cap_if_null(ops, secmark_refcount_dec);
set_to_cap_if_null(ops, req_classify_flow);
set_to_cap_if_null(ops, tun_dev_create);
set_to_cap_if_null(ops, tun_dev_post_create);
/**
* cap_task_setscheduler - Detemine if scheduler policy change is permitted
* @p: The task to affect
- * @policy: The policy to effect
- * @lp: The parameters to the scheduling policy
*
* Detemine if the requested scheduler policy change is permitted for the
* specified task, returning 0 if permission is granted, -ve if denied.
*/
-int cap_task_setscheduler(struct task_struct *p, int policy,
- struct sched_param *lp)
+int cap_task_setscheduler(struct task_struct *p)
{
return cap_safe_nice(p);
}
#define IMA_MEASURE_HTABLE_SIZE (1 << IMA_HASH_BITS)
/* set during initialization */
+extern int iint_initialized;
extern int ima_initialized;
extern int ima_used_chip;
extern char *ima_hash;
RADIX_TREE(ima_iint_store, GFP_ATOMIC);
DEFINE_SPINLOCK(ima_iint_lock);
-
static struct kmem_cache *iint_cache __read_mostly;
+int iint_initialized = 0;
+
/* ima_iint_find_get - return the iint associated with an inode
*
* ima_iint_find_get gets a reference to the iint. Caller must
iint_cache =
kmem_cache_create("iint_cache", sizeof(struct ima_iint_cache), 0,
SLAB_PANIC, init_once);
+ iint_initialized = 1;
return 0;
}
security_initcall(ima_iintcache_init);
struct ima_iint_cache *iint;
int rc;
- if (!ima_initialized || !S_ISREG(inode->i_mode))
+ if (!iint_initialized || !S_ISREG(inode->i_mode))
return;
iint = ima_iint_find_get(inode);
if (!iint)
return;
mutex_lock(&iint->mutex);
+ if (!ima_initialized)
+ goto out;
rc = ima_must_measure(iint, inode, MAY_READ, FILE_CHECK);
if (rc < 0)
goto out;
struct inode *inode = file->f_dentry->d_inode;
struct ima_iint_cache *iint;
- if (!ima_initialized || !S_ISREG(inode->i_mode))
+ if (!iint_initialized || !S_ISREG(inode->i_mode))
return;
iint = ima_iint_find_get(inode);
if (!iint)
{
struct inode *inode = file->f_dentry->d_inode;
struct ima_iint_cache *iint;
- int rc;
+ int rc = 0;
if (!ima_initialized || !S_ISREG(inode->i_mode))
return 0;
keyring_r = NULL;
me = current;
+ rcu_read_lock();
write_lock_irq(&tasklist_lock);
parent = me->real_parent;
goto not_permitted;
/* the keyrings must have the same UID */
- if (pcred->tgcred->session_keyring->uid != mycred->euid ||
+ if ((pcred->tgcred->session_keyring &&
+ pcred->tgcred->session_keyring->uid != mycred->euid) ||
mycred->tgcred->session_keyring->uid != mycred->euid)
goto not_permitted;
set_ti_thread_flag(task_thread_info(parent), TIF_NOTIFY_RESUME);
write_unlock_irq(&tasklist_lock);
+ rcu_read_unlock();
if (oldcred)
put_cred(oldcred);
return 0;
ret = 0;
not_permitted:
write_unlock_irq(&tasklist_lock);
+ rcu_read_unlock();
put_cred(cred);
return ret;
* Return true if:
* -The passed LSM is the one chosen by user at boot time,
* -or the passed LSM is configured as the default and the user did not
- * choose an alternate LSM at boot time,
- * -or there is no default LSM set and the user didn't specify a
- * specific LSM and we're the first to ask for registration permission,
- * -or the passed LSM is currently loaded.
+ * choose an alternate LSM at boot time.
* Otherwise, return false.
*/
int __init security_module_enable(struct security_operations *ops)
{
- if (!*chosen_lsm)
- strncpy(chosen_lsm, ops->name, SECURITY_NAME_MAX);
- else if (strncmp(ops->name, chosen_lsm, SECURITY_NAME_MAX))
- return 0;
-
- return 1;
+ return !strcmp(ops->name, chosen_lsm);
}
/**
return security_ops->task_setrlimit(p, resource, new_rlim);
}
-int security_task_setscheduler(struct task_struct *p,
- int policy, struct sched_param *lp)
+int security_task_setscheduler(struct task_struct *p)
{
- return security_ops->task_setscheduler(p, policy, lp);
+ return security_ops->task_setscheduler(p);
}
int security_task_getscheduler(struct task_struct *p)
security_ops->inet_conn_established(sk, skb);
}
+int security_secmark_relabel_packet(u32 secid)
+{
+ return security_ops->secmark_relabel_packet(secid);
+}
+EXPORT_SYMBOL(security_secmark_relabel_packet);
+
+void security_secmark_refcount_inc(void)
+{
+ security_ops->secmark_refcount_inc();
+}
+EXPORT_SYMBOL(security_secmark_refcount_inc);
+
+void security_secmark_refcount_dec(void)
+{
+ security_ops->secmark_refcount_dec();
+}
+EXPORT_SYMBOL(security_secmark_refcount_dec);
+
int security_tun_dev_create(void)
{
return security_ops->tun_dev_create();
# Makefile for building the SELinux module as part of the kernel tree.
#
-obj-$(CONFIG_SECURITY_SELINUX) := selinux.o ss/
-
-selinux-y := avc.o \
- hooks.o \
- selinuxfs.o \
- netlink.o \
- nlmsgtab.o \
- netif.o \
- netnode.o \
- netport.o \
- exports.o
+obj-$(CONFIG_SECURITY_SELINUX) := selinux.o
+
+selinux-y := avc.o hooks.o selinuxfs.o netlink.o nlmsgtab.o netif.o \
+ netnode.o netport.o exports.o \
+ ss/ebitmap.o ss/hashtab.o ss/symtab.o ss/sidtab.o ss/avtab.o \
+ ss/policydb.o ss/services.o ss/conditional.o ss/mls.o ss/status.o
selinux-$(CONFIG_SECURITY_NETWORK_XFRM) += xfrm.o
selinux-$(CONFIG_NETLABEL) += netlabel.o
-EXTRA_CFLAGS += -Isecurity/selinux -Isecurity/selinux/include
+ccflags-y := -Isecurity/selinux -Isecurity/selinux/include
-$(obj)/avc.o: $(obj)/flask.h
+$(addprefix $(obj)/,$(selinux-y)): $(obj)/flask.h
quiet_cmd_flask = GEN $(obj)/flask.h $(obj)/av_permissions.h
cmd_flask = scripts/selinux/genheaders/genheaders $(obj)/flask.h $(obj)/av_permissions.h
* it under the terms of the GNU General Public License version 2,
* as published by the Free Software Foundation.
*/
-#include <linux/types.h>
-#include <linux/kernel.h>
#include <linux/module.h>
-#include <linux/selinux.h>
-#include <linux/fs.h>
-#include <linux/ipc.h>
-#include <asm/atomic.h>
#include "security.h"
-#include "objsec.h"
-
-/* SECMARK reference count */
-extern atomic_t selinux_secmark_refcount;
-
-int selinux_string_to_sid(char *str, u32 *sid)
-{
- if (selinux_enabled)
- return security_context_to_sid(str, strlen(str), sid);
- else {
- *sid = 0;
- return 0;
- }
-}
-EXPORT_SYMBOL_GPL(selinux_string_to_sid);
-
-int selinux_secmark_relabel_packet_permission(u32 sid)
-{
- if (selinux_enabled) {
- const struct task_security_struct *__tsec;
- u32 tsid;
-
- __tsec = current_security();
- tsid = __tsec->sid;
-
- return avc_has_perm(tsid, sid, SECCLASS_PACKET,
- PACKET__RELABELTO, NULL);
- }
- return 0;
-}
-EXPORT_SYMBOL_GPL(selinux_secmark_relabel_packet_permission);
-
-void selinux_secmark_refcount_inc(void)
-{
- atomic_inc(&selinux_secmark_refcount);
-}
-EXPORT_SYMBOL_GPL(selinux_secmark_refcount_inc);
-
-void selinux_secmark_refcount_dec(void)
-{
- atomic_dec(&selinux_secmark_refcount);
-}
-EXPORT_SYMBOL_GPL(selinux_secmark_refcount_dec);
bool selinux_is_enabled(void)
{
return 0;
}
-static int selinux_task_setscheduler(struct task_struct *p, int policy, struct sched_param *lp)
+static int selinux_task_setscheduler(struct task_struct *p)
{
int rc;
- rc = cap_task_setscheduler(p, policy, lp);
+ rc = cap_task_setscheduler(p);
if (rc)
return rc;
selinux_skb_peerlbl_sid(skb, family, &sksec->peer_sid);
}
+static int selinux_secmark_relabel_packet(u32 sid)
+{
+ const struct task_security_struct *__tsec;
+ u32 tsid;
+
+ __tsec = current_security();
+ tsid = __tsec->sid;
+
+ return avc_has_perm(tsid, sid, SECCLASS_PACKET, PACKET__RELABELTO, NULL);
+}
+
+static void selinux_secmark_refcount_inc(void)
+{
+ atomic_inc(&selinux_secmark_refcount);
+}
+
+static void selinux_secmark_refcount_dec(void)
+{
+ atomic_dec(&selinux_secmark_refcount);
+}
+
static void selinux_req_classify_flow(const struct request_sock *req,
struct flowi *fl)
{
.inet_conn_request = selinux_inet_conn_request,
.inet_csk_clone = selinux_inet_csk_clone,
.inet_conn_established = selinux_inet_conn_established,
+ .secmark_relabel_packet = selinux_secmark_relabel_packet,
+ .secmark_refcount_inc = selinux_secmark_refcount_inc,
+ .secmark_refcount_dec = selinux_secmark_refcount_dec,
.req_classify_flow = selinux_req_classify_flow,
.tun_dev_create = selinux_tun_dev_create,
.tun_dev_post_create = selinux_tun_dev_post_create,
{ "compute_av", "compute_create", "compute_member",
"check_context", "load_policy", "compute_relabel",
"compute_user", "setenforce", "setbool", "setsecparam",
- "setcheckreqprot", NULL } },
+ "setcheckreqprot", "read_policy", NULL } },
{ "process",
{ "fork", "transition", "sigchld", "sigkill",
"sigstop", "signull", "signal", "ptrace", "getsched", "setsched",
#define _SELINUX_SECURITY_H_
#include <linux/magic.h>
+#include <linux/types.h>
#include "flask.h"
#define SECSID_NULL 0x00000000 /* unspecified SID */
int security_mls_enabled(void);
int security_load_policy(void *data, size_t len);
+int security_read_policy(void **data, ssize_t *len);
+size_t security_policydb_len(void);
int security_policycap_supported(unsigned int req_cap);
const char *security_get_initial_sid_context(u32 sid);
+/*
+ * status notifier using mmap interface
+ */
+extern struct page *selinux_kernel_status_page(void);
+
+#define SELINUX_KERNEL_STATUS_VERSION 1
+struct selinux_kernel_status {
+ u32 version; /* version number of thie structure */
+ u32 sequence; /* sequence number of seqlock logic */
+ u32 enforcing; /* current setting of enforcing mode */
+ u32 policyload; /* times of policy reloaded */
+ u32 deny_unknown; /* current setting of deny_unknown */
+ /*
+ * The version > 0 supports above members.
+ */
+} __attribute__((packed));
+
+extern void selinux_status_update_setenforce(int enforcing);
+extern void selinux_status_update_policyload(int seqno);
+
#endif /* _SELINUX_SECURITY_H_ */
static struct dentry *class_dir;
static unsigned long last_class_ino;
+static char policy_opened;
+
/* global data for policy capabilities */
static struct dentry *policycap_dir;
SEL_COMPAT_NET, /* whether to use old compat network packet controls */
SEL_REJECT_UNKNOWN, /* export unknown reject handling to userspace */
SEL_DENY_UNKNOWN, /* export unknown deny handling to userspace */
+ SEL_STATUS, /* export current status using mmap() */
+ SEL_POLICY, /* allow userspace to read the in kernel policy */
SEL_INO_NEXT, /* The next inode number to use */
};
if (selinux_enforcing)
avc_ss_reset(0);
selnl_notify_setenforce(selinux_enforcing);
+ selinux_status_update_setenforce(selinux_enforcing);
}
length = count;
out:
.llseek = generic_file_llseek,
};
+static int sel_open_handle_status(struct inode *inode, struct file *filp)
+{
+ struct page *status = selinux_kernel_status_page();
+
+ if (!status)
+ return -ENOMEM;
+
+ filp->private_data = status;
+
+ return 0;
+}
+
+static ssize_t sel_read_handle_status(struct file *filp, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct page *status = filp->private_data;
+
+ BUG_ON(!status);
+
+ return simple_read_from_buffer(buf, count, ppos,
+ page_address(status),
+ sizeof(struct selinux_kernel_status));
+}
+
+static int sel_mmap_handle_status(struct file *filp,
+ struct vm_area_struct *vma)
+{
+ struct page *status = filp->private_data;
+ unsigned long size = vma->vm_end - vma->vm_start;
+
+ BUG_ON(!status);
+
+ /* only allows one page from the head */
+ if (vma->vm_pgoff > 0 || size != PAGE_SIZE)
+ return -EIO;
+ /* disallow writable mapping */
+ if (vma->vm_flags & VM_WRITE)
+ return -EPERM;
+ /* disallow mprotect() turns it into writable */
+ vma->vm_flags &= ~VM_MAYWRITE;
+
+ return remap_pfn_range(vma, vma->vm_start,
+ page_to_pfn(status),
+ size, vma->vm_page_prot);
+}
+
+static const struct file_operations sel_handle_status_ops = {
+ .open = sel_open_handle_status,
+ .read = sel_read_handle_status,
+ .mmap = sel_mmap_handle_status,
+ .llseek = generic_file_llseek,
+};
+
#ifdef CONFIG_SECURITY_SELINUX_DISABLE
static ssize_t sel_write_disable(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
.llseek = generic_file_llseek,
};
+struct policy_load_memory {
+ size_t len;
+ void *data;
+};
+
+static int sel_open_policy(struct inode *inode, struct file *filp)
+{
+ struct policy_load_memory *plm = NULL;
+ int rc;
+
+ BUG_ON(filp->private_data);
+
+ mutex_lock(&sel_mutex);
+
+ rc = task_has_security(current, SECURITY__READ_POLICY);
+ if (rc)
+ goto err;
+
+ rc = -EBUSY;
+ if (policy_opened)
+ goto err;
+
+ rc = -ENOMEM;
+ plm = kzalloc(sizeof(*plm), GFP_KERNEL);
+ if (!plm)
+ goto err;
+
+ if (i_size_read(inode) != security_policydb_len()) {
+ mutex_lock(&inode->i_mutex);
+ i_size_write(inode, security_policydb_len());
+ mutex_unlock(&inode->i_mutex);
+ }
+
+ rc = security_read_policy(&plm->data, &plm->len);
+ if (rc)
+ goto err;
+
+ policy_opened = 1;
+
+ filp->private_data = plm;
+
+ mutex_unlock(&sel_mutex);
+
+ return 0;
+err:
+ mutex_unlock(&sel_mutex);
+
+ if (plm)
+ vfree(plm->data);
+ kfree(plm);
+ return rc;
+}
+
+static int sel_release_policy(struct inode *inode, struct file *filp)
+{
+ struct policy_load_memory *plm = filp->private_data;
+
+ BUG_ON(!plm);
+
+ policy_opened = 0;
+
+ vfree(plm->data);
+ kfree(plm);
+
+ return 0;
+}
+
+static ssize_t sel_read_policy(struct file *filp, char __user *buf,
+ size_t count, loff_t *ppos)
+{
+ struct policy_load_memory *plm = filp->private_data;
+ int ret;
+
+ mutex_lock(&sel_mutex);
+
+ ret = task_has_security(current, SECURITY__READ_POLICY);
+ if (ret)
+ goto out;
+
+ ret = simple_read_from_buffer(buf, count, ppos, plm->data, plm->len);
+out:
+ mutex_unlock(&sel_mutex);
+ return ret;
+}
+
+static int sel_mmap_policy_fault(struct vm_area_struct *vma,
+ struct vm_fault *vmf)
+{
+ struct policy_load_memory *plm = vma->vm_file->private_data;
+ unsigned long offset;
+ struct page *page;
+
+ if (vmf->flags & (FAULT_FLAG_MKWRITE | FAULT_FLAG_WRITE))
+ return VM_FAULT_SIGBUS;
+
+ offset = vmf->pgoff << PAGE_SHIFT;
+ if (offset >= roundup(plm->len, PAGE_SIZE))
+ return VM_FAULT_SIGBUS;
+
+ page = vmalloc_to_page(plm->data + offset);
+ get_page(page);
+
+ vmf->page = page;
+
+ return 0;
+}
+
+static struct vm_operations_struct sel_mmap_policy_ops = {
+ .fault = sel_mmap_policy_fault,
+ .page_mkwrite = sel_mmap_policy_fault,
+};
+
+int sel_mmap_policy(struct file *filp, struct vm_area_struct *vma)
+{
+ if (vma->vm_flags & VM_SHARED) {
+ /* do not allow mprotect to make mapping writable */
+ vma->vm_flags &= ~VM_MAYWRITE;
+
+ if (vma->vm_flags & VM_WRITE)
+ return -EACCES;
+ }
+
+ vma->vm_flags |= VM_RESERVED;
+ vma->vm_ops = &sel_mmap_policy_ops;
+
+ return 0;
+}
+
+static const struct file_operations sel_policy_ops = {
+ .open = sel_open_policy,
+ .read = sel_read_policy,
+ .mmap = sel_mmap_policy,
+ .release = sel_release_policy,
+};
+
static ssize_t sel_write_load(struct file *file, const char __user *buf,
size_t count, loff_t *ppos)
[SEL_CHECKREQPROT] = {"checkreqprot", &sel_checkreqprot_ops, S_IRUGO|S_IWUSR},
[SEL_REJECT_UNKNOWN] = {"reject_unknown", &sel_handle_unknown_ops, S_IRUGO},
[SEL_DENY_UNKNOWN] = {"deny_unknown", &sel_handle_unknown_ops, S_IRUGO},
+ [SEL_STATUS] = {"status", &sel_handle_status_ops, S_IRUGO},
+ [SEL_POLICY] = {"policy", &sel_policy_ops, S_IRUSR},
/* last one */ {""}
};
ret = simple_fill_super(sb, SELINUX_MAGIC, selinux_files);
+++ /dev/null
-#
-# Makefile for building the SELinux security server as part of the kernel tree.
-#
-
-EXTRA_CFLAGS += -Isecurity/selinux -Isecurity/selinux/include
-obj-y := ss.o
-
-ss-y := ebitmap.o hashtab.o symtab.o sidtab.o avtab.o policydb.o services.o conditional.o mls.o
-
if (shift > 2)
shift = shift - 2;
nslot = 1 << shift;
- if (nslot > MAX_AVTAB_SIZE)
- nslot = MAX_AVTAB_SIZE;
+ if (nslot > MAX_AVTAB_HASH_BUCKETS)
+ nslot = MAX_AVTAB_HASH_BUCKETS;
mask = nslot - 1;
h->htable = kcalloc(nslot, sizeof(*(h->htable)), GFP_KERNEL);
goto out;
}
+int avtab_write_item(struct policydb *p, struct avtab_node *cur, void *fp)
+{
+ __le16 buf16[4];
+ __le32 buf32[1];
+ int rc;
+
+ buf16[0] = cpu_to_le16(cur->key.source_type);
+ buf16[1] = cpu_to_le16(cur->key.target_type);
+ buf16[2] = cpu_to_le16(cur->key.target_class);
+ buf16[3] = cpu_to_le16(cur->key.specified);
+ rc = put_entry(buf16, sizeof(u16), 4, fp);
+ if (rc)
+ return rc;
+ buf32[0] = cpu_to_le32(cur->datum.data);
+ rc = put_entry(buf32, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ return 0;
+}
+
+int avtab_write(struct policydb *p, struct avtab *a, void *fp)
+{
+ unsigned int i;
+ int rc = 0;
+ struct avtab_node *cur;
+ __le32 buf[1];
+
+ buf[0] = cpu_to_le32(a->nel);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ for (i = 0; i < a->nslot; i++) {
+ for (cur = a->htable[i]; cur; cur = cur->next) {
+ rc = avtab_write_item(p, cur, fp);
+ if (rc)
+ return rc;
+ }
+ }
+
+ return rc;
+}
void avtab_cache_init(void)
{
avtab_node_cachep = kmem_cache_create("avtab_node",
void *p);
int avtab_read(struct avtab *a, void *fp, struct policydb *pol);
+int avtab_write_item(struct policydb *p, struct avtab_node *cur, void *fp);
+int avtab_write(struct policydb *p, struct avtab *a, void *fp);
struct avtab_node *avtab_insert_nonunique(struct avtab *h, struct avtab_key *key,
struct avtab_datum *datum);
#define MAX_AVTAB_HASH_BITS 11
#define MAX_AVTAB_HASH_BUCKETS (1 << MAX_AVTAB_HASH_BITS)
#define MAX_AVTAB_HASH_MASK (MAX_AVTAB_HASH_BUCKETS-1)
-#define MAX_AVTAB_SIZE MAX_AVTAB_HASH_BUCKETS
#endif /* _SS_AVTAB_H_ */
return rc;
}
+int cond_write_bool(void *vkey, void *datum, void *ptr)
+{
+ char *key = vkey;
+ struct cond_bool_datum *booldatum = datum;
+ struct policy_data *pd = ptr;
+ void *fp = pd->fp;
+ __le32 buf[3];
+ u32 len;
+ int rc;
+
+ len = strlen(key);
+ buf[0] = cpu_to_le32(booldatum->value);
+ buf[1] = cpu_to_le32(booldatum->state);
+ buf[2] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 3, fp);
+ if (rc)
+ return rc;
+ rc = put_entry(key, 1, len, fp);
+ if (rc)
+ return rc;
+ return 0;
+}
+
+/*
+ * cond_write_cond_av_list doesn't write out the av_list nodes.
+ * Instead it writes out the key/value pairs from the avtab. This
+ * is necessary because there is no way to uniquely identifying rules
+ * in the avtab so it is not possible to associate individual rules
+ * in the avtab with a conditional without saving them as part of
+ * the conditional. This means that the avtab with the conditional
+ * rules will not be saved but will be rebuilt on policy load.
+ */
+static int cond_write_av_list(struct policydb *p,
+ struct cond_av_list *list, struct policy_file *fp)
+{
+ __le32 buf[1];
+ struct cond_av_list *cur_list;
+ u32 len;
+ int rc;
+
+ len = 0;
+ for (cur_list = list; cur_list != NULL; cur_list = cur_list->next)
+ len++;
+
+ buf[0] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ if (len == 0)
+ return 0;
+
+ for (cur_list = list; cur_list != NULL; cur_list = cur_list->next) {
+ rc = avtab_write_item(p, cur_list->node, fp);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+int cond_write_node(struct policydb *p, struct cond_node *node,
+ struct policy_file *fp)
+{
+ struct cond_expr *cur_expr;
+ __le32 buf[2];
+ int rc;
+ u32 len = 0;
+
+ buf[0] = cpu_to_le32(node->cur_state);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ for (cur_expr = node->expr; cur_expr != NULL; cur_expr = cur_expr->next)
+ len++;
+
+ buf[0] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ for (cur_expr = node->expr; cur_expr != NULL; cur_expr = cur_expr->next) {
+ buf[0] = cpu_to_le32(cur_expr->expr_type);
+ buf[1] = cpu_to_le32(cur_expr->bool);
+ rc = put_entry(buf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+ }
+
+ rc = cond_write_av_list(p, node->true_list, fp);
+ if (rc)
+ return rc;
+ rc = cond_write_av_list(p, node->false_list, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+int cond_write_list(struct policydb *p, struct cond_node *list, void *fp)
+{
+ struct cond_node *cur;
+ u32 len;
+ __le32 buf[1];
+ int rc;
+
+ len = 0;
+ for (cur = list; cur != NULL; cur = cur->next)
+ len++;
+ buf[0] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ for (cur = list; cur != NULL; cur = cur->next) {
+ rc = cond_write_node(p, cur, fp);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
/* Determine whether additional permissions are granted by the conditional
* av table, and if so, add them to the result
*/
int cond_read_bool(struct policydb *p, struct hashtab *h, void *fp);
int cond_read_list(struct policydb *p, void *fp);
+int cond_write_bool(void *key, void *datum, void *ptr);
+int cond_write_list(struct policydb *p, struct cond_node *list, void *fp);
void cond_compute_av(struct avtab *ctab, struct avtab_key *key, struct av_decision *avd);
#include "ebitmap.h"
#include "policydb.h"
+#define BITS_PER_U64 (sizeof(u64) * 8)
+
int ebitmap_cmp(struct ebitmap *e1, struct ebitmap *e2)
{
struct ebitmap_node *n1, *n2;
e->highbit = le32_to_cpu(buf[1]);
count = le32_to_cpu(buf[2]);
- if (mapunit != sizeof(u64) * 8) {
+ if (mapunit != BITS_PER_U64) {
printk(KERN_ERR "SELinux: ebitmap: map size %u does not "
"match my size %Zd (high bit was %d)\n",
- mapunit, sizeof(u64) * 8, e->highbit);
+ mapunit, BITS_PER_U64, e->highbit);
goto bad;
}
ebitmap_destroy(e);
goto out;
}
+
+int ebitmap_write(struct ebitmap *e, void *fp)
+{
+ struct ebitmap_node *n;
+ u32 count;
+ __le32 buf[3];
+ u64 map;
+ int bit, last_bit, last_startbit, rc;
+
+ buf[0] = cpu_to_le32(BITS_PER_U64);
+
+ count = 0;
+ last_bit = 0;
+ last_startbit = -1;
+ ebitmap_for_each_positive_bit(e, n, bit) {
+ if (rounddown(bit, (int)BITS_PER_U64) > last_startbit) {
+ count++;
+ last_startbit = rounddown(bit, BITS_PER_U64);
+ }
+ last_bit = roundup(bit + 1, BITS_PER_U64);
+ }
+ buf[1] = cpu_to_le32(last_bit);
+ buf[2] = cpu_to_le32(count);
+
+ rc = put_entry(buf, sizeof(u32), 3, fp);
+ if (rc)
+ return rc;
+
+ map = 0;
+ last_startbit = INT_MIN;
+ ebitmap_for_each_positive_bit(e, n, bit) {
+ if (rounddown(bit, (int)BITS_PER_U64) > last_startbit) {
+ __le64 buf64[1];
+
+ /* this is the very first bit */
+ if (!map) {
+ last_startbit = rounddown(bit, BITS_PER_U64);
+ map = (u64)1 << (bit - last_startbit);
+ continue;
+ }
+
+ /* write the last node */
+ buf[0] = cpu_to_le32(last_startbit);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ buf64[0] = cpu_to_le64(map);
+ rc = put_entry(buf64, sizeof(u64), 1, fp);
+ if (rc)
+ return rc;
+
+ /* set up for the next node */
+ map = 0;
+ last_startbit = rounddown(bit, BITS_PER_U64);
+ }
+ map |= (u64)1 << (bit - last_startbit);
+ }
+ /* write the last node */
+ if (map) {
+ __le64 buf64[1];
+
+ /* write the last node */
+ buf[0] = cpu_to_le32(last_startbit);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ buf64[0] = cpu_to_le64(map);
+ rc = put_entry(buf64, sizeof(u64), 1, fp);
+ if (rc)
+ return rc;
+ }
+ return 0;
+}
int ebitmap_set_bit(struct ebitmap *e, unsigned long bit, int value);
void ebitmap_destroy(struct ebitmap *e);
int ebitmap_read(struct ebitmap *e, void *fp);
+int ebitmap_write(struct ebitmap *e, void *fp);
#ifdef CONFIG_NETLABEL
int ebitmap_netlbl_export(struct ebitmap *ebmap,
#include "policydb.h"
#include "conditional.h"
#include "mls.h"
+#include "services.h"
#define _DEBUG_HASHES
static int rangetr_cmp(struct hashtab *h, const void *k1, const void *k2)
{
const struct range_trans *key1 = k1, *key2 = k2;
- return (key1->source_type != key2->source_type ||
- key1->target_type != key2->target_type ||
- key1->target_class != key2->target_class);
+ int v;
+
+ v = key1->source_type - key2->source_type;
+ if (v)
+ return v;
+
+ v = key1->target_type - key2->target_type;
+ if (v)
+ return v;
+
+ v = key1->target_class - key2->target_class;
+
+ return v;
}
/*
static int type_bounds_sanity_check(void *key, void *datum, void *datap)
{
- struct type_datum *upper, *type;
+ struct type_datum *upper;
struct policydb *p = datap;
int depth = 0;
- upper = type = datum;
+ upper = datum;
while (upper->bounds) {
if (++depth == POLICYDB_BOUNDS_MAXDEPTH) {
printk(KERN_ERR "SELinux: type %s: "
policydb_destroy(p);
goto out;
}
+
+/*
+ * Write a MLS level structure to a policydb binary
+ * representation file.
+ */
+static int mls_write_level(struct mls_level *l, void *fp)
+{
+ __le32 buf[1];
+ int rc;
+
+ buf[0] = cpu_to_le32(l->sens);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ rc = ebitmap_write(&l->cat, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+/*
+ * Write a MLS range structure to a policydb binary
+ * representation file.
+ */
+static int mls_write_range_helper(struct mls_range *r, void *fp)
+{
+ __le32 buf[3];
+ size_t items;
+ int rc, eq;
+
+ eq = mls_level_eq(&r->level[1], &r->level[0]);
+
+ if (eq)
+ items = 2;
+ else
+ items = 3;
+ buf[0] = cpu_to_le32(items-1);
+ buf[1] = cpu_to_le32(r->level[0].sens);
+ if (!eq)
+ buf[2] = cpu_to_le32(r->level[1].sens);
+
+ BUG_ON(items > (sizeof(buf)/sizeof(buf[0])));
+
+ rc = put_entry(buf, sizeof(u32), items, fp);
+ if (rc)
+ return rc;
+
+ rc = ebitmap_write(&r->level[0].cat, fp);
+ if (rc)
+ return rc;
+ if (!eq) {
+ rc = ebitmap_write(&r->level[1].cat, fp);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+static int sens_write(void *vkey, void *datum, void *ptr)
+{
+ char *key = vkey;
+ struct level_datum *levdatum = datum;
+ struct policy_data *pd = ptr;
+ void *fp = pd->fp;
+ __le32 buf[2];
+ size_t len;
+ int rc;
+
+ len = strlen(key);
+ buf[0] = cpu_to_le32(len);
+ buf[1] = cpu_to_le32(levdatum->isalias);
+ rc = put_entry(buf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+
+ rc = put_entry(key, 1, len, fp);
+ if (rc)
+ return rc;
+
+ rc = mls_write_level(levdatum->level, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int cat_write(void *vkey, void *datum, void *ptr)
+{
+ char *key = vkey;
+ struct cat_datum *catdatum = datum;
+ struct policy_data *pd = ptr;
+ void *fp = pd->fp;
+ __le32 buf[3];
+ size_t len;
+ int rc;
+
+ len = strlen(key);
+ buf[0] = cpu_to_le32(len);
+ buf[1] = cpu_to_le32(catdatum->value);
+ buf[2] = cpu_to_le32(catdatum->isalias);
+ rc = put_entry(buf, sizeof(u32), 3, fp);
+ if (rc)
+ return rc;
+
+ rc = put_entry(key, 1, len, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int role_trans_write(struct role_trans *r, void *fp)
+{
+ struct role_trans *tr;
+ u32 buf[3];
+ size_t nel;
+ int rc;
+
+ nel = 0;
+ for (tr = r; tr; tr = tr->next)
+ nel++;
+ buf[0] = cpu_to_le32(nel);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ for (tr = r; tr; tr = tr->next) {
+ buf[0] = cpu_to_le32(tr->role);
+ buf[1] = cpu_to_le32(tr->type);
+ buf[2] = cpu_to_le32(tr->new_role);
+ rc = put_entry(buf, sizeof(u32), 3, fp);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
+static int role_allow_write(struct role_allow *r, void *fp)
+{
+ struct role_allow *ra;
+ u32 buf[2];
+ size_t nel;
+ int rc;
+
+ nel = 0;
+ for (ra = r; ra; ra = ra->next)
+ nel++;
+ buf[0] = cpu_to_le32(nel);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ for (ra = r; ra; ra = ra->next) {
+ buf[0] = cpu_to_le32(ra->role);
+ buf[1] = cpu_to_le32(ra->new_role);
+ rc = put_entry(buf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+ }
+ return 0;
+}
+
+/*
+ * Write a security context structure
+ * to a policydb binary representation file.
+ */
+static int context_write(struct policydb *p, struct context *c,
+ void *fp)
+{
+ int rc;
+ __le32 buf[3];
+
+ buf[0] = cpu_to_le32(c->user);
+ buf[1] = cpu_to_le32(c->role);
+ buf[2] = cpu_to_le32(c->type);
+
+ rc = put_entry(buf, sizeof(u32), 3, fp);
+ if (rc)
+ return rc;
+
+ rc = mls_write_range_helper(&c->range, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+/*
+ * The following *_write functions are used to
+ * write the symbol data to a policy database
+ * binary representation file.
+ */
+
+static int perm_write(void *vkey, void *datum, void *fp)
+{
+ char *key = vkey;
+ struct perm_datum *perdatum = datum;
+ __le32 buf[2];
+ size_t len;
+ int rc;
+
+ len = strlen(key);
+ buf[0] = cpu_to_le32(len);
+ buf[1] = cpu_to_le32(perdatum->value);
+ rc = put_entry(buf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+
+ rc = put_entry(key, 1, len, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int common_write(void *vkey, void *datum, void *ptr)
+{
+ char *key = vkey;
+ struct common_datum *comdatum = datum;
+ struct policy_data *pd = ptr;
+ void *fp = pd->fp;
+ __le32 buf[4];
+ size_t len;
+ int rc;
+
+ len = strlen(key);
+ buf[0] = cpu_to_le32(len);
+ buf[1] = cpu_to_le32(comdatum->value);
+ buf[2] = cpu_to_le32(comdatum->permissions.nprim);
+ buf[3] = cpu_to_le32(comdatum->permissions.table->nel);
+ rc = put_entry(buf, sizeof(u32), 4, fp);
+ if (rc)
+ return rc;
+
+ rc = put_entry(key, 1, len, fp);
+ if (rc)
+ return rc;
+
+ rc = hashtab_map(comdatum->permissions.table, perm_write, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int write_cons_helper(struct policydb *p, struct constraint_node *node,
+ void *fp)
+{
+ struct constraint_node *c;
+ struct constraint_expr *e;
+ __le32 buf[3];
+ u32 nel;
+ int rc;
+
+ for (c = node; c; c = c->next) {
+ nel = 0;
+ for (e = c->expr; e; e = e->next)
+ nel++;
+ buf[0] = cpu_to_le32(c->permissions);
+ buf[1] = cpu_to_le32(nel);
+ rc = put_entry(buf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+ for (e = c->expr; e; e = e->next) {
+ buf[0] = cpu_to_le32(e->expr_type);
+ buf[1] = cpu_to_le32(e->attr);
+ buf[2] = cpu_to_le32(e->op);
+ rc = put_entry(buf, sizeof(u32), 3, fp);
+ if (rc)
+ return rc;
+
+ switch (e->expr_type) {
+ case CEXPR_NAMES:
+ rc = ebitmap_write(&e->names, fp);
+ if (rc)
+ return rc;
+ break;
+ default:
+ break;
+ }
+ }
+ }
+
+ return 0;
+}
+
+static int class_write(void *vkey, void *datum, void *ptr)
+{
+ char *key = vkey;
+ struct class_datum *cladatum = datum;
+ struct policy_data *pd = ptr;
+ void *fp = pd->fp;
+ struct policydb *p = pd->p;
+ struct constraint_node *c;
+ __le32 buf[6];
+ u32 ncons;
+ size_t len, len2;
+ int rc;
+
+ len = strlen(key);
+ if (cladatum->comkey)
+ len2 = strlen(cladatum->comkey);
+ else
+ len2 = 0;
+
+ ncons = 0;
+ for (c = cladatum->constraints; c; c = c->next)
+ ncons++;
+
+ buf[0] = cpu_to_le32(len);
+ buf[1] = cpu_to_le32(len2);
+ buf[2] = cpu_to_le32(cladatum->value);
+ buf[3] = cpu_to_le32(cladatum->permissions.nprim);
+ if (cladatum->permissions.table)
+ buf[4] = cpu_to_le32(cladatum->permissions.table->nel);
+ else
+ buf[4] = 0;
+ buf[5] = cpu_to_le32(ncons);
+ rc = put_entry(buf, sizeof(u32), 6, fp);
+ if (rc)
+ return rc;
+
+ rc = put_entry(key, 1, len, fp);
+ if (rc)
+ return rc;
+
+ if (cladatum->comkey) {
+ rc = put_entry(cladatum->comkey, 1, len2, fp);
+ if (rc)
+ return rc;
+ }
+
+ rc = hashtab_map(cladatum->permissions.table, perm_write, fp);
+ if (rc)
+ return rc;
+
+ rc = write_cons_helper(p, cladatum->constraints, fp);
+ if (rc)
+ return rc;
+
+ /* write out the validatetrans rule */
+ ncons = 0;
+ for (c = cladatum->validatetrans; c; c = c->next)
+ ncons++;
+
+ buf[0] = cpu_to_le32(ncons);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ rc = write_cons_helper(p, cladatum->validatetrans, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int role_write(void *vkey, void *datum, void *ptr)
+{
+ char *key = vkey;
+ struct role_datum *role = datum;
+ struct policy_data *pd = ptr;
+ void *fp = pd->fp;
+ struct policydb *p = pd->p;
+ __le32 buf[3];
+ size_t items, len;
+ int rc;
+
+ len = strlen(key);
+ items = 0;
+ buf[items++] = cpu_to_le32(len);
+ buf[items++] = cpu_to_le32(role->value);
+ if (p->policyvers >= POLICYDB_VERSION_BOUNDARY)
+ buf[items++] = cpu_to_le32(role->bounds);
+
+ BUG_ON(items > (sizeof(buf)/sizeof(buf[0])));
+
+ rc = put_entry(buf, sizeof(u32), items, fp);
+ if (rc)
+ return rc;
+
+ rc = put_entry(key, 1, len, fp);
+ if (rc)
+ return rc;
+
+ rc = ebitmap_write(&role->dominates, fp);
+ if (rc)
+ return rc;
+
+ rc = ebitmap_write(&role->types, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int type_write(void *vkey, void *datum, void *ptr)
+{
+ char *key = vkey;
+ struct type_datum *typdatum = datum;
+ struct policy_data *pd = ptr;
+ struct policydb *p = pd->p;
+ void *fp = pd->fp;
+ __le32 buf[4];
+ int rc;
+ size_t items, len;
+
+ len = strlen(key);
+ items = 0;
+ buf[items++] = cpu_to_le32(len);
+ buf[items++] = cpu_to_le32(typdatum->value);
+ if (p->policyvers >= POLICYDB_VERSION_BOUNDARY) {
+ u32 properties = 0;
+
+ if (typdatum->primary)
+ properties |= TYPEDATUM_PROPERTY_PRIMARY;
+
+ if (typdatum->attribute)
+ properties |= TYPEDATUM_PROPERTY_ATTRIBUTE;
+
+ buf[items++] = cpu_to_le32(properties);
+ buf[items++] = cpu_to_le32(typdatum->bounds);
+ } else {
+ buf[items++] = cpu_to_le32(typdatum->primary);
+ }
+ BUG_ON(items > (sizeof(buf) / sizeof(buf[0])));
+ rc = put_entry(buf, sizeof(u32), items, fp);
+ if (rc)
+ return rc;
+
+ rc = put_entry(key, 1, len, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int user_write(void *vkey, void *datum, void *ptr)
+{
+ char *key = vkey;
+ struct user_datum *usrdatum = datum;
+ struct policy_data *pd = ptr;
+ struct policydb *p = pd->p;
+ void *fp = pd->fp;
+ __le32 buf[3];
+ size_t items, len;
+ int rc;
+
+ len = strlen(key);
+ items = 0;
+ buf[items++] = cpu_to_le32(len);
+ buf[items++] = cpu_to_le32(usrdatum->value);
+ if (p->policyvers >= POLICYDB_VERSION_BOUNDARY)
+ buf[items++] = cpu_to_le32(usrdatum->bounds);
+ BUG_ON(items > (sizeof(buf) / sizeof(buf[0])));
+ rc = put_entry(buf, sizeof(u32), items, fp);
+ if (rc)
+ return rc;
+
+ rc = put_entry(key, 1, len, fp);
+ if (rc)
+ return rc;
+
+ rc = ebitmap_write(&usrdatum->roles, fp);
+ if (rc)
+ return rc;
+
+ rc = mls_write_range_helper(&usrdatum->range, fp);
+ if (rc)
+ return rc;
+
+ rc = mls_write_level(&usrdatum->dfltlevel, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int (*write_f[SYM_NUM]) (void *key, void *datum,
+ void *datap) =
+{
+ common_write,
+ class_write,
+ role_write,
+ type_write,
+ user_write,
+ cond_write_bool,
+ sens_write,
+ cat_write,
+};
+
+static int ocontext_write(struct policydb *p, struct policydb_compat_info *info,
+ void *fp)
+{
+ unsigned int i, j, rc;
+ size_t nel, len;
+ __le32 buf[3];
+ u32 nodebuf[8];
+ struct ocontext *c;
+ for (i = 0; i < info->ocon_num; i++) {
+ nel = 0;
+ for (c = p->ocontexts[i]; c; c = c->next)
+ nel++;
+ buf[0] = cpu_to_le32(nel);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ for (c = p->ocontexts[i]; c; c = c->next) {
+ switch (i) {
+ case OCON_ISID:
+ buf[0] = cpu_to_le32(c->sid[0]);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ rc = context_write(p, &c->context[0], fp);
+ if (rc)
+ return rc;
+ break;
+ case OCON_FS:
+ case OCON_NETIF:
+ len = strlen(c->u.name);
+ buf[0] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ rc = put_entry(c->u.name, 1, len, fp);
+ if (rc)
+ return rc;
+ rc = context_write(p, &c->context[0], fp);
+ if (rc)
+ return rc;
+ rc = context_write(p, &c->context[1], fp);
+ if (rc)
+ return rc;
+ break;
+ case OCON_PORT:
+ buf[0] = cpu_to_le32(c->u.port.protocol);
+ buf[1] = cpu_to_le32(c->u.port.low_port);
+ buf[2] = cpu_to_le32(c->u.port.high_port);
+ rc = put_entry(buf, sizeof(u32), 3, fp);
+ if (rc)
+ return rc;
+ rc = context_write(p, &c->context[0], fp);
+ if (rc)
+ return rc;
+ break;
+ case OCON_NODE:
+ nodebuf[0] = c->u.node.addr; /* network order */
+ nodebuf[1] = c->u.node.mask; /* network order */
+ rc = put_entry(nodebuf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+ rc = context_write(p, &c->context[0], fp);
+ if (rc)
+ return rc;
+ break;
+ case OCON_FSUSE:
+ buf[0] = cpu_to_le32(c->v.behavior);
+ len = strlen(c->u.name);
+ buf[1] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+ rc = put_entry(c->u.name, 1, len, fp);
+ if (rc)
+ return rc;
+ rc = context_write(p, &c->context[0], fp);
+ if (rc)
+ return rc;
+ break;
+ case OCON_NODE6:
+ for (j = 0; j < 4; j++)
+ nodebuf[j] = c->u.node6.addr[j]; /* network order */
+ for (j = 0; j < 4; j++)
+ nodebuf[j + 4] = c->u.node6.mask[j]; /* network order */
+ rc = put_entry(nodebuf, sizeof(u32), 8, fp);
+ if (rc)
+ return rc;
+ rc = context_write(p, &c->context[0], fp);
+ if (rc)
+ return rc;
+ break;
+ }
+ }
+ }
+ return 0;
+}
+
+static int genfs_write(struct policydb *p, void *fp)
+{
+ struct genfs *genfs;
+ struct ocontext *c;
+ size_t len;
+ __le32 buf[1];
+ int rc;
+
+ len = 0;
+ for (genfs = p->genfs; genfs; genfs = genfs->next)
+ len++;
+ buf[0] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ for (genfs = p->genfs; genfs; genfs = genfs->next) {
+ len = strlen(genfs->fstype);
+ buf[0] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ rc = put_entry(genfs->fstype, 1, len, fp);
+ if (rc)
+ return rc;
+ len = 0;
+ for (c = genfs->head; c; c = c->next)
+ len++;
+ buf[0] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ for (c = genfs->head; c; c = c->next) {
+ len = strlen(c->u.name);
+ buf[0] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ rc = put_entry(c->u.name, 1, len, fp);
+ if (rc)
+ return rc;
+ buf[0] = cpu_to_le32(c->v.sclass);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ rc = context_write(p, &c->context[0], fp);
+ if (rc)
+ return rc;
+ }
+ }
+ return 0;
+}
+
+static int range_count(void *key, void *data, void *ptr)
+{
+ int *cnt = ptr;
+ *cnt = *cnt + 1;
+
+ return 0;
+}
+
+static int range_write_helper(void *key, void *data, void *ptr)
+{
+ __le32 buf[2];
+ struct range_trans *rt = key;
+ struct mls_range *r = data;
+ struct policy_data *pd = ptr;
+ void *fp = pd->fp;
+ struct policydb *p = pd->p;
+ int rc;
+
+ buf[0] = cpu_to_le32(rt->source_type);
+ buf[1] = cpu_to_le32(rt->target_type);
+ rc = put_entry(buf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+ if (p->policyvers >= POLICYDB_VERSION_RANGETRANS) {
+ buf[0] = cpu_to_le32(rt->target_class);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+ }
+ rc = mls_write_range_helper(r, fp);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+static int range_write(struct policydb *p, void *fp)
+{
+ size_t nel;
+ __le32 buf[1];
+ int rc;
+ struct policy_data pd;
+
+ pd.p = p;
+ pd.fp = fp;
+
+ /* count the number of entries in the hashtab */
+ nel = 0;
+ rc = hashtab_map(p->range_tr, range_count, &nel);
+ if (rc)
+ return rc;
+
+ buf[0] = cpu_to_le32(nel);
+ rc = put_entry(buf, sizeof(u32), 1, fp);
+ if (rc)
+ return rc;
+
+ /* actually write all of the entries */
+ rc = hashtab_map(p->range_tr, range_write_helper, &pd);
+ if (rc)
+ return rc;
+
+ return 0;
+}
+
+/*
+ * Write the configuration data in a policy database
+ * structure to a policy database binary representation
+ * file.
+ */
+int policydb_write(struct policydb *p, void *fp)
+{
+ unsigned int i, num_syms;
+ int rc;
+ __le32 buf[4];
+ u32 config;
+ size_t len;
+ struct policydb_compat_info *info;
+
+ /*
+ * refuse to write policy older than compressed avtab
+ * to simplify the writer. There are other tests dropped
+ * since we assume this throughout the writer code. Be
+ * careful if you ever try to remove this restriction
+ */
+ if (p->policyvers < POLICYDB_VERSION_AVTAB) {
+ printk(KERN_ERR "SELinux: refusing to write policy version %d."
+ " Because it is less than version %d\n", p->policyvers,
+ POLICYDB_VERSION_AVTAB);
+ return -EINVAL;
+ }
+
+ config = 0;
+ if (p->mls_enabled)
+ config |= POLICYDB_CONFIG_MLS;
+
+ if (p->reject_unknown)
+ config |= REJECT_UNKNOWN;
+ if (p->allow_unknown)
+ config |= ALLOW_UNKNOWN;
+
+ /* Write the magic number and string identifiers. */
+ buf[0] = cpu_to_le32(POLICYDB_MAGIC);
+ len = strlen(POLICYDB_STRING);
+ buf[1] = cpu_to_le32(len);
+ rc = put_entry(buf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+ rc = put_entry(POLICYDB_STRING, 1, len, fp);
+ if (rc)
+ return rc;
+
+ /* Write the version, config, and table sizes. */
+ info = policydb_lookup_compat(p->policyvers);
+ if (!info) {
+ printk(KERN_ERR "SELinux: compatibility lookup failed for policy "
+ "version %d", p->policyvers);
+ return rc;
+ }
+
+ buf[0] = cpu_to_le32(p->policyvers);
+ buf[1] = cpu_to_le32(config);
+ buf[2] = cpu_to_le32(info->sym_num);
+ buf[3] = cpu_to_le32(info->ocon_num);
+
+ rc = put_entry(buf, sizeof(u32), 4, fp);
+ if (rc)
+ return rc;
+
+ if (p->policyvers >= POLICYDB_VERSION_POLCAP) {
+ rc = ebitmap_write(&p->policycaps, fp);
+ if (rc)
+ return rc;
+ }
+
+ if (p->policyvers >= POLICYDB_VERSION_PERMISSIVE) {
+ rc = ebitmap_write(&p->permissive_map, fp);
+ if (rc)
+ return rc;
+ }
+
+ num_syms = info->sym_num;
+ for (i = 0; i < num_syms; i++) {
+ struct policy_data pd;
+
+ pd.fp = fp;
+ pd.p = p;
+
+ buf[0] = cpu_to_le32(p->symtab[i].nprim);
+ buf[1] = cpu_to_le32(p->symtab[i].table->nel);
+
+ rc = put_entry(buf, sizeof(u32), 2, fp);
+ if (rc)
+ return rc;
+ rc = hashtab_map(p->symtab[i].table, write_f[i], &pd);
+ if (rc)
+ return rc;
+ }
+
+ rc = avtab_write(p, &p->te_avtab, fp);
+ if (rc)
+ return rc;
+
+ rc = cond_write_list(p, p->cond_list, fp);
+ if (rc)
+ return rc;
+
+ rc = role_trans_write(p->role_tr, fp);
+ if (rc)
+ return rc;
+
+ rc = role_allow_write(p->role_allow, fp);
+ if (rc)
+ return rc;
+
+ rc = ocontext_write(p, info, fp);
+ if (rc)
+ return rc;
+
+ rc = genfs_write(p, fp);
+ if (rc)
+ return rc;
+
+ rc = range_write(p, fp);
+ if (rc)
+ return rc;
+
+ for (i = 0; i < p->p_types.nprim; i++) {
+ struct ebitmap *e = flex_array_get(p->type_attr_map_array, i);
+
+ BUG_ON(!e);
+ rc = ebitmap_write(e, fp);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
struct ebitmap permissive_map;
+ /* length of this policy when it was loaded */
+ size_t len;
+
unsigned int policyvers;
unsigned int reject_unknown : 1;
extern int policydb_type_isvalid(struct policydb *p, unsigned int type);
extern int policydb_role_isvalid(struct policydb *p, unsigned int role);
extern int policydb_read(struct policydb *p, void *fp);
+extern int policydb_write(struct policydb *p, void *fp);
#define PERM_SYMTAB_SIZE 32
size_t len;
};
+struct policy_data {
+ struct policydb *p;
+ void *fp;
+};
+
static inline int next_entry(void *buf, struct policy_file *fp, size_t bytes)
{
if (bytes > fp->len)
return 0;
}
+static inline int put_entry(void *buf, size_t bytes, int num, struct policy_file *fp)
+{
+ size_t len = bytes * num;
+
+ memcpy(fp->data, buf, len);
+ fp->data += len;
+ fp->len -= len;
+
+ return 0;
+}
+
extern u16 string_to_security_class(struct policydb *p, const char *name);
extern u32 string_to_av_perm(struct policydb *p, u16 tclass, const char *name);
#include <linux/mutex.h>
#include <linux/selinux.h>
#include <linux/flex_array.h>
+#include <linux/vmalloc.h>
#include <net/netlabel.h>
#include "flask.h"
{
char *scontextp;
- *scontext = NULL;
+ if (scontext)
+ *scontext = NULL;
*scontext_len = 0;
if (context->len) {
*scontext_len += strlen(policydb.p_type_val_to_name[context->type - 1]) + 1;
*scontext_len += mls_compute_context_len(context);
+ if (!scontext)
+ return 0;
+
/* Allocate space for the context; caller must free this space. */
scontextp = kmalloc(*scontext_len, GFP_ATOMIC);
if (!scontextp)
struct context *context;
int rc = 0;
- *scontext = NULL;
+ if (scontext)
+ *scontext = NULL;
*scontext_len = 0;
if (!ss_initialized) {
char *scontextp;
*scontext_len = strlen(initial_sid_to_string[sid]) + 1;
+ if (!scontext)
+ goto out;
scontextp = kmalloc(*scontext_len, GFP_ATOMIC);
if (!scontextp) {
rc = -ENOMEM;
return rc;
}
+ policydb.len = len;
rc = selinux_set_mapping(&policydb, secclass_map,
¤t_mapping,
¤t_mapping_size);
selinux_complete_init();
avc_ss_reset(seqno);
selnl_notify_policyload(seqno);
+ selinux_status_update_policyload(seqno);
selinux_netlbl_cache_invalidate();
selinux_xfrm_notify_policyload();
return 0;
if (rc)
return rc;
+ newpolicydb.len = len;
/* If switching between different policy types, log MLS status */
if (policydb.mls_enabled && !newpolicydb.mls_enabled)
printk(KERN_INFO "SELinux: Disabling MLS support...\n");
avc_ss_reset(seqno);
selnl_notify_policyload(seqno);
+ selinux_status_update_policyload(seqno);
selinux_netlbl_cache_invalidate();
selinux_xfrm_notify_policyload();
}
+size_t security_policydb_len(void)
+{
+ size_t len;
+
+ read_lock(&policy_rwlock);
+ len = policydb.len;
+ read_unlock(&policy_rwlock);
+
+ return len;
+}
+
/**
* security_port_sid - Obtain the SID for a port.
* @protocol: protocol number
if (!rc) {
avc_ss_reset(seqno);
selnl_notify_policyload(seqno);
+ selinux_status_update_policyload(seqno);
selinux_xfrm_notify_policyload();
}
return rc;
return rc;
}
#endif /* CONFIG_NETLABEL */
+
+/**
+ * security_read_policy - read the policy.
+ * @data: binary policy data
+ * @len: length of data in bytes
+ *
+ */
+int security_read_policy(void **data, ssize_t *len)
+{
+ int rc;
+ struct policy_file fp;
+
+ if (!ss_initialized)
+ return -EINVAL;
+
+ *len = security_policydb_len();
+
+ *data = vmalloc_user(*len);
+ if (!*data)
+ return -ENOMEM;
+
+ fp.data = *data;
+ fp.len = *len;
+
+ read_lock(&policy_rwlock);
+ rc = policydb_write(&policydb, &fp);
+ read_unlock(&policy_rwlock);
+
+ if (rc)
+ return rc;
+
+ *len = (unsigned long)fp.data - (unsigned long)*data;
+ return 0;
+
+}
--- /dev/null
+/*
+ * mmap based event notifications for SELinux
+ *
+ * Author: KaiGai Kohei <kaigai@ak.jp.nec.com>
+ *
+ * Copyright (C) 2010 NEC corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2,
+ * as published by the Free Software Foundation.
+ */
+#include <linux/kernel.h>
+#include <linux/gfp.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include "avc.h"
+#include "services.h"
+
+/*
+ * The selinux_status_page shall be exposed to userspace applications
+ * using mmap interface on /selinux/status.
+ * It enables to notify applications a few events that will cause reset
+ * of userspace access vector without context switching.
+ *
+ * The selinux_kernel_status structure on the head of status page is
+ * protected from concurrent accesses using seqlock logic, so userspace
+ * application should reference the status page according to the seqlock
+ * logic.
+ *
+ * Typically, application checks status->sequence at the head of access
+ * control routine. If it is odd-number, kernel is updating the status,
+ * so please wait for a moment. If it is changed from the last sequence
+ * number, it means something happen, so application will reset userspace
+ * avc, if needed.
+ * In most cases, application shall confirm the kernel status is not
+ * changed without any system call invocations.
+ */
+static struct page *selinux_status_page;
+static DEFINE_MUTEX(selinux_status_lock);
+
+/*
+ * selinux_kernel_status_page
+ *
+ * It returns a reference to selinux_status_page. If the status page is
+ * not allocated yet, it also tries to allocate it at the first time.
+ */
+struct page *selinux_kernel_status_page(void)
+{
+ struct selinux_kernel_status *status;
+ struct page *result = NULL;
+
+ mutex_lock(&selinux_status_lock);
+ if (!selinux_status_page) {
+ selinux_status_page = alloc_page(GFP_KERNEL|__GFP_ZERO);
+
+ if (selinux_status_page) {
+ status = page_address(selinux_status_page);
+
+ status->version = SELINUX_KERNEL_STATUS_VERSION;
+ status->sequence = 0;
+ status->enforcing = selinux_enforcing;
+ /*
+ * NOTE: the next policyload event shall set
+ * a positive value on the status->policyload,
+ * although it may not be 1, but never zero.
+ * So, application can know it was updated.
+ */
+ status->policyload = 0;
+ status->deny_unknown = !security_get_allow_unknown();
+ }
+ }
+ result = selinux_status_page;
+ mutex_unlock(&selinux_status_lock);
+
+ return result;
+}
+
+/*
+ * selinux_status_update_setenforce
+ *
+ * It updates status of the current enforcing/permissive mode.
+ */
+void selinux_status_update_setenforce(int enforcing)
+{
+ struct selinux_kernel_status *status;
+
+ mutex_lock(&selinux_status_lock);
+ if (selinux_status_page) {
+ status = page_address(selinux_status_page);
+
+ status->sequence++;
+ smp_wmb();
+
+ status->enforcing = enforcing;
+
+ smp_wmb();
+ status->sequence++;
+ }
+ mutex_unlock(&selinux_status_lock);
+}
+
+/*
+ * selinux_status_update_policyload
+ *
+ * It updates status of the times of policy reloaded, and current
+ * setting of deny_unknown.
+ */
+void selinux_status_update_policyload(int seqno)
+{
+ struct selinux_kernel_status *status;
+
+ mutex_lock(&selinux_status_lock);
+ if (selinux_status_page) {
+ status = page_address(selinux_status_page);
+
+ status->sequence++;
+ smp_wmb();
+
+ status->policyload = seqno;
+ status->deny_unknown = !security_get_allow_unknown();
+
+ smp_wmb();
+ status->sequence++;
+ }
+ mutex_unlock(&selinux_status_lock);
+}
*
* Return 0 if read access is permitted
*/
-static int smack_task_setscheduler(struct task_struct *p, int policy,
- struct sched_param *lp)
+static int smack_task_setscheduler(struct task_struct *p)
{
int rc;
- rc = cap_task_setscheduler(p, policy, lp);
+ rc = cap_task_setscheduler(p);
if (rc == 0)
rc = smk_curacc_on_task(p, MAY_WRITE);
return rc;
{
char *sp = smack_from_secid(secid);
- *secdata = sp;
+ if (secdata)
+ *secdata = sp;
*seclen = strlen(sp);
return 0;
}
return true; /* Do nothing if open(O_WRONLY). */
memset(&head->r, 0, sizeof(head->r));
head->r.print_this_domain_only = true;
- head->r.eof = !domain;
- head->r.domain = &domain->list;
+ if (domain)
+ head->r.domain = &domain->list;
+ else
+ head->r.eof = 1;
tomoyo_io_printf(head, "# select %s\n", data);
if (domain && domain->is_deleted)
tomoyo_io_printf(head, "# This is a deleted domain.\n");
const pid_t gpid = task_pid_nr(current);
static const int tomoyo_buffer_len = 4096;
char *buffer = kmalloc(tomoyo_buffer_len, GFP_NOFS);
+ pid_t ppid;
if (!buffer)
return NULL;
do_gettimeofday(&tv);
+ rcu_read_lock();
+ ppid = task_tgid_vnr(current->real_parent);
+ rcu_read_unlock();
snprintf(buffer, tomoyo_buffer_len - 1,
"#timestamp=%lu profile=%u mode=%s (global-pid=%u)"
" task={ pid=%u ppid=%u uid=%u gid=%u euid=%u"
" egid=%u suid=%u sgid=%u fsuid=%u fsgid=%u }",
tv.tv_sec, r->profile, tomoyo_mode[r->mode], gpid,
- (pid_t) sys_getpid(), (pid_t) sys_getppid(),
+ task_tgid_vnr(current), ppid,
current_uid(), current_gid(), current_euid(),
current_egid(), current_suid(), current_sgid(),
current_fsuid(), current_fsgid());
const u8 profile = domain->profile;
if (tomoyo_profile_ptr[profile])
continue;
+ printk(KERN_ERR "You need to define profile %u before using it.\n",
+ profile);
+ printk(KERN_ERR "Please see http://tomoyo.sourceforge.jp/2.3/ "
+ "for more information.\n");
panic("Profile %u (used by '%s') not defined.\n",
profile, domain->domainname->name);
}
tomoyo_read_unlock(idx);
- if (tomoyo_profile_version != 20090903)
+ if (tomoyo_profile_version != 20090903) {
+ printk(KERN_ERR "You need to install userland programs for "
+ "TOMOYO 2.3 and initialize policy configuration.\n");
+ printk(KERN_ERR "Please see http://tomoyo.sourceforge.jp/2.3/ "
+ "for more information.\n");
panic("Profile version %u is not supported.\n",
tomoyo_profile_version);
+ }
printk(KERN_INFO "TOMOYO: 2.3.0\n");
printk(KERN_INFO "Mandatory Access Control activated.\n");
}
/********** Function prototypes. **********/
-extern asmlinkage long sys_getpid(void);
-extern asmlinkage long sys_getppid(void);
-
/* Check whether the given string starts with the given keyword. */
bool tomoyo_str_starts(char **src, const char *find);
/* Get tomoyo_realpath() of current process. */
/* max number of user-defined controls */
#define MAX_USER_CONTROLS 32
+#define MAX_CONTROL_COUNT 1028
struct snd_kctl_ioctl {
struct list_head list; /* list of all ioctls */
if (snd_BUG_ON(!control || !control->count))
return NULL;
+
+ if (control->count > MAX_CONTROL_COUNT)
+ return NULL;
+
kctl = kzalloc(sizeof(*kctl) + sizeof(struct snd_kcontrol_volatile) * control->count, GFP_KERNEL);
if (kctl == NULL) {
snd_printk(KERN_ERR "Cannot allocate control instance\n");
struct snd_info_buffer *buffer)
{
struct snd_pcm_substream *substream = entry->private_data;
- struct snd_pcm_runtime *runtime = substream->runtime;
+ struct snd_pcm_runtime *runtime;
+
+ mutex_lock(&substream->pcm->open_mutex);
+ runtime = substream->runtime;
if (!runtime) {
snd_iprintf(buffer, "closed\n");
- return;
+ goto unlock;
}
if (runtime->status->state == SNDRV_PCM_STATE_OPEN) {
snd_iprintf(buffer, "no setup\n");
- return;
+ goto unlock;
}
snd_iprintf(buffer, "access: %s\n", snd_pcm_access_name(runtime->access));
snd_iprintf(buffer, "format: %s\n", snd_pcm_format_name(runtime->format));
snd_iprintf(buffer, "OSS period frames: %lu\n", (unsigned long)runtime->oss.period_frames);
}
#endif
+ unlock:
+ mutex_unlock(&substream->pcm->open_mutex);
}
static void snd_pcm_substream_proc_sw_params_read(struct snd_info_entry *entry,
struct snd_info_buffer *buffer)
{
struct snd_pcm_substream *substream = entry->private_data;
- struct snd_pcm_runtime *runtime = substream->runtime;
+ struct snd_pcm_runtime *runtime;
+
+ mutex_lock(&substream->pcm->open_mutex);
+ runtime = substream->runtime;
if (!runtime) {
snd_iprintf(buffer, "closed\n");
- return;
+ goto unlock;
}
if (runtime->status->state == SNDRV_PCM_STATE_OPEN) {
snd_iprintf(buffer, "no setup\n");
- return;
+ goto unlock;
}
snd_iprintf(buffer, "tstamp_mode: %s\n", snd_pcm_tstamp_mode_name(runtime->tstamp_mode));
snd_iprintf(buffer, "period_step: %u\n", runtime->period_step);
snd_iprintf(buffer, "silence_threshold: %lu\n", runtime->silence_threshold);
snd_iprintf(buffer, "silence_size: %lu\n", runtime->silence_size);
snd_iprintf(buffer, "boundary: %lu\n", runtime->boundary);
+ unlock:
+ mutex_unlock(&substream->pcm->open_mutex);
}
static void snd_pcm_substream_proc_status_read(struct snd_info_entry *entry,
struct snd_info_buffer *buffer)
{
struct snd_pcm_substream *substream = entry->private_data;
- struct snd_pcm_runtime *runtime = substream->runtime;
+ struct snd_pcm_runtime *runtime;
struct snd_pcm_status status;
int err;
+
+ mutex_lock(&substream->pcm->open_mutex);
+ runtime = substream->runtime;
if (!runtime) {
snd_iprintf(buffer, "closed\n");
- return;
+ goto unlock;
}
memset(&status, 0, sizeof(status));
err = snd_pcm_status(substream, &status);
if (err < 0) {
snd_iprintf(buffer, "error %d\n", err);
- return;
+ goto unlock;
}
snd_iprintf(buffer, "state: %s\n", snd_pcm_state_name(status.state));
snd_iprintf(buffer, "owner_pid : %d\n", pid_vnr(substream->pid));
snd_iprintf(buffer, "-----\n");
snd_iprintf(buffer, "hw_ptr : %ld\n", runtime->status->hw_ptr);
snd_iprintf(buffer, "appl_ptr : %ld\n", runtime->control->appl_ptr);
+ unlock:
+ mutex_unlock(&substream->pcm->open_mutex);
}
#ifdef CONFIG_SND_PCM_XRUN_DEBUG
substream->ops->close(substream);
substream->hw_opened = 0;
}
+ if (pm_qos_request_active(&substream->latency_pm_qos_req))
+ pm_qos_remove_request(&substream->latency_pm_qos_req);
if (substream->pcm_release) {
substream->pcm_release(substream);
substream->pcm_release = NULL;
{
struct snd_rawmidi_file *rfile;
struct snd_rawmidi *rmidi;
+ struct module *module;
rfile = file->private_data;
rmidi = rfile->rmidi;
rawmidi_release_priv(rfile);
kfree(rfile);
+ module = rmidi->card->module;
snd_card_file_remove(rmidi->card, file);
- module_put(rmidi->card->module);
+ module_put(module);
return 0;
}
if (get_user(device, (int __user *)argp))
return -EFAULT;
+ if (device >= SNDRV_RAWMIDI_DEVICES) /* next device is -1 */
+ device = SNDRV_RAWMIDI_DEVICES - 1;
mutex_lock(®ister_mutex);
device = device < 0 ? 0 : device + 1;
while (device < SNDRV_RAWMIDI_DEVICES) {
return 0;
_error:
- snd_seq_oss_writeq_delete(dp->writeq);
- snd_seq_oss_readq_delete(dp->readq);
snd_seq_oss_synth_cleanup(dp);
snd_seq_oss_midi_cleanup(dp);
- delete_port(dp);
delete_seq_queue(dp->queue);
- kfree(dp);
+ delete_port(dp);
return rc;
}
static int
delete_port(struct seq_oss_devinfo *dp)
{
- if (dp->port < 0)
+ if (dp->port < 0) {
+ kfree(dp);
return 0;
+ }
debug_printk(("delete_port %i\n", dp->port));
return snd_seq_event_port_detach(dp->cseq, dp->port);
return 0;
}
#else /* !CONFIG_PROC_FS */
-static int proc_init(struct snd_akm4xxx *ak) {}
+static int proc_init(struct snd_akm4xxx *ak) { return 0; }
#endif
int snd_akm4xxx_build_controls(struct snd_akm4xxx *ak)
static int irq[SNDRV_CARDS] = SNDRV_DEFAULT_IRQ;
static long mem[SNDRV_CARDS] = SNDRV_DEFAULT_PORT;
+#ifndef MSND_CLASSIC
static long cfg[SNDRV_CARDS] = SNDRV_DEFAULT_PORT;
-#ifndef MSND_CLASSIC
/* Extra Peripheral Configuration (Default: Disable) */
static long ide_io0[SNDRV_CARDS] = SNDRV_DEFAULT_PORT;
static long ide_io1[SNDRV_CARDS] = SNDRV_DEFAULT_PORT;
struct snd_card *card;
struct snd_msnd *chip;
- if (has_isapnp(idx) || cfg[idx] == SNDRV_AUTO_PORT) {
+ if (has_isapnp(idx)
+#ifndef MSND_CLASSIC
+ || cfg[idx] == SNDRV_AUTO_PORT
+#endif
+ ) {
printk(KERN_INFO LOGNAME ": Assuming PnP mode\n");
return -ENODEV;
}
case SND_DEV_DSP:
case SND_DEV_DSP16:
case SND_DEV_AUDIO:
- return audio_ioctl(dev, file, cmd, p);
+ ret = audio_ioctl(dev, file, cmd, p);
break;
case SND_DEV_MIDIN:
- return MIDIbuf_ioctl(dev, file, cmd, p);
+ ret = MIDIbuf_ioctl(dev, file, cmd, p);
break;
}
cfg->hp_outs--;
memmove(cfg->hp_pins + i, cfg->hp_pins + i + 1,
sizeof(cfg->hp_pins[0]) * (cfg->hp_outs - i));
- memmove(sequences_hp + i - 1, sequences_hp + i,
+ memmove(sequences_hp + i, sequences_hp + i + 1,
sizeof(sequences_hp[0]) * (cfg->hp_outs - i));
}
}
"{Intel, ICH10},"
"{Intel, PCH},"
"{Intel, CPT},"
+ "{Intel, PBG},"
"{Intel, SCH},"
"{ATI, SB450},"
"{ATI, SB600},"
{ PCI_DEVICE(0x8086, 0x3b57), .driver_data = AZX_DRIVER_ICH },
/* CPT */
{ PCI_DEVICE(0x8086, 0x1c20), .driver_data = AZX_DRIVER_PCH },
+ /* PBG */
+ { PCI_DEVICE(0x8086, 0x1d20), .driver_data = AZX_DRIVER_PCH },
/* SCH */
{ PCI_DEVICE(0x8086, 0x811b), .driver_data = AZX_DRIVER_SCH },
/* ATI SB 450/600 */
/* Lenovo Thinkpad T61/X61 */
SND_PCI_QUIRK_VENDOR(0x17aa, "Lenovo Thinkpad", AD1984_THINKPAD),
SND_PCI_QUIRK(0x1028, 0x0214, "Dell T3400", AD1984_DELL_DESKTOP),
+ SND_PCI_QUIRK(0x1028, 0x0233, "Dell Latitude E6400", AD1984_DELL_DESKTOP),
{}
};
{} /* terminator */
};
+/* Errata: CS4207 rev C0/C1/C2 Silicon
+ *
+ * http://www.cirrus.com/en/pubs/errata/ER880C3.pdf
+ *
+ * 6. At high temperature (TA > +85°C), the digital supply current (IVD)
+ * may be excessive (up to an additional 200 μA), which is most easily
+ * observed while the part is being held in reset (RESET# active low).
+ *
+ * Root Cause: At initial powerup of the device, the logic that drives
+ * the clock and write enable to the S/PDIF SRC RAMs is not properly
+ * initialized.
+ * Certain random patterns will cause a steady leakage current in those
+ * RAM cells. The issue will resolve once the SRCs are used (turned on).
+ *
+ * Workaround: The following verb sequence briefly turns on the S/PDIF SRC
+ * blocks, which will alleviate the issue.
+ */
+
+static struct hda_verb cs_errata_init_verbs[] = {
+ {0x01, AC_VERB_SET_POWER_STATE, 0x00}, /* AFG: D0 */
+ {0x11, AC_VERB_SET_PROC_STATE, 0x01}, /* VPW: processing on */
+
+ {0x11, AC_VERB_SET_COEF_INDEX, 0x0008},
+ {0x11, AC_VERB_SET_PROC_COEF, 0x9999},
+ {0x11, AC_VERB_SET_COEF_INDEX, 0x0017},
+ {0x11, AC_VERB_SET_PROC_COEF, 0xa412},
+ {0x11, AC_VERB_SET_COEF_INDEX, 0x0001},
+ {0x11, AC_VERB_SET_PROC_COEF, 0x0009},
+
+ {0x07, AC_VERB_SET_POWER_STATE, 0x00}, /* S/PDIF Rx: D0 */
+ {0x08, AC_VERB_SET_POWER_STATE, 0x00}, /* S/PDIF Tx: D0 */
+
+ {0x11, AC_VERB_SET_COEF_INDEX, 0x0017},
+ {0x11, AC_VERB_SET_PROC_COEF, 0x2412},
+ {0x11, AC_VERB_SET_COEF_INDEX, 0x0008},
+ {0x11, AC_VERB_SET_PROC_COEF, 0x0000},
+ {0x11, AC_VERB_SET_COEF_INDEX, 0x0001},
+ {0x11, AC_VERB_SET_PROC_COEF, 0x0008},
+ {0x11, AC_VERB_SET_PROC_STATE, 0x00},
+
+ {0x07, AC_VERB_SET_POWER_STATE, 0x03}, /* S/PDIF Rx: D3 */
+ {0x08, AC_VERB_SET_POWER_STATE, 0x03}, /* S/PDIF Tx: D3 */
+ /*{0x01, AC_VERB_SET_POWER_STATE, 0x03},*/ /* AFG: D3 This is already handled */
+
+ {} /* terminator */
+};
+
/* SPDIF setup */
static void init_digital(struct hda_codec *codec)
{
{
struct cs_spec *spec = codec->spec;
+ /* init_verb sequence for C0/C1/C2 errata*/
+ snd_hda_sequence_write(codec, cs_errata_init_verbs);
+
snd_hda_sequence_write(codec, cs_coef_init_verbs);
if (spec->gpio_mask) {
unsigned int dell_vostro:1;
unsigned int ideapad:1;
unsigned int thinkpad:1;
+ unsigned int hp_laptop:1;
unsigned int ext_mic_present;
unsigned int recording;
}
}
+/* toggle input of built-in digital mic and mic jack appropriately */
+static void cxt5066_hp_laptop_automic(struct hda_codec *codec)
+{
+ unsigned int present;
+
+ present = snd_hda_jack_detect(codec, 0x1b);
+ snd_printdd("CXT5066: external microphone present=%d\n", present);
+ snd_hda_codec_write(codec, 0x17, 0, AC_VERB_SET_CONNECT_SEL,
+ present ? 1 : 3);
+}
+
+
/* toggle input of built-in digital mic and mic jack appropriately
order is: external mic -> dock mic -> interal mic */
static void cxt5066_thinkpad_automic(struct hda_codec *codec)
}
/* unsolicited event for jack sensing */
+static void cxt5066_hp_laptop_event(struct hda_codec *codec, unsigned int res)
+{
+ snd_printdd("CXT5066_hp_laptop: unsol event %x (%x)\n", res, res >> 26);
+ switch (res >> 26) {
+ case CONEXANT_HP_EVENT:
+ cxt5066_hp_automute(codec);
+ break;
+ case CONEXANT_MIC_EVENT:
+ cxt5066_hp_laptop_automic(codec);
+ break;
+ }
+}
+
+/* unsolicited event for jack sensing */
static void cxt5066_thinkpad_event(struct hda_codec *codec, unsigned int res)
{
snd_printdd("CXT5066_thinkpad: unsol event %x (%x)\n", res, res >> 26);
{ } /* end */
};
+
+static struct hda_verb cxt5066_init_verbs_hp_laptop[] = {
+ {0x14, AC_VERB_SET_CONNECT_SEL, 0x0},
+ {0x19, AC_VERB_SET_UNSOLICITED_ENABLE, AC_USRSP_EN | CONEXANT_HP_EVENT},
+ {0x1b, AC_VERB_SET_UNSOLICITED_ENABLE, AC_USRSP_EN | CONEXANT_MIC_EVENT},
+ { } /* end */
+};
+
/* initialize jack-sensing, too */
static int cxt5066_init(struct hda_codec *codec)
{
cxt5066_ideapad_automic(codec);
else if (spec->thinkpad)
cxt5066_thinkpad_automic(codec);
+ else if (spec->hp_laptop)
+ cxt5066_hp_laptop_automic(codec);
}
cxt5066_set_mic_boost(codec);
return 0;
CXT5066_DELL_VOSTO, /* Dell Vostro 1015i */
CXT5066_IDEAPAD, /* Lenovo IdeaPad U150 */
CXT5066_THINKPAD, /* Lenovo ThinkPad T410s, others? */
+ CXT5066_HP_LAPTOP, /* HP Laptop */
CXT5066_MODELS
};
[CXT5066_DELL_VOSTO] = "dell-vostro",
[CXT5066_IDEAPAD] = "ideapad",
[CXT5066_THINKPAD] = "thinkpad",
+ [CXT5066_HP_LAPTOP] = "hp-laptop",
};
static struct snd_pci_quirk cxt5066_cfg_tbl[] = {
SND_PCI_QUIRK(0x1028, 0x02d8, "Dell Vostro", CXT5066_DELL_VOSTO),
SND_PCI_QUIRK(0x1028, 0x0402, "Dell Vostro", CXT5066_DELL_VOSTO),
SND_PCI_QUIRK(0x1028, 0x0408, "Dell Inspiron One 19T", CXT5066_IDEAPAD),
+ SND_PCI_QUIRK(0x103c, 0x360b, "HP G60", CXT5066_HP_LAPTOP),
+ SND_PCI_QUIRK(0x1179, 0xff1e, "Toshiba Satellite C650D", CXT5066_IDEAPAD),
SND_PCI_QUIRK(0x1179, 0xff50, "Toshiba Satellite P500-PSPGSC-01800T", CXT5066_OLPC_XO_1_5),
SND_PCI_QUIRK(0x1179, 0xffe0, "Toshiba Satellite Pro T130-15F", CXT5066_OLPC_XO_1_5),
+ SND_PCI_QUIRK(0x17aa, 0x20f2, "Lenovo T400s", CXT5066_THINKPAD),
SND_PCI_QUIRK(0x17aa, 0x21b2, "Thinkpad X100e", CXT5066_IDEAPAD),
SND_PCI_QUIRK(0x17aa, 0x21b3, "Thinkpad Edge 13 (197)", CXT5066_IDEAPAD),
SND_PCI_QUIRK(0x17aa, 0x21b4, "Thinkpad Edge", CXT5066_IDEAPAD),
spec->num_init_verbs++;
spec->dell_automute = 1;
break;
+ case CXT5066_HP_LAPTOP:
+ codec->patch_ops.init = cxt5066_init;
+ codec->patch_ops.unsol_event = cxt5066_hp_laptop_event;
+ spec->init_verbs[spec->num_init_verbs] =
+ cxt5066_init_verbs_hp_laptop;
+ spec->num_init_verbs++;
+ spec->hp_laptop = 1;
+ spec->mixers[spec->num_mixers++] = cxt5066_mixer_master;
+ spec->mixers[spec->num_mixers++] = cxt5066_mixers;
+ /* no S/PDIF out */
+ spec->multiout.dig_out_nid = 0;
+ /* input source automatically selected */
+ spec->input_mux = NULL;
+ spec->port_d_mode = 0;
+ spec->mic_boost = 3; /* default 30dB gain */
+ break;
+
case CXT5066_OLPC_XO_1_5:
codec->patch_ops.init = cxt5066_olpc_init;
codec->patch_ops.unsol_event = cxt5066_olpc_unsol_event;
#else
/* support all rates and formats */
#define SUPPORTED_RATES \
- (SNDRV_PCM_RATE_22050 | SNDRV_PCM_RATE_44100 | SNDRV_PCM_RATE_48000 |\
+ (SNDRV_PCM_RATE_32000 | SNDRV_PCM_RATE_44100 | SNDRV_PCM_RATE_48000 |\
SNDRV_PCM_RATE_88200 | SNDRV_PCM_RATE_96000 | SNDRV_PCM_RATE_176400 |\
SNDRV_PCM_RATE_192000)
#define SUPPORTED_MAXBPS 24
}
if (spec->autocfg.dig_in_pin) {
- hda_nid_t dig_nid;
- err = snd_hda_get_connections(codec,
- spec->autocfg.dig_in_pin,
- &dig_nid, 1);
- if (err > 0)
- spec->dig_in_nid = dig_nid;
+ dig_nid = codec->start_nid;
+ for (i = 0; i < codec->num_nodes; i++, dig_nid++) {
+ unsigned int wcaps = get_wcaps(codec, dig_nid);
+ if (get_wcaps_type(wcaps) != AC_WID_AUD_IN)
+ continue;
+ if (!(wcaps & AC_WCAP_DIGITAL))
+ continue;
+ if (!(wcaps & AC_WCAP_CONN_LIST))
+ continue;
+ err = get_connection_index(codec, dig_nid,
+ spec->autocfg.dig_in_pin);
+ if (err >= 0) {
+ spec->dig_in_nid = dig_nid;
+ break;
+ }
+ }
}
}
static struct snd_pci_quirk beep_white_list[] = {
SND_PCI_QUIRK(0x1043, 0x829f, "ASUS", 1),
+ SND_PCI_QUIRK(0x1043, 0x83ce, "EeePC", 1),
SND_PCI_QUIRK(0x8086, 0xd613, "Intel", 1),
{}
};
enum {
ALC269_FIXUP_SONY_VAIO,
+ ALC269_FIXUP_DELL_M101Z,
};
static const struct hda_verb alc269_sony_vaio_fixup_verbs[] = {
[ALC269_FIXUP_SONY_VAIO] = {
.verbs = alc269_sony_vaio_fixup_verbs
},
+ [ALC269_FIXUP_DELL_M101Z] = {
+ .verbs = (const struct hda_verb[]) {
+ /* Enables internal speaker */
+ {0x20, AC_VERB_SET_COEF_INDEX, 13},
+ {0x20, AC_VERB_SET_PROC_COEF, 0x4040},
+ {}
+ }
+ },
};
static struct snd_pci_quirk alc269_fixup_tbl[] = {
SND_PCI_QUIRK(0x104d, 0x9071, "Sony VAIO", ALC269_FIXUP_SONY_VAIO),
SND_PCI_QUIRK(0x104d, 0x9077, "Sony VAIO", ALC269_FIXUP_SONY_VAIO),
+ SND_PCI_QUIRK(0x1028, 0x0470, "Dell M101z", ALC269_FIXUP_DELL_M101Z),
{}
};
"HP dv6", STAC_HP_DV5),
SND_PCI_QUIRK(PCI_VENDOR_ID_HP, 0x3061,
"HP dv6", STAC_HP_DV5), /* HP dv6-1110ax */
+ SND_PCI_QUIRK(PCI_VENDOR_ID_HP, 0x363e,
+ "HP DV6", STAC_HP_DV5),
SND_PCI_QUIRK_MASK(PCI_VENDOR_ID_HP, 0xfff0, 0x7010,
"HP", STAC_HP_DV5),
SND_PCI_QUIRK(PCI_VENDOR_ID_DELL, 0x0233,
chip->model.suspend = claro_suspend;
chip->model.resume = claro_resume;
chip->model.set_adc_params = set_ak5385_params;
+ chip->model.device_config = PLAYBACK_0_TO_I2S |
+ PLAYBACK_1_TO_SPDIF |
+ CAPTURE_0_FROM_I2S_2 |
+ CAPTURE_1_FROM_SPDIF;
break;
}
if (id->driver_data == MODEL_MERIDIAN ||
int oxygen_pci_suspend(struct pci_dev *pci, pm_message_t state);
int oxygen_pci_resume(struct pci_dev *pci);
#endif
+void oxygen_pci_shutdown(struct pci_dev *pci);
/* oxygen_mixer.c */
}
}
-static void oxygen_card_free(struct snd_card *card)
+static void oxygen_shutdown(struct oxygen *chip)
{
- struct oxygen *chip = card->private_data;
-
spin_lock_irq(&chip->reg_lock);
chip->interrupt_mask = 0;
chip->pcm_running = 0;
oxygen_write16(chip, OXYGEN_DMA_STATUS, 0);
oxygen_write16(chip, OXYGEN_INTERRUPT_MASK, 0);
spin_unlock_irq(&chip->reg_lock);
+}
+
+static void oxygen_card_free(struct snd_card *card)
+{
+ struct oxygen *chip = card->private_data;
+
+ oxygen_shutdown(chip);
if (chip->irq >= 0)
free_irq(chip->irq, chip);
flush_scheduled_work();
}
EXPORT_SYMBOL(oxygen_pci_resume);
#endif /* CONFIG_PM */
+
+void oxygen_pci_shutdown(struct pci_dev *pci)
+{
+ struct snd_card *card = pci_get_drvdata(pci);
+ struct oxygen *chip = card->private_data;
+
+ oxygen_shutdown(chip);
+ chip->model.cleanup(chip);
+}
+EXPORT_SYMBOL(oxygen_pci_shutdown);
.suspend = oxygen_pci_suspend,
.resume = oxygen_pci_resume,
#endif
+ .shutdown = oxygen_pci_shutdown,
};
static int __init alsa_card_xonar_init(void)
struct xonar_generic generic;
u16 wm8776_regs[0x17];
u16 wm8766_regs[0x10];
+ struct snd_kcontrol *line_adcmux_control;
+ struct snd_kcontrol *mic_adcmux_control;
struct snd_kcontrol *lc_controls[13];
};
static void xonar_ds_cleanup(struct oxygen *chip)
{
xonar_disable_output(chip);
+ wm8776_write(chip, WM8776_RESET, 0);
}
static void xonar_ds_suspend(struct oxygen *chip)
{
struct oxygen *chip = ctl->private_data;
struct xonar_wm87x6 *data = chip->model_data;
+ struct snd_kcontrol *other_ctl;
unsigned int mux_bit = ctl->private_value;
u16 reg;
int changed;
mutex_lock(&chip->mutex);
reg = data->wm8776_regs[WM8776_ADCMUX];
if (value->value.integer.value[0]) {
- reg &= ~0x003;
reg |= mux_bit;
+ /* line-in and mic-in are exclusive */
+ mux_bit ^= 3;
+ if (reg & mux_bit) {
+ reg &= ~mux_bit;
+ if (mux_bit == 1)
+ other_ctl = data->line_adcmux_control;
+ else
+ other_ctl = data->mic_adcmux_control;
+ snd_ctl_notify(chip->card, SNDRV_CTL_EVENT_MASK_VALUE,
+ &other_ctl->id);
+ }
} else
reg &= ~mux_bit;
changed = reg != data->wm8776_regs[WM8776_ADCMUX];
err = snd_ctl_add(chip->card, ctl);
if (err < 0)
return err;
+ if (!strcmp(ctl->id.name, "Line Capture Switch"))
+ data->line_adcmux_control = ctl;
+ else if (!strcmp(ctl->id.name, "Mic Capture Switch"))
+ data->mic_adcmux_control = ctl;
}
+ if (!data->line_adcmux_control || !data->mic_adcmux_control)
+ return -ENXIO;
BUILD_BUG_ON(ARRAY_SIZE(lc_controls) != ARRAY_SIZE(data->lc_controls));
for (i = 0; i < ARRAY_SIZE(lc_controls); ++i) {
ctl = snd_ctl_new1(&lc_controls[i], chip);
if (err < 0)
return err;
+ memset(&info, 0, sizeof(info));
spin_lock_irqsave(&hdsp->lock, flags);
info.pref_sync_ref = (unsigned char)hdsp_pref_sync_ref(hdsp);
info.wordclock_sync_check = (unsigned char)hdsp_wc_sync_check(hdsp);
case SNDRV_HDSPM_IOCTL_GET_CONFIG_INFO:
+ memset(&info, 0, sizeof(info));
spin_lock_irq(&hdspm->lock);
info.pref_sync_ref = hdspm_pref_sync_ref(hdspm);
info.wordclock_sync_check = hdspm_wc_sync_check(hdspm);
rate * delay_ms / 1000)
* substream->runtime->channels;
- pr_debug(KERN_ERR "%s: time=%d rate=%d bytes=%ld, frames=%d, ret=%d\n",
+ pr_debug("%s: time=%d rate=%d bytes=%ld, frames=%d, ret=%d\n",
__func__,
delay_ms,
rate,
if ((pos + len) > prtd->dma_end) {
len = prtd->dma_end - pos;
- pr_debug(KERN_DEBUG "%s: corrected dma len %ld\n",
- __func__, len);
+ pr_debug("%s: corrected dma len %ld\n", __func__, len);
}
ret = s3c2410_dma_enqueue(prtd->params->channel,
#include <linux/firmware.h>
#include <linux/module.h>
+#include <asm/clkdev.h>
#include <asm/clock.h>
#include <cpu/sh7722.h>
};
static struct clk siumckb_clk = {
- .name = "siumckb_clk",
- .id = -1,
.ops = &siumckb_clk_ops,
.rate = 0, /* initialised at run-time */
};
+static struct clk_lookup *siumckb_lookup;
+
static int migor_hw_params(struct snd_pcm_substream *substream,
struct snd_pcm_hw_params *params)
{
if (ret < 0)
return ret;
+ siumckb_lookup = clkdev_alloc(&siumckb_clk, "siumckb_clk", NULL);
+ if (!siumckb_lookup) {
+ ret = -ENOMEM;
+ goto eclkdevalloc;
+ }
+ clkdev_add(siumckb_lookup);
+
/* Port number used on this machine: port B */
migor_snd_device = platform_device_alloc("soc-audio", 1);
if (!migor_snd_device) {
epdevadd:
platform_device_put(migor_snd_device);
epdevalloc:
+ clkdev_drop(siumckb_lookup);
+eclkdevalloc:
clk_unregister(&siumckb_clk);
return ret;
}
static void __exit migor_exit(void)
{
+ clkdev_drop(siumckb_lookup);
clk_unregister(&siumckb_clk);
platform_device_unregister(migor_snd_device);
}
data[1] = (value >> 8) & 0xff;
data[2] = value & 0xff;
- if (!snd_soc_codec_volatile_register(codec, reg))
- reg_cache[reg] = value;
+ if (!snd_soc_codec_volatile_register(codec, reg)
+ && reg < codec->reg_cache_size)
+ reg_cache[reg] = value;
if (codec->cache_only) {
codec->cache_sync = 1;
for (idx = 0; idx < 2; idx++) {
subs = &as->substream[idx];
if (!subs->num_formats)
- return;
+ continue;
snd_usb_release_substream_urbs(subs, 1);
subs->interface = -1;
}
}
switch (protocol) {
+ default:
+ snd_printdd(KERN_WARNING "unknown interface protocol %#02x, assuming v1\n",
+ protocol);
+ /* fall through */
+
case UAC_VERSION_1: {
struct uac1_ac_header_descriptor *h1 = control_header;
break;
}
-
- default:
- snd_printk(KERN_ERR "unknown protocol version 0x%02x\n", protocol);
- return -EINVAL;
}
return 0;
goto __error;
}
- chip->ctrl_intf = alts;
+ /*
+ * For devices with more than one control interface, we assume the
+ * first contains the audio controls. We might need a more specific
+ * check here in the future.
+ */
+ if (!chip->ctrl_intf)
+ chip->ctrl_intf = alts;
if (err > 0) {
/* create normal USB audio interfaces */
switch (altsd->bInterfaceProtocol) {
case UAC_VERSION_1:
+ default:
return set_sample_rate_v1(chip, iface, alts, fmt, rate);
case UAC_VERSION_2:
return set_sample_rate_v2(chip, iface, alts, fmt, rate);
}
-
- return -EINVAL;
}
/* get audio formats */
switch (protocol) {
+ default:
+ snd_printdd(KERN_WARNING "%d:%u:%d: unknown interface protocol %#02x, assuming v1\n",
+ dev->devnum, iface_no, altno, protocol);
+ protocol = UAC_VERSION_1;
+ /* fall through */
+
case UAC_VERSION_1: {
struct uac1_as_header_descriptor *as =
snd_usb_find_csint_desc(alts->extra, alts->extralen, NULL, UAC_AS_GENERAL);
dev->devnum, iface_no, altno, as->bTerminalLink);
continue;
}
-
- default:
- snd_printk(KERN_ERR "%d:%u:%d : unknown interface protocol %04x\n",
- dev->devnum, iface_no, altno, protocol);
- continue;
}
/* get format type */
u64 pcm_formats;
switch (protocol) {
- case UAC_VERSION_1: {
+ case UAC_VERSION_1:
+ default: {
struct uac_format_type_i_discrete_descriptor *fmt = _fmt;
sample_width = fmt->bBitResolution;
sample_bytes = fmt->bSubframeSize;
format <<= 1;
break;
}
-
- default:
- return -EINVAL;
}
pcm_formats = 0;
* audio class v2 uses class specific EP0 range requests for that.
*/
switch (protocol) {
+ default:
+ snd_printdd(KERN_WARNING "%d:%u:%d : invalid protocol version %d, assuming v1\n",
+ chip->dev->devnum, fp->iface, fp->altsetting, protocol);
+ /* fall through */
case UAC_VERSION_1:
fp->channels = fmt->bNrChannels;
ret = parse_audio_format_rates_v1(chip, fp, (unsigned char *) fmt, 7);
/* fp->channels is already set in this case */
ret = parse_audio_format_rates_v2(chip, fp);
break;
- default:
- snd_printk(KERN_ERR "%d:%u:%d : invalid protocol version %d\n",
- chip->dev->devnum, fp->iface, fp->altsetting, protocol);
- return -EINVAL;
}
if (fp->channels < 1) {
fp->channels = 1;
switch (protocol) {
+ default:
+ snd_printdd(KERN_WARNING "%d:%u:%d : invalid protocol version %d, assuming v1\n",
+ chip->dev->devnum, fp->iface, fp->altsetting, protocol);
+ /* fall through */
case UAC_VERSION_1: {
struct uac_format_type_ii_discrete_descriptor *fmt = _fmt;
brate = le16_to_cpu(fmt->wMaxBitRate);
ret = parse_audio_format_rates_v2(chip, fp);
break;
}
- default:
- snd_printk(KERN_ERR "%d:%u:%d : invalid protocol version %d\n",
- chip->dev->devnum, fp->iface, fp->altsetting, protocol);
- return -EINVAL;
}
return ret;
}
host_iface = &usb_ifnum_to_if(chip->dev, ctrlif)->altsetting[0];
- mixer->protocol = get_iface_desc(host_iface)->bInterfaceProtocol;
+ switch (get_iface_desc(host_iface)->bInterfaceProtocol) {
+ case UAC_VERSION_1:
+ default:
+ mixer->protocol = UAC_VERSION_1;
+ break;
+ case UAC_VERSION_2:
+ mixer->protocol = UAC_VERSION_2;
+ break;
+ }
if ((err = snd_usb_mixer_controls(mixer)) < 0 ||
(err = snd_usb_mixer_status_create(mixer)) < 0)
switch (altsd->bInterfaceProtocol) {
case UAC_VERSION_1:
+ default:
return init_pitch_v1(chip, iface, alts, fmt);
case UAC_VERSION_2:
return init_pitch_v2(chip, iface, alts, fmt);
}
-
- return -EINVAL;
}
/*
SYNOPSIS
--------
[verse]
-'perf annotate' [-i <file> | --input=file] symbol_name
+'perf annotate' [-i <file> | --input=file] [symbol_name]
DESCRIPTION
-----------
--input=::
Input file name. (default: perf.data)
+--stdio:: Use the stdio interface.
+
+--tui:: Use the TUI interface Use of --tui requires a tty, if one is not
+ present, as when piping to other commands, the stdio interface is
+ used. This interfaces starts by centering on the line with more
+ samples, TAB/UNTAB cycles thru the lines with more samples.
+
SEE ALSO
--------
-linkperf:perf-record[1]
+linkperf:perf-record[1], linkperf:perf-report[1]
the tree is considered as a new profiled object. +
Default: fractal,0.5.
+--stdio:: Use the stdio interface.
+
+--tui:: Use the TUI interface, that is integrated with annotate and allows
+ zooming into DSOs or threads, among other features. Use of --tui
+ requires a tty, if one is not present, as when piping to other
+ commands, the stdio interface is used.
+
SEE ALSO
--------
linkperf:perf-stat[1]
SCRIPT_SH += perf-archive.sh
+grep-libs = $(filter -l%,$(1))
+strip-libs = $(filter-out -l%,$(1))
+
#
# No Perl scripts right now:
#
ifdef NO_LIBPERL
BASIC_CFLAGS += -DNO_LIBPERL
else
- PERL_EMBED_LDOPTS = `perl -MExtUtils::Embed -e ldopts 2>/dev/null`
+ PERL_EMBED_LDOPTS = $(shell perl -MExtUtils::Embed -e ldopts 2>/dev/null)
+ PERL_EMBED_LDFLAGS = $(call strip-libs,$(PERL_EMBED_LDOPTS))
+ PERL_EMBED_LIBADD = $(call grep-libs,$(PERL_EMBED_LDOPTS))
PERL_EMBED_CCOPTS = `perl -MExtUtils::Embed -e ccopts 2>/dev/null`
FLAGS_PERL_EMBED=$(PERL_EMBED_CCOPTS) $(PERL_EMBED_LDOPTS)
ifneq ($(call try-cc,$(SOURCE_PERL_EMBED),$(FLAGS_PERL_EMBED)),y)
BASIC_CFLAGS += -DNO_LIBPERL
else
- ALL_LDFLAGS += $(PERL_EMBED_LDOPTS)
+ ALL_LDFLAGS += $(PERL_EMBED_LDFLAGS)
+ EXTLIBS += $(PERL_EMBED_LIBADD)
LIB_OBJS += $(OUTPUT)util/scripting-engines/trace-event-perl.o
LIB_OBJS += $(OUTPUT)scripts/perl/Perf-Trace-Util/Context.o
endif
ifdef NO_LIBPYTHON
BASIC_CFLAGS += -DNO_LIBPYTHON
else
- PYTHON_EMBED_LDOPTS = `python-config --ldflags 2>/dev/null`
+ PYTHON_EMBED_LDOPTS = $(shell python-config --ldflags 2>/dev/null)
+ PYTHON_EMBED_LDFLAGS = $(call strip-libs,$(PYTHON_EMBED_LDOPTS))
+ PYTHON_EMBED_LIBADD = $(call grep-libs,$(PYTHON_EMBED_LDOPTS))
PYTHON_EMBED_CCOPTS = `python-config --cflags 2>/dev/null`
FLAGS_PYTHON_EMBED=$(PYTHON_EMBED_CCOPTS) $(PYTHON_EMBED_LDOPTS)
ifneq ($(call try-cc,$(SOURCE_PYTHON_EMBED),$(FLAGS_PYTHON_EMBED)),y)
BASIC_CFLAGS += -DNO_LIBPYTHON
else
- ALL_LDFLAGS += $(PYTHON_EMBED_LDOPTS)
+ ALL_LDFLAGS += $(PYTHON_EMBED_LDFLAGS)
+ EXTLIBS += $(PYTHON_EMBED_LIBADD)
LIB_OBJS += $(OUTPUT)util/scripting-engines/trace-event-python.o
LIB_OBJS += $(OUTPUT)scripts/python/Perf-Trace-Util/Context.o
endif
endif
endif
+
+ifdef NO_STRLCPY
+ BASIC_CFLAGS += -DNO_STRLCPY
+else
+ ifneq ($(call try-cc,$(SOURCE_STRLCPY),),y)
+ BASIC_CFLAGS += -DNO_STRLCPY
+ endif
+endif
+
ifndef CC_LD_DYNPATH
ifdef NO_R_TO_GCC_LINKER
# Some gcc does not accept and pass -R to the linker to specify
$(ALL_CFLAGS) -c $(filter %.c,$^) -o $@
$(OUTPUT)perf$X: $(OUTPUT)perf.o $(BUILTIN_OBJS) $(PERFLIBS)
- $(QUIET_LINK)$(CC) $(ALL_CFLAGS) -o $@ $(OUTPUT)perf.o \
- $(BUILTIN_OBJS) $(ALL_LDFLAGS) $(LIBS)
+ $(QUIET_LINK)$(CC) $(ALL_CFLAGS) $(ALL_LDFLAGS) $(OUTPUT)perf.o \
+ $(BUILTIN_OBJS) $(LIBS) -o $@
$(OUTPUT)builtin-help.o: builtin-help.c $(OUTPUT)common-cmds.h $(OUTPUT)PERF-CFLAGS
$(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) \
# we compile into subdirectories. if the target directory is not the source directory, they might not exists. So
# we depend the various files onto their directories.
DIRECTORY_DEPS = $(LIB_OBJS) $(BUILTIN_OBJS) $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h
-$(DIRECTORY_DEPS): $(sort $(dir $(DIRECTORY_DEPS)))
+$(DIRECTORY_DEPS): | $(sort $(dir $(DIRECTORY_DEPS)))
# In the second step, we make a rule to actually create these directories
$(sort $(dir $(DIRECTORY_DEPS))):
$(QUIET_MKDIR)$(MKDIR) -p $@ 2>/dev/null
static char const *input_name = "perf.data";
-static bool force;
+static bool force, use_tui, use_stdio;
static bool full_paths;
static void hists__find_annotations(struct hists *self)
{
- struct rb_node *first = rb_first(&self->entries), *nd = first;
+ struct rb_node *nd = rb_first(&self->entries), *next;
int key = KEY_RIGHT;
while (nd) {
if (use_browser > 0) {
key = hist_entry__tui_annotate(he);
- if (is_exit_key(key))
- break;
switch (key) {
case KEY_RIGHT:
- case '\t':
- nd = rb_next(nd);
+ next = rb_next(nd);
break;
case KEY_LEFT:
- if (nd == first)
- continue;
- nd = rb_prev(nd);
- default:
+ next = rb_prev(nd);
break;
+ default:
+ return;
}
+
+ if (next != NULL)
+ nd = next;
} else {
hist_entry__tty_annotate(he);
nd = rb_next(nd);
"be more verbose (show symbol address, etc)"),
OPT_BOOLEAN('D', "dump-raw-trace", &dump_trace,
"dump raw trace in ASCII"),
+ OPT_BOOLEAN(0, "tui", &use_tui, "Use the TUI interface"),
+ OPT_BOOLEAN(0, "stdio", &use_stdio, "Use the stdio interface"),
OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name,
"file", "vmlinux pathname"),
OPT_BOOLEAN('m', "modules", &symbol_conf.use_modules,
{
argc = parse_options(argc, argv, options, annotate_usage, 0);
+ if (use_stdio)
+ use_browser = 0;
+ else if (use_tui)
+ use_browser = 1;
+
setup_browser();
symbol_conf.priv_size = sizeof(struct sym_priv);
static char const *input_name = "perf.data";
-static bool force;
+static bool force, use_tui, use_stdio;
static bool hide_unresolved;
static bool dont_use_callchains;
goto out_free_syms;
err = 0;
if (symbol_conf.use_callchain) {
- err = append_chain(he->callchain, data->callchain, syms, data->period);
+ err = callchain_append(he->callchain, data->callchain, syms,
+ data->period);
if (err)
goto out_free_syms;
}
"Show per-thread event counters"),
OPT_STRING(0, "pretty", &pretty_printing_style, "key",
"pretty printing style key: normal raw"),
+ OPT_BOOLEAN(0, "tui", &use_tui, "Use the TUI interface"),
+ OPT_BOOLEAN(0, "stdio", &use_stdio, "Use the stdio interface"),
OPT_STRING('s', "sort", &sort_order, "key[,key2...]",
"sort by key(s): pid, comm, dso, symbol, parent"),
OPT_BOOLEAN(0, "showcpuutilization", &symbol_conf.show_cpu_utilization,
{
argc = parse_options(argc, argv, options, report_usage, 0);
+ if (use_stdio)
+ use_browser = 0;
+ else if (use_tui)
+ use_browser = 1;
+
if (strcmp(input_name, "-") != 0)
setup_browser();
+ else
+ use_browser = 0;
/*
* Only in the newt browser we are doing integrated annotation,
* so don't allocate extra space that won't be used in the stdio
}
endef
+define SOURCE_STRLCPY
+#include <stdlib.h>
+extern size_t strlcpy(char *dest, const char *src, size_t size);
+
+int main(void)
+{
+ strlcpy(NULL, NULL, 0);
+ return 0;
+}
+endef
+
# try-cc
# Usage: option = $(call try-cc, source-to-build, cc-options)
try-cc = $(shell sh -c \
#define cpu_relax() asm volatile("":::"memory")
#endif
+#ifdef __mips__
+#include "../../arch/mips/include/asm/unistd.h"
+#define rmb() asm volatile( \
+ ".set mips2\n\t" \
+ "sync\n\t" \
+ ".set mips0" \
+ : /* no output */ \
+ : /* no input */ \
+ : "memory")
+#define cpu_relax() asm volatile("" ::: "memory")
+#endif
+
#include <time.h>
#include <unistd.h>
#include <sys/types.h>
--- /dev/null
+#!/bin/bash
+perf record -a -e net:net_dev_xmit -e net:net_dev_queue \
+ -e net:netif_receive_skb -e net:netif_rx \
+ -e skb:consume_skb -e skb:kfree_skb \
+ -e skb:skb_copy_datagram_iovec -e napi:napi_poll \
+ -e irq:irq_handler_entry -e irq:irq_handler_exit \
+ -e irq:softirq_entry -e irq:softirq_exit \
+ -e irq:softirq_raise $@
--- /dev/null
+#!/bin/bash
+# description: display a process of packet and processing time
+# args: [tx] [rx] [dev=] [debug]
+
+perf trace -s ~/libexec/perf-core/scripts/python/netdev-times.py $@
--- /dev/null
+# Display a process of packets and processed time.
+# It helps us to investigate networking or network device.
+#
+# options
+# tx: show only tx chart
+# rx: show only rx chart
+# dev=: show only thing related to specified device
+# debug: work with debug mode. It shows buffer status.
+
+import os
+import sys
+
+sys.path.append(os.environ['PERF_EXEC_PATH'] + \
+ '/scripts/python/Perf-Trace-Util/lib/Perf/Trace')
+
+from perf_trace_context import *
+from Core import *
+from Util import *
+
+all_event_list = []; # insert all tracepoint event related with this script
+irq_dic = {}; # key is cpu and value is a list which stacks irqs
+ # which raise NET_RX softirq
+net_rx_dic = {}; # key is cpu and value include time of NET_RX softirq-entry
+ # and a list which stacks receive
+receive_hunk_list = []; # a list which include a sequence of receive events
+rx_skb_list = []; # received packet list for matching
+ # skb_copy_datagram_iovec
+
+buffer_budget = 65536; # the budget of rx_skb_list, tx_queue_list and
+ # tx_xmit_list
+of_count_rx_skb_list = 0; # overflow count
+
+tx_queue_list = []; # list of packets which pass through dev_queue_xmit
+of_count_tx_queue_list = 0; # overflow count
+
+tx_xmit_list = []; # list of packets which pass through dev_hard_start_xmit
+of_count_tx_xmit_list = 0; # overflow count
+
+tx_free_list = []; # list of packets which is freed
+
+# options
+show_tx = 0;
+show_rx = 0;
+dev = 0; # store a name of device specified by option "dev="
+debug = 0;
+
+# indices of event_info tuple
+EINFO_IDX_NAME= 0
+EINFO_IDX_CONTEXT=1
+EINFO_IDX_CPU= 2
+EINFO_IDX_TIME= 3
+EINFO_IDX_PID= 4
+EINFO_IDX_COMM= 5
+
+# Calculate a time interval(msec) from src(nsec) to dst(nsec)
+def diff_msec(src, dst):
+ return (dst - src) / 1000000.0
+
+# Display a process of transmitting a packet
+def print_transmit(hunk):
+ if dev != 0 and hunk['dev'].find(dev) < 0:
+ return
+ print "%7s %5d %6d.%06dsec %12.3fmsec %12.3fmsec" % \
+ (hunk['dev'], hunk['len'],
+ nsecs_secs(hunk['queue_t']),
+ nsecs_nsecs(hunk['queue_t'])/1000,
+ diff_msec(hunk['queue_t'], hunk['xmit_t']),
+ diff_msec(hunk['xmit_t'], hunk['free_t']))
+
+# Format for displaying rx packet processing
+PF_IRQ_ENTRY= " irq_entry(+%.3fmsec irq=%d:%s)"
+PF_SOFT_ENTRY=" softirq_entry(+%.3fmsec)"
+PF_NAPI_POLL= " napi_poll_exit(+%.3fmsec %s)"
+PF_JOINT= " |"
+PF_WJOINT= " | |"
+PF_NET_RECV= " |---netif_receive_skb(+%.3fmsec skb=%x len=%d)"
+PF_NET_RX= " |---netif_rx(+%.3fmsec skb=%x)"
+PF_CPY_DGRAM= " | skb_copy_datagram_iovec(+%.3fmsec %d:%s)"
+PF_KFREE_SKB= " | kfree_skb(+%.3fmsec location=%x)"
+PF_CONS_SKB= " | consume_skb(+%.3fmsec)"
+
+# Display a process of received packets and interrputs associated with
+# a NET_RX softirq
+def print_receive(hunk):
+ show_hunk = 0
+ irq_list = hunk['irq_list']
+ cpu = irq_list[0]['cpu']
+ base_t = irq_list[0]['irq_ent_t']
+ # check if this hunk should be showed
+ if dev != 0:
+ for i in range(len(irq_list)):
+ if irq_list[i]['name'].find(dev) >= 0:
+ show_hunk = 1
+ break
+ else:
+ show_hunk = 1
+ if show_hunk == 0:
+ return
+
+ print "%d.%06dsec cpu=%d" % \
+ (nsecs_secs(base_t), nsecs_nsecs(base_t)/1000, cpu)
+ for i in range(len(irq_list)):
+ print PF_IRQ_ENTRY % \
+ (diff_msec(base_t, irq_list[i]['irq_ent_t']),
+ irq_list[i]['irq'], irq_list[i]['name'])
+ print PF_JOINT
+ irq_event_list = irq_list[i]['event_list']
+ for j in range(len(irq_event_list)):
+ irq_event = irq_event_list[j]
+ if irq_event['event'] == 'netif_rx':
+ print PF_NET_RX % \
+ (diff_msec(base_t, irq_event['time']),
+ irq_event['skbaddr'])
+ print PF_JOINT
+ print PF_SOFT_ENTRY % \
+ diff_msec(base_t, hunk['sirq_ent_t'])
+ print PF_JOINT
+ event_list = hunk['event_list']
+ for i in range(len(event_list)):
+ event = event_list[i]
+ if event['event_name'] == 'napi_poll':
+ print PF_NAPI_POLL % \
+ (diff_msec(base_t, event['event_t']), event['dev'])
+ if i == len(event_list) - 1:
+ print ""
+ else:
+ print PF_JOINT
+ else:
+ print PF_NET_RECV % \
+ (diff_msec(base_t, event['event_t']), event['skbaddr'],
+ event['len'])
+ if 'comm' in event.keys():
+ print PF_WJOINT
+ print PF_CPY_DGRAM % \
+ (diff_msec(base_t, event['comm_t']),
+ event['pid'], event['comm'])
+ elif 'handle' in event.keys():
+ print PF_WJOINT
+ if event['handle'] == "kfree_skb":
+ print PF_KFREE_SKB % \
+ (diff_msec(base_t,
+ event['comm_t']),
+ event['location'])
+ elif event['handle'] == "consume_skb":
+ print PF_CONS_SKB % \
+ diff_msec(base_t,
+ event['comm_t'])
+ print PF_JOINT
+
+def trace_begin():
+ global show_tx
+ global show_rx
+ global dev
+ global debug
+
+ for i in range(len(sys.argv)):
+ if i == 0:
+ continue
+ arg = sys.argv[i]
+ if arg == 'tx':
+ show_tx = 1
+ elif arg =='rx':
+ show_rx = 1
+ elif arg.find('dev=',0, 4) >= 0:
+ dev = arg[4:]
+ elif arg == 'debug':
+ debug = 1
+ if show_tx == 0 and show_rx == 0:
+ show_tx = 1
+ show_rx = 1
+
+def trace_end():
+ # order all events in time
+ all_event_list.sort(lambda a,b :cmp(a[EINFO_IDX_TIME],
+ b[EINFO_IDX_TIME]))
+ # process all events
+ for i in range(len(all_event_list)):
+ event_info = all_event_list[i]
+ name = event_info[EINFO_IDX_NAME]
+ if name == 'irq__softirq_exit':
+ handle_irq_softirq_exit(event_info)
+ elif name == 'irq__softirq_entry':
+ handle_irq_softirq_entry(event_info)
+ elif name == 'irq__softirq_raise':
+ handle_irq_softirq_raise(event_info)
+ elif name == 'irq__irq_handler_entry':
+ handle_irq_handler_entry(event_info)
+ elif name == 'irq__irq_handler_exit':
+ handle_irq_handler_exit(event_info)
+ elif name == 'napi__napi_poll':
+ handle_napi_poll(event_info)
+ elif name == 'net__netif_receive_skb':
+ handle_netif_receive_skb(event_info)
+ elif name == 'net__netif_rx':
+ handle_netif_rx(event_info)
+ elif name == 'skb__skb_copy_datagram_iovec':
+ handle_skb_copy_datagram_iovec(event_info)
+ elif name == 'net__net_dev_queue':
+ handle_net_dev_queue(event_info)
+ elif name == 'net__net_dev_xmit':
+ handle_net_dev_xmit(event_info)
+ elif name == 'skb__kfree_skb':
+ handle_kfree_skb(event_info)
+ elif name == 'skb__consume_skb':
+ handle_consume_skb(event_info)
+ # display receive hunks
+ if show_rx:
+ for i in range(len(receive_hunk_list)):
+ print_receive(receive_hunk_list[i])
+ # display transmit hunks
+ if show_tx:
+ print " dev len Qdisc " \
+ " netdevice free"
+ for i in range(len(tx_free_list)):
+ print_transmit(tx_free_list[i])
+ if debug:
+ print "debug buffer status"
+ print "----------------------------"
+ print "xmit Qdisc:remain:%d overflow:%d" % \
+ (len(tx_queue_list), of_count_tx_queue_list)
+ print "xmit netdevice:remain:%d overflow:%d" % \
+ (len(tx_xmit_list), of_count_tx_xmit_list)
+ print "receive:remain:%d overflow:%d" % \
+ (len(rx_skb_list), of_count_rx_skb_list)
+
+# called from perf, when it finds a correspoinding event
+def irq__softirq_entry(name, context, cpu, sec, nsec, pid, comm, vec):
+ if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+ return
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+ all_event_list.append(event_info)
+
+def irq__softirq_exit(name, context, cpu, sec, nsec, pid, comm, vec):
+ if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+ return
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+ all_event_list.append(event_info)
+
+def irq__softirq_raise(name, context, cpu, sec, nsec, pid, comm, vec):
+ if symbol_str("irq__softirq_entry", "vec", vec) != "NET_RX":
+ return
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, vec)
+ all_event_list.append(event_info)
+
+def irq__irq_handler_entry(name, context, cpu, sec, nsec, pid, comm,
+ irq, irq_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ irq, irq_name)
+ all_event_list.append(event_info)
+
+def irq__irq_handler_exit(name, context, cpu, sec, nsec, pid, comm, irq, ret):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm, irq, ret)
+ all_event_list.append(event_info)
+
+def napi__napi_poll(name, context, cpu, sec, nsec, pid, comm, napi, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ napi, dev_name)
+ all_event_list.append(event_info)
+
+def net__netif_receive_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+ skblen, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen, dev_name)
+ all_event_list.append(event_info)
+
+def net__netif_rx(name, context, cpu, sec, nsec, pid, comm, skbaddr,
+ skblen, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen, dev_name)
+ all_event_list.append(event_info)
+
+def net__net_dev_queue(name, context, cpu, sec, nsec, pid, comm,
+ skbaddr, skblen, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen, dev_name)
+ all_event_list.append(event_info)
+
+def net__net_dev_xmit(name, context, cpu, sec, nsec, pid, comm,
+ skbaddr, skblen, rc, dev_name):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen, rc ,dev_name)
+ all_event_list.append(event_info)
+
+def skb__kfree_skb(name, context, cpu, sec, nsec, pid, comm,
+ skbaddr, protocol, location):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, protocol, location)
+ all_event_list.append(event_info)
+
+def skb__consume_skb(name, context, cpu, sec, nsec, pid, comm, skbaddr):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr)
+ all_event_list.append(event_info)
+
+def skb__skb_copy_datagram_iovec(name, context, cpu, sec, nsec, pid, comm,
+ skbaddr, skblen):
+ event_info = (name, context, cpu, nsecs(sec, nsec), pid, comm,
+ skbaddr, skblen)
+ all_event_list.append(event_info)
+
+def handle_irq_handler_entry(event_info):
+ (name, context, cpu, time, pid, comm, irq, irq_name) = event_info
+ if cpu not in irq_dic.keys():
+ irq_dic[cpu] = []
+ irq_record = {'irq':irq, 'name':irq_name, 'cpu':cpu, 'irq_ent_t':time}
+ irq_dic[cpu].append(irq_record)
+
+def handle_irq_handler_exit(event_info):
+ (name, context, cpu, time, pid, comm, irq, ret) = event_info
+ if cpu not in irq_dic.keys():
+ return
+ irq_record = irq_dic[cpu].pop()
+ if irq != irq_record['irq']:
+ return
+ irq_record.update({'irq_ext_t':time})
+ # if an irq doesn't include NET_RX softirq, drop.
+ if 'event_list' in irq_record.keys():
+ irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_raise(event_info):
+ (name, context, cpu, time, pid, comm, vec) = event_info
+ if cpu not in irq_dic.keys() \
+ or len(irq_dic[cpu]) == 0:
+ return
+ irq_record = irq_dic[cpu].pop()
+ if 'event_list' in irq_record.keys():
+ irq_event_list = irq_record['event_list']
+ else:
+ irq_event_list = []
+ irq_event_list.append({'time':time, 'event':'sirq_raise'})
+ irq_record.update({'event_list':irq_event_list})
+ irq_dic[cpu].append(irq_record)
+
+def handle_irq_softirq_entry(event_info):
+ (name, context, cpu, time, pid, comm, vec) = event_info
+ net_rx_dic[cpu] = {'sirq_ent_t':time, 'event_list':[]}
+
+def handle_irq_softirq_exit(event_info):
+ (name, context, cpu, time, pid, comm, vec) = event_info
+ irq_list = []
+ event_list = 0
+ if cpu in irq_dic.keys():
+ irq_list = irq_dic[cpu]
+ del irq_dic[cpu]
+ if cpu in net_rx_dic.keys():
+ sirq_ent_t = net_rx_dic[cpu]['sirq_ent_t']
+ event_list = net_rx_dic[cpu]['event_list']
+ del net_rx_dic[cpu]
+ if irq_list == [] or event_list == 0:
+ return
+ rec_data = {'sirq_ent_t':sirq_ent_t, 'sirq_ext_t':time,
+ 'irq_list':irq_list, 'event_list':event_list}
+ # merge information realted to a NET_RX softirq
+ receive_hunk_list.append(rec_data)
+
+def handle_napi_poll(event_info):
+ (name, context, cpu, time, pid, comm, napi, dev_name) = event_info
+ if cpu in net_rx_dic.keys():
+ event_list = net_rx_dic[cpu]['event_list']
+ rec_data = {'event_name':'napi_poll',
+ 'dev':dev_name, 'event_t':time}
+ event_list.append(rec_data)
+
+def handle_netif_rx(event_info):
+ (name, context, cpu, time, pid, comm,
+ skbaddr, skblen, dev_name) = event_info
+ if cpu not in irq_dic.keys() \
+ or len(irq_dic[cpu]) == 0:
+ return
+ irq_record = irq_dic[cpu].pop()
+ if 'event_list' in irq_record.keys():
+ irq_event_list = irq_record['event_list']
+ else:
+ irq_event_list = []
+ irq_event_list.append({'time':time, 'event':'netif_rx',
+ 'skbaddr':skbaddr, 'skblen':skblen, 'dev_name':dev_name})
+ irq_record.update({'event_list':irq_event_list})
+ irq_dic[cpu].append(irq_record)
+
+def handle_netif_receive_skb(event_info):
+ global of_count_rx_skb_list
+
+ (name, context, cpu, time, pid, comm,
+ skbaddr, skblen, dev_name) = event_info
+ if cpu in net_rx_dic.keys():
+ rec_data = {'event_name':'netif_receive_skb',
+ 'event_t':time, 'skbaddr':skbaddr, 'len':skblen}
+ event_list = net_rx_dic[cpu]['event_list']
+ event_list.append(rec_data)
+ rx_skb_list.insert(0, rec_data)
+ if len(rx_skb_list) > buffer_budget:
+ rx_skb_list.pop()
+ of_count_rx_skb_list += 1
+
+def handle_net_dev_queue(event_info):
+ global of_count_tx_queue_list
+
+ (name, context, cpu, time, pid, comm,
+ skbaddr, skblen, dev_name) = event_info
+ skb = {'dev':dev_name, 'skbaddr':skbaddr, 'len':skblen, 'queue_t':time}
+ tx_queue_list.insert(0, skb)
+ if len(tx_queue_list) > buffer_budget:
+ tx_queue_list.pop()
+ of_count_tx_queue_list += 1
+
+def handle_net_dev_xmit(event_info):
+ global of_count_tx_xmit_list
+
+ (name, context, cpu, time, pid, comm,
+ skbaddr, skblen, rc, dev_name) = event_info
+ if rc == 0: # NETDEV_TX_OK
+ for i in range(len(tx_queue_list)):
+ skb = tx_queue_list[i]
+ if skb['skbaddr'] == skbaddr:
+ skb['xmit_t'] = time
+ tx_xmit_list.insert(0, skb)
+ del tx_queue_list[i]
+ if len(tx_xmit_list) > buffer_budget:
+ tx_xmit_list.pop()
+ of_count_tx_xmit_list += 1
+ return
+
+def handle_kfree_skb(event_info):
+ (name, context, cpu, time, pid, comm,
+ skbaddr, protocol, location) = event_info
+ for i in range(len(tx_queue_list)):
+ skb = tx_queue_list[i]
+ if skb['skbaddr'] == skbaddr:
+ del tx_queue_list[i]
+ return
+ for i in range(len(tx_xmit_list)):
+ skb = tx_xmit_list[i]
+ if skb['skbaddr'] == skbaddr:
+ skb['free_t'] = time
+ tx_free_list.append(skb)
+ del tx_xmit_list[i]
+ return
+ for i in range(len(rx_skb_list)):
+ rec_data = rx_skb_list[i]
+ if rec_data['skbaddr'] == skbaddr:
+ rec_data.update({'handle':"kfree_skb",
+ 'comm':comm, 'pid':pid, 'comm_t':time})
+ del rx_skb_list[i]
+ return
+
+def handle_consume_skb(event_info):
+ (name, context, cpu, time, pid, comm, skbaddr) = event_info
+ for i in range(len(tx_xmit_list)):
+ skb = tx_xmit_list[i]
+ if skb['skbaddr'] == skbaddr:
+ skb['free_t'] = time
+ tx_free_list.append(skb)
+ del tx_xmit_list[i]
+ return
+
+def handle_skb_copy_datagram_iovec(event_info):
+ (name, context, cpu, time, pid, comm, skbaddr, skblen) = event_info
+ for i in range(len(rx_skb_list)):
+ rec_data = rx_skb_list[i]
+ if skbaddr == rec_data['skbaddr']:
+ rec_data.update({'handle':"skb_copy_datagram_iovec",
+ 'comm':comm, 'pid':pid, 'comm_t':time})
+ del rx_skb_list[i]
+ return
extern char *perf_pathdup(const char *fmt, ...)
__attribute__((format (printf, 1, 2)));
+#ifdef NO_STRLCPY
extern size_t strlcpy(char *dest, const char *src, size_t size);
+#endif
#endif /* __PERF_CACHE_H */
#define chain_for_each_child(child, parent) \
list_for_each_entry(child, &parent->children, brothers)
+#define chain_for_each_child_safe(child, next, parent) \
+ list_for_each_entry_safe(child, next, &parent->children, brothers)
+
static void
rb_insert_callchain(struct rb_root *root, struct callchain_node *chain,
enum chain_mode mode)
* sort them by hit
*/
static void
-sort_chain_flat(struct rb_root *rb_root, struct callchain_node *node,
+sort_chain_flat(struct rb_root *rb_root, struct callchain_root *root,
u64 min_hit, struct callchain_param *param __used)
{
- __sort_chain_flat(rb_root, node, min_hit);
+ __sort_chain_flat(rb_root, &root->node, min_hit);
}
static void __sort_chain_graph_abs(struct callchain_node *node,
}
static void
-sort_chain_graph_abs(struct rb_root *rb_root, struct callchain_node *chain_root,
+sort_chain_graph_abs(struct rb_root *rb_root, struct callchain_root *chain_root,
u64 min_hit, struct callchain_param *param __used)
{
- __sort_chain_graph_abs(chain_root, min_hit);
- rb_root->rb_node = chain_root->rb_root.rb_node;
+ __sort_chain_graph_abs(&chain_root->node, min_hit);
+ rb_root->rb_node = chain_root->node.rb_root.rb_node;
}
static void __sort_chain_graph_rel(struct callchain_node *node,
}
static void
-sort_chain_graph_rel(struct rb_root *rb_root, struct callchain_node *chain_root,
+sort_chain_graph_rel(struct rb_root *rb_root, struct callchain_root *chain_root,
u64 min_hit __used, struct callchain_param *param)
{
- __sort_chain_graph_rel(chain_root, param->min_percent / 100.0);
- rb_root->rb_node = chain_root->rb_root.rb_node;
+ __sort_chain_graph_rel(&chain_root->node, param->min_percent / 100.0);
+ rb_root->rb_node = chain_root->node.rb_root.rb_node;
}
int register_callchain_param(struct callchain_param *param)
}
static int
-__append_chain(struct callchain_node *root, struct resolved_chain *chain,
- unsigned int start, u64 period);
+append_chain(struct callchain_node *root, struct resolved_chain *chain,
+ unsigned int start, u64 period);
static void
-__append_chain_children(struct callchain_node *root,
- struct resolved_chain *chain,
- unsigned int start, u64 period)
+append_chain_children(struct callchain_node *root, struct resolved_chain *chain,
+ unsigned int start, u64 period)
{
struct callchain_node *rnode;
/* lookup in childrens */
chain_for_each_child(rnode, root) {
- unsigned int ret = __append_chain(rnode, chain, start, period);
+ unsigned int ret = append_chain(rnode, chain, start, period);
if (!ret)
goto inc_children_hit;
}
static int
-__append_chain(struct callchain_node *root, struct resolved_chain *chain,
- unsigned int start, u64 period)
+append_chain(struct callchain_node *root, struct resolved_chain *chain,
+ unsigned int start, u64 period)
{
struct callchain_list *cnode;
unsigned int i = start;
}
/* We match the node and still have a part remaining */
- __append_chain_children(root, chain, i, period);
+ append_chain_children(root, chain, i, period);
return 0;
}
}
-int append_chain(struct callchain_node *root, struct ip_callchain *chain,
- struct map_symbol *syms, u64 period)
+int callchain_append(struct callchain_root *root, struct ip_callchain *chain,
+ struct map_symbol *syms, u64 period)
{
struct resolved_chain *filtered;
if (!filtered->nr)
goto end;
- __append_chain_children(root, filtered, 0, period);
+ append_chain_children(&root->node, filtered, 0, period);
+
+ if (filtered->nr > root->max_depth)
+ root->max_depth = filtered->nr;
end:
free(filtered);
return 0;
}
+
+static int
+merge_chain_branch(struct callchain_node *dst, struct callchain_node *src,
+ struct resolved_chain *chain)
+{
+ struct callchain_node *child, *next_child;
+ struct callchain_list *list, *next_list;
+ int old_pos = chain->nr;
+ int err = 0;
+
+ list_for_each_entry_safe(list, next_list, &src->val, list) {
+ chain->ips[chain->nr].ip = list->ip;
+ chain->ips[chain->nr].ms = list->ms;
+ chain->nr++;
+ list_del(&list->list);
+ free(list);
+ }
+
+ if (src->hit)
+ append_chain_children(dst, chain, 0, src->hit);
+
+ chain_for_each_child_safe(child, next_child, src) {
+ err = merge_chain_branch(dst, child, chain);
+ if (err)
+ break;
+
+ list_del(&child->brothers);
+ free(child);
+ }
+
+ chain->nr = old_pos;
+
+ return err;
+}
+
+int callchain_merge(struct callchain_root *dst, struct callchain_root *src)
+{
+ struct resolved_chain *chain;
+ int err;
+
+ chain = malloc(sizeof(*chain) +
+ src->max_depth * sizeof(struct resolved_ip));
+ if (!chain)
+ return -ENOMEM;
+
+ chain->nr = 0;
+
+ err = merge_chain_branch(&dst->node, &src->node, chain);
+
+ free(chain);
+
+ return err;
+}
u64 children_hit;
};
+struct callchain_root {
+ u64 max_depth;
+ struct callchain_node node;
+};
+
struct callchain_param;
-typedef void (*sort_chain_func_t)(struct rb_root *, struct callchain_node *,
+typedef void (*sort_chain_func_t)(struct rb_root *, struct callchain_root *,
u64, struct callchain_param *);
struct callchain_param {
struct list_head list;
};
-static inline void callchain_init(struct callchain_node *node)
+static inline void callchain_init(struct callchain_root *root)
{
- INIT_LIST_HEAD(&node->brothers);
- INIT_LIST_HEAD(&node->children);
- INIT_LIST_HEAD(&node->val);
+ INIT_LIST_HEAD(&root->node.brothers);
+ INIT_LIST_HEAD(&root->node.children);
+ INIT_LIST_HEAD(&root->node.val);
- node->parent = NULL;
- node->hit = 0;
+ root->node.parent = NULL;
+ root->node.hit = 0;
+ root->node.children_hit = 0;
+ root->max_depth = 0;
}
static inline u64 cumul_hits(struct callchain_node *node)
}
int register_callchain_param(struct callchain_param *param);
-int append_chain(struct callchain_node *root, struct ip_callchain *chain,
- struct map_symbol *syms, u64 period);
+int callchain_append(struct callchain_root *root, struct ip_callchain *chain,
+ struct map_symbol *syms, u64 period);
+int callchain_merge(struct callchain_root *dst, struct callchain_root *src);
bool ip_callchain__valid(struct ip_callchain *chain, const event_t *event);
#endif /* __PERF_CALLCHAIN_H */
static struct hist_entry *hist_entry__new(struct hist_entry *template)
{
- size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_node) : 0;
+ size_t callchain_size = symbol_conf.use_callchain ? sizeof(struct callchain_root) : 0;
struct hist_entry *self = malloc(sizeof(*self) + callchain_size);
if (self != NULL) {
if (!cmp) {
iter->period += he->period;
+ if (symbol_conf.use_callchain)
+ callchain_merge(iter->callchain, he->callchain);
hist_entry__free(he);
return false;
}
return ".";
}
+#ifdef NO_STRLCPY
size_t strlcpy(char *dest, const char *src, size_t size)
{
size_t ret = strlen(src);
}
return ret;
}
-
+#endif
static char *get_pathname(void)
{
goto error;
}
tev->point.offset = pev->point.offset;
+ tev->point.retprobe = pev->point.retprobe;
tev->nargs = pev->nargs;
if (tev->nargs) {
tev->args = zalloc(sizeof(struct probe_trace_arg)
char buf[32], *ptr;
int ret, nscopes;
+ if (!is_c_varname(pf->pvar->var)) {
+ /* Copy raw parameters */
+ pf->tvar->value = strdup(pf->pvar->var);
+ if (pf->tvar->value == NULL)
+ return -ENOMEM;
+ if (pf->pvar->type) {
+ pf->tvar->type = strdup(pf->pvar->type);
+ if (pf->tvar->type == NULL)
+ return -ENOMEM;
+ }
+ if (pf->pvar->name) {
+ pf->tvar->name = strdup(pf->pvar->name);
+ if (pf->tvar->name == NULL)
+ return -ENOMEM;
+ } else
+ pf->tvar->name = NULL;
+ return 0;
+ }
+
if (pf->pvar->name)
pf->tvar->name = strdup(pf->pvar->name);
else {
if (pf->tvar->name == NULL)
return -ENOMEM;
- if (!is_c_varname(pf->pvar->var)) {
- /* Copy raw parameters */
- pf->tvar->value = strdup(pf->pvar->var);
- if (pf->tvar->value == NULL)
- return -ENOMEM;
- if (pf->pvar->type) {
- pf->tvar->type = strdup(pf->pvar->type);
- if (pf->tvar->type == NULL)
- return -ENOMEM;
- }
- return 0;
- }
-
pr_debug("Searching '%s' variable in context.\n",
pf->pvar->var);
/* Search child die for local variables and parameters. */
/* This function has no name. */
tev->point.offset = (unsigned long)pf->addr;
+ /* Return probe must be on the head of a subprogram */
+ if (pf->pev->point.retprobe) {
+ if (tev->point.offset != 0) {
+ pr_warning("Return probe must be on the head of"
+ " a real function\n");
+ return -EINVAL;
+ }
+ tev->point.retprobe = true;
+ }
+
pr_debug("Probe point found: %s+%lu\n", tev->point.symbol,
tev->point.offset);
struct hist_entry *pair;
struct rb_root sorted_chain;
};
- struct callchain_node callchain[0];
+ struct callchain_root callchain[0];
};
enum sort_type {
return fprintf(fp, "%s", sbuild_id);
}
+size_t dso__fprintf_symbols_by_name(struct dso *self, enum map_type type, FILE *fp)
+{
+ size_t ret = 0;
+ struct rb_node *nd;
+ struct symbol_name_rb_node *pos;
+
+ for (nd = rb_first(&self->symbol_names[type]); nd; nd = rb_next(nd)) {
+ pos = rb_entry(nd, struct symbol_name_rb_node, rb_node);
+ fprintf(fp, "%s\n", pos->sym.name);
+ }
+
+ return ret;
+}
+
size_t dso__fprintf(struct dso *self, enum map_type type, FILE *fp)
{
struct rb_node *nd;
int symbol__init(void)
{
+ if (symbol_conf.initialized)
+ return 0;
+
elf_version(EV_CURRENT);
if (symbol_conf.sort_by_name)
symbol_conf.priv_size += (sizeof(struct symbol_name_rb_node) -
symbol_conf.sym_list_str, "symbol") < 0)
goto out_free_comm_list;
+ symbol_conf.initialized = true;
return 0;
out_free_dso_list:
void symbol__exit(void)
{
+ if (!symbol_conf.initialized)
+ return;
strlist__delete(symbol_conf.sym_list);
strlist__delete(symbol_conf.dso_list);
strlist__delete(symbol_conf.comm_list);
vmlinux_path__exit();
symbol_conf.sym_list = symbol_conf.dso_list = symbol_conf.comm_list = NULL;
+ symbol_conf.initialized = false;
}
int machines__create_kernel_maps(struct rb_root *self, pid_t pid)
show_nr_samples,
use_callchain,
exclude_other,
- show_cpu_utilization;
+ show_cpu_utilization,
+ initialized;
const char *vmlinux_name,
*source_prefix,
*field_sep;
size_t machines__fprintf_dsos_buildid(struct rb_root *self, FILE *fp, bool with_hits);
size_t dso__fprintf_buildid(struct dso *self, FILE *fp);
+size_t dso__fprintf_symbols_by_name(struct dso *self, enum map_type type, FILE *fp);
size_t dso__fprintf(struct dso *self, enum map_type type, FILE *fp);
enum dso_origin {
register_python_scripting(&python_scripting_unsupported_ops);
}
#else
-struct scripting_ops python_scripting_ops;
+extern struct scripting_ops python_scripting_ops;
void setup_python_scripting(void)
{
register_perl_scripting(&perl_scripting_unsupported_ops);
}
#else
-struct scripting_ops perl_scripting_ops;
+extern struct scripting_ops perl_scripting_ops;
void setup_perl_scripting(void)
{
-#define _GNU_SOURCE
-#include <stdio.h>
-#undef _GNU_SOURCE
-/*
- * slang versions <= 2.0.6 have a "#if HAVE_LONG_LONG" that breaks
- * the build if it isn't defined. Use the equivalent one that glibc
- * has on features.h.
- */
-#include <features.h>
-#ifndef HAVE_LONG_LONG
-#define HAVE_LONG_LONG __GLIBC_HAVE_LONG_LONG
-#endif
#include <slang.h>
+#include "libslang.h"
+#include <linux/compiler.h>
#include <linux/list.h>
#include <linux/rbtree.h>
#include <stdlib.h>
#include "helpline.h"
#include "../color.h"
#include "../util.h"
+#include <stdio.h>
-#if SLANG_VERSION < 20104
-#define sltt_set_color(obj, name, fg, bg) \
- SLtt_set_color(obj,(char *)name, (char *)fg, (char *)bg)
-#else
-#define sltt_set_color SLtt_set_color
-#endif
-
-newtComponent newt_form__new(void);
-
-int ui_browser__percent_color(double percent, bool current)
+static int ui_browser__percent_color(double percent, bool current)
{
if (current)
return HE_COLORSET_SELECTED;
return HE_COLORSET_NORMAL;
}
+void ui_browser__set_color(struct ui_browser *self __used, int color)
+{
+ SLsmg_set_color(color);
+}
+
+void ui_browser__set_percent_color(struct ui_browser *self,
+ double percent, bool current)
+{
+ int color = ui_browser__percent_color(percent, current);
+ ui_browser__set_color(self, color);
+}
+
+void ui_browser__gotorc(struct ui_browser *self, int y, int x)
+{
+ SLsmg_gotorc(self->y + y, self->x + x);
+}
+
void ui_browser__list_head_seek(struct ui_browser *self, off_t offset, int whence)
{
struct list_head *head = self->entries;
nd = self->top;
while (nd != NULL) {
- SLsmg_gotorc(self->y + row, self->x);
+ ui_browser__gotorc(self, row, 0);
self->write(self, nd, row);
if (++row == self->height)
break;
int cols, rows;
newtGetScreenSize(&cols, &rows);
- if (self->width > cols - 4)
- self->width = cols - 4;
- self->height = rows - 5;
- if (self->height > self->nr_entries)
- self->height = self->nr_entries;
- self->y = (rows - self->height) / 2;
- self->x = (cols - self->width) / 2;
+ self->width = cols - 1;
+ self->height = rows - 2;
+ self->y = 1;
+ self->x = 0;
}
void ui_browser__reset_index(struct ui_browser *self)
self->seek(self, 0, SEEK_SET);
}
+void ui_browser__add_exit_key(struct ui_browser *self, int key)
+{
+ newtFormAddHotKey(self->form, key);
+}
+
+void ui_browser__add_exit_keys(struct ui_browser *self, int keys[])
+{
+ int i = 0;
+
+ while (keys[i] && i < 64) {
+ ui_browser__add_exit_key(self, keys[i]);
+ ++i;
+ }
+}
+
int ui_browser__show(struct ui_browser *self, const char *title,
const char *helpline, ...)
{
va_list ap;
+ int keys[] = { NEWT_KEY_UP, NEWT_KEY_DOWN, NEWT_KEY_PGUP,
+ NEWT_KEY_PGDN, NEWT_KEY_HOME, NEWT_KEY_END, ' ',
+ NEWT_KEY_LEFT, NEWT_KEY_ESCAPE, 'q', CTRL('c'), 0 };
- if (self->form != NULL) {
+ if (self->form != NULL)
newtFormDestroy(self->form);
- newtPopWindow();
- }
+
ui_browser__refresh_dimensions(self);
- newtCenteredWindow(self->width, self->height, title);
- self->form = newt_form__new();
+ self->form = newtForm(NULL, NULL, 0);
if (self->form == NULL)
return -1;
- self->sb = newtVerticalScrollbar(self->width, 0, self->height,
+ self->sb = newtVerticalScrollbar(self->width, 1, self->height,
HE_COLORSET_NORMAL,
HE_COLORSET_SELECTED);
if (self->sb == NULL)
return -1;
- newtFormAddHotKey(self->form, NEWT_KEY_UP);
- newtFormAddHotKey(self->form, NEWT_KEY_DOWN);
- newtFormAddHotKey(self->form, NEWT_KEY_PGUP);
- newtFormAddHotKey(self->form, NEWT_KEY_PGDN);
- newtFormAddHotKey(self->form, NEWT_KEY_HOME);
- newtFormAddHotKey(self->form, NEWT_KEY_END);
- newtFormAddHotKey(self->form, ' ');
+ SLsmg_gotorc(0, 0);
+ ui_browser__set_color(self, NEWT_COLORSET_ROOT);
+ slsmg_write_nstring(title, self->width);
+
+ ui_browser__add_exit_keys(self, keys);
newtFormAddComponent(self->form, self->sb);
va_start(ap, helpline);
void ui_browser__hide(struct ui_browser *self)
{
newtFormDestroy(self->form);
- newtPopWindow();
self->form = NULL;
ui_helpline__pop();
}
newtScrollbarSet(self->sb, self->index, self->nr_entries - 1);
row = self->refresh(self);
- SLsmg_set_color(HE_COLORSET_NORMAL);
+ ui_browser__set_color(self, HE_COLORSET_NORMAL);
SLsmg_fill_region(self->y + row, self->x,
self->height - row, self->width, ' ');
return 0;
}
-int ui_browser__run(struct ui_browser *self, struct newtExitStruct *es)
+int ui_browser__run(struct ui_browser *self)
{
+ struct newtExitStruct es;
+
if (ui_browser__refresh(self) < 0)
return -1;
while (1) {
off_t offset;
- newtFormRun(self->form, es);
+ newtFormRun(self->form, &es);
- if (es->reason != NEWT_EXIT_HOTKEY)
+ if (es.reason != NEWT_EXIT_HOTKEY)
break;
- if (is_exit_key(es->u.key))
- return es->u.key;
- switch (es->u.key) {
+ switch (es.u.key) {
case NEWT_KEY_DOWN:
if (self->index == self->nr_entries - 1)
break;
self->seek(self, -offset, SEEK_END);
break;
default:
- return es->u.key;
+ return es.u.key;
}
if (ui_browser__refresh(self) < 0)
return -1;
}
- return 0;
+ return -1;
}
unsigned int ui_browser__list_head_refresh(struct ui_browser *self)
pos = self->top;
list_for_each_from(pos, head) {
- SLsmg_gotorc(self->y + row, self->x);
+ ui_browser__gotorc(self, row, 0);
self->write(self, pos, row);
if (++row == self->height)
break;
};
-int ui_browser__percent_color(double percent, bool current);
+void ui_browser__set_color(struct ui_browser *self, int color);
+void ui_browser__set_percent_color(struct ui_browser *self,
+ double percent, bool current);
bool ui_browser__is_current_entry(struct ui_browser *self, unsigned row);
void ui_browser__refresh_dimensions(struct ui_browser *self);
void ui_browser__reset_index(struct ui_browser *self);
+void ui_browser__gotorc(struct ui_browser *self, int y, int x);
+void ui_browser__add_exit_key(struct ui_browser *self, int key);
+void ui_browser__add_exit_keys(struct ui_browser *self, int keys[]);
int ui_browser__show(struct ui_browser *self, const char *title,
const char *helpline, ...);
void ui_browser__hide(struct ui_browser *self);
int ui_browser__refresh(struct ui_browser *self);
-int ui_browser__run(struct ui_browser *self, struct newtExitStruct *es);
+int ui_browser__run(struct ui_browser *self);
void ui_browser__rb_tree_seek(struct ui_browser *self, off_t offset, int whence);
unsigned int ui_browser__rb_tree_refresh(struct ui_browser *self);
if (ol->offset != -1) {
struct objdump_line_rb_node *olrb = objdump_line__rb(ol);
- int color = ui_browser__percent_color(olrb->percent, current_entry);
- SLsmg_set_color(color);
+ ui_browser__set_percent_color(self, olrb->percent, current_entry);
slsmg_printf(" %7.2f ", olrb->percent);
if (!current_entry)
- SLsmg_set_color(HE_COLORSET_CODE);
+ ui_browser__set_color(self, HE_COLORSET_CODE);
} else {
- int color = ui_browser__percent_color(0, current_entry);
- SLsmg_set_color(color);
+ ui_browser__set_percent_color(self, 0, current_entry);
slsmg_write_nstring(" ", 9);
}
self->curr_hot = nd;
}
-static int annotate_browser__run(struct annotate_browser *self,
- struct newtExitStruct *es)
+static int annotate_browser__run(struct annotate_browser *self)
{
struct rb_node *nd;
struct hist_entry *he = self->b.priv;
+ int key;
if (ui_browser__show(&self->b, he->ms.sym->name,
- "<- or ESC: exit, TAB/shift+TAB: cycle thru samples") < 0)
+ "<-, -> or ESC: exit, TAB/shift+TAB: cycle thru samples") < 0)
return -1;
-
- newtFormAddHotKey(self->b.form, NEWT_KEY_LEFT);
- newtFormAddHotKey(self->b.form, NEWT_KEY_RIGHT);
+ /*
+ * To allow builtin-annotate to cycle thru multiple symbols by
+ * examining the exit key for this function.
+ */
+ ui_browser__add_exit_key(&self->b, NEWT_KEY_RIGHT);
nd = self->curr_hot;
if (nd) {
- newtFormAddHotKey(self->b.form, NEWT_KEY_TAB);
- newtFormAddHotKey(self->b.form, NEWT_KEY_UNTAB);
+ int tabs[] = { NEWT_KEY_TAB, NEWT_KEY_UNTAB, 0 };
+ ui_browser__add_exit_keys(&self->b, tabs);
}
while (1) {
- ui_browser__run(&self->b, es);
-
- if (es->reason != NEWT_EXIT_HOTKEY)
- break;
+ key = ui_browser__run(&self->b);
- switch (es->u.key) {
+ switch (key) {
case NEWT_KEY_TAB:
nd = rb_prev(nd);
if (nd == NULL)
}
out:
ui_browser__hide(&self->b);
- return es->u.key;
+ return key;
}
int hist_entry__tui_annotate(struct hist_entry *self)
{
- struct newtExitStruct es;
struct objdump_line *pos, *n;
struct objdump_line_rb_node *rbpos;
LIST_HEAD(head);
annotate_browser__set_top(&browser, browser.curr_hot);
browser.b.width += 18; /* Percentage */
- ret = annotate_browser__run(&browser, &es);
+ ret = annotate_browser__run(&browser);
list_for_each_entry_safe(pos, n, &head, node) {
list_del(&pos->node);
objdump_line__free(pos);
return map_symbol__folded(&self->ms);
}
+static void map_symbol__set_folding(struct map_symbol *self, bool unfold)
+{
+ self->unfolded = unfold ? self->has_children : false;
+}
+
static int callchain_node__count_rows_rb_tree(struct callchain_node *self)
{
int n = 0;
for (nd = rb_first(&self->rb_root); nd; nd = rb_next(nd)) {
struct callchain_node *child = rb_entry(nd, struct callchain_node, rb_node);
struct callchain_list *chain;
- int first = true;
+ bool first = true;
list_for_each_entry(chain, &child->val, list) {
if (first) {
first = false;
chain->ms.has_children = chain->list.next != &child->val ||
- rb_first(&child->rb_root) != NULL;
+ !RB_EMPTY_ROOT(&child->rb_root);
} else
chain->ms.has_children = chain->list.next == &child->val &&
- rb_first(&child->rb_root) != NULL;
+ !RB_EMPTY_ROOT(&child->rb_root);
}
callchain_node__init_have_children_rb_tree(child);
struct callchain_list *chain;
list_for_each_entry(chain, &self->val, list)
- chain->ms.has_children = rb_first(&self->rb_root) != NULL;
+ chain->ms.has_children = !RB_EMPTY_ROOT(&self->rb_root);
callchain_node__init_have_children_rb_tree(self);
}
static void hist_entry__init_have_children(struct hist_entry *self)
{
if (!self->init_have_children) {
+ self->ms.has_children = !RB_EMPTY_ROOT(&self->sorted_chain);
callchain__init_have_children(&self->sorted_chain);
self->init_have_children = true;
}
return false;
}
-static int hist_browser__run(struct hist_browser *self, const char *title,
- struct newtExitStruct *es)
+static int callchain_node__set_folding_rb_tree(struct callchain_node *self, bool unfold)
+{
+ int n = 0;
+ struct rb_node *nd;
+
+ for (nd = rb_first(&self->rb_root); nd; nd = rb_next(nd)) {
+ struct callchain_node *child = rb_entry(nd, struct callchain_node, rb_node);
+ struct callchain_list *chain;
+ bool has_children = false;
+
+ list_for_each_entry(chain, &child->val, list) {
+ ++n;
+ map_symbol__set_folding(&chain->ms, unfold);
+ has_children = chain->ms.has_children;
+ }
+
+ if (has_children)
+ n += callchain_node__set_folding_rb_tree(child, unfold);
+ }
+
+ return n;
+}
+
+static int callchain_node__set_folding(struct callchain_node *node, bool unfold)
+{
+ struct callchain_list *chain;
+ bool has_children = false;
+ int n = 0;
+
+ list_for_each_entry(chain, &node->val, list) {
+ ++n;
+ map_symbol__set_folding(&chain->ms, unfold);
+ has_children = chain->ms.has_children;
+ }
+
+ if (has_children)
+ n += callchain_node__set_folding_rb_tree(node, unfold);
+
+ return n;
+}
+
+static int callchain__set_folding(struct rb_root *chain, bool unfold)
+{
+ struct rb_node *nd;
+ int n = 0;
+
+ for (nd = rb_first(chain); nd; nd = rb_next(nd)) {
+ struct callchain_node *node = rb_entry(nd, struct callchain_node, rb_node);
+ n += callchain_node__set_folding(node, unfold);
+ }
+
+ return n;
+}
+
+static void hist_entry__set_folding(struct hist_entry *self, bool unfold)
+{
+ hist_entry__init_have_children(self);
+ map_symbol__set_folding(&self->ms, unfold);
+
+ if (self->ms.has_children) {
+ int n = callchain__set_folding(&self->sorted_chain, unfold);
+ self->nr_rows = unfold ? n : 0;
+ } else
+ self->nr_rows = 0;
+}
+
+static void hists__set_folding(struct hists *self, bool unfold)
+{
+ struct rb_node *nd;
+
+ self->nr_entries = 0;
+
+ for (nd = rb_first(&self->entries); nd; nd = rb_next(nd)) {
+ struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node);
+ hist_entry__set_folding(he, unfold);
+ self->nr_entries += 1 + he->nr_rows;
+ }
+}
+
+static void hist_browser__set_folding(struct hist_browser *self, bool unfold)
+{
+ hists__set_folding(self->hists, unfold);
+ self->b.nr_entries = self->hists->nr_entries;
+ /* Go to the start, we may be way after valid entries after a collapse */
+ ui_browser__reset_index(&self->b);
+}
+
+static int hist_browser__run(struct hist_browser *self, const char *title)
{
- char str[256], unit;
- unsigned long nr_events = self->hists->stats.nr_events[PERF_RECORD_SAMPLE];
+ int key;
+ int exit_keys[] = { 'a', '?', 'h', 'C', 'd', 'D', 'E', 't',
+ NEWT_KEY_ENTER, NEWT_KEY_RIGHT, NEWT_KEY_LEFT, 0, };
self->b.entries = &self->hists->entries;
self->b.nr_entries = self->hists->nr_entries;
hist_browser__refresh_dimensions(self);
- nr_events = convert_unit(nr_events, &unit);
- snprintf(str, sizeof(str), "Events: %lu%c ",
- nr_events, unit);
- newtDrawRootText(0, 0, str);
-
if (ui_browser__show(&self->b, title,
"Press '?' for help on key bindings") < 0)
return -1;
- newtFormAddHotKey(self->b.form, 'a');
- newtFormAddHotKey(self->b.form, '?');
- newtFormAddHotKey(self->b.form, 'h');
- newtFormAddHotKey(self->b.form, 'd');
- newtFormAddHotKey(self->b.form, 'D');
- newtFormAddHotKey(self->b.form, 't');
-
- newtFormAddHotKey(self->b.form, NEWT_KEY_LEFT);
- newtFormAddHotKey(self->b.form, NEWT_KEY_RIGHT);
- newtFormAddHotKey(self->b.form, NEWT_KEY_ENTER);
+ ui_browser__add_exit_keys(&self->b, exit_keys);
while (1) {
- ui_browser__run(&self->b, es);
+ key = ui_browser__run(&self->b);
- if (es->reason != NEWT_EXIT_HOTKEY)
- break;
- switch (es->u.key) {
+ switch (key) {
case 'D': { /* Debug */
static int seq;
struct hist_entry *h = rb_entry(self->b.top,
self->b.top_idx,
h->row_offset, h->nr_rows);
}
- continue;
+ break;
+ case 'C':
+ /* Collapse the whole world. */
+ hist_browser__set_folding(self, false);
+ break;
+ case 'E':
+ /* Expand the whole world. */
+ hist_browser__set_folding(self, true);
+ break;
case NEWT_KEY_ENTER:
if (hist_browser__toggle_fold(self))
break;
/* fall thru */
default:
- return 0;
+ goto out;
}
}
-
+out:
ui_browser__hide(&self->b);
- return 0;
+ return key;
}
static char *callchain_list__sym_name(struct callchain_list *self,
int color;
bool was_first = first;
- if (first) {
+ if (first)
first = false;
- chain->ms.has_children = chain->list.next != &child->val ||
- rb_first(&child->rb_root) != NULL;
- } else {
+ else
extra_offset = LEVEL_OFFSET_STEP;
- chain->ms.has_children = chain->list.next == &child->val &&
- rb_first(&child->rb_root) != NULL;
- }
folded_sign = callchain_list__folded(chain);
if (*row_offset != 0) {
*is_current_entry = true;
}
- SLsmg_set_color(color);
- SLsmg_gotorc(self->b.y + row, self->b.x);
+ ui_browser__set_color(&self->b, color);
+ ui_browser__gotorc(&self->b, row, 0);
slsmg_write_nstring(" ", offset + extra_offset);
slsmg_printf("%c ", folded_sign);
slsmg_write_nstring(str, width);
list_for_each_entry(chain, &node->val, list) {
char ipstr[BITS_PER_LONG / 4 + 1], *s;
int color;
- /*
- * FIXME: This should be moved to somewhere else,
- * probably when the callchain is created, so as not to
- * traverse it all over again
- */
- chain->ms.has_children = rb_first(&node->rb_root) != NULL;
+
folded_sign = callchain_list__folded(chain);
if (*row_offset != 0) {
}
s = callchain_list__sym_name(chain, ipstr, sizeof(ipstr));
- SLsmg_gotorc(self->b.y + row, self->b.x);
- SLsmg_set_color(color);
+ ui_browser__gotorc(&self->b, row, 0);
+ ui_browser__set_color(&self->b, color);
slsmg_write_nstring(" ", offset);
slsmg_printf("%c ", folded_sign);
slsmg_write_nstring(s, width - 2);
}
if (symbol_conf.use_callchain) {
- entry->ms.has_children = !RB_EMPTY_ROOT(&entry->sorted_chain);
+ hist_entry__init_have_children(entry);
folded_sign = hist_entry__folded(entry);
}
color = HE_COLORSET_NORMAL;
}
- SLsmg_set_color(color);
- SLsmg_gotorc(self->b.y + row, self->b.x);
+ ui_browser__set_color(&self->b, color);
+ ui_browser__gotorc(&self->b, row, 0);
if (symbol_conf.use_callchain) {
slsmg_printf("%c ", folded_sign);
width -= 2;
static void hist_browser__delete(struct hist_browser *self)
{
- newtFormDestroy(self->b.form);
- newtPopWindow();
free(self);
}
return self->he_selection->thread;
}
-static int hist_browser__title(char *bf, size_t size, const char *ev_name,
- const struct dso *dso, const struct thread *thread)
+static int hists__browser_title(struct hists *self, char *bf, size_t size,
+ const char *ev_name, const struct dso *dso,
+ const struct thread *thread)
{
- int printed = 0;
+ char unit;
+ int printed;
+ unsigned long nr_events = self->stats.nr_events[PERF_RECORD_SAMPLE];
+
+ nr_events = convert_unit(nr_events, &unit);
+ printed = snprintf(bf, size, "Events: %lu%c %s", nr_events, unit, ev_name);
if (thread)
printed += snprintf(bf + printed, size - printed,
- "Thread: %s(%d)",
- (thread->comm_set ? thread->comm : ""),
+ ", Thread: %s(%d)",
+ (thread->comm_set ? thread->comm : ""),
thread->pid);
if (dso)
printed += snprintf(bf + printed, size - printed,
- "%sDSO: %s", thread ? " " : "",
- dso->short_name);
- return printed ?: snprintf(bf, size, "Event: %s", ev_name);
+ ", DSO: %s", dso->short_name);
+ return printed;
}
int hists__browse(struct hists *self, const char *helpline, const char *ev_name)
struct pstack *fstack;
const struct thread *thread_filter = NULL;
const struct dso *dso_filter = NULL;
- struct newtExitStruct es;
char msg[160];
int key = -1;
ui_helpline__push(helpline);
- hist_browser__title(msg, sizeof(msg), ev_name,
- dso_filter, thread_filter);
-
+ hists__browser_title(self, msg, sizeof(msg), ev_name,
+ dso_filter, thread_filter);
while (1) {
const struct thread *thread;
const struct dso *dso;
annotate = -2, zoom_dso = -2, zoom_thread = -2,
browse_map = -2;
- if (hist_browser__run(browser, msg, &es))
- break;
+ key = hist_browser__run(browser, msg);
thread = hist_browser__selected_thread(browser);
dso = browser->selection->map ? browser->selection->map->dso : NULL;
- if (es.reason == NEWT_EXIT_HOTKEY) {
- key = es.u.key;
-
- switch (key) {
- case NEWT_KEY_F1:
- goto do_help;
- case NEWT_KEY_TAB:
- case NEWT_KEY_UNTAB:
- /*
- * Exit the browser, let hists__browser_tree
- * go to the next or previous
- */
- goto out_free_stack;
- default:;
- }
-
- switch (key) {
- case 'a':
- if (browser->selection->map == NULL &&
- browser->selection->map->dso->annotate_warned)
- continue;
- goto do_annotate;
- case 'd':
- goto zoom_dso;
- case 't':
- goto zoom_thread;
- case 'h':
- case '?':
-do_help:
- ui__help_window("-> Zoom into DSO/Threads & Annotate current symbol\n"
- "<- Zoom out\n"
- "a Annotate current symbol\n"
- "h/?/F1 Show this window\n"
- "d Zoom into current DSO\n"
- "t Zoom into current Thread\n"
- "q/CTRL+C Exit browser");
+ switch (key) {
+ case NEWT_KEY_TAB:
+ case NEWT_KEY_UNTAB:
+ /*
+ * Exit the browser, let hists__browser_tree
+ * go to the next or previous
+ */
+ goto out_free_stack;
+ case 'a':
+ if (browser->selection->map == NULL &&
+ browser->selection->map->dso->annotate_warned)
continue;
- default:;
- }
- if (is_exit_key(key)) {
- if (key == NEWT_KEY_ESCAPE &&
- !ui__dialog_yesno("Do you really want to exit?"))
- continue;
- break;
- }
-
- if (es.u.key == NEWT_KEY_LEFT) {
- const void *top;
+ goto do_annotate;
+ case 'd':
+ goto zoom_dso;
+ case 't':
+ goto zoom_thread;
+ case NEWT_KEY_F1:
+ case 'h':
+ case '?':
+ ui__help_window("-> Zoom into DSO/Threads & Annotate current symbol\n"
+ "<- Zoom out\n"
+ "a Annotate current symbol\n"
+ "h/?/F1 Show this window\n"
+ "C Collapse all callchains\n"
+ "E Expand all callchains\n"
+ "d Zoom into current DSO\n"
+ "t Zoom into current Thread\n"
+ "q/CTRL+C Exit browser");
+ continue;
+ case NEWT_KEY_ENTER:
+ case NEWT_KEY_RIGHT:
+ /* menu */
+ break;
+ case NEWT_KEY_LEFT: {
+ const void *top;
- if (pstack__empty(fstack))
- continue;
- top = pstack__pop(fstack);
- if (top == &dso_filter)
- goto zoom_out_dso;
- if (top == &thread_filter)
- goto zoom_out_thread;
+ if (pstack__empty(fstack))
continue;
- }
+ top = pstack__pop(fstack);
+ if (top == &dso_filter)
+ goto zoom_out_dso;
+ if (top == &thread_filter)
+ goto zoom_out_thread;
+ continue;
+ }
+ case NEWT_KEY_ESCAPE:
+ if (!ui__dialog_yesno("Do you really want to exit?"))
+ continue;
+ /* Fall thru */
+ default:
+ goto out_free_stack;
}
if (browser->selection->sym != NULL &&
pstack__push(fstack, &dso_filter);
}
hists__filter_by_dso(self, dso_filter);
- hist_browser__title(msg, sizeof(msg), ev_name,
- dso_filter, thread_filter);
+ hists__browser_title(self, msg, sizeof(msg), ev_name,
+ dso_filter, thread_filter);
hist_browser__reset(browser);
} else if (choice == zoom_thread) {
zoom_thread:
pstack__push(fstack, &thread_filter);
}
hists__filter_by_thread(self, thread_filter);
- hist_browser__title(msg, sizeof(msg), ev_name,
- dso_filter, thread_filter);
+ hists__browser_title(self, msg, sizeof(msg), ev_name,
+ dso_filter, thread_filter);
hist_browser__reset(browser);
}
}
const char *ev_name = __event_name(hists->type, hists->config);
key = hists__browse(hists, help, ev_name);
-
- if (is_exit_key(key))
- break;
-
switch (key) {
case NEWT_KEY_TAB:
next = rb_next(nd);
continue;
nd = rb_prev(nd);
default:
- break;
+ return key;
}
}
#include "../libslang.h"
#include <elf.h>
-#include <newt.h>
#include <sys/ttydefaults.h>
#include <ctype.h>
#include <string.h>
struct map_browser {
struct ui_browser b;
struct map *map;
- u16 namelen;
u8 addrlen;
};
struct symbol *sym = rb_entry(nd, struct symbol, rb_node);
struct map_browser *mb = container_of(self, struct map_browser, b);
bool current_entry = ui_browser__is_current_entry(self, row);
- int color = ui_browser__percent_color(0, current_entry);
+ int width;
- SLsmg_set_color(color);
+ ui_browser__set_percent_color(self, 0, current_entry);
slsmg_printf("%*llx %*llx %c ",
mb->addrlen, sym->start, mb->addrlen, sym->end,
sym->binding == STB_GLOBAL ? 'g' :
sym->binding == STB_LOCAL ? 'l' : 'w');
- slsmg_write_nstring(sym->name, mb->namelen);
+ width = self->width - ((mb->addrlen * 2) + 4);
+ if (width > 0)
+ slsmg_write_nstring(sym->name, width);
}
/* FIXME uber-kludgy, see comment on cmd_report... */
return 0;
}
-static int map_browser__run(struct map_browser *self, struct newtExitStruct *es)
+static int map_browser__run(struct map_browser *self)
{
+ int key;
+
if (ui_browser__show(&self->b, self->map->dso->long_name,
"Press <- or ESC to exit, %s / to search",
verbose ? "" : "restart with -v to use") < 0)
return -1;
- newtFormAddHotKey(self->b.form, NEWT_KEY_LEFT);
- newtFormAddHotKey(self->b.form, NEWT_KEY_ENTER);
if (verbose)
- newtFormAddHotKey(self->b.form, '/');
+ ui_browser__add_exit_key(&self->b, '/');
while (1) {
- ui_browser__run(&self->b, es);
+ key = ui_browser__run(&self->b);
- if (es->reason != NEWT_EXIT_HOTKEY)
- break;
- if (verbose && es->u.key == '/')
+ if (verbose && key == '/')
map_browser__search(self);
else
break;
}
ui_browser__hide(&self->b);
- return 0;
+ return key;
}
int map__browse(struct map *self)
},
.map = self,
};
- struct newtExitStruct es;
struct rb_node *nd;
char tmp[BITS_PER_LONG / 4];
u64 maxaddr = 0;
for (nd = rb_first(mb.b.entries); nd; nd = rb_next(nd)) {
struct symbol *pos = rb_entry(nd, struct symbol, rb_node);
- if (mb.namelen < pos->namelen)
- mb.namelen = pos->namelen;
if (maxaddr < pos->end)
maxaddr = pos->end;
if (verbose) {
}
mb.addrlen = snprintf(tmp, sizeof(tmp), "%llx", maxaddr);
- mb.b.width += mb.addrlen * 2 + 4 + mb.namelen;
- return map_browser__run(&mb, &es);
+ return map_browser__run(&mb);
}
#include "helpline.h"
#include "util.h"
-newtComponent newt_form__new(void);
-
static void newt_form__set_exit_keys(newtComponent self)
{
newtFormAddHotKey(self, NEWT_KEY_LEFT);
newtFormAddHotKey(self, CTRL('c'));
}
-newtComponent newt_form__new(void)
+static newtComponent newt_form__new(void)
{
newtComponent self = newtForm(NULL, NULL, 0);
if (self)
bool strlazymatch(const char *str, const char *pat);
unsigned long convert_unit(unsigned long value, char *unit);
-#ifndef ESC
-#define ESC 27
-#endif
-
-static inline bool is_exit_key(int key)
-{
- char up;
- if (key == CTRL('c') || key == ESC)
- return true;
- up = toupper(key);
- return up == 'Q';
-}
-
#define _STR(x) #x
#define STR(x) _STR(x)
events = file->f_op->poll(file, &irqfd->pt);
list_add_tail(&irqfd->list, &kvm->irqfds.items);
- spin_unlock_irq(&kvm->irqfds.lock);
/*
* Check if there was an event already pending on the eventfd
if (events & POLLIN)
schedule_work(&irqfd->inject);
+ spin_unlock_irq(&kvm->irqfds.lock);
+
/*
* do not drop the file until the irqfd is fully initialized, otherwise
* we might race against the POLLHUP
cpu);
hardware_disable(NULL);
break;
- case CPU_ONLINE:
+ case CPU_STARTING:
printk(KERN_INFO "kvm: enabling virtualization on CPU%d\n",
cpu);
- smp_call_function_single(cpu, hardware_enable, NULL, 1);
+ hardware_enable(NULL);
break;
}
return NOTIFY_OK;
asmlinkage void kvm_handle_fault_on_reboot(void)
{
- if (kvm_rebooting)
+ if (kvm_rebooting) {
/* spin while reset goes on */
+ local_irq_enable();
while (true)
;
+ }
/* Fault while not rebooting. We want the trace. */
BUG();
}
static struct notifier_block kvm_cpu_notifier = {
.notifier_call = kvm_cpu_hotplug,
- .priority = 20, /* must be > scheduler priority */
};
static int vm_stat_get(void *_offset, u64 *val)