Yinghai Lu [Sat, 17 Nov 2012 03:39:13 +0000 (19:39 -0800)]
x86, mm: use PFN_DOWN in split_mem_range()
to replace own inline version for shifting.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-37-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:12 +0000 (19:39 -0800)]
x86, mm: use round_up/down in split_mem_range()
to replace own inline version for those roundup and rounddown.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-36-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:11 +0000 (19:39 -0800)]
x86, mm: Add check before clear pte above max_low_pfn on 32bit
During test patch that adjust page_size_mask to map small range ram with
big page size, found page table is setup wrongly for 32bit. And
native_pagetable_init wrong clear pte for pmd with large page support.
1. add more comments about why we are expecting pte.
2. add BUG checking, so next time we could find problem earlier
when we mess up page table setup again.
3. max_low_pfn is not included boundary for low memory mapping.
We should check from max_low_pfn instead of +1.
4. add print out when some pte really get cleared, or we should use
WARN() to find out why above max_low_pfn get mapped? so we could
fix it.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-35-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:10 +0000 (19:39 -0800)]
x86, mm: Move function declaration into mm_internal.h
They are only for mm/init*.c.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-34-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:09 +0000 (19:39 -0800)]
x86, mm: change low/hignmem_pfn_init to static on 32bit
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-33-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:08 +0000 (19:39 -0800)]
x86, mm: Move init_gbpages() out of setup.c
Put it in mm/init.c, and call it from probe_page_mask().
init_mem_mapping is calling probe_page_mask at first.
So calling sequence is not changed.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-32-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:07 +0000 (19:39 -0800)]
x86, mm: Move back pgt_buf_* to mm/init.c
Also change them to static.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-31-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:06 +0000 (19:39 -0800)]
x86, mm: only call early_ioremap_page_table_range_init() once
On 32bit, before patcheset that only set page table for ram, we only
call that one time.
Now, we are calling that during every init_memory_mapping if we have holes
under max_low_pfn.
We should only call it one time after all ranges under max_low_page get
mapped just like we did before.
Also that could avoid the risk to run out of pgt_buf in BRK.
Need to update page_table_range_init() to count the pages for kmap page table
at first, and use new added alloc_low_pages() to get pages in sequence.
That will conform to the requirement that pages need to be in low to high order.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-30-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Stefano Stabellini [Sat, 17 Nov 2012 03:39:05 +0000 (19:39 -0800)]
x86, mm: Add pointer about Xen mmu requirement for alloc_low_pages
Add link for more information
279b706 x86,xen: introduce x86_init.mapping.pagetable_reserve
-v2: updated to commets from hpa to include commit name.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-29-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:04 +0000 (19:39 -0800)]
x86, mm: Add alloc_low_pages(num)
32bit kmap mapping needs pages to be used for low to high.
At this point those pages are still from pgt_buf_* from BRK, so it is
ok now.
But we want to move early_ioremap_page_table_range_init() out of
init_memory_mapping() and only call it one time later, that will
make page_table_range_init/page_table_kmap_check/alloc_low_page to
use memblock to get page.
memblock allocation for pages are from high to low.
So will get panic from page_table_kmap_check() that has BUG_ON to do
ordering checking.
This patch add alloc_low_pages to make it possible to allocate serveral
pages at first, and hand out pages one by one from low to high.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-28-git-send-email-yinghai@kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:03 +0000 (19:39 -0800)]
x86, mm, Xen: Remove mapping_pagetable_reserve()
Page table area are pre-mapped now after
x86, mm: setup page table in top-down
x86, mm: Remove early_memremap workaround for page table accessing on 64bit
mapping_pagetable_reserve is not used anymore, so remove it.
Also remove operation in mask_rw_pte(), as modified allow_low_page
always return pages that are already mapped, moreover
xen_alloc_pte_init, xen_alloc_pmd_init, etc, will mark the page RO
before hooking it into the pagetable automatically.
-v2: add changelog about mask_rw_pte() from Stefano.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-27-git-send-email-yinghai@kernel.org
Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:02 +0000 (19:39 -0800)]
x86, mm: Move min_pfn_mapped back to mm/init.c
Also change it to static.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-26-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:01 +0000 (19:39 -0800)]
x86, mm: Merge alloc_low_page between 64bit and 32bit
They are almost same except 64 bit need to handle after_bootmem case.
Add mm_internal.h to make that alloc_low_page() only to be accessible
from arch/x86/mm/init*.c
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-25-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:39:00 +0000 (19:39 -0800)]
x86, mm: Remove parameter in alloc_low_page for 64bit
Now all page table buf are pre-mapped, and could use virtual address directly.
So don't need to remember physical address anymore.
Remove that phys pointer in alloc_low_page(), and that will allow us to merge
alloc_low_page between 64bit and 32bit.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-24-git-send-email-yinghai@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:59 +0000 (19:38 -0800)]
x86, mm: Remove early_memremap workaround for page table accessing on 64bit
We try to put page table high to make room for kdump, and at that time
those ranges are not mapped yet, and have to use ioremap to access it.
Now after patch that pre-map page table top down.
x86, mm: setup page table in top-down
We do not need that workaround anymore.
Just use __va to return directly mapping address.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-23-git-send-email-yinghai@kernel.org
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:58 +0000 (19:38 -0800)]
x86, mm: setup page table in top-down
Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
Then use mapped pages to map more ranges below, and keep looping until
all pages get mapped.
alloc_low_page will use page from BRK at first, after that buffer is used
up, will use memblock to find and reserve pages for page table usage.
Introduce min_pfn_mapped to make sure find new pages from mapped ranges,
that will be updated when lower pages get mapped.
Also add step_size to make sure that don't try to map too big range with
limited mapped pages initially, and increase the step_size when we have
more mapped pages on hand.
We don't need to call pagetable_reserve anymore, reserve work is done
in alloc_low_page() directly.
At last we can get rid of calculation and find early pgt related code.
-v2: update to after fix_xen change,
also use MACRO for initial pgt_buf size and add comments with it.
-v3: skip big reserved range in memblock.reserved near end.
-v4: don't need fix_xen change now.
-v5: add changelog about moving about reserving pagetable to alloc_low_page.
Suggested-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-22-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:57 +0000 (19:38 -0800)]
x86, mm: Break down init_all_memory_mapping
Will replace that with top-down page table initialization.
New API need to take range: init_range_memory_mapping()
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-21-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:56 +0000 (19:38 -0800)]
x86, mm: Don't clear page table if range is ram
After we add code use buffer in BRK to pre-map buf for page table in
following patch:
x86, mm: setup page table in top-down
it should be safe to remove early_memmap for page table accessing.
Instead we get panic with that.
It turns out that we clear the initial page table wrongly for next range
that is separated by holes.
And it only happens when we are trying to map ram range one by one.
We need to check if the range is ram before clearing page table.
We change the loop structure to remove the extra little loop and use
one loop only, and in that loop will caculate next at first, and check if
[addr,next) is covered by E820_RAM.
-v2: E820_RESERVED_KERN is treated as E820_RAM. EFI one change some E820_RAM
to that, so next kernel by kexec will know that range is used already.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-20-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:55 +0000 (19:38 -0800)]
x86, mm: Use big page size for small memory range
We could map small range in the middle of big range at first, so should use
big page size at first to avoid using small page size to break down page table.
Only can set big page bit when that range has ram area around it.
-v2: fix 32bit boundary checking. We can not count ram above max_low_pfn
for 32 bit.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-19-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:54 +0000 (19:38 -0800)]
x86, mm: Align start address to correct big page size
We are going to use buffer in BRK to map small range just under memory top,
and use those new mapped ram to map ram range under it.
The ram range that will be mapped at first could be only page aligned,
but ranges around it are ram too, we could use bigger page to map it to
avoid small page size.
We will adjust page_size_mask in following patch:
x86, mm: Use big page size for small memory range
to use big page size for small ram range.
Before that patch, this patch will make sure start address to be
aligned down according to bigger page size, otherwise entry in page
page will not have correct value.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-18-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:53 +0000 (19:38 -0800)]
x86, mm: relocate initrd under all mem for 64bit
instead of under 4g.
For 64bit, we can use any mapped mem instead of low mem.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-17-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Jacob Shin [Sat, 17 Nov 2012 03:38:52 +0000 (19:38 -0800)]
x86, mm: Only direct map addresses that are marked as E820_RAM
Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
backed by actual DRAM. This is fine for holes under 4GB which are covered
by fixed and variable range MTRRs to be UC. However, we run into trouble
on higher memory addresses which cannot be covered by MTRRs.
Our system with 1TB of RAM has an e820 that looks like this:
BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
and so direct mappings are created for huge memory hole between
0x000000e038000000 to 0x0000010000000000. Even though the kernel never
generates memory accesses in that region, since the page tables mark
them incorrectly as being WB, our (AMD) processor ends up causing a MCE
while doing some memory bookkeeping/optimizations around that area.
This patch iterates through e820 and only direct maps ranges that are
marked as E820_RAM, and keeps track of those pfn ranges. Depending on
the alignment of E820 ranges, this may possibly result in using smaller
size (i.e. 4K instead of 2M or 1G) page tables.
-v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range
instead. - Yinghai Lu
-v3: add calculate_all_table_space_size() to get correct needed page table
size. - Yinghai Lu
-v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when
mem map does have hole under 4g that is found by Konard on xen
domU with 8g ram. - Yinghai
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:51 +0000 (19:38 -0800)]
x86, mm: use pfn_range_is_mapped() with reserve_initrd
We are going to map ram only, so under max_low_pfn_mapped,
between 4g and max_pfn_mapped does not mean mapped at all.
Use pfn_range_is_mapped() to find out if range is mapped for initrd.
That could happen bootloader put initrd in range but user could
use memmap to carve some of range out.
Also during copying need to use early_memmap to map original initrd
for accessing.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-15-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:50 +0000 (19:38 -0800)]
x86, mm: use pfn_range_is_mapped() with gart
We are going to map ram only, so under max_low_pfn_mapped,
between 4g and max_pfn_mapped does not mean mapped at all.
Use pfn_range_is_mapped() directly.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-14-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:49 +0000 (19:38 -0800)]
x86, mm: use pfn_range_is_mapped() with CPA
We are going to map ram only, so under max_low_pfn_mapped,
between 4g and max_pfn_mapped does not mean mapped at all.
Use pfn_range_is_mapped() directly.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-13-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Jacob Shin [Sat, 17 Nov 2012 03:38:48 +0000 (19:38 -0800)]
x86, mm: Fixup code testing if a pfn is direct mapped
Update code that previously assumed pfns [ 0 - max_low_pfn_mapped ) and
[ 4GB - max_pfn_mapped ) were always direct mapped, to now look up
pfn_mapped ranges instead.
-v2: change applying sequence to keep git bisecting working.
so add dummy pfn_range_is_mapped(). - Yinghai Lu
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1353123563-3103-12-git-send-email-yinghai@kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Jacob Shin [Sat, 17 Nov 2012 03:38:47 +0000 (19:38 -0800)]
x86, mm: if kernel .text .data .bss are not marked as E820_RAM, complain and fix
There could be cases where user supplied memmap=exactmap memory
mappings do not mark the region where the kernel .text .data and
.bss reside as E820_RAM, as reported here:
https://lkml.org/lkml/2012/8/14/86
Handle it by complaining, and adding the range back into the e820.
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Link: http://lkml.kernel.org/r/1353123563-3103-11-git-send-email-yinghai@kernel.org
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:46 +0000 (19:38 -0800)]
x86, mm: Set memblock initial limit to 1M
memblock_x86_fill() could double memory array.
If we set memblock.current_limit to 512M, so memory array could be around 512M.
So kdump will not get big range (like 512M) under 1024M.
Try to put it down under 1M, it would use about 4k or so, and that is limited.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-10-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:45 +0000 (19:38 -0800)]
x86, mm: Separate out calculate_table_space_size()
It should take physical address range that will need to be mapped.
find_early_table_space should take range that pgt buff should be in.
Separating page table size calculating and finding early page table to
reduce confusing.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-9-git-send-email-yinghai@kernel.org
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:44 +0000 (19:38 -0800)]
x86, mm: Find early page table buffer together
We should not do that in every calling of init_memory_mapping.
At the same time need to move down early_memtest, and could remove after_bootmem
checking.
-v2: fix one early_memtest with 32bit by passing max_pfn_mapped instead.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-8-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:43 +0000 (19:38 -0800)]
x86, mm: Change find_early_table_space() paramters
call split_mem_range inside the function.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-7-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:42 +0000 (19:38 -0800)]
x86, mm: Revert back good_end setting for 64bit
After
| commit
8548c84da2f47e71bbbe300f55edb768492575f7
| Author: Takashi Iwai <tiwai@suse.de>
| Date: Sun Oct 23 23:19:12 2011 +0200
|
| x86: Fix S4 regression
|
| Commit
4b239f458 ("x86-64, mm: Put early page table high") causes a S4
| regression since 2.6.39, namely the machine reboots occasionally at S4
| resume. It doesn't happen always, overall rate is about 1/20. But,
| like other bugs, once when this happens, it continues to happen.
|
| This patch fixes the problem by essentially reverting the memory
| assignment in the older way.
Have some page table around 512M again, that will prevent kdump to find 512M
under 768M.
We need revert that reverting, so we could put page table high again for 64bit.
Takashi agreed that S4 regression could be something else.
https://lkml.org/lkml/2012/6/15/182
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-6-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:41 +0000 (19:38 -0800)]
x86, mm: Move init_memory_mapping calling out of setup.c
Now init_memory_mapping is called two times, later will be called for every
ram ranges.
Could put all related init_mem calling together and out of setup.c.
Actually, it reverts commit 1bbbbe7
x86: Exclude E820_RESERVED regions and memory holes above 4 GB from direct mapping.
will address that later with complete solution include handling hole under 4g.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-5-git-send-email-yinghai@kernel.org
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:40 +0000 (19:38 -0800)]
x86, mm: Move down find_early_table_space()
It will need to call split_mem_range().
Move it down after that to avoid extra declaration.
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-4-git-send-email-yinghai@kernel.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:39 +0000 (19:38 -0800)]
x86, mm: Split out split_mem_range from init_memory_mapping
So make init_memory_mapping smaller and readable.
-v2: use 0 instead of nr_range as input parameter found by Yasuaki Ishimatsu.
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-3-git-send-email-yinghai@kernel.org
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Yinghai Lu [Sat, 17 Nov 2012 03:38:38 +0000 (19:38 -0800)]
x86, mm: Add global page_size_mask and probe one time only
Now we pass around use_gbpages and use_pse for calculating page table size,
Later we will need to call init_memory_mapping for every ram range one by one,
that mean those calculation will be done several times.
Those information are the same for all ram range and could be stored in
page_size_mask and could be probed it one time only.
Move that probing code out of init_memory_mapping into separated function
probe_page_size_mask(), and call it before all init_memory_mapping.
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Link: http://lkml.kernel.org/r/1353123563-3103-2-git-send-email-yinghai@kernel.org
Reviewed-by: Pekka Enberg <penberg@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Linus Torvalds [Sat, 17 Nov 2012 01:42:40 +0000 (17:42 -0800)]
Linux 3.7-rc6
Linus Torvalds [Sat, 17 Nov 2012 00:49:10 +0000 (16:49 -0800)]
Merge git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fix from Marcelo Tosatti:
"A correction for oops on module init with older Intel hosts."
* git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update()
Linus Torvalds [Fri, 16 Nov 2012 23:26:38 +0000 (15:26 -0800)]
Merge branch 'akpm' (Fixes from Andrew)
Merge misc fixes from Andrew Morton.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (12 patches)
revert "mm: fix-up zone present pages"
tmpfs: change final i_blocks BUG to WARNING
tmpfs: fix shmem_getpage_gfp() VM_BUG_ON
mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address
mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"
rapidio: fix kernel-doc warnings
swapfile: fix name leak in swapoff
memcg: fix hotplugged memory zone oops
mips, arc: fix build failure
memcg: oom: fix totalpages calculation for memory.swappiness==0
mm: fix build warning for uninitialized value
mm: add anon_vma_lock to validate_mm()
Andrew Morton [Fri, 16 Nov 2012 22:15:06 +0000 (14:15 -0800)]
revert "mm: fix-up zone present pages"
Revert commit
7f1290f2f2a4 ("mm: fix-up zone present pages")
That patch tried to fix a issue when calculating zone->present_pages,
but it caused a regression on 32bit systems with HIGHMEM. With that
change, reset_zone_present_pages() resets all zone->present_pages to
zero, and fixup_zone_present_pages() is called to recalculate
zone->present_pages when the boot allocator frees core memory pages into
buddy allocator. Because highmem pages are not freed by bootmem
allocator, all highmem zones' present_pages becomes zero.
Various options for improving the situation are being discussed but for
now, let's return to the 3.6 code.
Cc: Jianguo Wu <wujianguo@huawei.com>
Cc: Jiang Liu <jiang.liu@huawei.com>
Cc: Petr Tesarik <ptesarik@suse.cz>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: David Rientjes <rientjes@google.com>
Tested-by: Chris Clayton <chris2553@googlemail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 16 Nov 2012 22:15:04 +0000 (14:15 -0800)]
tmpfs: change final i_blocks BUG to WARNING
Under a particular load on one machine, I have hit shmem_evict_inode()'s
BUG_ON(inode->i_blocks), enough times to narrow it down to a particular
race between swapout and eviction.
It comes from the "if (freed > 0)" asymmetry in shmem_recalc_inode(),
and the lack of coherent locking between mapping's nrpages and shmem's
swapped count. There's a window in shmem_writepage(), between lowering
nrpages in shmem_delete_from_page_cache() and then raising swapped
count, when the freed count appears to be +1 when it should be 0, and
then the asymmetry stops it from being corrected with -1 before hitting
the BUG.
One answer is coherent locking: using tree_lock throughout, without
info->lock; reasonable, but the raw_spin_lock in percpu_counter_add() on
used_blocks makes that messier than expected. Another answer may be a
further effort to eliminate the weird shmem_recalc_inode() altogether,
but previous attempts at that failed.
So far undecided, but for now change the BUG_ON to WARN_ON: in usual
circumstances it remains a useful consistency check.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 16 Nov 2012 22:15:03 +0000 (14:15 -0800)]
tmpfs: fix shmem_getpage_gfp() VM_BUG_ON
Fuzzing with trinity hit the "impossible" VM_BUG_ON(error) (which Fedora
has converted to WARNING) in shmem_getpage_gfp():
WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
Call Trace:
warn_slowpath_common+0x7f/0xc0
warn_slowpath_null+0x1a/0x20
shmem_getpage_gfp+0xa5c/0xa70
shmem_fault+0x4f/0xa0
__do_fault+0x71/0x5c0
handle_pte_fault+0x97/0xae0
handle_mm_fault+0x289/0x350
__do_page_fault+0x18e/0x530
do_page_fault+0x2b/0x50
page_fault+0x28/0x30
tracesys+0xe1/0xe6
Thanks to Johannes for pointing to truncation: free_swap_and_cache()
only does a trylock on the page, so the page lock we've held since
before confirming swap is not enough to protect against truncation.
What cleanup is needed in this case? Just delete_from_swap_cache(),
which takes care of the memcg uncharge.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reported-by: Dave Jones <davej@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Will Deacon [Fri, 16 Nov 2012 22:15:00 +0000 (14:15 -0800)]
mm: highmem: don't treat PKMAP_ADDR(LAST_PKMAP) as a highmem address
kmap_to_page returns the corresponding struct page for a virtual address
of an arbitrary mapping. This works by checking whether the address
falls in the pkmap region and using the pkmap page tables instead of the
linear mapping if appropriate.
Unfortunately, the bounds checking means that PKMAP_ADDR(LAST_PKMAP) is
incorrectly treated as a highmem address and we can end up walking off
the end of pkmap_page_table and subsequently passing junk to pte_page.
This patch fixes the bound check to stay within the pkmap tables.
Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Fri, 16 Nov 2012 22:14:59 +0000 (14:14 -0800)]
mm: revert "mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures"
Jiri Slaby reported the following:
(It's an effective revert of "mm: vmscan: scale number of pages
reclaimed by reclaim/compaction based on failures".) Given kswapd
had hours of runtime in ps/top output yesterday in the morning
and after the revert it's now 2 minutes in sum for the last 24h,
I would say, it's gone.
The intention of the patch in question was to compensate for the loss of
lumpy reclaim. Part of the reason lumpy reclaim worked is because it
aggressively reclaimed pages and this patch was meant to be a sane
compromise.
When compaction fails, it gets deferred and both compaction and
reclaim/compaction is deferred avoid excessive reclaim. However, since
commit
c654345924f7 ("mm: remove __GFP_NO_KSWAPD"), kswapd is woken up
each time and continues reclaiming which was not taken into account when
the patch was developed.
Attempts to address the problem ended up just changing the shape of the
problem instead of fixing it. The release window gets closer and while
a THP allocation failing is not a major problem, kswapd chewing up a lot
of CPU is.
This patch reverts commit
83fde0f22872 ("mm: vmscan: scale number of
pages reclaimed by reclaim/compaction based on failures") and will be
revisited in the future.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Zdenek Kabelac <zkabelac@redhat.com>
Tested-by: Valdis Kletnieks <Valdis.Kletnieks@vt.edu>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Fri, 16 Nov 2012 22:14:56 +0000 (14:14 -0800)]
rapidio: fix kernel-doc warnings
Fix rapidio kernel-doc warnings:
Warning(drivers/rapidio/rio.c:415): No description found for parameter 'local'
Warning(drivers/rapidio/rio.c:415): Excess function parameter 'lstart' description in 'rio_map_inb_region'
Warning(include/linux/rio.h:290): No description found for parameter 'switches'
Warning(include/linux/rio.h:290): No description found for parameter 'destid_table'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Matt Porter <mporter@kernel.crashing.org>
Acked-by: Alexandre Bounine <alexandre.bounine@idt.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Xiaotian Feng [Fri, 16 Nov 2012 22:14:55 +0000 (14:14 -0800)]
swapfile: fix name leak in swapoff
There's a name leak introduced by commit
91a27b2a7567 ("vfs: define
struct filename and have getname() return it"). Add the missing
putname.
[akpm@linux-foundation.org: cleanup]
Signed-off-by: Xiaotian Feng <dannyfeng@tencent.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Fri, 16 Nov 2012 22:14:54 +0000 (14:14 -0800)]
memcg: fix hotplugged memory zone oops
When MEMCG is configured on (even when it's disabled by boot option),
when adding or removing a page to/from its lru list, the zone pointer
used for stats updates is nowadays taken from the struct lruvec. (On
many configurations, calculating zone from page is slower.)
But we have no code to update all the lruvecs (per zone, per memcg) when
a memory node is hotadded. Here's an extract from the oops which
results when running numactl to bind a program to a newly onlined node:
BUG: unable to handle kernel NULL pointer dereference at
0000000000000f60
IP: __mod_zone_page_state+0x9/0x60
Pid: 1219, comm: numactl Not tainted 3.6.0-rc5+ #180 Bochs Bochs
Process numactl (pid: 1219, threadinfo
ffff880039abc000, task
ffff8800383c4ce0)
Call Trace:
__pagevec_lru_add_fn+0xdf/0x140
pagevec_lru_move_fn+0xb1/0x100
__pagevec_lru_add+0x1c/0x30
lru_add_drain_cpu+0xa3/0x130
lru_add_drain+0x2f/0x40
...
The natural solution might be to use a memcg callback whenever memory is
hotadded; but that solution has not been scoped out, and it happens that
we do have an easy location at which to update lruvec->zone. The lruvec
pointer is discovered either by mem_cgroup_zone_lruvec() or by
mem_cgroup_page_lruvec(), and both of those do know the right zone.
So check and set lruvec->zone in those; and remove the inadequate
attempt to set lruvec->zone from lruvec_init(), which is called before
NODE_DATA(node) has been allocated in such cases.
Ah, there was one exceptionr. For no particularly good reason,
mem_cgroup_force_empty_list() has its own code for deciding lruvec.
Change it to use the standard mem_cgroup_zone_lruvec() and
mem_cgroup_get_lru_size() too. In fact it was already safe against such
an oops (the lru lists in danger could only be empty), but we're better
proofed against future changes this way.
I've marked this for stable (3.6) since we introduced the problem in 3.5
(now closed to stable); but I have no idea if this is the only fix
needed to get memory hotadd working with memcg in 3.6, and received no
answer when I enquired twice before.
Reported-by: Tang Chen <tangchen@cn.fujitsu.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Wen Congyang <wency@cn.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Fri, 16 Nov 2012 22:14:52 +0000 (14:14 -0800)]
mips, arc: fix build failure
Using a cross-compiler to fix another issue, the following build error
occurred for mips defconfig:
arch/mips/fw/arc/misc.c: In function 'ArcHalt':
arch/mips/fw/arc/misc.c:25:2: error: implicit declaration of function 'local_irq_disable'
Fix it up by including irqflags.h.
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Fri, 16 Nov 2012 22:14:49 +0000 (14:14 -0800)]
memcg: oom: fix totalpages calculation for memory.swappiness==0
oom_badness() takes a totalpages argument which says how many pages are
available and it uses it as a base for the score calculation. The value
is calculated by mem_cgroup_get_limit which considers both limit and
total_swap_pages (resp. memsw portion of it).
This is usually correct but since
fe35004fbf9e ("mm: avoid swapping out
with swappiness==0") we do not swap when swappiness is 0 which means
that we cannot really use up all the totalpages pages. This in turn
confuses oom score calculation if the memcg limit is much smaller than
the available swap because the used memory (capped by the limit) is
negligible comparing to totalpages so the resulting score is too small
if adj!=0 (typically task with CAP_SYS_ADMIN or non zero oom_score_adj).
A wrong process might be selected as result.
The problem can be worked around by checking mem_cgroup_swappiness==0
and not considering swap at all in such a case.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Fri, 16 Nov 2012 22:14:48 +0000 (14:14 -0800)]
mm: fix build warning for uninitialized value
do_wp_page() sets mmun_called if mmun_start and mmun_end were
initialized and, if so, may call mmu_notifier_invalidate_range_end()
with these values. This doesn't prevent gcc from emitting a build
warning though:
mm/memory.c: In function `do_wp_page':
mm/memory.c:2530: warning: `mmun_start' may be used uninitialized in this function
mm/memory.c:2531: warning: `mmun_end' may be used uninitialized in this function
It's much easier to initialize the variables to impossible values and do
a simple comparison to determine if they were initialized to remove the
bool entirely.
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michel Lespinasse [Fri, 16 Nov 2012 22:14:47 +0000 (14:14 -0800)]
mm: add anon_vma_lock to validate_mm()
Iterating over the vma->anon_vma_chain without anon_vma_lock may cause
NULL ptr deref in anon_vma_interval_tree_verify(), because the node in the
chain might have been removed.
BUG: unable to handle kernel paging request at
fffffffffffffff0
IP: [<
ffffffff8122c29c>] anon_vma_interval_tree_verify+0xc/0xa0
PGD 4e28067 PUD 4e29067 PMD 0
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
CPU 0
Pid: 9050, comm: trinity-child64 Tainted: G W 3.7.0-rc2-next-
20121025-sasha-00001-g673f98e-dirty #77
RIP: 0010: anon_vma_interval_tree_verify+0xc/0xa0
Process trinity-child64 (pid: 9050, threadinfo
ffff880045f80000, task
ffff880048eb0000)
Call Trace:
validate_mm+0x58/0x1e0
vma_adjust+0x635/0x6b0
__split_vma.isra.22+0x161/0x220
split_vma+0x24/0x30
sys_madvise+0x5da/0x7b0
tracesys+0xe1/0xe6
RIP anon_vma_interval_tree_verify+0xc/0xa0
CR2:
fffffffffffffff0
Figured out by Bob Liu.
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Bob Liu <lliubbo@gmail.com>
Signed-off-by: Michel Lespinasse <walken@google.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Takashi Iwai [Fri, 9 Nov 2012 14:20:17 +0000 (15:20 +0100)]
KVM: x86: Fix invalid secondary exec controls in vmx_cpuid_update()
The commit [
ad756a16: KVM: VMX: Implement PCID/INVPCID for guests with
EPT] introduced the unconditional access to SECONDARY_VM_EXEC_CONTROL,
and this triggers kernel warnings like below on old CPUs:
vmwrite error: reg 401e value
a0568000 (err 12)
Pid: 13649, comm: qemu-kvm Not tainted 3.7.0-rc4-test2+ #154
Call Trace:
[<
ffffffffa0558d86>] vmwrite_error+0x27/0x29 [kvm_intel]
[<
ffffffffa054e8cb>] vmcs_writel+0x1b/0x20 [kvm_intel]
[<
ffffffffa054f114>] vmx_cpuid_update+0x74/0x170 [kvm_intel]
[<
ffffffffa03629b6>] kvm_vcpu_ioctl_set_cpuid2+0x76/0x90 [kvm]
[<
ffffffffa0341c67>] kvm_arch_vcpu_ioctl+0xc37/0xed0 [kvm]
[<
ffffffff81143f7c>] ? __vunmap+0x9c/0x110
[<
ffffffffa0551489>] ? vmx_vcpu_load+0x39/0x1a0 [kvm_intel]
[<
ffffffffa0340ee2>] ? kvm_arch_vcpu_load+0x52/0x1a0 [kvm]
[<
ffffffffa032dcd4>] ? vcpu_load+0x74/0xd0 [kvm]
[<
ffffffffa032deb0>] kvm_vcpu_ioctl+0x110/0x5e0 [kvm]
[<
ffffffffa032e93d>] ? kvm_dev_ioctl+0x4d/0x4a0 [kvm]
[<
ffffffff8117dc6f>] do_vfs_ioctl+0x8f/0x530
[<
ffffffff81139d76>] ? remove_vma+0x56/0x60
[<
ffffffff8113b708>] ? do_munmap+0x328/0x400
[<
ffffffff81187c8c>] ? fget_light+0x4c/0x100
[<
ffffffff8117e1a1>] sys_ioctl+0x91/0xb0
[<
ffffffff815a942d>] system_call_fastpath+0x1a/0x1f
This patch adds a check for the availability of secondary exec
control to avoid these warnings.
Cc: <stable@vger.kernel.org> [v3.6+]
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Linus Torvalds [Fri, 16 Nov 2012 22:10:15 +0000 (14:10 -0800)]
Merge git://git./linux/kernel/git/davem/net
Pull networking updates from David Miller:
1) tx_filtered/ps_tx_buf queues need to be accessed with the SKB queue
lock, from Arik Nemtsov.
2) Don't call 802.11 driver's filter configure method until it's
actually open, from Felix Fietkau.
3) Use ieee80211_free_txskb otherwise we leak control information.
From Johannes Berg.
4) Fix memory leak in bluetooth UUID removal,f rom Johan Hedberg.
5) The shift mask trick doesn't work properly when 'optname' is out of
range in do_ip_setsockopt(). Use a straightforward switch statement
instead, the compiler emits essentially the same code but without
the missing range check. From Xi Wang.
6) Fix when we call tcp_replace_ts_recent() otherwise we can
erroneously accept a too-high tsval. From Eric Dumazet.
7) VXLAN bug fixes, mostly to do with VLAN header length handling, from
Alexander Duyck.
8) Missing return value initialization for IPV6_MINHOPCOUNT socket
option handling. From Hannes Frederic.
9) Fix regression in tasklet handling in jme/ksz884x/xilinx drivers,
from Xiaotian Feng.
10) At smsc911x driver init time, we don't know if the chip is in word
swap mode or not. However we do need to wait for the control
register's ready bit to be set before we program any other part of
the chip. Adjust the wait loop to account for this. From Kamlakant
Patel.
11) Revert erroneous MDIO bus unregister change to mdio-bitbang.c
12) Fix memory leak in /proc/net/sctp/, from Tommi Rantala.
13) tilegx driver registers IRQ with NULL name, oops, from Simon Marchi.
14) TCP metrics hash table kzalloc() based allocation can fail, back
down to using vmalloc() if it does. From Eric Dumazet.
15) Fix packet steering out-of-order delivery regression, from Tom
Herbert.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (40 commits)
net-rps: Fix brokeness causing OOO packets
tcp: handle tcp_net_metrics_init() order-5 memory allocation failures
batman-adv: process broadcast packets in BLA earlier
batman-adv: don't add TEMP clients belonging to other backbone nodes
batman-adv: correctly pass the client flag on tt_response
batman-adv: fix tt_global_entries flags update
tilegx: request_irq with a non-null device name
net: correct check in dev_addr_del()
tcp: fix retransmission in repair mode
sctp: fix /proc/net/sctp/ memory leak
Revert "drivers/net/phy/mdio-bitbang.c: Call mdiobus_unregister before mdiobus_free"
net/smsc911x: Fix ready check in cases where WORD_SWAP is needed
drivers/net: fix tasklet misuse issue
ipv4/ip_vti.c: VTI fix post-decryption forwarding
brcmfmac: fix typo in CONFIG_BRCMISCAN
vxlan: Update hard_header_len based on lowerdev when instantiating VXLAN
vxlan: fix a typo.
ipv6: setsockopt(IPIPPROTO_IPV6, IPV6_MINHOPCOUNT) forgot to set return value
doc/net: Fix typo in netdev-features.txt
vxlan: Fix error that was resulting in VXLAN MTU size being 10 bytes too large
...
David S. Miller [Fri, 16 Nov 2012 19:37:18 +0000 (14:37 -0500)]
Merge branch 'for-davem' of git://git./linux/kernel/git/linville/wireless
John W. Linville says:
====================
This batch of fixes is intended for the 3.7 stream...
This includes a pull of the Bluetooth tree. Gustavo says:
"A few important fixes to go into 3.7. There is a new hw support by Marcos
Chaparro. Johan added a memory leak fix and hci device index list fix.
Also Marcel fixed a race condition in the device set up that was prevent the
bt monitor to work properly. Last, Paulo Sérgio added a fix to the error
status when pairing for LE fails. This was prevent userspace to work to handle
the failure properly."
Regarding the mac80211 pull, Johannes says:
"I have a locking fix for some SKB queues, a variable initialization to
avoid crashes in a certain failure case, another free_txskb fix from
Felix and another fix from him to avoid calling a stopped driver, a fix
for a (very unlikely) memory leak and a fix to not send null data
packets when resuming while not associated."
Regarding the iwlwifi pull, Johannes says:
"Two more fixes for iwlwifi ... one to use ieee80211_free_txskb(), and
one to check DMA mapping errors, please pull."
On top of that, Johannes also included a wireless regulatory fix
to allow 40 MHz on channels 12 and 13 in world roaming mode. Also,
Hauke Mehrtens fixes a #ifdef typo in brcmfmac.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Fri, 16 Nov 2012 09:04:15 +0000 (09:04 +0000)]
net-rps: Fix brokeness causing OOO packets
In commit
c445477d74ab3779 which adds aRFS to the kernel, the CPU
selected for RFS is not set correctly when CPU is changing.
This is causing OOO packets and probably other issues.
Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Nov 2012 18:38:12 +0000 (13:38 -0500)]
Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge
Included fixes are:
- update the client entry status flags when using the "early client
detection". This makes the Distributed AP isolation correctly work;
- transfer the client entry status flags when recovering the translation
table from another node. This makes the Distributed AP isolation correctly
work;
- prevent the "early client detection mechanism" to add clients belonging to
other backbone nodes in the same LAN. This breaks connectivity when using this
mechanism together with the Bridge Loop Avoidance
- process broadcast packets with the Bridge Loop Avoidance before any other
component. BLA can possibly drop the packets based on the source address. This
makes the "early client detection mechanism" correctly work when used with
BLA.
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 16 Nov 2012 05:31:53 +0000 (05:31 +0000)]
tcp: handle tcp_net_metrics_init() order-5 memory allocation failures
order-5 allocations can fail with current kernels, we should
try vmalloc() as well.
Reported-by: Julien Tinnes <jln@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Rui [Thu, 15 Nov 2012 00:58:27 +0000 (08:58 +0800)]
Thermal: Add Linux/Thermal subsystem info in MAINTAINER file
All the changes made to the generic thermal layer, or platform thermal
drivers that make use of the thermal layer, should be sent to
linux-pm@vger.kernel.org for discussion.
And as the maintainer, I will only apply the patches that have been sent
to linux-pm@vger.kernel.org.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Tue, 13 Nov 2012 01:53:04 +0000 (17:53 -0800)]
mm, oom: reintroduce /proc/pid/oom_adj
This is mostly a revert of
01dc52ebdf47 ("oom: remove deprecated oom_adj")
from Davidlohr Bueso.
It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier
kernels. It simply scales the value linearly when /proc/pid/oom_score_adj
is written.
The major difference is that its scheduled removal is no longer included
in Documentation/feature-removal-schedule.txt. We do warn users with a
single printk, though, to suggest the more powerful and supported
/proc/pid/oom_score_adj interface.
Reported-by: Artem S. Tashkinov <t.artem@lycos.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 16 Nov 2012 18:08:45 +0000 (10:08 -0800)]
Merge tag 'fixes-for-linus' of git://git./linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"We've been sitting on this longer than we meant to due to travel and
other activities, but the number of patches is luckily not that high.
Biggest changes are from a batch of OMAP bugfixes, but there are a few
for the broader set of SoCs too (bcm2835, pxa, highbank, tegra, at91
and i.MX).
The OMAP patches contain some fixes for MUSB/PHY on omap4 which ends
up being a bit on the large side but needed for legacy (non-DT)
platforms. Beyond that there are a handful of hwmod/pm changes.
So, fairly noncontroversial stuff all in all, and as usual around this
time the fixes are well targeted at specific problems."
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: imx: ehci: fix host power mask bit
ARM i.MX: fix error-valued pointer dereference in clk_register_gate2()
ARM: at91/usbh: fix overcurrent gpio setup
ARM: at91/AT91SAM9G45: fix crypto peripherals irq issue due to sparse irq support
ARM: boot: Fix usage of kecho
ARM: OMAP: ocp2scp: create omap device for ocp2scp
ARM: OMAP4: add _dev_attr_ to ocp2scp for representing usb_phy
drivers: bus: ocp2scp: add pdata support
irqchip: irq-bcm2835: Add terminating entry for of_device_id table
ARM: highbank: retry wfi on reset request
ARM: OMAP4: PM: fix regulator name for VDD_MPU
ARM: OMAP4: hwmod data: do not enable or reset the McPDM during kernel init
ARM: OMAP2+: hwmod: add flag to prevent hwmod code from touching IP block during init
ARM: dt: tegra: fix length of pad control and mux registers
ARM: OMAP: hwmod: wait for sysreset complete after enabling hwmod
ARM: OMAP2+: clockdomain: Fix OMAP4 ISS clk domain to support only SWSUP
ARM: pxa/spitz_pm: Fix hang when resuming from STR
ARM: pxa: hx4700: Fix backlight PWM device number
ARM: OMAP2+: PM: add missing newline to VC warning message
John W. Linville [Fri, 16 Nov 2012 17:59:13 +0000 (12:59 -0500)]
Merge branch 'master' of git://git./linux/kernel/git/linville/wireless into for-davem
Linus Torvalds [Fri, 16 Nov 2012 16:32:07 +0000 (08:32 -0800)]
Merge tag 'arm64-fixes' of git://git./linux/kernel/git/cmarinas/linux-aarch64
Pull arm64 bugfix from Catalin Marinas:
"Arm64 page permission bug fix.
Without this fix, the CPU speculatively accesses the interrupt
controller memory causing random IRQ acknowledge."
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
arm64: Distinguish between user and kernel XN bits
Linus Torvalds [Fri, 16 Nov 2012 15:58:20 +0000 (07:58 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/jikos/hid
Pull HID fix from Jiri Kosina:
"This has a build fix for architectures where memcmp() is macro, from
Jiri Slaby"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: microsoft: do not use compound literal - fix build
Catalin Marinas [Thu, 15 Nov 2012 17:21:16 +0000 (17:21 +0000)]
arm64: Distinguish between user and kernel XN bits
On AArch64, the meaning of the XN bit has changed to UXN (user). The PXN
(privileged) bit must be set to prevent kernel execution. Without the
PXN bit set, the CPU may speculatively access device memory. This patch
ensures that all the mappings that the kernel must not execute from
(including user mappings) have the PXN bit set.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Fri, 16 Nov 2012 15:47:18 +0000 (07:47 -0800)]
Merge tag 'usb-3.7-rc5' of git://git./linux/kernel/git/gregkh/usb
Pull USB fixes from Greg Kroah-Hartman:
"Here are some USB fixes for the 3.7 tree.
Nothing huge here, just a number of tiny bugfixes resolving issues
that have been found, and two reverts of patches that were found to
have caused problems.
All of these have been in linux-next already.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'usb-3.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
Revert "USB/host: Cleanup unneccessary irq disable code"
USB: option: add Alcatel X220/X500D USB IDs
USB: option: add Novatel E362 and Dell Wireless 5800 USB IDs
USB: keyspan: fix typo causing GPF on open
USB: fix build with XEN and EARLY_PRINTK_DBGP enabled but USB_SUPPORT disabled
USB: usb_wwan: fix bulk-urb allocation
usb: otg: Fix build errors if USB_MUSB_OMAP2PLUS is selected as module
usb: musb: ux500: fix 'musbid' undeclared error in ux500_remove()
Revert "usb: musb: use DMA mode 1 whenever possible"
Linus Torvalds [Fri, 16 Nov 2012 15:46:38 +0000 (07:46 -0800)]
Merge tag 'tty-3.7-rc5' of git://git./linux/kernel/git/gregkh/tty
Pull TTY fixes from Greg Kroah-Hartman:
"Here are two TTY driver fixes for 3.7-rc5.
They resolve a bug in the hvc driver that has been reported, and fix a
problem with the list of device ids in the max310x serial driver.
Both have been in linux-next for a while.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'tty-3.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: serial: max310x: Add terminating entry for spi_device_id table
TTY: hvc_console, fix port reference count going to zero prematurely
Linus Torvalds [Fri, 16 Nov 2012 15:46:04 +0000 (07:46 -0800)]
Merge tag 'staging-3.7-rc5' of git://git./linux/kernel/git/gregkh/staging
Pull staging tree fix from Greg Kroah-Hartman:
"Here is a single patch, a revert of an android driver patch, that
resolves a bug that has been reported in the Android alarm driver.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
* tag 'staging-3.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
Revert "Staging: Android alarm: IOCTL command encoding fix"
Arnd Bergmann [Fri, 16 Nov 2012 15:43:58 +0000 (16:43 +0100)]
Merge tag 'at91-fixes' of git://github.com/at91linux/linux-at91 into fixes
From Nicolas Ferre <nicolas.ferre@atmel.com>:
Two little fixes, one related to the move to sparse irq and
another one fixing the check of a GPIO for USB host overcurrent.
* tag 'at91-fixes' of git://github.com/at91linux/linux-at91:
ARM: at91/usbh: fix overcurrent gpio setup
ARM: at91/AT91SAM9G45: fix crypto peripherals irq issue due to sparse irq support
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Arnd Bergmann [Fri, 16 Nov 2012 15:42:59 +0000 (16:42 +0100)]
Merge tag 'imx-fixes-rc' of git://git.pengutronix.de/git/imx/linux-2.6 into fixes
From Sascha Hauer <s.hauer@pengutronix.de>:
ARM i.MX fixes for 3.7-rc
* tag 'imx-fixes-rc' of git://git.pengutronix.de/git/imx/linux-2.6:
ARM: imx: ehci: fix host power mask bit
ARM i.MX: fix error-valued pointer dereference in clk_register_gate2()
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Linus Torvalds [Fri, 16 Nov 2012 15:39:30 +0000 (07:39 -0800)]
Merge branch 'for-linus' of git://git./linux/kernel/git/s390/linux
Pull s390 patches from Martin Schwidefsky:
"Some more bug fixes and a config change.
The signal bug is nasty, if the clock_gettime vdso function is
interrupted by a signal while in access-register-mode we end up with
an endless signal loop until the signal stack is full. The config
change is for aligned struct pages, gives us 8% improvement with
hackbench."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/3215: fix tty close handling
s390/mm: have 16 byte aligned struct pages
s390/gup: fix access_ok() usage in __get_user_pages_fast()
s390/gup: add missing TASK_SIZE check to get_user_pages_fast()
s390/topology: fix core id vs physical package id mix-up
s390/signal: set correct address space control
Linus Torvalds [Fri, 16 Nov 2012 15:32:32 +0000 (07:32 -0800)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"All pretty normal: one TTM oops fix, one radeon, a few intel and a
vmwgfx fix."
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
drm/ttm: remove unneeded preempt_disable/enable
ttm: Clear the ttm page allocated from high memory zone correctly
vmwgfx: return an -EFAULT if copy_to_user() fails
drm/radeon: fix logic error in atombios_encoders.c
drm/i915: do not ignore eDP bpc settings from vbt
drm/i915/sdvo: clean up connectors on intel_sdvo_init() failures
drm/i915/crt: fix DPMS standby and suspend mode handling
Linus Torvalds [Fri, 16 Nov 2012 15:19:45 +0000 (07:19 -0800)]
Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux
Pull another clk layer fix from Michael Turquette:
"GCC 4.7 users get compilation errors from unnecessary use of inline in
clk-provider.h. This pull request fixes the regression by removing
inline usage from those function declarations."
* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mturquette/linux:
clk: remove inline usage from clk-provider.h
Christoph Fritz [Fri, 16 Nov 2012 14:39:24 +0000 (15:39 +0100)]
ARM: imx: ehci: fix host power mask bit
This patch sets HPM (Host power mask bit) to bit 16 according to i.MX
Reference Manual. Falsely it was set to bit 8, but this controls pull-up
Impedance.
Reported-by: Michael Burkey <mdburkey@gmail.com>
Cc: Stable <stable@vger.kernel.org>
Signed-off-by: Christoph Fritz <chf.fritz@googlemail.com>
Acked-by: Eric Bénard <eric@eukrea.com>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Wei Yongjun [Thu, 25 Oct 2012 15:02:18 +0000 (23:02 +0800)]
ARM i.MX: fix error-valued pointer dereference in clk_register_gate2()
The error-valued pointer clk is used for the arg of kfree, it should be
kfree(gate) if clk_register() return ERR_PTR().
dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Sascha Hauer <s.hauer@pengutronix.de>
Johan Hovold [Wed, 14 Nov 2012 11:18:17 +0000 (12:18 +0100)]
ARM: at91/usbh: fix overcurrent gpio setup
Use gpio_is_valid also for overcurrent pins (which are currently
negative in many board files).
Signed-off-by: Johan Hovold <jhovold@gmail.com>
Acked-by: Jean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
Signed-off-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Nicolas Royer [Tue, 6 Nov 2012 16:31:03 +0000 (17:31 +0100)]
ARM: at91/AT91SAM9G45: fix crypto peripherals irq issue due to sparse irq support
Spare irq support introduced by commit 8fe82a5 (ARM: at91: sparse irq support)
involves to add the NR_IRQS_LEGACY offset to irq number.
Signed-off-by: Nicolas Royer <nicolas@eukrea.com>
Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com>
Acked-by: Eric Bénard <eric@eukrea.com>
Tested-by: Eric Bénard <eric@eukrea.com>
Cc: stable@vger.kernel.org # 3.6
Antonio Quartulli [Thu, 8 Nov 2012 20:55:30 +0000 (21:55 +0100)]
batman-adv: process broadcast packets in BLA earlier
The logic in the BLA mechanism may decide to drop broadcast packets
because the node may still be in the setup phase. For this reason,
further broadcast processing like the early client detection mechanism
must be done only after the BLA check.
This patches moves the invocation to BLA before any other broadcast
processing.
This was introduced
30cfd02b60e1cb16f5effb0a01f826c5bb7e4c59
("batman-adv: detect not yet announced clients")
Reported-by: Glen Page <glen.page@thet.net>
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Antonio Quartulli [Thu, 8 Nov 2012 20:55:29 +0000 (21:55 +0100)]
batman-adv: don't add TEMP clients belonging to other backbone nodes
The "early client detection" mechanism must not add clients belonging
to other backbone nodes. Such clients must be reached by directly
using the LAN instead of the mesh.
This was introduced by
30cfd02b60e1cb16f5effb0a01f826c5bb7e4c59
("batman-adv: detect not yet announced clients")
Reported-by: Glen Page <glen.page@thet.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Antonio Quartulli [Thu, 8 Nov 2012 13:21:11 +0000 (14:21 +0100)]
batman-adv: correctly pass the client flag on tt_response
When a TT response with the full table is sent, the client flags
should be sent as well. This patch fix the flags assignment when
populating the tt_response to send back
This was introduced by
30cfd02b60e1cb16f5effb0a01f826c5bb7e4c59
("batman-adv: detect not yet announced clients")
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Antonio Quartulli [Wed, 7 Nov 2012 14:05:33 +0000 (15:05 +0100)]
batman-adv: fix tt_global_entries flags update
Flags carried by a change_entry have to be always copied into the
client entry as they may contain important attributes (e.g.
TT_CLIENT_WIFI).
For instance, a client added by means of the "early detection
mechanism" has no flag set at the beginning, so they must be updated once the
proper ADD event is received.
This was introduced by
30cfd02b60e1cb16f5effb0a01f826c5bb7e4c59
("batman-adv: detect not yet announced clients")
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Simon Marchi [Thu, 15 Nov 2012 18:13:19 +0000 (18:13 +0000)]
tilegx: request_irq with a non-null device name
This patch simply makes the tilegx net driver call request_irq with a
non-null name. It makes the output in /proc/interrupts more obvious, but
also helps tools that don't expect to find null there.
Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Fri, 16 Nov 2012 00:00:43 +0000 (10:00 +1000)]
Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel into drm-fixes
Daniel writes:
Just a few small things to fix regressions, somehow all patches from Jani:
- Fix dpms confusion about which platforms support intermediate modes on
vga.
- Revert the "ignore vbt for eDP bpc" patch, it breaks machines. This will
annoy mbp retina owners again, but windows machines seem to _really_
depend upon this. We can try to quirk the mbp retinas again in 3.8 and
backport the patch.
- Fix connector leaks when the sdvo setup failed, resulted in an OOPS
later on when trying to probe that connector (with it's encoder kfree'd
already).
* 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel:
drm/i915: do not ignore eDP bpc settings from vbt
drm/i915/sdvo: clean up connectors on intel_sdvo_init() failures
drm/i915/crt: fix DPMS standby and suspend mode handling
Dave Airlie [Fri, 16 Nov 2012 00:00:24 +0000 (10:00 +1000)]
Merge branch 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
Just a single radeon fix from Alex.
* 'drm-fixes-3.7' of git://people.freedesktop.org/~agd5f/linux:
drm/radeon: fix logic error in atombios_encoders.c
Akinobu Mita [Fri, 9 Nov 2012 12:10:43 +0000 (12:10 +0000)]
drm/ttm: remove unneeded preempt_disable/enable
It is unnecessary to disable preemption explicitly while calling
copy_highpage(). Because copy_highpage() will do it again through
kmap_atomic/kunmap_atomic.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Zhao Yakui [Tue, 13 Nov 2012 18:31:55 +0000 (18:31 +0000)]
ttm: Clear the ttm page allocated from high memory zone correctly
The TTM page can be allocated from high memory. In such case it is
wrong to use the page_address(page) as the virtual address for the high memory
page.
bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=50241
Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Dan Carpenter [Mon, 12 Nov 2012 11:07:24 +0000 (11:07 +0000)]
vmwgfx: return an -EFAULT if copy_to_user() fails
copy_to_user() returns the number of bytes remaining to be copied, but
we want to return a negative error code here. I fixed a couple of these
last year, but I missed this one.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Jiri Pirko [Wed, 14 Nov 2012 02:51:04 +0000 (02:51 +0000)]
net: correct check in dev_addr_del()
Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
sets this. Correct the check to behave properly on addr removal.
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Vagin [Thu, 15 Nov 2012 04:03:17 +0000 (04:03 +0000)]
tcp: fix retransmission in repair mode
Currently if a socket was repaired with a few packet in a write queue,
a kernel bug may be triggered:
kernel BUG at net/ipv4/tcp_output.c:2330!
RIP: 0010:[<
ffffffff8155784f>] tcp_retransmit_skb+0x5ff/0x610
According to the initial realization v3.4-rc2-963-gc0e88ff,
all skb-s should look like already posted. This patch fixes code
according with this sentence.
Here are three points, which were not done in the initial patch:
1. A tcp send head should not be changed
2. Initialize TSO state of a skb
3. Reset the retransmission time
This patch moves logic from tcp_sendmsg to tcp_write_xmit. A packet
passes the ussual way, but isn't sent to network. This patch solves
all described problems and handles tcp_sendpages.
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrey Vagin <avagin@openvz.org>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Mazanov [Thu, 15 Nov 2012 17:07:00 +0000 (21:07 +0400)]
clk: remove inline usage from clk-provider.h
Users of GCC 4.7 have reported compiler errors due to having inline
applied to function declarations in clk-provider.h. The definitions
exist in drivers/clk/clk.c. An example error:
In file included from arch/arm/mach-omap2/clockdomain.c:25:0:
arch/arm/mach-omap2/clockdomain.c: In function ‘clkdm_clk_disable’:
include/linux/clk-provider.h:338:12: error: inlining failed in call to always_inline ‘__clk_get_enable_count’: function body not available
arch/arm/mach-omap2/clockdomain.c:1001:28: error: called from here
make[1]: *** [arch/arm/mach-omap2/clockdomain.o] Error 1
make: *** [arch/arm/mach-omap2] Error 2
This patch removes the use of inline from include/linux/clk-provider.h
but keeps the function definitions in drivers/clk/clk.c as inlined since
they are one-liners.
Signed-off-by: Igor Mazanov <i.mazanov@gmail.com>
Acked-by: Paul Walmsley <paul@pwsan.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
[mturquette@linaro.org: improved subject, added changelog]
Linus Torvalds [Thu, 15 Nov 2012 19:34:45 +0000 (11:34 -0800)]
Merge tag 'for-linus' of git://github.com/gxt/linux
Pull unicore32 update from Guan Xuetao.
* tag 'for-linus' of git://github.com/gxt/linux:
arch/unicore32: remove CONFIG_EXPERIMENTAL
unicore32: switch to generic sys_execve()
unicore32: switch to generic kernel_thread()/kernel_execve()
unicore32: Use Kbuild infrastructure for kvm_para.h
UAPI: (Scripted) Disintegrate arch/unicore32/include/asm
UniCore32-bugfix: Remove definitions in asm/bug.h to solve difference between native and cross compiler
UniCore32-bugfix: fix mismatch return value of __xchg_bad_pointer
UniCore32 bugfix: add missed CONFIG_ZONE_DMA
unicore32/mm/fault.c: Port OOM changes to do_pf
Linus Torvalds [Thu, 15 Nov 2012 19:28:43 +0000 (11:28 -0800)]
Merge tag 'upstream-3.7-rc6' of git://git.infradead.org/linux-ubifs
Pull UBIFS fixes from Artem Bityutskiy:
"Two patches which fix a problem reported by several people in the
past, but only fixed now because no one gave enough material for
debugging.
Anyway, these fix the problem that sometimes after a power cut the
file-system is not mountable with the following symptom:
grab_empty_leb: could not find an empty LEB
The fixes make the file-system mountable again."
* tag 'upstream-3.7-rc6' of git://git.infradead.org/linux-ubifs:
UBIFS: fix mounting problems after power cuts
UBIFS: introduce categorized lprops counter
Linus Torvalds [Thu, 15 Nov 2012 19:27:53 +0000 (11:27 -0800)]
Merge tag 'for-v3.7-fixes' of git://git.infradead.org/users/cbou/linux-pstore
Pull pstore fix from Anton Vorontsov:
"A small fixup for the persistent storage subsystem. The bug can
prevent kernel booting on a APEI-enabled machines w/ PSTORE_CONSOLE=y
(this is N by default, though)."
* tag 'for-v3.7-fixes' of git://git.infradead.org/users/cbou/linux-pstore:
pstore: Fix NULL pointer dereference in console writes
Linus Torvalds [Thu, 15 Nov 2012 19:25:39 +0000 (11:25 -0800)]
Merge branch 'i2c-for-linus' of git://git./linux/kernel/git/jdelvare/staging
Pill i2c fixes from Jean Delvare.
Well, "fixes".. The biggest patch here is actually Jan marking Wolfram
Sang as the main i2c subsystem maintainer, with Jan staying on as the PC
controller maintainer.
* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
i2c-mux-pinctrl: Fix probe error path
MAINTAINERS: i2c: 7 years, this is it
Linus Torvalds [Thu, 15 Nov 2012 19:22:03 +0000 (11:22 -0800)]
Merge tag 'regulator-3.7' of git://git./linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A few fixes for teardown issues that will be rarely seen, plus a fix
for a silly bug in regulator_is_supported_voltage() which shows how
often the answer to the question should be false.
The supported voltage commit is very new as I just edited to add a Cc
to stable, the code itself has been in -next."
* tag 'regulator-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fix voltage check in regulator_is_supported_voltage()
regulator: core: Avoid deadlock when regulator_register fails
Regulator: core: Unregister when gpio request fails.
Linus Torvalds [Thu, 15 Nov 2012 19:21:28 +0000 (11:21 -0800)]
Merge tag 'sound-3.7' of git://git./linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"The only large LOC is seen in WM5102 driver, just writing a bunch of
register updates, but the actual code change is small. Other than
that, all small fixes suitable for rc6."
* tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: usb-audio: Fix mutex deadlock at disconnection
ALSA: fm801: precedence bug in snd_fm801_tea575x_get_pins()
ALSA: es1968: precedence bug in snd_es1968_tea575x_get_pins()
ALSA: hda - Add a missing quirk entry for iMac 9,1
ASoC: core: Double control update err for snd_soc_put_volsw_sx
ASoC: dapm: Use card_list during DAPM shutdown
ASoC: cs42l52: fix the return value of cs42l52_set_fmt()
ASoC: bells: Correct type in sub speaker DAI name for WM5102
ASoC: wm8978: pll incorrectly configured when codec is master
ASoC: mxs-saif: Fix channel swap for 24-bit format
ASoC: bells: Select WM1250-EV1 Springbank audio I/O module
ASoC: bells: Add missing select of WM0010
ASoC: mxs-saif: Add MODULE_ALIAS
ASoC: wm5102: Write register value corrections after SYSCLK is enabled
Tommi Rantala [Thu, 15 Nov 2012 03:49:05 +0000 (03:49 +0000)]
sctp: fix /proc/net/sctp/ memory leak
Commit 13d782f ("sctp: Make the proc files per network namespace.")
changed the /proc/net/sctp/ struct file_operations opener functions to
use single_open_net() and seq_open_net().
Avoid leaking memory by using single_release_net() and seq_release_net()
as the release functions.
Discovered with Trinity (the syscall fuzzer).
Signed-off-by: Tommi Rantala <tt.rantala@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiko Carstens [Thu, 15 Nov 2012 08:22:40 +0000 (09:22 +0100)]
s390/3215: fix tty close handling
The 3215 console always has the RAW3215_FIXED flag set, which causes
raw3215_shutdown() not to wait for outstanding I/O requests if an attached
tty gets closed.
The flag however can be simply removed, so we can guarantee that all requests
belonging to the tty have been processed when the tty is closed.
However the tasklet that belongs to the 3215 device may be scheduled even if
there is no tty attached anymore, since we have a race between console and tty
processing.
Thefore unconditional tty_wakekup() in raw3215_wakeup() can cause the following
NULL pointer dereference:
3.465368 Unable to handle kernel pointer dereference at virtual kernel address (null)
3.465448 Oops: 0004 #1 SMP
3.465454 Modules linked in:
3.465459 CPU: 1 Not tainted 3.6.0 #1
3.465462 Process swapper/1 (pid: 0, task:
000000003ffa4428, ksp:
000000003ffb7ce0)
3.465466 Krnl PSW :
0404100180000000 0000000000162f86 (__wake_up+0x46/0xb8)
3.465480 R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
Krnl GPRS:
fffffffffffffffe 0000000000000000 0000000000000160 0000000000000001
3.465492
0000000000000001 0000000000000004 0000000000000004 000000000096b490
3.465499
0000000000000001 0000000000000100 0000000000000001 0000000000000001
3.465506
070000003fc87d60 0000000000000160 000000003fc87d68 000000003fc87d00
3.465526 Krnl Code:
0000000000162f76:
e3c0f0a80004 lg %r12,168(%r15)
0000000000162f7c:
58000370 l %r0,880
#
0000000000162f80:
c007ffffffff00 xilf %r0,
4294967295
>
0000000000162f86:
ba102000 cs %r1,%r0,0(%r2)
0000000000162f8a: 1211 ltr %r1,%r1
0000000000162f8c:
a774002f brc 7,162fea
0000000000162f90:
b904002d lgr %r2,%r13
0000000000162f94:
b904003a lgr %r3,%r10
3.465597 Call Trace:
3.465599 (<
0400000000000000> 0x400000000000000)
3.465602 <
000000000048c77e> raw3215_wakeup+0x2e/0x40
3.465607 <
0000000000134d66> tasklet_action+0x96/0x168
3.465612 <
000000000013423c> __do_softirq+0xd8/0x21c
3.465615 <
0000000000134678> irq_exit+0xa8/0xac
3.465617 <
000000000046c232> do_IRQ+0x182/0x248
3.465621 <
00000000005c8296> io_return+0x0/0x8
3.465625 <
00000000005c7cac> vtime_stop_cpu+0x4c/0xb8
3.465629 (<
0000000000194e06> tick_nohz_idle_enter+0x4e/0x74)
3.465633 <
0000000000104760> cpu_idle+0x170/0x184
3.465636 <
00000000005b5182> smp_start_secondary+0xd6/0xe0
3.465641 <
00000000005c86be> restart_int_handler+0x56/0x6c
3.465643 <
0000000000000000> 0x0
3.465645 Last Breaking-Event-Address:
3.465647 <
0000000000403136> tty_wakeup+0x46/0x98
3.465652
3.465654 Kernel panic - not syncing: Fatal exception in interrupt
01: HCPGIR450W CP entered; disabled wait PSW
00020001 80000000 00000000 0010F63C
The easiest solution is simply to check if tty is NULL in the tasklet.
If it is NULL nothing is to do (no tty attached), otherwise tty_wakeup()
can be called, since we hold a reference to the tty.
This is not nice... but it is a small patch and it works.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Alex Deucher [Wed, 14 Nov 2012 14:10:39 +0000 (09:10 -0500)]
drm/radeon: fix logic error in atombios_encoders.c
Fixes:
https://bugzilla.kernel.org/show_bug.cgi?id=50431
Reported-by: David Binderman <dcb314@hotmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
David S. Miller [Thu, 15 Nov 2012 03:32:15 +0000 (22:32 -0500)]
Revert "drivers/net/phy/mdio-bitbang.c: Call mdiobus_unregister before mdiobus_free"
This reverts commit
aa731872f7d33dcb8b54dad0cfb82d4e4d195d7e.
As pointed out by Ben Hutchings, this change is not correct.
mdiobus_unregister() can't be called if the bus isn't registered yet,
however this change can result in situations which cause that to
happen.
Part of the confusion here revolves around the fact that the
callers of this module control registration/unregistration,
rather than the module itself.
Signed-off-by: David S. Miller <davem@davemloft.net>
Kamlakant Patel [Wed, 14 Nov 2012 01:41:38 +0000 (01:41 +0000)]
net/smsc911x: Fix ready check in cases where WORD_SWAP is needed
The chip ready check added by the commit
3ac3546e [Always wait for
the chip to be ready] does not work when the register read/write
is word swapped. This check has been added before the WORD_SWAP
register is programmed, so we need to check for swapped register
value as well.
Bit 16 is marked as RESERVED in SMSC datasheet, Steve Glendinning
<steve@shawell.net> checked with SMSC and wrote:
The chip architects have concluded we should be reading PMT_CTRL
until we see any of bits 0, 8, 16 or 24 set. Then we should read
BYTE_TEST to check the byte order is correct (as we already do).
The rationale behind this is that some of the chip variants have
word order swapping features too, so the READY bit could actually
be in any of the 4 possible locations. The architects have confirmed
that if any of these 4 positions is set the chip is ready. The other
3 locations will either never be set or can only go high after READY
does (so also indicate the device is ready).
This change will check for the READY bit at the 16th position. We do
not check the other two cases (bit 8 and 24) since the driver does not
support byte-swapped register read/write.
Signed-off-by: Kamlakant Patel <kamlakant.patel@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>