Vxsort (#37159)
authorPeter Sollich <petersol@microsoft.com>
Wed, 15 Jul 2020 20:57:01 +0000 (22:57 +0200)
committerGitHub <noreply@github.com>
Wed, 15 Jul 2020 20:57:01 +0000 (13:57 -0700)
commit69b0d160953f5c920e52f021366743623693c158
tree076bd124c7beeadd7f8e3547601d21a5091bd262
parent64c9e7c6d12db0e03d5a2589239b53b0a4046761
Vxsort (#37159)

* Initial snapshort of vxsort implementation incorporating source code from Dan Shechter.

* Bug fix from Dan.

* Use bigger mark list for experiments.

* Give up if the mark list size is bigger than a reasonable fraction of the ephemeral space.

* Latest version from Dan, disable for ARM64.

* Fixes for Linux compile, ended up disabling vxsort for Linux for now.

* Experimenting with 32-bit sort - the variation that gathers mark list entries pertaining to the local heap by reading the mark lists from all the heaps appears to be too slow and scales very badly with increasing number of heaps.

* 32-bit sort - preserve failing case.

* Do the pointer compression/decompression in place, to improve performance, optionally write mark lists and associated information to binary files for further analysis.

* Introduce runtime check whether CPU supports AVX2 instruction set.

* Implement mark list growth.

* Integrate new version including AVX512 code path.

* Implement runtime test for AVX512 support.

* Move the files for the vectorized sort to their own directory, add stubs to call AVX2 or AVX512 flavor of the sort.

* Get rid of unneeded #include statement in two files.

* Address codereview feedback to specifically say AVX512F instead of just AVX512 as there are multiple subsets.

* Fix CMakeLists.tx files for non-x64 non-Windows targets, introduce separate max mark list sizes for WKS, remove dead code from grow_mark_list, add #ifdef to AVX512 detection to make the other architectures build.

* Instead of modifying the tool-generated header file corinfoinstructionset.h, modify InstructionSetDesc.txt that it is generated from, and run the tools that generates all the files from it.

* Move AVX2/AVX512 instruction set detection to GC side.

* Use vectorized packer, switch packed range from uint32_t to int32_t, because that makes the sorting a bit more efficient.

* Add GCConfig setting to turn vectorized sorting off, streamline ISA detection (but require initialization), rename to IsSupportedInstructionSet.

* Several small improvements:
 - Don't waste time sorting the mark list if background GC is running as we are not going to use it.
 - Use smaller max mark list size if we cannot use AVX2/AVX512 instruction sets
 - Fix mark list overflow detection for server GC.

* Address codereview feedback - add constants for the thresholds above which we use AVX/AVX512F instruction sets.

Add space before parameter lists as per GC codebase coding conventions.

Improve some comments.

* Add license headers and entry in THIRD-PARTY_NOTICES.TXT for Dan's vectorized sorting code.

* Update license headers

* Address code review feedback:
 - fix typo in comment in InitSupportedInstructionSet
 - move test for full GC to beginning of sort_mark_list
 - in WKS GC, we can use the tighter range shigh - slow for the surviving objects instead of the full ephemeral range.
 - make the description for the new config setting GCEnabledInstructionSets more explicit by enumerating the legal values and their meanings.

* Snapshot for Linux changes

* Add more definitions to immintrinh.h

* Fix cmake warnings about mismatched endif clauses.

* Disable Linux support for now due to multiple compile & link errors.

* Address code review feedback:
 - add instructions to bitonic_gen.py
 - centralize range and instruction set checks in do_vxsort
 - add parentheses around expressions.
 - removed some printfs, converted others to dprintf
 - strengthened assert
39 files changed:
THIRD-PARTY-NOTICES.TXT
src/coreclr/src/gc/CMakeLists.txt
src/coreclr/src/gc/gc.cpp
src/coreclr/src/gc/gcconfig.h
src/coreclr/src/gc/gcpriv.h
src/coreclr/src/gc/gcsvr.cpp
src/coreclr/src/gc/gcwks.cpp
src/coreclr/src/gc/sample/CMakeLists.txt
src/coreclr/src/gc/vxsort/alignment.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/defs.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/do_vxsort.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/do_vxsort_avx2.cpp [new file with mode: 0644]
src/coreclr/src/gc/vxsort/do_vxsort_avx512.cpp [new file with mode: 0644]
src/coreclr/src/gc/vxsort/isa_detection.cpp [new file with mode: 0644]
src/coreclr/src/gc/vxsort/isa_detection_dummy.cpp [new file with mode: 0644]
src/coreclr/src/gc/vxsort/machine_traits.avx2.cpp [new file with mode: 0644]
src/coreclr/src/gc/vxsort/machine_traits.avx2.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/machine_traits.avx512.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/machine_traits.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/packer.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/bitonic_sort.AVX2.int32_t.generated.cpp [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/bitonic_sort.AVX2.int32_t.generated.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/bitonic_sort.AVX2.int64_t.generated.cpp [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/bitonic_sort.AVX2.int64_t.generated.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/bitonic_sort.AVX512.int32_t.generated.cpp [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/bitonic_sort.AVX512.int32_t.generated.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/bitonic_sort.AVX512.int64_t.generated.cpp [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/bitonic_sort.AVX512.int64_t.generated.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/bitonic_sort.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/codegen/avx2.py [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/codegen/avx512.py [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/codegen/bitonic_gen.py [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/codegen/bitonic_isa.py [new file with mode: 0644]
src/coreclr/src/gc/vxsort/smallsort/codegen/utils.py [new file with mode: 0644]
src/coreclr/src/gc/vxsort/vxsort.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/vxsort_targets_disable.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/vxsort_targets_enable_avx2.h [new file with mode: 0644]
src/coreclr/src/gc/vxsort/vxsort_targets_enable_avx512.h [new file with mode: 0644]
src/coreclr/src/vm/CMakeLists.txt