review.tizen.org Git - platform/upstream/llvm.git/log

[X86] merge "={eax}" and "~{eax}" into "=&eax" for MSInlineASM

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D94466

[clangd] Treat "null" optional fields as missing

Clangd currently throws away any protocol messages whenever an optional
field has an unexpected type. This patch changes the behaviour to treat
`null` fields as missing.

This enables clangd to be more tolerant against small violations to the
LSP spec.

Fixes https://github.com/clangd/vscode-clangd/issues/134

Differential Revision: https://reviews.llvm.org/D95229

[OpenMP][Libomptarget] Fix check-libomptarget

The check-libomptarget fails when building with LLVM_ENABLE_PROJECTS. This is because test configuration misses the path to libomp.so and libLLVMSupport.so when time profiling is enabled (both libraries have the same path when building). This patch add the path to the configuration.

Reviewed By: vzakhari

Differential Revision: https://reviews.llvm.org/D95376

[OpenMP] Fix building using LLVM_ENABLE_RUNTIMES

Fix when time profiling is enabled.

Related to: D94855

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95398

[clangd] Work around GCC bug 66735

Try to fix cl-options.c on bots were the default triple is non-x86 non-arm

llvmArchToWindowsSDKArch() returns "" for non-intel non-arm archs.
We're checking for "/fake/lib/" which is followed by the result
of that function -- but if that returns an empty string, then that
trailing slash isn't there. As fix, just explicitly pass a triple
that's intel or arm (I randomly chose aarch64). Since the test runs
with -###, that arch doesn't have to be in LLVM_TARGETS_TO_BUILD.

clang-cl: Prefer /vctoolsdir, /winsdkdir over LIB for link invocations

/vctoolsdir and /winsdkdir take precedence over the INCLUDE env var,
so they should also take precedence over LIB. It's not quite as neat
since LIB is still read by the linker and the linker just prefers
the -libpath: paths the driver now passes, but as long as all libraries
are present at /vctoolsdir and /winsdkdir, there's no harm in the linker
also looking at LIB later.

This fixes cl-options.c after a5d85cbe on Windows when LIB is set.
Another way to fix the test would be to prefix the clang-cl
line with `env --unset=LIB`, but I think it's better to fix the
flag to work as expected instead of making the test work around
the surprising behavior that LIB being set causes clang-cl to
not pass -libpath: flags to the linker when /vctoolsdir and
/winsdkdir are used.

[clang][cli] Generate HeaderSearch options separately

This patch moves parsing of header search options from `generateCC1Options` to separate `GenerateHeaderSearchArgs`.

The round-trip algorithm in D94472 requires this separation to be able to run parsing and generating **only** for the options that need to be tested via round-tripping.

This also moves the `GENERATE_OPTION_WITH_MARSHALLING` to the top of the file, because other kinds of options will be generated in separate functions that will be spread throughout `CompilerInvocation.cpp` to be close to their parsing counterparts.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D94803

[clang][cli] Parse HeaderSearch options separately

This patch moves parsing of header search options from `parseSimpleArgs` back to `ParseHeaderSearchArgs` where they originally were.

The round-trip algorithm in D94472 requires this separation to be able to run parsing and generating **only** for the options that need to be tested via round-tripping.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D94802

[clang][cli] Port OpenMP-related LangOpts to marshalling system

Port some OpenMP-related language options to the marshalling system for automatic command line parsing and generation.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D95348

[LoopUnswitch] Add test cases not partially unswitched due to cost.

This pre-commits tests for D95468.

[mlir:async] Fix deadlock in async runtime await-and-execute functions

`emplace???` functions running concurrently can set the ready flag and then pending awaiter will never be executed

Differential Revision: https://reviews.llvm.org/D95517

[lldb] Add move_iterator to supported template list

Identical to previous commits that just add a standard library template to the
supported template list and test it. Adding this rather obscure class to the
template list is mostly caused by the std::deque test unexpectedly referencing
this type when testing against newer libc++ versions on macOS.

Fixes TestQueueFromStdModule and TestQueueFromStdModule on macOS.
Fixes rdar://73213589

[DWARF] Create subprogram's DIE in DISubprogram's unit

This is a fix for PR48790. Over in D70350, subprogram DIEs were permitted
to be shared between CUs. However, the creation of a subprogram DIE can be
triggered early, from other CUs. The subprogram definition is then created
in one CU, and when the function is actually emitted children are attached
to the subprogram that expect to be in another CU. This breaks internal CU
references in the children.

Fix this by redirecting the creation of subprogram DIEs in
getOrCreateContextDIE to the CU specified by it's DISubprogram definition.
This ensures that the subprogram DIE is always created in the correct CU.

Differential Revision: https://reviews.llvm.org/D94976

[OpenCL][Docs] Moved info from UsersManual into OpenCLSupport.

Moved information detailing the implementation from UsersManual
into OpenCLSupport page as it is not relevant to the user's of
clang but primarily needed for the compiler developers.

Tags: #clang

Differential Revision: https://reviews.llvm.org/D95061

[analyzer] NFC: Introduce reusable bug category for "C++ move semantics".

Currently only used by MoveChecker but ideally all checkers
should have reusable categories.

clang-cl: Add /winsdkdir and /winsdkversion flags

These do for the Windows SDK path what D85998 did for
%VCToolsInstallDir% with /vctoolsdir: Offer a way to set them with an
explicit commandline switch.

With this (and /vctoolsdir), it's possible to compile and link
against hermetic vctools and winsdk directories with:

    out/gn/bin/clang-cl win.c -fuse-ld=lld \
        /vctoolsdir path/to/VC/Tools/MSVC/14.26.28801 \
        /winsdkdir path/to/win_sdk

compared to a long list of -imsvc and /link /libpath: flags.

While here:
- Change the case of the "Include" folder inside the windows sdk
  from "include" to "Include" to match on-disk case. Since the
  Windows file system is case-insensitive this isn't a behavior
  change, it's just a bit cleaner.
- Add libpath tests to the /vctoolsdir
- Add a FIXME about reading env vars for win sdk and ucrt sdk
  if these flags aren't present, to match the VCToolsInstallDir
  logic

We should also cache all these computed paths in the driver instead
of computing them every time they're queried, but that's for a future
patch.

It'd also be nice to invent a /winsysroot: flag that sets both
/vctoolsdir: and /winsdkdir: to some well-known subdirectory.
That's for a future patch as well.

Differential Revision: https://reviews.llvm.org/D95472

[SCEV] Fix incorrect loop exit count analysis.

In computeLoadConstantCompareExitLimit, the addrec used to compute the
exit count should be from the loop which the exiting block belongs to.

Reviewed by: mkazantsev

Differential Revision: https://reviews.llvm.org/D92367

[clang][AST] Encapsulate DeclarationNameLoc, NFCI

This change makes `DeclarationNameLoc` a proper class and refactors its
users to use getter methods instead of accessing the members directly.
The change also makes `DeclarationNameLoc` immutable (i.e., it cannot
be modified once constructed).

Reviewed By: aprantl

Differential Revision: https://reviews.llvm.org/D94596

[MachineLICM][MachineSink] Move SinkIntoLoop to MachineSink.

This moves SinkIntoLoop from MachineLICM to MachineSink. The motivation for
this work is that hoisting is a canonicalisation transformation, but we do not
really have a good story to sink instructions back if that is better, e.g. to
reduce live-ranges, register pressure and spilling. This has been discussed a
few times on the list, the latest thread is:

https://lists.llvm.org/pipermail/llvm-dev/2020-December/147184.html

There it was pointed out that we have the LoopSink IR pass, but that works on
IR, lacks register pressure informatiom, and is focused on profile guided
optimisations, and then we have MachineLICM and MachineSink that both perform
sinking. MachineLICM is more about hoisting and CSE'ing of hoisted
instructions. It also contained a very incomplete and disabled-by-default
SinkIntoLoop feature, which we now move to MachineSink.

Getting loop-sinking to do something useful is going to be at least a 3-step
approach:

1) This is just moving the code and is almost a NFC, but contains a bug fix.
This uses helper function `isLoopInvariant` that was factored out in D94082 and
added to MachineLoop.
2) A first functional change to make loop-sink a little bit less restrictive,
which it really is at the moment, is the change in D94308. This lets it do
more (alias) analysis using functions in MachineSink, making it a bit more
powerful. Nothing changes much: still off by default. But it shows that
MachineSink is a better home for this, and it starts using its functionality
like `hasStoreBetween`, and in the next step we can use `isProfitableToSinkTo`.
3) This is the going to be he interesting step: decision making when and how
many instructions to sink. This will be driven by the register pressure, and
deciding if reducing live-ranges and loop sinking will help in better
performance.
4) Once we are happy with 3), this should be enabled by default, that should be
the end goal of this exercise.

Differential Revision: https://reviews.llvm.org/D93694

[AArch64] Add vector saturating add intrinsic costs

This adds sadd.sat, uadd.sat, ssub.sat and usub.sat costs for AArch64,
similar to how they were recently added for ARM.

Differential Revision: https://reviews.llvm.org/D95292

[RISCV] Fix a codegen crash in getSetCCResultType

This patch fixes some crashes coming from
`RISCVISelLowering::getSetCCResultType`, which would occasionally return
an EVT constructed from an invalid MVT, which has a null Type pointer.

The attached test shows this happening currently for some fixed-length
vectors, which hit this issue when the V extension was enabled, even
though they're not legal types under the V extension. The fix was also
pre-emptively extended to scalable vectors which can't be represented as
an MVT, even though a test case couldn't be found for them.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D95434

[flang][driver] Report prescanning diags during syntax-only parsing

Ensure diagnostics from the prescanner are reported when running `flang-new -fsyntax-only` (i.e. only syntax parsing).
This keeps the diagnostics output of flang-new consistent with `f18 -fparse-only` when running the syntax parsing action, ParseSyntaxOnlyAction.

Summary of changes:
- Modify ParseSyntaxOnlyAction::ExecuteAction to report diagnostics

Differential Revision: https://reviews.llvm.org/D95220

Fix "not all control paths return a value" warning. NFCI.

[AMDGPU] Write "GFX6-GFX9" instead of "GFX6-9" in docs

... and similarly for some other cases. This is for consistency and to
make it easier to search for mentions of a particular architecture.

Differential Revision: https://reviews.llvm.org/D95453

[ARM] Add neon FP16 scalar_to_vector patterns.

This adds some simple fp16 scalar_to_vector patterns, preventing a
selection failure if this came up.

Differential Revision: https://reviews.llvm.org/D95427

[Test][AArch64] Use named vregs in overflow legalization tests. NFC

[AArch64][GlobalISel] Make G_SADDE and G_SSUBE legal

This makes G_SADDE and G_SSUBE legal in preparation for further work
legalizing overflowing operations. It's fine that they don't have an
instruction selector implementation yet, because G_UADDE and G_USUBE are
already legal on AArch64 without an instruction selector implementation. This
completes the set of G_[SU]{ADD,SUB}[EO] operations on AArch64.

Reviewed By: paquette

Differential Revision: https://reviews.llvm.org/D95325

[clang] Fix signedness in vector bitcast evaluation

The included test case triggered a sign assertion on the result in
`Success()`.  This was caused by the APSInt created for a bitcast
having its signedness bit inverted.  The second APSInt constructor
argument is `isUnsigned`, so invert the result of
`isSignedIntegerType`.

Relanding this patch after reverting.  The test case had to be updated
to be insensitive to 32/64-bit extractelement indices.

Differential Revision: https://reviews.llvm.org/D95135

[libc][NFC] Use a end of list marker for cpu feature detection.

Without this, the array can end up being an empty array leading to
compiler failures.

[OpenMP] libomp: fix build by clang-cl with vs2019

Problem reported by Joseph Shen <joseph.smeng@gmail.com>.
The patch changes *(&<atomic-var>) to (&<atomic-var>)->load().

Differential Revision: https://reviews.llvm.org/D95485

[clang][cli] Port LangOpts to marshalling system, pt.2

Port some miscellaneous language options to the marshalling system for oautomatic command line parsing and generation.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D95347

[mlir] Extend semantic of OffsetSizeAndStrideOpInterface.

OffsetSizeAndStrideOpInterface now have the ability to specify only a leading subset of
offset, sizes, strides operands/attributes.
The size of that leading subset must be limited by the corresponding entry in `getArrayAttrMaxRanks` to avoid overflows.
Missing trailing dimensions are assumed to span the whole range (i.e. [0 .. dim)).
This brings more natural semantics to slice-like op on top of subview and is a simplifies to removing all uses of SliceOp in dependent projects.

Differential revision: https://reviews.llvm.org/D95441

Fix an error about implicit fallthrough during self build - new tag for ittapi.

A fix has been implemented in the ittap repo to fix an error about implicit fallthrough in a switch that was occurring during self build.
A new tag has been created for that fix. This is to update the tag.

Reviewed By: bader

Differential Revision: https://reviews.llvm.org/D95462

Patch by Zahira Ammarguellat.

[clang-format] Avoid considering include directive as a template closer.

This fixes a bug [[ http://llvm.org/PR48891 | PR48891 ]] introduced in D93839 where:
```
#include <stdint.h>
namespace rep {}
```
got formatted as
```
#include <stdint.h>
namespace rep {
}
```

Reviewed By: MyDeveloperDay, leonardchan

Differential Revision: https://reviews.llvm.org/D95479

[clang][cli] Port LangOpts to marshalling system, pt.1

Port some miscellaneous language options to the marshalling system for oautomatic command line parsing and generation.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D95346

[mlir][Linalg] Add canonicalization for init_tensor -> subtensor op.

Differential Revision: https://reviews.llvm.org/D95305

[llvm-objdump] Use append_range (NFC)

[MemorySSA] Use ListSeparator (NFC)

[AMDGPU] Forward-declare TargetRegisterClass (NFC)

AMDGPUInstructionSelector.h needs TargetRegisterClass but relies on a
forward declaration of TargetRegisterClass in InstructionSelector.h.
This patch adds a forward declaration right in
AMDGPUInstructionSelector.h.

While we are at it, this patch removes the one in
InstructionSelector.h, where it is unnecessary.

[TableGen] Add isContradictoryImpl implementation to CheckCondCodeMatcher and CheckChild2CondCodeMatcher.

This enables better pattern factoring in the RISCV ISel table.

Bump the trunk major version to 13

and clear the release notes.

Frontend: Use early returns in CompilerInstance::clearOutputFiles, NFC

Use early returns in `CompilerInstance::clearOutputFiles` to clarify the
logic, and rename `ec` to `EC` as a drive-by.

No functionality change.

Rename clang/test/Frontend/output-{failures,paths}.c, NFC

A follow up patch will add a few success cases here; rename it to
`output-paths.c` instead of `output-failures.c`.

[gn build] Port bb9eb1982980

[OpenMP][NVPTX] Drop dependence on CUDA to build NVPTX `deviceRTLs`

With D94745, we no longer use CUDA SDK to compile `deviceRTLs`. Therefore,
many CMake code in the project is useless. This patch cleans up unnecessary code
and also drops the requirement to build NVPTX `deviceRTLs`. CUDA detection is
still being used however to determine whether we need to involve the tests. Auto
detection of compute capability is enabled by default and can be disabled by
setting CMake variable `LIBOMPTARGET_NVPTX_AUTODETECT_COMPUTE_CAPABILITY=OFF`.
If auto detection is enabled, and CUDA is also valid, it will only build the
bitcode library for the detected version; otherwise, all variants supported will
be generated. One drawback of this patch is, we now generate 96 variants of
bitcode library, and totally 1485 files to be built with a clean build on a
non-CUDA system. `LIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=""` can be used to
disable building NVPTX `deviceRTLs`.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D95466

[RISCV] Add rv64 run lines to rv32 MC layer tests for B extension

Remove common instructions from rv64 tests since they are now
covered by the rv64 run lines in the rv32 tests.

Add rv32-only* tests for a few cases that aren't common between
r32 and rv64.

Addresses review feedback from D95150.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D95272

Support for instrumenting only selected files or functions

This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.

Differential Revision: https://reviews.llvm.org/D94820

[libc++] Give `MoveOnly` all six comparison operators, not just == and <.

Split out of D93512.

[OpenMP] Modify OMP_ALLOCATOR environment variable

This patch sets the def-allocator-var ICV based on the environment variables
provided in OMP_ALLOCATOR. Previously, only allowed value for OMP_ALLOCATOR
was a predefined memory allocator. OpenMP 5.1 specification allows predefined
memory allocator, predefined mem space, or predefined mem space with traits in
OMP_ALLOCATOR. If an allocator can not be created using the provided environment
variables, the def-allocator-var is set to omp_default_mem_alloc.

Differential Revision: https://reviews.llvm.org/D94985

[libomptarget][cuda] Handle missing _v2 symbols gracefully

[libomptarget][cuda] Handle missing _v2 symbols gracefully

Follow on from D95367. Dlsym the _v2 symbols if present, otherwise use the
unsuffixed version. Builds a hashtable for the check, can revise for zero
heap allocations later if necessary.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95415

[gn build] fix get.py change

[gn build] restore build command removed in 9595a7ff55b6 for platforms without prebuilts

Disable rosegment for old Android versions.

The unwinder used by the crash handler on versions of Android prior to
API 29 did not correctly handle binaries built with rosegment, which is
enabled by default for LLD. Android only supports LLD, so it's not an
issue that this flag is not accepted by other linkers.

Reviewed By: srhines

Differential Revision: https://reviews.llvm.org/D95166

llvm-lib: Pull error printing code out of two functions

Slightly changes the output in error code, but no behavior change in
normal use. This is for preparation for using these two functions
elsewhere.

[libomptarget][NFC] Avoid gcc 5/6 issue with lambda captures.

Differential Revision: https://reviews.llvm.org/D95486

Frontend: Fix layering between create{,Default}OutputFile, NFC

Fix layering between `CompilerInstance::createDefaultOutputFile` and the
two versions of `createOutputFile`.

- Add missing configuration flags to `createDefaultOutputFile` so that
  GeneratePCHAction and GenerateModuleFromModuleMapAction can use it.
  They previously promised that temporary files were turned on; now
  `createDefaultOutputFile` handles that logic.
- Lift the logic handling `InFile` and `Extension` to
  `createDefaultOutputFile`, since it's only the callers of that
  function that are using it.
- Rename the deeper of the two `createOutputFile`s to
  `createOutputFileImpl` and make it private to `CompilerInstance` (to
  prove that no one else is using it).
- Sink the logic for adding to `CompilerInstance::OutputFiles` down to
  `createOutputFileImpl`, allowing two "optional" (but always used)
  `std::string*` out parameters to be removed.
- Instead of passing a `std::error_code` out parameter into
  `createOutputFileImpl`, have it return `Expected<>`.
- As a drive-by, inline `CompilerInstance::addOutputFile` into its only
  caller, `createOutputFileImpl`.

Clean layering makes it easier for a future commit to extract
`createOutputFileImpl` out of `CompilerInstance`.

Differential Revision: https://reviews.llvm.org/D93248

[llc] Add reportError helper and canonicalize error messages

Frontend: Simplify handling of non-seeking streams in CompilerInstance, NFC

Add a new `raw_pwrite_ostream` variant, `buffer_unique_ostream`, which
is like `buffer_ostream` but with unique ownership of the stream it's
wrapping. Use this in CompilerInstance to simplify the ownership of
non-seeking output streams, avoiding logic sprawled around to deal with
them specially.

This also simplifies future work to encapsulate output files in a
different class.

Differential Revision: https://reviews.llvm.org/D93260

[GlobalISel] Implement computeKnownBits for G_SEXT_INREG

Just use the existing `Known.sextInReg` implementation.

- Update KnownBitsTest.cpp.
- Update combine-redundant-and.mir for a more concrete example.

Differential Revision: https://reviews.llvm.org/D95484

Salvage debug info for function arguments in coro-split funclets.

This patch improves the availability for variables stored in the
coroutine frame by emitting an alloca to hold the pointer to the frame
object and rewriting dbg.declare intrinsics to point inside the frame
object using salvaged DIExpressions. Finally, a new alloca is created
in the funclet to hold the FramePtr pointer to ensure that it is
available throughout the entire function at -O0.

This path also effectively reverts D90772. The testcase updates
highlight nicely how every removed CHECK for a dbg.value is preceded
by a new CHECK for a dbg.declare.

Thanks to JunMa, Yifeng, and Bruno for their thoughtful reviews!

Differential Revision: https://reviews.llvm.org/D93497

rdar://71866936

Frontend: Fix memory leak in CompilerInstance::setVerboseOutputStream

Found this memory leak in `CompilerInstance::setVerboseOutputStream` by
inspection; it looks like this wasn't previously exercised, since it was
never called twice.

Differential Revision: https://reviews.llvm.org/D93249

[ARM] Fix STRT/STRHT/STRBT input/output operands.

STRT, STRHT, and STRBT are store instructions and their source register
$Rt should be treated as an input operand instead of an output operand.
This should fix things (e.g., liveness tracking in LivePhysRegs) if
these instructions were used in CodeGen.

Differential Revision: https://reviews.llvm.org/D95074

[NewPM] Add ExtraVectorizerPasses support

As it looks like NewPM generally is using SimpleLoopUnswitch
instead of LoopUnswitch, this patch also use SimpleLoopUnswitch
in the ExtraVectorizerPasses sequence (compared with LegacyPM
which use the LoopUnswitch pass).

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D95457

[libomptarget][NFC] Use portable printf format specifiers.

Differential Revision: https://reviews.llvm.org/D95476

[InstCombine] Preserve FMF for powi simplifications.

Differential Revision: https://reviews.llvm.org/D95455

[NFC] Show instcombine powi simplifications drop FMF

Differential Revision: https://reviews.llvm.org/D95454

[X86] In shrinkAndImmediate, place the new constant into the topological sort.

Revert the change to use APInt::isSignedIntN from
5ff5cf8e057782e3e648ecf5ccf1d9990b53ee90.

Its clear that the games we were playing to avoid the topological
sort aren't working. So just fix it once and for all.

Fixes PR48888.

[NFC][lit] Cleanup code using string interpolation

LLVM now requires Python 3.6, so we can use string interpolation to make
code more readable.

[GlobalISel][IRTranslator] Ignore the llvm.experimental.noalias.scope.decl intrinsic.

These don't generate any code.

[OpenMP][Libomptarget] Fix cmake error on remote plugin

Requiring 3.15 causes a build breakage, I'm sure none of the contents actually require
3.15 or above.

Differential Revision: https://reviews.llvm.org/D95474

[gn build] Port 1e634f3952aa

[llvm-elfabi] Fix test after D95140

[libomptarget][cuda] Gracefully handle missing cuda library

[libomptarget][cuda] Gracefully handle missing cuda library

If using dynamic cuda, and it failed to load, it is not safe to call
cuGetErrorString.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95412

[libomptarget][cuda] Only run tests when sure there is cuda available

[libomptarget][cuda] Only run tests when sure there is cuda available

Prior to D95155, building the cuda plugin implied cuda was installed locally.
With that change, every machine can build a cuda plugin, but they won't all have
cuda and/or an nvptx card installed locally.

This change enables the nvptx tests when either:
- libcuda is present
- the user has forced use of the dlopen stub

The default case when there is no cuda detected will no longer attempt to
run the tests on nvptx hardware, as was the case before D95155.

Reviewed By: jdoerfert, ronlieb

Differential Revision: https://reviews.llvm.org/D95467

[OpenMP][Libomptarget] Introduce Remote Offloading Plugin

This introduces a remote offloading plugin for libomptarget. This
implementation relies on gRPC and protobuf, so this library will only
build if both libraries are available on the system. The corresponding
server is compiled to `openmp-offloading-server`.

This is a large change, but the only way to split this up is into RTL/server
but I fear that could introduce an inconsistency amongst them.

Ideally, tests for this should be added to the current ones that but that is
problematic for at least one reason. Given that libomptarget registers plugin
on a first-come-first-serve basis, if we wanted to offload onto a local x86
through a different process, then we'd have to either re-order the plugin list
in `rtl.cpp` (which is what I did locally for testing) or find a better
solution for runtime plugin registration in libomptarget.

Differential Revision: https://reviews.llvm.org/D95314

[llvm-elfabi] Support ELF file that lacks .gnu.hash section

Before this change, when reading ELF file, elfabi determines number of
entries in .dynsym by reading the .gnu.hash section. This change makes
elfabi read section headers directly first. This change allows elfabi
works on ELF files which do not have .gnu.hash sections.

Differential Revision: https://reviews.llvm.org/D93362

[libc++] Fix oss-fuzz build

Add -fbinutils-version= to gate ELF features on the specified binutils version

There are two use cases.

Assembler
We have accrued some code gated on MCAsmInfo::useIntegratedAssembler(). Some
features are supported by latest GNU as, but we have to use
MCAsmInfo::useIntegratedAs() because the newer versions have not been widely
adopted (e.g. SHF_LINK_ORDER 'o' and 'unique' linkage in 2.35, --compress-debug-sections= in 2.26).

Linker
We want to use features supported only by LLD or very new GNU ld, or don't want
to work around older GNU ld. We currently can't represent that "we don't care
about old GNU ld". You can find such workarounds in a few other places, e.g.
Mips/MipsAsmprinter.cpp PowerPC/PPCTOCRegDeps.cpp X86/X86MCInstrLower.cpp
AArch64 TLS workaround for R_AARCH64_TLSLD_MOVW_DTPREL_* (PR ld/18276),
R_AARCH64_TLSLE_LDST8_TPREL_LO12 (https://bugs.llvm.org/show_bug.cgi?id=36727 https://sourceware.org/bugzilla/show_bug.cgi?id=22969)

Mixed SHF_LINK_ORDER and non-SHF_LINK_ORDER components (supported by LLD in D84001;
GNU ld feature request https://sourceware.org/bugzilla/show_bug.cgi?id=16833 may take a while before available).
This feature allows to garbage collect some unused sections (e.g. fragmented .gcc_except_table).

This patch adds `-fbinutils-version=` to clang and `-binutils-version` to llc.
It changes one codegen place in SHF_MERGE to demonstrate its usage.
`-fbinutils-version=2.35` means the produced object file does not care about GNU
ld<2.35 compatibility. When `-fno-integrated-as` is specified, the produced
assembly can be consumed by GNU as>=2.35, but older versions may not work.

`-fbinutils-version=none` means that we can use all ELF features, regardless of
GNU as/ld support.

Both clang and llc need `parseBinutilsVersion`. Such command line parsing is
usually implemented in `llvm/lib/CodeGen/CommandFlags.cpp` (LLVMCodeGen),
however, ClangCodeGen does not depend on LLVMCodeGen. So I add
`parseBinutilsVersion` to `llvm/lib/Target/TargetMachine.cpp` (LLVMTarget).

Differential Revision: https://reviews.llvm.org/D85474

Revert "Support for instrumenting only selected files or functions"

This reverts commit 4edf35f11a9e20bd5df3cb47283715f0ff38b751 because
the test fails on Windows bots.

Make SBDebugger::CreateTargetWithFileAndArch work with lldb::LLDB_DEFAULT_ARCH

Second try, handling both a bogus arch string and the "null file & arch" used
to create an empty but valid target.
Also check in that case before logging (previously the logging would have
crashed.)

[flang][openacc][NFC] Organize clause validity tests by directive

Split the tests from acc-clause-validity.f90 in dedicated files by directives.
The file acc-clause-validity.f90 was getting too big to be correctly maintained.
Tests are identical.

Reviewed By: SouraVX

Differential Revision: https://reviews.llvm.org/D95328

CGDebugInfo CreatedLimitedType: Drop file/line for RecordType with invalid location

For Clang synthesized `__va_list_tag` (`CreateX86_64ABIBuiltinVaListDecl`),
its DW_AT_decl_file/DW_AT_decl_line are arbitrarily set from `CurLoc`.

In a stage 2 `-DCMAKE_BUILD_TYPE=Debug` clang build, I observe that
in driver.cpp, DW_AT_decl_file/DW_AT_decl_line may be set to an `#include` line
(the transitively included file uses va_arg (`__builtin_va_arg`)).
This seems arbitrary. Drop that.

Reviewed By: #debug-info, dblaikie

Differential Revision: https://reviews.llvm.org/D94735

CGDebugInfo: Drop Loc.isInvalid() special case from getLineNumber

`getLineNumber()` picks CurLoc if the parameter is invalid. This appears to
mainly work around missing SourceLocation information for some constructs, but
sometimes adds unintended locations.

* For `CodeGenObjC/debug-info-blocks.m`, `CurLoc` has been advanced to the closing brace. The debug line of `ImplicitVarParameter` is set to the line of `}` because this implicit parameter has an invalid `SourceLocation`. The debug line is a bit arbitrary - perhaps the location of `^{` is better.
* The file/line of Clang synthesized `__va_list_tag` is arbitrarily attached a `#include` line. D94735

Drop the special case to make getLineNumber less magic and add CurLoc fallback in its callers instead.

Tested with stage 2 -DCMAKE_BUILD_TYPE=Debug clang, byte identical.

Reviewed By: #debug-info, aprantl

Differential Revision: https://reviews.llvm.org/D94391

[AMDGPU] Update subtarget features for new target ID support

Support for XNACK and SRAMECC is not static on some GPUs. We must be able
to differentiate between different scenarios for these dynamic subtarget
features.

The possible settings are:

- Unsupported: The GPU has no support for XNACK/SRAMECC.
- Any: Preference is unspecified. Use conservative settings that can run anywhere.
- Off: Request support for XNACK/SRAMECC Off
- On: Request support for XNACK/SRAMECC On

GCNSubtarget will track the four options based on the following criteria. If
the subtarget does not support XNACK/SRAMECC we say the setting is
"Unsupported". If no subtarget features for XNACK/SRAMECC are requested we
must support "Any" mode. If the subtarget features XNACK/SRAMECC exist in the
feature string when initializing the subtarget, the settings are "On/Off".

The defaults are updated to be conservatively correct, meaning if no setting
for XNACK or SRAMECC is explicitly requested, defaults will be used which
generate code that can be run anywhere. This corresponds to the "Any" setting.

Differential Revision: https://reviews.llvm.org/D85882

[OpenMP][Libomptarget] Introduce changes to support remote plugin

In order to support remote execution, we need to be able to send the
target binary description to the remote host for registration (and
consequent deregistration). To support this, I added these two
optional new functions to the plugin API:
- `__tgt_rtl_register_lib`
- `__tgt_rtl_unregister_lib`

These functions will be called to properly manage the instance of
libomptarget running on the remote host.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D93293

[gn build] Port 4edf35f11a9e

Support for instrumenting only selected files or functions

This change implements support for applying profile instrumentation
only to selected files or functions. The implementation uses the
sanitizer special case list format to select which files and functions
to instrument, and relies on the new noprofile IR attribute to exclude
functions from instrumentation.

Differential Revision: https://reviews.llvm.org/D94820

[libomptarget][devicertl][amdgpu] Fix build, variable renaming error

[clangd] FindTarget resolves base specifier

FindTarget on the virtual keyword or access specifier of a base specifier will now resolve to type of the base specifier.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D95338

[clangd] Selection handles CXXBaseSpecifier

Selection now includes the virtual and access modifier as part of their range for cxx base specifiers.

Reviewed By: sammccall

Differential Revision: https://reviews.llvm.org/D95231

[ARM] [ELF] Fix ARMMaterializeGV for Indirect calls

Recent shouldAssumeDSOLocal changes (introduced by 961f31d8ad14c66)
do not take in consideration the relocation model anymore.  The ARM
fast-isel pass uses the function return to set whether a global symbol
is loaded indirectly or not, and without the expected information
llvm now generates an extra load for following code:

```
$ cat test.ll
@__asan_option_detect_stack_use_after_return = external global i32
define dso_local i32 @main(i32 %argc, i8** %argv) #0 {
entry:
  %0 = load i32, i32* @__asan_option_detect_stack_use_after_return,
align 4
  %1 = icmp ne i32 %0, 0
  br i1 %1, label %2, label %3

2:
  ret i32 0

3:
  ret i32 1
}

attributes #0 = { noinline optnone }

$ lcc test.ll -o -
[...]
main:
        .fnstart
[...]
        movw    r0, :lower16:__asan_option_detect_stack_use_after_return
        movt    r0, :upper16:__asan_option_detect_stack_use_after_return
        ldr     r0, [r0]
        ldr     r0, [r0]
        cmp     r0, #0
[...]
```

And without 'optnone' it produces:
```
[...]
main:
        .fnstart
[...]
        movw    r0, :lower16:__asan_option_detect_stack_use_after_return
        movt    r0, :upper16:__asan_option_detect_stack_use_after_return
        ldr     r0, [r0]
        clz     r0, r0
        lsr     r0, r0, #5
        bx      lr

[...]
```

This triggered a lot of invalid memory access in sanitizers for
arm-linux-gnueabihf.  I checked this patch both a stage1 built with
gcc and a stage2 bootstrap and it fixes all the Linux sanitizers
issues.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D95379

[RISCV] Have customLegalizeToWOp truncate to the original type instead of i32 now that we use it for i8/i16 as well.

239cfbccb0509da1a08d9e746706013b732e646b add support for legalizing
i8/i16 UDIV/UREM/SDIV to use *W instructions. So we need to truncate
to i8/i16 if we're legalizing one of those.

[mlir] sret and byval now require a type argument when constructed.

Fixes the LLVM code gen bugs and adds the missing tests.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D95378

Reland "[lit] Use os.cpu_count() to cleanup TODO"

The initial problem with the remaining bot config was resolved.

We can now use Python3. Let's use `os.cpu_count()` to cleanup this
helper.

Differential Revision: https://reviews.llvm.org/D94734

[lldb][NFC] Another attempt to fix GCC 5.x compilation

37510f69b4cb8d76064f108d57bebe95984a23ae tried to fix GCC 5.x compilation
by making the enum which is used as a unordered_map key unscoped. However it
seems that in GCC 5.x, enum keys are not supported *at all* in unordered_maps
(at least that's what some trial&error on godbolt tells me). This updates the
workaround to just use an int until GCC 5.x support is dropped.

[mlir] Set CUDA/ROCm context before creating resources.

The current context is thread-local state, and in preparation of GPU async execution (on multiple threads) we need to set the context before calling API that create resources.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D94495

AMDGPU: Fix redundant FP spilling/assert in some functions

If a function has stack objects, and a call, we require an FP. If we
did not initially have any stack objects, and only introduced them
during PrologEpilogInserter for CSR VGPR spills, SILowerSGPRSpills
would end up spilling the FP register as if it were a normal
register. This would result in an assert in a debug build, or
redundant handling of the FP register in a release build.

Try to predict that we will have an FP later, although this is ugly.

AMDGPU: Add assertion to determineCalleeSaves

Make sure this isn't getting called multiple times. I was surprised we
were modifying the function here, which I think is a bit questionable.

[OpenMP][deviceRTLs] Build the deviceRTLs with OpenMP instead of target dependent language

From this patch (plus some landed patches), `deviceRTLs` is taken as a regular OpenMP program with just `declare target` regions. In this way, ideally, `deviceRTLs` can be written in OpenMP directly. No CUDA, no HIP anymore. (Well, AMD is still working on getting it work. For now AMDGCN still uses original way to compile) However, some target specific functions are still required, but they're no longer written in target specific language. For example, CUDA parts have all refined by replacing CUDA intrinsic and builtins with LLVM/Clang/NVVM intrinsics.
Here're a list of changes in this patch.
1. For NVPTX, `DEVICE` is defined empty in order to make the common parts still work with AMDGCN. Later once AMDGCN is also available, we will completely remove `DEVICE` or probably some other macros.
2. Shared variable is implemented with OpenMP allocator, which is defined in `allocator.h`. Again, this feature is not available on AMDGCN, so two macros are redefined properly.
3. CUDA header `cuda.h` is dropped in the source code. In order to deal with code difference in various CUDA versions, we build one bitcode library for each supported CUDA version. For each CUDA version, the highest PTX version it supports will be used, just as what we currently use for CUDA compilation.
4. Correspondingly, compiler driver is also updated to support CUDA version encoded in the name of bitcode library. Now the bitcode library for NVPTX is named as `libomptarget-nvptx-cuda_[cuda_version]-sm_[sm_number].bc`, such as `libomptarget-nvptx-cuda_80-sm_20.bc`.

With this change, there are also multiple features to be expected in the near future:
1. CUDA will be completely dropped when compiling OpenMP. By the time, we also build bitcode libraries for all supported SM, multiplied by all supported CUDA version.
2. Atomic operations used in `deviceRTLs` can be replaced by `omp atomic` if OpenMP 5.1 feature is fully supported. For now, the IR generated is totally wrong.
3. Target specific parts will be wrapped into `declare variant` with `isa` selector if it can work properly. No target specific macro is needed anymore.
4. (Maybe more...)

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D94745