review.tizen.org Git - platform/upstream/llvm.git/log

[flang][driver] Fix reading from stdin when using `-test-io`

This patch adds logic in the InputOutputTestAction frontend action for
reading input from stdin. Without this patch the following fails:
```
flang-new -fc1 -test-io -
```

The implementation of `InputOutputTestAction` is cleaned-up and a test
for reading from stdin is added.

Note that there's a difference between `-test-io` and e.g. `-E` in terms
of file I/O. The frontend action for the former handles all file I/O on
it's own. Conversely, the action corresponding to -E relies on the
prescanner API to handle this.

Currently we can't test reading from stdin for `flang-new -`. In this
case `libclangDriver` assumes `-x -c`. This in turn leads to `flang-new
-cc1`, which is not supported.

[libc++] Remove the ability to use braced-init for filesystem paths

According to my reading of http://eel.is/c++draft/filesystems#fs.class.path,
the Standard doesn't actually mention that this should work. Since other
implementations don't allow it, allowing it in libc++ is just setting a
portability trap.

Supersedes https://reviews.llvm.org/D89865.

Differential Revision: https://reviews.llvm.org/D95975

[CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation

For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.

Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.

Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.

Differential Revision: https://reviews.llvm.org/D94110

[libc++] Make feature-test macros consistent with availability macros

Before this patch, feature-test macros didn't take special availability
markup into account, which means that feature-test macros can sometimes
appear to "lie". For example, if you compile in C++20 mode and target
macOS 10.13, the __cpp_lib_filesystem feature-test macro will be provided
even though the <filesystem> declarations are marked as unavailable.
This patch fixes that.

rdar://68142369

Differential Revision: https://reviews.llvm.org/D94983

[libc++] Fix libcxx build on 32bit architectures with 64bit time_t defaults e.g. riscv32

Patch by Khem Raj.

Differential Revision: https://reviews.llvm.org/D85095

[clang][Arm] Fix handling of -Wa,-march=

This fixes Bugzilla #48894 for Arm, where it
was reported that -Wa,-march was not being handled
by the integrated assembler.

This was previously fixed for -Wa,-mthumb by
parsing the argument in ToolChain::ComputeLLVMTriple
instead of CollectArgsForIntegratedAssembler.
It has to be done in the former because the Triple
is read only by the time we get to the latter.

Previously only mcpu would work via -Wa but only because
"-target-cpu" is it's own option to cc1, which we were
able to modify. Target architecture is part of "-target-triple".

This change applies the same workaround to -march and cleans up
handling of -Wa,-mcpu at the same time. There were some
places where we were not using the last instance of an argument.

The existing -Wa,-mthumb code was doing this correctly,
so I've just added tests to confirm that.

Now the same rules will apply to -Wa,-march/-mcpu as would
if you just passed them to the compiler:
* -Wa/-Xassembler options only apply to assembly files.
* Architecture derived from mcpu beats any march options.
* When there are multiple mcpu or multiple march, the last
  one wins.
* If there is a compiler option and an assembler option of
  the same type, we prefer the one that fits the input type.
* If there is an applicable mcpu option but it is overruled
  by an march, the cpu value is still used for the "-target-cpu"
  cc1 option.

Reviewed By: nickdesaulniers

Differential Revision: https://reviews.llvm.org/D95872

[flang][driver] Add support for `-J/-module-dir`

Add support for option -J/-module-dir in the new Flang driver. This
will allow for including module files in other directories, as the
default search path is currently the working folder. This also provides
an option of storing the output module in the specified folder.

Differential Revision: https://reviews.llvm.org/D95448

[Hexagon] Add -mv68 option to driver

[libc++] Adds a make_string test helper function.

These function makes it easier to write generic unit tests for the
format header. It solves the issue where it's not possible to use
`templated_prefix"foo"`
where `templated_prefix` resolves to: nothing, `L`, `u8`, `u`,
or `U`. The templated_prefix would be more faster during execution.

Reviewed By: ldionne, #libc, curdeius

Differential Revision: https://reviews.llvm.org/D93414

[Hexagon] Add clang builtin definitions for Hexagon V68

[InstCombine] add tests for demanded/known bits of shifted constant; NFC

These are variations of a missed analysis noted in:
https://llvm.org/PR48984

[AVR] Fix up a few accidentally-regressed Generic CodeGen tests recently broken

In 85e8e6246e0fcc62ba727e8fb5990f1a632125d0, these tests were modified
to work with AVR, but the regex matchers were finicky and required a
fix forward patch, being this.

[libc++] Rename include/support to include/__support

We do ship those headers, so the directory name should not be something
that can potentially conflict with user-defined directories.

Differential Revision: https://reviews.llvm.org/D95956

[X86] Use VT::changeVectorElementType helper where possible. NFCI.

[AVR] Add 'XFAIL' to the remaining failing Generic CodeGen tests for AVR

This patch adds 'XFAIL: avr' to 2 Generic CodeGen tests, bringing the
Generic CodeGen tests for AVR to a pass, with only two XFAILures.

After this patch, the Generic CodeGen tests pass on AVR.

[AVR] Fix 14 Generic CodeGen tests by making address space explicit or optional

This fixes the vast majority of remaining failing AVR Generic CodeGen
tests.

[Dexter] Avoid infinite loop in dbgeng driver

This method of the dbgeng debugger driver used to just "pass", as setting
dbgeng free running still leads to numerous errors. This wasn't a problem
in the past because, as it turns out, nothing called the go method.
However, a recent refactor uses it.

Rather than launch dbgeng free running, instead have it single step one
step forwards. This is slow, but it makes progress, where previously we
weren't.

Differential Revision: https://reviews.llvm.org/D91737

[flang][driver] Add PrescanAction frontend action (nfc)

This new action encapsulates all actions that require the prescanner to
be run before proceeding with other processing. By adding this new
action, we are better equipped to control which actions _do_ run the
prescanner and which _do not_.

The following actions that require the prescanner are refactored to
inherit from `PrescanAction`:
  * `PrintPreprocessedAction`
  * `ParseSyntaxOnlyAction` .

New virtual method is introduced to facilitate all this:
  * `BeginSourceFileAction`
Like in Clang, this method is run inside `BeginSourceFile`. In other
words, it is invoked before `ExecuteAction` for the corresponding
frontend action is run. This method allows us to:
  * carry out any processing that is always required by the action (e.g.
    run the prescanner)
  * fine tune the settings/options on a file-by-file basis (e.g. to
    decide between fixed-form and free-form based on file extension)

This patch implements non-functional-changes.

Reviewed By: FarisRehman

Differential Revision: https://reviews.llvm.org/D95464

NFC: Migrate LoopUnrollPass to work on InstructionCost

This patch migrates cost values and arithmetic to work on InstructionCost.
When the interfaces to TargetTransformInfo are changed, any InstructionCost
state will propagate naturally.

See this patch for the introduction of the type: https://reviews.llvm.org/D91174
See this thread for context: http://lists.llvm.org/pipermail/llvm-dev/2020-November/146408.html

Reviewed By: david-arm, fhahn

Differential Revision: https://reviews.llvm.org/D95817

[OpenCL][Docs] Link page explaining tooling for offline compilation.

Tags: #clang

Differential Revision: https://reviews.llvm.org/D95886

[ConstraintElimination] Support conditions from loop preheaders

This patch extends the condition collection logic to allow adding
conditions from pre-headers to loop headers, by allowing cases where the
target block dominates some of its predecessors.

[OpenCL] Fix default address space in template argument deduction.

When deducing a reference type for forwarding references prevent
adding default address space of a template argument if it is given.

This got reported in PR48896 because in OpenCL all parameters are
in private address space and therefore when we initialize a
forwarding reference with a parameter we should just inherit the
address space from it i.e. keep __private instead of __generic.

Tags: #clang

Differential Revision: https://reviews.llvm.org/D95624

[mlir] Return scf.parallel ops resulted from tiling.

Differential Revision: https://reviews.llvm.org/D96024

AMDGPU: Add support for amdgpu-unsafe-fp-atomics attribute

If amdgpu-unsafe-fp-atomics is specified, allow {flat|global}_atomic_add_f32 even if atomic modes don't match.

Differential Revision: https://reviews.llvm.org/D95391

[AVR] Remove an assertion that causes generic CodeGen tests to fail

It was discussed a few years ago and agreed that it makes sense to
remove this assertion as other targets do not perform similar register
size checking in inline assembly constraint logic, so the check just
adds a needless barrier on AVR.

This patch removes the assertion and removes 'XFAIL' from two Generic
CodeGen tests for AVR as a result.

[clang] Add AddClang.cmake to the list of the CMake modules that are installed

This makes sure that AddClang.cmake is installed alongside other Clang
CMake modules. This mirrors LLVM and MLIR in this respect and is
required when building the new Flang driver out of tree (as it depends
on Clang and includes AddClang.cmake).

Reviewed By: bogner

Differential Revision: https://reviews.llvm.org/D94533

[flang][driver] Add forced form flags and -ffixed-line-length

Add support for the following layout options:
* -ffree-form
* -ffixed-form
- -ffixed-line-length=n (alias -ffixed-line-length-n)
Additionally remove options `-fno-free-form` and `-fno-fixed-form` as they were initially added to forward to gfortran but gfortran does not support these flags.

This patch adds the flag FlangOnlyOption to the existing options `-ffixed-form`, `-ffree-form` and `-ffree-line-length-` in Options.td. As of commit 6a75496836ea14bcfd2f4b59d35a1cad4ac58cee, these flags are not currently forwarded to gfortran anyway.

The default fixed line length in FrontendOptions is 72, based off the current default in Fortran::parser::Options. The line length cannot be set to a negative integer, or a positive integer less than 7 excluding 0, consistent with the behaviour of gfortran.

This patch does not add `-ffree-line-length-n` as Fortran::parser::Options does not have a variable for free form columns.
Whilst the `fixedFormColumns` variable is used in f18 for `-ffree-line-length-n`, f18 only allows `-ffree-line-length-none`/`-ffree-line-length-0` and not a user-specified value. `fixedFormcolumns` cannot be used in the new driver as it is ignored in the frontend when dealing with free form files.

Summary of changes:
- Remove -fno-fixed-form and -fno-free-form from Options.td
- Make -ffixed-form, -ffree-form and -ffree-line-length-n FlangOnlyOption in Options.td
- Create AddFortranDialectOptions method in Flang.cpp
- Create FortranForm enum in FrontendOptions.h
- Add fortranForm_ and fixedFormColumns_ to Fortran::frontend::FrontendOptions
- Update fixed-form-test.f so that it guarantees that it fails when forced as a free form file to better facilitate testing.

Differential Revision: https://reviews.llvm.org/D95460

[X86] Remove stale TODO comment. NFC.

We now handle implicit zero-extension shuffle mask cases.

Revert "[hip][cuda] Enable extended lambda support on Windows."

This reverts commit a2fdf9d4d734732a6fa9288f1ffdf12bf8618123.
Slightly speculative, seeing several cuda tests fail on this
Windows bot: http://45.33.8.238/win/32620/step_7.txt

[gn build] (manually) port 0609f257dc2e2c3

[ElementCount] NFC: Set 'const' qualifier for getWithIncrement/Decrement.

These class methods simply return a new UnivariateLinearPolyBase
(e.g. ElementCount), and do not modify the object in any way or form,
so qualify for being 'const'.

[mlir][Linalg] Drop SliceOp

This op is subsumed by rank-reducing SubViewOp and has become useless.

Differential revision: https://reviews.llvm.org/D95317

Re-land D94976 after revert in e29552c5aff6

This modified patch avoids redirecting the unit in which a subprogram is
created if type units are enabled -- DIEs were getting children allocated
from different units memory pools. Original commit message:

[DWARF] Create subprogram's DIE in DISubprogram's unit

This is a fix for PR48790. Over in D70350, subprogram DIEs were permitted
to be shared between CUs. However, the creation of a subprogram DIE can be
triggered early, from other CUs. The subprogram definition is then created
in one CU, and when the function is actually emitted children are attached
to the subprogram that expect to be in another CU. This breaks internal CU
references in the children.

Fix this by redirecting the creation of subprogram DIEs in
getOrCreateContextDIE to the CU specified by it's DISubprogram definition.
This ensures that the subprogram DIE is always created in the correct CU.

Differential Revision: https://reviews.llvm.org/D94976

[ARM] Handle f16 in GeneratePerfectShuffle

This new f16 shuffle under Neon would hit an assert in
GeneratePerfectShuffle as it would try to treat a f16 vector as an i8.
Add f16 handling, treating them like an i16.

Differential Revision: https://reviews.llvm.org/D95446

[mlir] make vector to llvm conversion truly partial

Historically, the Vector to LLVM dialect conversion subsumed the Standard to
LLVM dialect conversion patterns. This was necessary because the conversion
infrastructure did not have sufficient support for reconciling type
conversions. This support is now available. Only keep the patterns related to
the Vector dialect in the Vector to LLVM conversion and require type casts
operations to be inserted if necessary. These casts will be removed by
following conversions if possible. Update integration tests to also run the
Standard to LLVM conversion.

There is a significant amount of test churn, which is due to (a) unnecessarily
strict tests in VectorToLLVM and (b) many patterns actually targeting Standard
dialect ops instead of LLVM dialect ops leading to tests actually exercising a
Vector->Standard->LLVM conversion. This churn is a good illustration of the
reason to make the conversion partial: now the tests only check the code in the
Vector to LLVM conversion and will not be randomly broken by changes in
Standard to LLVM conversion.

Arguably, it may be possible to extract Vector to Standard patterns into a
separate pass, but given the ongoing splitting of the Standard dialect, such
pass will be short-lived and will require further refactoring.

Depends On D95626

Reviewed By: nicolasvasilache, aartbik

Differential Revision: https://reviews.llvm.org/D95685

[lldb] Make TestLocalVariables.py compatible with the new pass manager

The new PM is more aggressive at inlining, which breaks assumptions in
the test => slap some __attribute__((noinlines)) to prevent that.

[mlir] Apply source materialization in case of transitive conversion

In dialect conversion infrastructure, source materialization applies as part of
the finalization procedure to results of the newly produced operations that
replace previously existing values with values having a different type.
However, such operations may be created to replace operations created in other
patterns. At this point, it is possible that the results of the _original_
operation are still in use and have mismatching types, but the results of the
_intermediate_ operation that performed the type change are not in use leading
to the absence of source materialization. For example,

  %0 = dialect.produce : !dialect.A
  dialect.use %0 : !dialect.A

can be replaced with

  %0 = dialect.other : !dialect.A
  %1 = dialect.produce : !dialect.A  // replaced, scheduled for removal
  dialect.use %1 : !dialect.A

and then with

  %0 = dialect.final : !dialect.B
  %1 = dialect.other : !dialect.A    // replaced, scheduled for removal
  %2 = dialect.produce : !dialect.A  // replaced, scheduled for removal
  dialect.use %2 : !dialect.A

in the same rewriting, but only the %1->%0 replacement is currently considered.

Change the logic in dialect conversion to look up all values that were replaced
by the given value and performing source materialization if any of those values
is still in use with mismatching types. This is performed by computing the
inverse value replacement mapping. This arguably expensive manipulation is
performed only if there were some type-changing replacements. An alternative
could be to consider all replaced operations and not only those that resulted
in type changes, but it would harm pattern-level composability: the pattern
that performed the non-type-changing replacement would have to be made aware of
the type converter in order to call the materialization hook.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D95626

[clang-cl] Remove the /fallback option

As discussed in
https://lists.llvm.org/pipermail/cfe-dev/2021-January/067524.html

It doesn't appear to be used, isn't really maintained, and adds some
complexity to the code. Let's remove it.

Differential revision: https://reviews.llvm.org/D95876

[clang][cli] Command line round-trip for HeaderSearch options

This patch implements generation of remaining header search arguments.
It's done manually in C++ as opposed to TableGen, because we need the flexibility and don't anticipate reuse.

This patch also tests the generation of header search options via a round-trip. This way, the code gets exercised whenever Clang is built and tested in asserts mode. All `check-clang` tests pass.

Reviewed By: dexonsmith

Differential Revision: https://reviews.llvm.org/D94472

[Support] Indent multi-line descr of enum cli options.

As noted in https://reviews.llvm.org/D93459, the formatting of
multi-line descriptions of clEnumValN and the likes is unfavorable.
Thus this patch adds support for correctly indenting these.

Reviewed By: serge-sans-paille

Differential Revision: https://reviews.llvm.org/D93494

[AMDGPU] Save all lanes for reserved VGPRs

When SGPRs are spilled to VGPRs, they can overwrite any lane. We need
to preserve the value of inactive lanes in function calls, so we save
the register even if it is marked as caller saved.

Also, teach buildPrologSpill to work when no registers are free like in
CodeGen/AMDGPU/pei-scavenge-vgpr-spill.mir and update the comment on
findScratchNonCalleeSaveRegister as it is not used anymore to realign
the stack pointer since D95865.

Differential Revision: https://reviews.llvm.org/D95946

[clangd] Detect rename conflicits within enclosing scope

This patch allows detecting conflicts with variables defined in the current
CompoundStmt or If/While/For variable init statements.

Reviewed By: hokein

Differential Revision: https://reviews.llvm.org/D95925

[Syntax] Support condition for IfStmt.

Differential Revision: https://reviews.llvm.org/D95782

[mlir][Linalg] Generalize the definition of a Linalg contraction.

This revision defines a Linalg contraction in general terms:

  1. Has 2 input and 1 output shapes.
  2. Has at least one reduction dimension.
  3. Has only projected permutation indexing maps.
  4. its body computes `u5(u1(c) + u2(u3(a) * u4(b)))` on some field
    (AddOpType, MulOpType), where u1, u2, u3, u4 and u5 represent scalar unary
    operations that may change the type (e.g. for mixed-precision).

As a consequence, when vectorization of such an op occurs, the only special
behavior is that the (unique) MulOpType is vectorized into a
`vector.contract`. All other ops are handled in a generic fashion.

In the future, we may wish to allow more input arguments and elementwise and
constant operations that do not involve the reduction dimension(s).

A test is added to demonstrate the proper vectorization of matmul_i8_i8_i32.

Differential revision: https://reviews.llvm.org/D95939

Give this test a target triple.

Fix miscompile when performing template instantiation of non-dependent
doubly-nested implicit CXXConstructExprs.

Ensure that we transform the parameter initializer using
TransformInitializer rather than TransformExpr so that we properly strip
down and rebuild the initialization, including any necessary
CXXBindTemporaryExprs. Otherwise we can end up forgetting to destroy
temporary objects used to construct a constructor parameter.

[mlir][Linalg] NFC - Extract a standalone LinalgInterfaces

This separation improves the layering and paves the way for more interfaces coming up in the future.

Differential revision: https://reviews.llvm.org/D95941

[hip][cuda] Enable extended lambda support on Windows.

- On Windows, extended lambda has extra issues due to the numbering
  schemes are different between the host compilation (Microsoft C++ ABI)
  and the device compilation (Itanium C++ ABI. Additional device side
  lambda number is required per lambda for the host compilation to
  correctly mangle the device-side lambda name.
- A hybrid numbering context `MSHIPNumberingContext` is introduced to
  number a lambda for both host- and device-compilations.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D69322

[CSSPGO][llvm-profgen] Compress recursive cycles in calling context

This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]

Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.

Differential Revision: https://reviews.llvm.org/D93556

Revert "[CSSPGO][llvm-profgen] Compress recursive cycles in calling context"

This reverts commit 0609f257dc2e2c3e4c7cd30fe2ffd520117e706b.

Revert "[CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation"

This reverts commit 1714ad2336293f351b15dd4b518f9e8618ec38f2.

[ASTReader] Always rebuild a cached module that has errors

A module in the cache with an error should just be a cache miss. If
allowing errors (with -fallow-pcm-with-compiler-errors), a rebuild is
needed so that the appropriate diagnostics are output and in case search
paths have changed. If not allowing errors, the module was built
*allowing* errors and thus should be rebuilt regardless.

Reviewed By: akyrtzi

Differential Revision: https://reviews.llvm.org/D95989

[NFC] Fix the noprofile attribute comment

[lldb] Convert more assertTrue to assertEqual (NFC)

Follow up to D95813, this converts multiline assertTrue to assertEqual.

Differential Revision: https://reviews.llvm.org/D95899

[NFC][Coroutine] Remove redundant comment

The functionallity in the TODO was added before:
https://reviews.llvm.org/rGb3a722e66b75328ab5e2eb5c8572022cb083855b

[Transforms/IPO] Use range-based for loops (NFC)

[TableGen] Use ListSeparator (NFC)

[Support] Drop unnecessary const from return types (NFC)

Identified with const-return-type.

Fix the guaranteed alignment of memory returned by malloc/new on Darwin

The guaranteed alignment is 16 bytes on Darwin.

rdar://73431623

Differential Revision: https://reviews.llvm.org/D95910

[test] Pin spir-codegen.ll to legacy PM

-polly-enable-delicm is not supported under the new PM but is tested here:
Assertion `!EnableDeLICM && "This option is not implemented"' failed.

[CSSPGO][llvm-profgen] Aggregate samples on call frame trie to speed up profile generation

For CS profile generation, the process of call stack unwinding is time-consuming since for each LBR entry we need linear time to generate the context( hash, compression, string concatenation). This change speeds up this by grouping all the call frame within one LBR sample into a trie and aggregating the result(sample counter) on it, deferring the context compression and string generation to the end of unwinding.

Specifically, it uses `StackLeaf` as the top frame on the stack and manipulates(pop or push a trie node) it dynamically during virtual unwinding so that the raw sample can just be recoded on the leaf node, the path(root to leaf) will represent its calling context. In the end, it traverses the trie and generates the context on the fly.

Results:
Our internal branch shows about 5X speed-up on some large workloads in SPEC06 benchmark.

Differential Revision: https://reviews.llvm.org/D94110

[CSSPGO][llvm-profgen] Compress recursive cycles in calling context

This change compresses the context string by removing cycles due to recursive function for CS profile generation. Removing recursion cycles is a way to normalize the calling context which will be better for the sample aggregation and also make the context promoting deterministic.
Specifically for implementation, we recognize adjacent repeated frames as cycles and deduplicated them through multiple round of iteration.
For example:
Considering a input context string stack:
[“a”, “a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For first iteration,, it removed all adjacent repeated frames of size 1:
[“a”, “b”, “c”, “a”, “b”, “c”, “b”, “c”, “d”]
For second iteration, it removed all adjacent repeated frames of size 2:
[“a”, “b”, “c”, “a”, “b”, “c”, “d”]
So in the end, we get compressed output:
[“a”, “b”, “c”, “d”]

Compression will be called in two place: one for sample's context key right after unwinding, one is for the eventual context string id in the ProfileGenerator.
Added a switch `compress-recursion` to control the size of duplicated frames, default -1 means no size limit.
Added unit tests and regression test for this.

Differential Revision: https://reviews.llvm.org/D93556

[MLIR] Fix building unittests in in-tree build

Reviewed By: mehdi_amini

Differential Revision: https://reviews.llvm.org/D95978

Make the folder more robust against op fold() methods that generate a type mismatch

We could extend this with an interface to allow dialect to perform a type
conversion, but that would make the folder creating operation which isn't
the case at the moment, and isn't necessarily always desirable.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D95991

[OpenMP][NVPTX] Take functions in `deviceRTLs` as `convergent`

OpenMP device compiler (similar to other SPMD compilers) assumes that
functions are convergent by default to avoid invalid transformations, such as
the bug (https://bugs.llvm.org/show_bug.cgi?id=49021).

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D95971

[OpenMPIRBuilder] Implement collapseLoops.

The collapseLoops method implements a transformations facilitating the implementation of the collapse-clause. It takes a list of loops from a loop nest and reduces it to a single loop that can be used by other methods that are implemented on just a single loop, such as createStaticWorkshareLoop.

This patch shares some changes with D92974 (such as adding some getters to CanonicalLoopNest), used by both patches.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D93268

[lldb] Rollback to using i386 for the watch simulator

I switched the watch simulator test from i386 to using x86_64, but
apparently that's not supported on the bots. Rollback to using i386 and
solve the original issue by passing the target, similar to what I did
in TestSimulatorPlatform.py.

[CSSPGO][llvm-profgen] Pseudo probe based CS profile generation

This change implements profile generation infra for pseudo probe in llvm-profgen. During virtual unwinding, the raw profile is extracted into range counter and branch counter and aggregated to sample counter map indexed by the call stack context. This change introduces the last step and produces the eventual profile. Specifically, the body of function sample is recorded by going through each probe among the range and callsite target sample is recorded by extracting the callsite probe from branch's source.

Please refer https://groups.google.com/g/llvm-dev/c/1p1rdYbL93s and https://reviews.llvm.org/D89707 for more context about CSSPGO and llvm-profgen.

**Implementation**

- Extended `PseudoProbeProfileGenerator` for pseudo probe based profile generation.
- `populateBodySamplesWithProbes` reading range counter is responsible for recording function body samples and inferring caller's body samples.
- `populateBoundarySamplesWithProbes` reading branch counter is responsible for recording call site target samples.
- Each sample is recorded with its calling context(named `ContextId`). Remind that the probe based context key doesn't include the leaf frame probe info, so the `ContextId` string is created from two part: one from the probe stack strings' concatenation and other one from the leaf frame probe.
- Added regression test

Test Plan:

ninja & ninja check-llvm

Differential Revision: https://reviews.llvm.org/D92998

[AArch64][GlobalISel] Change store value type from p0 -> s64 to import patterns

Similar to the G_PTR_ADD + G_LOAD twiddling we do in `preISelLower`.

The imported patterns expect scalars only, so they can't handle things like

```
G_STORE %ptr1, %ptr2
```

To get around this, use s64 instead.

(This probably makes a good portion of the manual selection code for G_STORE
dead.)

This is a 0.2% geomean code size improvement on CTMark at -Os.

(Best is consumer-typeset @ -0.7%)

Differential Revision: https://reviews.llvm.org/D95908

Revert "[InstrProfiling] Use !associated metadata for counters, data and values"

This reverts commit 97ba5cde52664200819446c1a18de28faf2ed1c6.
Still breaks tests: https://reviews.llvm.org/D76802#2540647

[AArch64][GlobalISel] Emit G_ASSERT_ZEXT in assignValueToAddress for ZExt params

When we have a zeroext parameter coming in on the stack, build

```
%x = G_LOAD ...
%x_assert_zext = G_ASSERT_ZEXT %x, narrow_size
%trunc = G_TRUNC %x_assert_zext
```

Rather than just loading into the truncated type.

This allows us to optimize cases like this: https://godbolt.org/z/vfjhW8

Differential Revision: https://reviews.llvm.org/D95805

[libc++] [P0879] constexpr std::sort

This completes libc++'s implementation of
P0879 "Constexpr for swap and swap related functions."
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0879r0.html

For the feature-macro adjustment, see
https://cplusplus.github.io/LWG/issue3256

Differential Revision: https://reviews.llvm.org/D93661

[Docs] Add some documentation for constructor homing, a debug info optimization (-fuse-ctor-homing)

Adding this, since there's currently no documentation about this.

Differential Revision: https://reviews.llvm.org/D95911

[clang-tidy] Use new mapping matchers

Use mapAnyOf() and matchers based on it.

Use of binaryOperation() means that modernize-loop-convert and
readability-container-size-empty can now be used with rewritten binary
operators.

Differential Revision: https://reviews.llvm.org/D94131

PR44325 (and duplicates): don't issue -Wzero-as-null-pointer-constant
when rewriting 'a < b' as '(a <=> b) < 0'.

It's pretty common for comparison category types to use a pointer or
pointer-to-member type as their '0' parameter.

Revert "[LTO] Use lto::backend for code generation."

This reverts commit 6a59f0560648b43324b5aed51b9ef996404a25e0, because
it is causing failures on green dragon.

Revert "[LTO] Add option enable NewPM with LTOCodeGenerator."

This reverts commit 7a6a2cc81aaf064e6f5bc9a9a16973f552d2bdc2 because
it is causing failures on green dragon.

Revert "[LTOCodeGenerator] Use lto::Config for options (NFC)."

This reverts commit 0d487cf87aa1b609b7db061def3e5ad068576ecf because
it is causing failures on green dragon.

Turn on the new pass manager by default

This turns on the new pass manager by default for the optimization pipeline in
Clang and ThinLTO in various LLD backends. This also makes uses of `opt
-instcombine` use the new pass manager (unless specifically opted out).

This does not affect the backend target-dependent codegen pipeline.

If this causes regressions, you can opt out of the new pass manager
either via the -DENABLE_EXPERIMENTAL_NEW_PASS_MANAGER=OFF CMake flag
while building LLVM, or via various compiler flags, e.g.
-flegacy-pass-manager for Clang or -Wl,--lto-legacy-pass-manager for
ELF LLD. Please file bugs for any regressions.

Major differences:
* The inliner works slightly differently
* -O1 does some amount of inlining
* LCSSA and LoopSimplify are run before all loop passes
* Loop unswitching is implemented slightly differently
* A new SpeculateAroundPHIs pass is added to the pipeline

https://lists.llvm.org/pipermail/llvm-dev/2021-January/148098.html

Reviewed By: asbirlea, ychen, MaskRay, echristo

Differential Revision: https://reviews.llvm.org/D95380

PR49020: Diagnose brace elision in designated initializers in C++.

This is a corner of the differences between C99 designators and C++20
designators that we'd previously overlooked. As with other such cases,
this continues to be permitted as an extension and allowed by default,
behind the -Wc99-designators warning flag, except in cases where it
leads to a conformance difference (such as in overload resolution and in
a SFINAE context).

[GlobalISel] Add sext(constant) -> constant artifact combine.

This is the G_SEXT counterpart to the existing G_ZEXT/G_ANYEXT combines.

Differential Revision: https://reviews.llvm.org/D95729

[lldb] Check for both Lua 5.3 and 5.4 error messages in the tests.

[lldb] Honor the CPU type & subtype when launching on macOS

Honor the CPU type (and subtype) when launching the inferior on macOS.

Part of this functionality was thought to be no longer needed and
removed in 85bd4369610fe60397455c8e0914a09288285e84, however it's still
needed, for example to launch binaries under Rosetta 2 on Apple Silicon.

This patch will use posix_spawnattr_setarchpref_np if available and
fallback to posix_spawnattr_setbinpref_np if not.

Differential revision: https://reviews.llvm.org/D95922

[Coverage] Propogate counter to condition of conditional operator

Clang usually propagates counter mapping region for conditions of `if`, `while`,
`for`, etc from parent counter. We should do the same for condition of conditional operator.

Differential Revision: https://reviews.llvm.org/D95918

[libc++] Rationalize our treatment of contiguous iterators and __unwrap_iter().

- Implement C++20's changes to `reverse_iterator`, so that it won't be
    accidentally counted as a contiguous iterator in C++20 mode.
- Implement C++20's changes to `move_iterator` as well.
- `move_iterator` should not be contiguous. This fixes a bug where
    we optimized `std::copy`-of-move-iterators in an observable way.
    Add a regression test for that bugfix.
- Add libcxx tests for `__is_cpp17_contiguous_iterator` of all relevant
    standard iterator types. Particularly check that vector::iterator
    is still considered contiguous in all C++ modes, even C++03.

After this patch, there continues to be no supported way to write your
own iterator type in C++17-and-earlier such that libc++ will consider it
"contiguous"; however, we now fully support the C++20 approach (in C++20
mode only). If you want user-defined contiguous iterators in C++17-and-earlier,
libc++'s position is "please upgrade to C++20."

Differential Revision: https://reviews.llvm.org/D94807

Add API for adding arguments to blocks

This just exposes a missing API

Differential Revision: https://reviews.llvm.org/D95968

[lld-macho] Try to fix Windows build

[NewPM][HelloWorld] Move HelloWorld to Utils

To prevent creating a new component, which creates a new library.

Reviewed By: ychen

Differential Revision: https://reviews.llvm.org/D95907

[asan] Fix pthread_create interceptor

AsanThread::Destroy implementation expected to be called on
child thread.

I missed authors concern regarding this reviewing D95184.

Reviewed By: delcypher

Differential Revision: https://reviews.llvm.org/D95731

Fix overflowing signed left shift, found by ubsan buildbot.

[mlir] Add gpu async integration test.

Reviewed By: herhut

Differential Revision: https://reviews.llvm.org/D94421

[Hexagon] Add LLVM instruction definitions for Hexagon V68

[clang-tblgen] AnnotateAttr::printPretty has spurious comma when no variadic argument is specified

rdar://73742471
Differential Revision: https://reviews.llvm.org/D95695

[SampleFDO][NFC] Detach SampleProfileLoader from SampleCoverageTracker

This patch detaches SampleProfileLoader from class
SampleCoverageTracker. We plan to move SampleProfileLoader
to a template class. This would remain SampleCoverageTracker
as a class.
Also make callsiteIsHot() as a file static function.

Differential Revision: https://reviews.llvm.org/D95823

[NFC][CUDA] Refactor registering device variable

Extract registering device variable to CUDA runtime codegen function since it
will be called in multiple places.

Reviewed by: Artem Belevich

Differential Revision: https://reviews.llvm.org/D95558

Delete CUDA context after linking device code.

Differential Revision: https://reviews.llvm.org/D95857

[GlobalISel] Combine narrowScalar of G_ADD and G_SUB. NFC

These two cases have identical implementations other than an
unreachable part of `G_ADD` that checks if the scalar we're narrowing
is a vector. Combining them to avoid unnecessary divergence.

Set GPU context before {cu,hip}MemHostRegister.

Differential Revision: https://reviews.llvm.org/D95856

RegisterCoalescer: Fix not setting undef on coalesced subregister uses

This was only adding undef to the use if the copy itself had a
subregister index. It did not consider the subrange liveness if the
use had a subreg index to begin with.

[flang] Fix calls to LBOUND() intrinsic for arrays with lower bounds not 1

Constant folding for calls to LBOUND() was not working when the lower bound of
a constant array was not 1.

I fixed this and re-enabled the test in Evaluate/folding16.f90 that previously
was silently failing. I slightly changed the test to parenthesize the first
argument to exercise all of the new code.

Differential Revision: https://reviews.llvm.org/D95894