review.tizen.org Git - platform/upstream/llvm.git/log

[modules] Fix 37878; Autoload subdirectory modulemaps with specific LangOpts

Summary:
Reproducer and errors:
https://bugs.llvm.org/show_bug.cgi?id=37878

lookupModule was falling back to loadSubdirectoryModuleMaps when it couldn't
find ModuleName in (proper) search paths. This was causing iteration over all
files in the search path subdirectories for example "/usr/include/foobar" in
bugzilla case.

Users don't expect Clang to load modulemaps in subdirectories implicitly, and
also the disk access is not cheap.

if (AllowExtraModuleMapSearch) true with ObjC with @import ModuleName.

Reviewers: rsmith, aprantl, bruno

Subscribers: cfe-commits, teemperor, v.g.vassilev

Differential Revision: https://reviews.llvm.org/D48367

llvm-svn: 336660

[LowerSwitch] Fixed faulty PHI nodes

Summary:
Fixed two cases of where PHI nodes need to be updated by lowerswitch.

When lowerswitch find out that the switch default branch is not
reachable it remove the old default and replace it with the most
popular block from the cases, but it forget to update the PHI
nodes in the default block.

The PHI nodes also need to be updated when the switch is replaced
with a single branch.

Reviewers: hans, reames, arsenm

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D47203

llvm-svn: 336659

Fixing builtin __atomic_fetch_min declaration

Differential Revision: http://reviews.llvm.org/D49068

llvm-svn: 336658

[Support] Harded JSON against invalid UTF-8.

Parsing invalid UTF-8 input is now a parse error.
Creating JSON values from invalid UTF-8 now triggers an assertion, and
(in no-assert builds) substitutes the unicode replacement character.
Strings retrieved from json::Value are always valid UTF-8.

llvm-svn: 336657

[DAGCombiner] Split SDIV/UDIV optimization expansions from the rest of the combines. NFCI.

As suggested by @efriedma on D48975, this patch separates the BuildDiv/Pow2 style optimizations from the rest of the visitSDIV/visitUDIV to make it easier to reuse the combines and will allow us to avoid some rather nasty node recursive combining in visitREM.

llvm-svn: 336656

[MinGW] Skip adding default win32 api libraries if -lwindowsapp is specified

In this setup, skip adding all the default windows import libraries,
if linking to windowsapp (which replaces them, when targeting the
windows store/UWP api subset).

With GCC, the same is achieved by using a custom spec file, but
since clang doesn't use spec files, we have to allow other means of
overriding what default libraries to use (without going all the
way to using -nostdlib, which would exclude everything). The same
approach, in detecting certain user specified libraries and omitting
others from the defaults, was already used in SVN r314138.

Differential Revision: https://reviews.llvm.org/D49059

llvm-svn: 336655

[MinGW] Treat any -lucrt* as replacing -lmsvcrt

Since SVN r314138, we check if the user has specified any particular
alternative msvcrt/ucrt version, and skip the default -lmsvcrt
in those cases.

In addition to the existing names checked, we should also treat
a plain -lucrt in the same way, mingw-w64 has now added a separate
import library named libucrt.a, in addition to libucrtbase.a.

Differential Revision: https://reviews.llvm.org/D49054

llvm-svn: 336654

[VPlan] Add VPlanTestBase.h with helper class to build VPlan for tests.

Reviewers: dcaballe, hsaito, rengolin

Reviewed By: dcaballe

Differential Revision: https://reviews.llvm.org/D49032

llvm-svn: 336653

[COFF] Store import symbol pointers as pointers to the base class

Future symbol insertions can potentially change the type of these
symbols - keep pointers to the base class to reflect this, and
use dynamic casts to inspect them before using as the subclass
type.

This fixes crashes that were possible before, by touching these
symbols that now are populated as e.g. a DefinedRegular, via
the old pointers with DefinedImportThunk type.

Differential Revision: https://reviews.llvm.org/D48953

llvm-svn: 336652

[ELF] - Improve call graph pasing error reporting.

This adds a file name to the error message,
adds a missing test case and refactors code a bit.

llvm-svn: 336651

[ELF] - Report call graph profile file names in error messages.

We did not report file names for some reason.

llvm-svn: 336650

Fix MSVC "signed/unsigned mismatch" warning. NFCI.

llvm-svn: 336649

[XRay][compiler-rt] Fixup build breakage

Changes:

- Remove static assertion on size of a structure, fails on systems where
pointers aren't 8 bytes.

- Use size_t instead of deducing type of arguments to
`nearest_boundary`.

Follow-up to D48653.

llvm-svn: 336648

[PM/Unswitch] Fix unused variable in r336646.

llvm-svn: 336647

[PM/Unswitch] Fix a collection of closely related issues with trivial
switch unswitching.

The core problem was that the way we handled unswitching trivial exit
edges through the default successor of a switch. For some reason
I thought the right way to do this was to add a block containing
unreachable and point the default successor at this block. In
retrospect, this has an amazing number of problems.

The first issue is the one that this pass has always worked around -- we
have to *detect* such edges and avoid unswitching them again. This
seemed pretty easy really. You juts look for an edge to a block
containing unreachable. However, this pattern is woefully unsound. So
many things can break it. The amazing thing is that I found a test case
where *simple-loop-unswitch itself* breaks this! When we do
a *non-trivial* unswitch of a switch we will end up splitting this exit
edge. The result will be a default successor that is an exit and
terminates in ... a perfectly normal branch. So the first test case that
I started trying to fix is added to the nontrivial test cases. This is
a ridiculous example that did just amazing things previously. With just
unswitch, it would create 10+ copies of this stuff stamped out. But if
you combine it *just right* with a bunch of other passes (like
simplify-cfg, loop rotate, and some LICM) you can get it to do this
infinitely. Or at least, I never got it to finish. =[

This, in turn, uncovered another related issue. When we are manipulating
these switches after doing a trivial unswitch we never correctly updated
PHI nodes to reflect our edits. As soon as I started changing how these
edges were managed, it became obvious there were more issues that
I couldn't realistically leave unaddressed, so I wrote more test cases
around PHI updates here and ensured all of that works now.

And this, in turn, required some adjustment to how we collect and manage
the exit successor when it is the default successor. That showed a clear
bug where we failed to include it in our search for the outer-most loop
reached by an unswitched exit edge. This was actually already tested and
the test case didn't work. I (wrongly) thought that was due to SCEV
failing to analyze the switch. In fact, it was just a simple bug in the
code that skipped the default successor. While changing this, I handled
it correctly and have updated the test to reflect that we now get
precise SCEV analysis of trip counts for the outer loop in one of these
cases.

llvm-svn: 336646

[X86] Fast-isel tests for lowered truncation intrinsics

This patch adds fast-isel tests for the IR patterns produced for truncation
intrinsics in rC336643.

Differential Revision: https://reviews.llvm.org/D48822

llvm-svn: 336645

[XRay][compiler-rt] xray::Array Freelist and Iterator Updates

Summary:
We found a bug while working on a benchmark for the profiling mode which
manifests as a segmentation fault in the profiling handler's
implementation. This change adds unit tests which replicate the
issues in isolation.

We've tracked this down as a bug in the implementation of the Freelist
in the `xray::Array` type. This happens when we trim the array by a
number of elements, where we've been incorrectly assigning pointers for
the links in the freelist of chunk nodes. We've taken the chance to add
more debug-only assertions to the code path and allow us to verify these
assumptions in debug builds.

In the process, we also took the opportunity to use iterators to
implement both `front()` and `back()` which exposes a bug in the
iterator decrement operation. In particular, when we decrement past a
chunk size boundary, we end up moving too far back and reaching the
`SentinelChunk` prematurely.

This change unblocks us to allow for contributing the non-crashing
version of the benchmarks in the test-suite as well.

Reviewers: kpw

Subscribers: mgorny, llvm-commits

Differential Revision: https://reviews.llvm.org/D48653

llvm-svn: 336644

[X86] Lowering integer truncation intrinsics to native IR

This patch lowers the _mm[256|512]_cvtepi{64|32|16}_epi{32|16|8} intrinsics to
native IR in cases where the result's length is less than 128 bits.

The resulting IR for 256-bit inputs is folded into VPMOV instructions, while for
128-bit inputs the vpshufb (or, in the 64-to-32-bit case, vinsertps)
instructions are generated instead

Differential Revision: https://reviews.llvm.org/D48712

llvm-svn: 336643

[X86][SSE] Prefer BLEND(SHL(v,c1),SHL(v,c2)) over MUL(v, c3)

Now that rL336250 has landed, we should prefer 2 immediate shifts + a shuffle blend over performing a multiply. Despite the increase in instructions, this is quicker (especially for slow v4i32 multiplies), avoid loads and constant pool usage. It does mean however that we increase register pressure. The code size will go up a little but by less than what we save on the constant pool data.

This patch also adds support for v16i16 to the BLEND(SHIFT(v,c1),SHIFT(v,c2)) combine, and also prevents blending on pre-SSE41 shifts if it would introduce extra blend masks/constant pool usage.

Differential Revision: https://reviews.llvm.org/D48936

llvm-svn: 336642

[X86] Regenerate vector-shuffle-512-v8.ll so the script will merge the 32 and 64 bit checks together. NFC

llvm-svn: 336641

Test commit

Add redundant doc.

llvm-svn: 336640

[X86] Use IsProfitableToFold to block vinsertf128rm in favor of insert_subreg instead of artifically increasing pattern complexity to give priority.

This is a much more direct way to solve the issue than just giving extra priority.

llvm-svn: 336639

[X86] Remove some seemingly unnecessary patterns.

We're missing the EVEX equivalents of these patterns and seem to get along fine.

I think we end up with X86vzload for the obvious IR cases that would produce this DAG.

llvm-svn: 336638

[X86] Use masked the masked scalar fma builtins to implement the default rounding version of the fma intrinsics.

The rounding mode is checked in CGBuiltin.cpp to generate the correct intrinsic call.

Making this switch switchs the masking to use the i8 bitcast to <8 x i1> and extract i1 version of the IR for the mask. Previously we ended up with a scalar 'and' plus an icmp.

llvm-svn: 336637

Add new string benchmarks

llvm-svn: 336636

Update google-benchark to trunk

llvm-svn: 336635

[Sema] Fix a structured binding typo correction bug

BindingDecls have null type until their initializer is processed, so we can't
assume that a correction candidate has non-null type.

rdar://41559582

llvm-svn: 336634

Add lowercase OS name feature

Summary:
Some tests already make use of OS feature names, e.g. 'linux' and 'freebsd',
but they are not actually currently set by lit.

Reviewers: pcc, eugenis

Reviewed By: eugenis

Subscribers: emaste, krytarowski, delcypher, llvm-commits, #sanitizers

Differential Revision: https://reviews.llvm.org/D49115

llvm-svn: 336633

[ODRHash] Merge the two function hashes into one.

Functions that are a sub-Decl of a record were hashed differently than other
functions.  This change keeps the AddFunctionDecl function and the hash of
records now calls this function.  In addition, AddFunctionDecl has an option
to perform a hash as if the body was absent, which is required for some
checks after loading modules.  Additional logic prevents multiple error
message from being printed.

llvm-svn: 336632

Report an error for an extremely large .gdb_index section.

I believe the only way to test this functionality is to create extremely
large object files and attempt to create a .gdb_index that is greater
than 4 GiB. But I think that's too much for most environments and buildbots,
so I'm commiting this without a test that actually triggers the new
error condition.

llvm-svn: 336631

Update crash diagnostics test to avoid attempting to write into various
directories if possible and to not require %t to have "Output" in the name.

llvm-svn: 336630

Fix parsing of privacy annotations in os_log format strings.

Privacy annotations shouldn't have to appear in the first
comma-delimited string in order to be recognized. Also, they should be
ignored if they are preceded or followed by non-whitespace characters.

rdar://problem/40706280

llvm-svn: 336629

[X86] Remove custom handling for __builtin_ia32_divss_round_mask and __builtin_ia32_divsd_round_mask.

llvm-svn: 336628

[X86] Add back GCCBuiltin on mask_div_ss/sd_round.

We no longer need custom handling in clang.

llvm-svn: 336627

[X86] Correct vfixupimm load patterns to look for an integer load, not a floating point load bitcasted to integer.

DAG combine wouldn't let a floating point load bitcasted to integer exist. It would just be an integer load.

llvm-svn: 336626

[X86] Add test cases that show failure to fold load into vfixupimm instructions due to bad isel pattern.

llvm-svn: 336625

[X86] Remove FloatVT from X86VectorVTInfo in X86InstrAVX512.td

The only places it was used where places where VT was the same as FloatVT. So switch those uses to VT and drop it.

llvm-svn: 336624

Revert "AMDGPU: Force inlining if LDS global address is used"

This reverts commit r336587, it was causing test failures on the
sanitizer bots.

llvm-svn: 336623

[X86] Add __builtin_ia32_selectss_128 and __builtin_ia32_selectsd_128 that is suitable for use in scalar mask intrinsics.

This will convert the i8 mask argument to <8 x i1> and extract an i1 and then emit a select instruction. This replaces the '(__U & 1)" and ternary operator used in some of intrinsics. The old sequence was lowered to a scalar and and compare. The new sequence uses an i1 vector that will interoperate better with other mask intrinsics.

This removes the need to handle div_ss/sd specially in CGBuiltin.cpp. A follow up patch will add the GCCBuiltin name back in llvm and remove the custom handling.

I made some adjustments to legacy move_ss/sd intrinsics which we reused here to do a simpler extract and insert instead of 2 extracts and two inserts or a shuffle.

llvm-svn: 336622

[DWARF][NFC] Refactor range list emission to use a static helper

This is prep for DWARF v5 range list emission. Emission of a single range list is moved
to a static helper function.

Reviewer: jdevlieghere

Differential Revision: https://reviews.llvm.org/D49098

llvm-svn: 336621

Fix a bug for packed relocations.

Previously, we didn't create multiple consecutive bitmaps.
Added a test to catch this bug too.

Differential Revision: https://reviews.llvm.org/D49107

llvm-svn: 336620

[libFuzzer] Make -fsanitize=memory,fuzzer work.

This patch allows libFuzzer to fuzz applications instrumented with MSan
without recompiling libFuzzer with MSan instrumentation.

Fixes https://github.com/google/sanitizers/issues/958.

Differential Revision: https://reviews.llvm.org/D48891

llvm-svn: 336619

[test] two small cleanups:

* Remove unused type from is_assignable.pass.cpp

* Don't specialize `common_type<::X<float>>` in common_type.pass.cpp, which violates the requirements of [meta.trans.other]/5

llvm-svn: 336618

[InstCombine] allow more shuffle folds using safe constants

getSafeVectorConstantForBinop() was calling getBinOpIdentity() assuming
that the constant we wanted was operand 1 (RHS). That's wrong, but I
don't think we could expose a bug or even a suboptimal fold from that
because the callers have other guards for any binop that would have
been affected.

llvm-svn: 336617

Revert "[libFuzzer] Mutation tracking and logging implemented"

This reverts r336597 due to bot breakage.

llvm-svn: 336616

[WebAssembly] Support for binary atomic RMW instructions

Summary:
This adds support for binary atomic read-modify-write instructions:
add, sub, and, or, xor, and xchg.

This does not yet support translations of some of LLVM IR atomicrmw
instructions (nand, max, min, umax, and umin) that do not have a direct
counterpart in wasm instructions.

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49088

llvm-svn: 336615

Simplify RelrSection<ELFT>::updateAllocSize.

This patch also speeds it up by making some constants compile-time
constants. Other than that, NFC.

Differential Revision: https://reviews.llvm.org/D49101

llvm-svn: 336614

llvm: Add support for "-fno-delete-null-pointer-checks"

Summary:
Support for this option is needed for building Linux kernel.
This is a very frequently requested feature by kernel developers.

More details : https://lkml.org/lkml/2018/4/4/601

GCC option description for -fdelete-null-pointer-checks:
This Assume that programs cannot safely dereference null pointers,
and that no code or data element resides at address zero.

-fno-delete-null-pointer-checks is the inverse of this implying that
null pointer dereferencing is not undefined.

This feature is implemented in LLVM IR in this CL as the function attribute
"null-pointer-is-valid"="true" in IR (Under review at D47894).
The CL updates several passes that assumed null pointer dereferencing is
undefined to not optimize when the "null-pointer-is-valid"="true"
attribute is present.

Reviewers: t.p.northover, efriedma, jyknight, chandlerc, rnk, srhines, void, george.burgess.iv

Reviewed By: efriedma, george.burgess.iv

Subscribers: eraman, haicheng, george.burgess.iv, drinkcat, theraven, reames, sanjoy, xbolva00, llvm-commits

Differential Revision: https://reviews.llvm.org/D47895

llvm-svn: 336613

Use StringRef instead of `const char *`.

I don't think there's a need to use `const char *`. In most (probably all?)
cases, we need a length of a name later, so discarding a length will
lead to a wasted effort.

Differential Revision: https://reviews.llvm.org/D49046

llvm-svn: 336612

Make llvm.objectsize more conservative with null

In non-zero address spaces, we were reporting that an object at `null`
always occupies zero bytes. This is incorrect in many cases, so just
return `unknown` in those cases for now.

Differential Revision: https://reviews.llvm.org/D48860

llvm-svn: 336611

Rename function calls missed in r336605

NextIsLatest -> isFirst

llvm-svn: 336610

Fix direct calls to __wrap_sym when it is relocated.

Patch by Matthew Koontz!

Before, direct calls to __wrap_sym would not map to valid PLT entries,
so they would crash at runtime. This change maps such calls to the same
PLT entry as calls to sym that are then wrapped.

Differential Revision: https://reviews.llvm.org/D48502

llvm-svn: 336609

Rollback [test-suite] Add a decorator for the lack of libstdcxx on the system.

Pavel suggested an alternative approach that I'll try to implement.

llvm-svn: 336608

[ObjCRuntime] Add support for obfuscation in tagged pointers.

This is the default in MacOS Mojave. No testcases, as basically
we have a lot of coverage (and the testsuite fails quite a bit
without this change in Beta 3).

Thanks to Fred Riss for helping me with this patch (fixing
bugs/nondeterminism).

<rdar://problem/38305553>

llvm-svn: 336607

[Index] Add index::IndexingOptions::IndexImplicitInstantiation

Summary:
With IndexImplicitInstantiation=true, the following case records an occurrence of B::bar in A::foo, which will benefit cross reference tools.

template <class T> struct B { void bar() {}};
template <class T> struct A { void foo(B<T> *x) { x->bar(); }};
int main() { A<int> a; a.foo(0); }

Reviewers: akyrtzi, arphaman, rsmith

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D49002

llvm-svn: 336606

[AST] Rename some Redeclarable functions to reduce confusion

Reviewers: rsmith, akyrtzi

Subscribers: cfe-commits

Differential Revision: https://reviews.llvm.org/D48894

llvm-svn: 336605

Added -fcrash-diagnostics-dir flag

Summary:
New flag causes crash reports to be written in the specified directory
rather than the temp directory.

Patch by Chijioke Kamanu.

Reviewers: hans, inglorion, rnk

Reviewed By: hans

Subscribers: zturner, hiraditya, llvm-commits, cfe-commits

Differential Revision: https://reviews.llvm.org/D48601

llvm-svn: 336604

[ORC] Rename MaterializationResponsibility::delegate to replace and add a new
delegate method (and unit test).

The name 'replace' better captures what the old delegate method did: it
returned materialization responsibility for a set of symbols to the VSO.

The new delegate method delegates responsibility for a set of symbols to a new
MaterializationResponsibility instance. This can be used to split responsibility
between multiple threads, or multiple materialization methods.

llvm-svn: 336603

Fix line endings. NFCI.

llvm-svn: 336602

[Power9] Add __float128 builtins for Rounding Operations

Added __float128 support for a number of rounding operations:

trunc
rint
nearbyint
round
floor
ceil

Differential Revision: https://reviews.llvm.org/D48415

llvm-svn: 336601

[Docs] Fix generation of manpages.

Fix the following error when Sphinx generates the Polly manpage:

Warning, treated as error:
docs/Performance.rst:: WARNING: "table cell spanning" not supported

llvm-svn: 336600

Factor out code to parse -pack-dyn-relocs. NFC.

llvm-svn: 336599

[WebAssembly] Improve readability of load/stores and tests. NFC.

Summary:
- Changed variable/function names to be more consistent
- Improved comments in test files
- Added more tests
- Fixed a few typos
- Misc. cosmetic changes

Reviewers: dschuff

Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits

Differential Revision: https://reviews.llvm.org/D49087

llvm-svn: 336598

[libFuzzer] Mutation tracking and logging implemented

Code now exists to track number of mutations that are used in fuzzing in
total and ones that produce new coverage. The stats are currently being
dumped to the command line.

Patch By: Kode Williams

Differntial Revision: https://reviews.llvm.org/D48054

llvm-svn: 336597

[Power9] [CLANG] Add __float128 support for trunc to double round to odd

Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)

Differential Revision: https://reviews.llvm.org/D48482

llvm-svn: 336596

[Power9] [LLVM] Add __float128 support for trunc to double round to odd

Add support for this builtin:
double builtin_truncf128_round_to_odd(float128)

Differential Revision: https://reviews.llvm.org/D48483

llvm-svn: 336595

lld: add experimental support for SHT_RELR sections.

Patch by Rahul Chaudhry!

This change adds experimental support for SHT_RELR sections, proposed
here: https://groups.google.com/forum/#!topic/generic-abi/bX460iggiKg

Pass '--pack-dyn-relocs=relr' to enable generation of SHT_RELR section
and DT_RELR, DT_RELRSZ, and DT_RELRENT dynamic tags.

Definitions for the new ELF section type and dynamic array tags, as well
as the encoding used in the new section are all under discussion and are
subject to change. Use with caution!

Pass '--use-android-relr-tags' with '--pack-dyn-relocs=relr' to use
SHT_ANDROID_RELR section type instead of SHT_RELR, as well as
DT_ANDROID_RELR* dynamic tags instead of DT_RELR*. The generated
section contents are identical.

'--pack-dyn-relocs=android+relr --use-android-relr-tags' enables both
'--pack-dyn-relocs=android' and '--pack-dyn-relocs=relr': lld will
encode the relative relocations in a SHT_ANDROID_RELR section, and pack
the rest of the dynamic relocations in a SHT_ANDROID_REL(A) section.

Differential Revision: https://reviews.llvm.org/D48247

llvm-svn: 336594

RenameIndependentSubregs: Fix handling of undef tied operands

Ensure that, if updating a tied operand pair, to only update
that pair.

Differential Revision: https://reviews.llvm.org/D49052

llvm-svn: 336593

[OPENMP] Do not mark local variables as declare target.

When the parsing of the functions happens inside of the declare target
region, we may erroneously mark local variables as declare target
thought they are not. This attribute can be applied only to global
variables.

llvm-svn: 336592

[libclang] NFC, simplify clang_Cursor_Evaluate

Take advantage of early returns as suggested by Duncan in
https://reviews.llvm.org/D49051

llvm-svn: 336591

[libclang] evalute compound statement cursors before trying to evaluate
the cursor like a declaration

This change fixes a bug in libclang in which it tries to evaluate a statement
cursor as a declaration cursor, because that statement still has a pointer to
the declaration parent.

rdar://38888477

Differential Revision: https://reviews.llvm.org/D49051

llvm-svn: 336590

[globalisel][irtranslator] Add support for atomicrmw and (strong) cmpxchg

Summary:
This patch adds support for the atomicrmw instructions and the strong
cmpxchg instruction to the IRTranslator.

I've left out weak cmpxchg because LangRef.rst isn't entirely clear on what
difference it makes to the backend. As far as I can tell from the code, it
only matters to AtomicExpandPass which is run at the LLVM-IR level.

Reviewers: ab, t.p.northover, qcolombet, rovka, aditya_nandakumar, volkan, javed.absar

Reviewed By: qcolombet

Subscribers: kristof.beyls, javed.absar, igorb, llvm-commits

Differential Revision: https://reviews.llvm.org/D40092

llvm-svn: 336589

[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error

Build error on Android; reported by and fix provided by (thanks) by Mauro Rossi <issor.oruam@gmail.com>

Fixes the following building error:

external/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp:1903:61:
error: comparison of integers of different signs:
'typename iterator_traits<__wrap_iter<MachineBasicBlock **> >::difference_type'
(aka 'int') and 'unsigned int' [-Werror,-Wsign-compare]
BlockWaitcntProcessedSet.end(), &MBB) < Count)) {
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^ ~~~~~
1 error generated.

Differential Revision: https://reviews.llvm.org/D49089

llvm-svn: 336588

AMDGPU: Force inlining if LDS global address is used

These won't work for the forseeable future. These aren't allowed
from OpenCL, but IPO optimizations can make them appear.

Also directly set the attributes on functions, regardless
of the linkage rather than cloning functions like before.

llvm-svn: 336587

Fix const cast problem introduced in r336563

336563 eliminated CCAST() macros caused build failures

llvm-svn: 336586

[X86][TLI] DAGCombine: Unfold variable bit-clearing mask to two shifts.

Summary:
This adds a reverse transform for the instcombine canonicalizations
that were added in D47980, D47981.

As discussed later, that was worse at least for the code size,
and potentially for the performance, too.

https://rise4fun.com/Alive/Zmpl

Reviewers: craig.topper, RKSimon, spatel

Reviewed By: spatel

Subscribers: reames, llvm-commits

Differential Revision: https://reviews.llvm.org/D48768

llvm-svn: 336585

[Index] Ignore noop #undef's when handling macro occurrences.

llvm-svn: 336584

[Builtins][Attributes][X86] Tag all X86 builtins with their required vector width. Add a min_vector_width function attribute and tag all x86 instrinsics with it

This is part of an ongoing attempt at making 512 bit vectors illegal in the X86 backend type legalizer due to CPU frequency penalties associated with wide vectors on Skylake Server CPUs. We want the loop vectorizer to be able to emit IR containing wide vectors as intermediate operations in vectorized code and allow these wide vectors to be legalized to 256 bits by the X86 backend even though we are targetting a CPU that supports 512 bit vectors. This is similar to what happens with an AVX2 CPU, the vectorizer can emit wide vectors and the backend will split them. We want this splitting behavior, but still be able to use new Skylake instructions that work on 256-bit vectors and support things like masking and gather/scatter.

Of course if the user uses explicit vector code in their source code we need to not split those operations. Especially if they have used any of the 512-bit vector intrinsics from immintrin.h. And we need to make it so that merely using the intrinsics produces the expected code in order to be backwards compatible.

To support this goal, this patch adds a new IR function attribute "min-legal-vector-width" that can indicate the need for a minimum vector width to be legal in the backend. We need to ensure this attribute is set to the largest vector width needed by any intrinsics from immintrin.h that the function uses. The inliner will be reponsible for merging this attribute when a function is inlined. We may also need a way to limit inlining in the future as well, but we can discuss that in the future.

To make things more complicated, there are two different ways intrinsics are implemented in immintrin.h. Either as an always_inline function containing calls to builtins(can be target specific or target independent) or vector extension code. Or as a macro wrapper around a taget specific builtin. I believe I've removed all cases where the macro was around a target independent builtin.

To support the always_inline function case this patch adds attribute((min_vector_width(128))) that can be used to tag these functions with their vector width. All x86 intrinsic functions that operate on vectors have been tagged with this attribute.

To support the macro case, all x86 specific builtins have also been tagged with the vector width that they require. Use of any builtin with this property will implicitly increase the min_vector_width of the function that calls it. I've done this as a new property in the attribute string for the builtin rather than basing it on the type string so that we can opt into it on a per builtin basis and avoid any impact to target independent builtins.

There will be future work to support vectors passed as function arguments and supporting inline assembly. And whatever else we can find that isn't covered by this patch.

Special thanks to Chandler who suggested this direction and reviewed a preview version of this patch. And thanks to Eric Christopher who has had many conversations with me about this issue.

Differential Revision: https://reviews.llvm.org/D48617

llvm-svn: 336583

Don't take the address of an xvalue when printing an expr result

Summary:
If we have an xvalue here, we will always hit the `err_typecheck_invalid_lvalue_addrof` error
in 'Sema::CheckAddressOfOperand' when trying to take the address of the result. This patch
uses the fallback code path where we store the result in a local variable instead when we hit
this case.

Fixes rdar://problem/40613277

Reviewers: jingham, vsk

Reviewed By: vsk

Subscribers: vsk, friss, lldb-commits

Differential Revision: https://reviews.llvm.org/D48303

llvm-svn: 336582

[clangd] Make sure macro information exists before increasing usage count.

llvm-svn: 336581

[Utils] Fix gdb pretty printers to work with Python 3.

Reiterate D23202 for container printers added after the change landed.

Differential Revision: https://reviews.llvm.org/D46578

llvm-svn: 336580

[Power9] Add __float128 builtins for Round To Odd

Add a number of builtins for __float128 Round To Odd.
This is the Clang portion of the builtins work.

Differential Revision: https://reviews.llvm.org/D47548

llvm-svn: 336579

[Power9] Add __float128 builtins for Round To Odd

GCC has builtins for these round to odd instructions:

__float128 __builtin_sqrtf128_round_to_odd (__float128)
__float128 __builtin_{add,sub,mul,div}f128_round_to_odd (__float128, __float128)
__float128 __builtin_fmaf128_round_to_odd (__float128, __float128, __float128)

Differential Revision: https://reviews.llvm.org/D47550

llvm-svn: 336578

[DebugInfo] Change default value of FDEPointerEncoding

Summary:
If the encoding is not specified in CIE augmentation string, then it
should be DW_EH_PE_absptr instead of DW_EH_PE_omit.

Reviewers: ruiu, MaskRay, plotfi, rafauler

Reviewed By: MaskRay

Subscribers: rafauler, JDevlieghere, llvm-commits

Differential Revision: https://reviews.llvm.org/D49000

llvm-svn: 336577

[SelectionDAG] Add VT consistency checks to the creation of ISD::FMA.

This is similar to what is done for binops. I don't know if this would have helped us catch the bug fixed in r336566 earlier or not, but I figured it couldn't hurt.

llvm-svn: 336576

[OpenMP] Fix a few formatting issues

llvm-svn: 336575

Add bitcode compatibility test for 6.0

Summary:
Add bitcode compatibility test for 6.0. On top of the normal disassemble
test, also runs the verifier to make sure simple 6.0 bitcode can pass
the current IR verifier.

Reviewers: vsk

Reviewed By: vsk

Subscribers: dexonsmith, llvm-commits

Differential Revision: https://reviews.llvm.org/D49086

llvm-svn: 336574

[ASan] Minor ASan error reporting cleanup

Summary:
- use proper Error() decorator for error messages
- refactor ASan thread id and name reporting

Reviewers: eugenis

Subscribers: kubamracek, delcypher, #sanitizers, llvm-commits

Differential Revision: https://reviews.llvm.org/D49044

llvm-svn: 336573

[LoopInfo] Port loop exit interfaces from Loop to LoopBase

This patch ports hasDedicatedExits, getUniqueExitBlocks and
getUniqueExitBlock in Loop to LoopBase so that they can be used
from other LoopBase sub-classes.

Reviewers: chandlerc, sanjoy, hfinkel, fhahn

Reviewed By: chandlerc

Differential Revision: https://reviews.llvm.org/D48817

llvm-svn: 336572

[OpenMP] Introduce hierarchical scheduling

This patch introduces the logic implementing hierarchical scheduling.
First and foremost, hierarchical scheduling is off by default
To enable, use -DLIBOMP_USE_HIER_SCHED=On during CMake's configure stage.
This work is based off if the IWOMP paper:
"Workstealing and Nested Parallelism in SMP Systems"

Hierarchical scheduling is the layering of OpenMP schedules for different layers
of the memory hierarchy. One can have multiple layers between the threads and
the global iterations space. The threads will go up the hierarchy to grab
iterations, using possibly a different schedule & chunk for each layer.

[ Global iteration space (0-999) ]

(use static)
[ L1 | L1 | L1 | L1 ]

(use dynamic,1)
[ T0 T1 | T2 T3 | T4 T5 | T6 T7 ]

In the example shown above, there are 8 threads and 4 L1 caches begin targeted.
If the topology indicates that there are two threads per core, then two
consecutive threads will share the data of one L1 cache unit. This example
would have the iteration space (0-999) split statically across the four L1
caches (so the first L1 would get (0-249), the second would get (250-499), etc).
Then the threads will use a dynamic,1 schedule to grab iterations from the L1
cache units. There are currently four supported layers: L1, L2, L3, NUMA

OMP_SCHEDULE can now read a hierarchical schedule with this syntax:
OMP_SCHEDULE='EXPERIMENTAL LAYER,SCHED[,CHUNK][:LAYER,SCHED[,CHUNK]...]:SCHED,CHUNK
And OMP_SCHEDULE can still read the normal SCHED,CHUNK syntax from before

I've kept most of the hierarchical scheduling logic inside kmp_dispatch_hier.h
to try to keep it separate from the rest of the code.

Differential Revision: https://reviews.llvm.org/D47962

llvm-svn: 336571

[InstCombine] correct test comments; NFC

llvm-svn: 336570

[OPENMP, NVPTX] Support several images in the executable.

Summary:
Currently Cuda plugin supports loading of the single image, though we
may have the executable with the several images, if it has target
regions inside of the dynamically loaded library. Patch allows to load
multiple images.

Reviewers: grokos

Subscribers: guansong, openmp-commits, kkwli0

Differential Revision: https://reviews.llvm.org/D49036

llvm-svn: 336569

[OpenMP] Restructure loop code for hierarchical scheduling

This patch reorganizes the loop scheduling code in order to allow hierarchical
scheduling to use it more effectively. In particular, the goal of this patch
is to separate the algorithmic parts of the scheduling from the thread
logistics code.

Moves declarations & structures to kmp_dispatch.h for easier access in
other files. Extracts the algorithmic part of __kmp_dispatch_init() and
__kmp_dispatch_next() into __kmp_dispatch_init_algorithm() and
__kmp_dispatch_next_algorithm(). The thread bookkeeping logic is still kept in
__kmp_dispatch_init() and __kmp_dispatch_next(). This is done because the
hierarchical scheduler needs to access the scheduling logic without the
bookkeeping logic. To prepare for new pointer in dispatch_private_info_t, a
new flags variable is created which stores the ordered and nomerge flags instead
of them being in two separate variables. This will keep the
dispatch_private_info_t structure the same size.

Differential Revision: https://reviews.llvm.org/D47961

llvm-svn: 336568

[OPENMP, NVPTX] Do not globalize local variables in parallel regions.

In generic data-sharing mode we are allowed to not globalize local
variables that escape their declaration context iff they are declared
inside of the parallel region. We can do this because L2 parallel
regions are executed sequentially and, thus, we do not need to put
shared local variables in the global memory.

llvm-svn: 336567

[X86] In combineFMA, make sure we bitcast the result of isFNEG back the expected type before creating the new FMA node.

Previously, we were creating malformed SDNodes, but nothing noticed because the type constraints prevented isel from noticing.

llvm-svn: 336566

[X86][AVX] Regenerate AVX1 fast-isel tests.

Let the update script merge 32/64 tests where possible

llvm-svn: 336565

Retrieve a function PDB symbol correctly from nested blocks

Summary:
This patch fixes a problem with retrieving a function symbol by an address in a nested block. In the current implementation of ResolveSymbolContext function it retrieves a symbol with PDB_SymType::None and then checks if found symbol's tag equals to PDB_SymType::Function. So, if nested block's symbol was found, ResolveSymbolContext does not resolve a function.

It is very simple to reproduce this. For example, in the next program

```
int main() {
  auto r = 0;
  for (auto i = 1; i <= 10; i++) {
    r += i & 1 + (i - 1) & 1 - 1;
  }

  return r;
}
```

if we will stop inside the cycle and will do a backtrace, the top element will be broken. But how we can test this? I thought to add an option to lldb-test to allow search a function by address, but the address may change when the compiler will be changed.

Patch by: Aleksandr Urakov

Reviewers: asmith, labath, zturner

Reviewed By: asmith, labath

Subscribers: stella.stamenova, llvm-commits

Differential Revision: https://reviews.llvm.org/D47939

llvm-svn: 336564

[OpenMP] Use C++11 Atomics - barrier, tasking, and lock code

These are preliminary changes that attempt to use C++11 Atomics in the runtime.
We are expecting better portability with this change across architectures/OSes.
Here is the summary of the changes.

Most variables that need synchronization operation were converted to generic
atomic variables (std::atomic<T>). Variables that are updated with combined CAS
are packed into a single atomic variable, and partial read/write is done
through unpacking/packing

Patch by Hansang Bae

Differential Revision: https://reviews.llvm.org/D47903

llvm-svn: 336563

[InstCombine] avoid extra poison when moving shift above shuffle

As discussed in D49047 / D48987, shift-by-undef produces poison,
so we can't use undef vector elements in that case..

Note that we need to extend this for poison-generating flags,
and there's a proposal to create poison from FMF in D47963,

llvm-svn: 336562

[dsymutil] Add support for outputting assembly

When implementing the DWARF accelerator tables in dsymutil I ran into an
assertion in the assembler. Debugging these kind of issues is a lot
easier when looking at the assembly instead of debugging the assembler
itself. Since it's only a matter of creating an AsmStreamer instead of a
MCObjectStreamer it made sense to turn this into a (hidden) dsymutil
feature.

Differential revision: https://reviews.llvm.org/D49079

llvm-svn: 336561