platform/upstream/llvm.git
2 years ago[InstCombine] Add reduced sub/negate test from PR51584.
Florian Hahn [Mon, 23 Aug 2021 14:45:53 +0000 (15:45 +0100)]
[InstCombine] Add reduced sub/negate test from PR51584.

2 years agoRevert "[InstCombine] generalize subtract with 'not' operands"
Florian Hahn [Mon, 23 Aug 2021 14:31:48 +0000 (15:31 +0100)]
Revert "[InstCombine] generalize subtract with 'not' operands"

This reverts commit 3aa009cc87e3789ac44bbb98b04846736373e08f.

The reverted commit causes an infinite loop in instcombine. See PR51584.

2 years ago[InstrProfiling] Add AIX triple to platform test
Jinsong Ji [Mon, 23 Aug 2021 13:16:48 +0000 (13:16 +0000)]
[InstrProfiling] Add AIX triple to platform test

We found that AIX was not covered in most of the InstrProfiling tests.
So we are trying to enable the tests gradually.

This is to add AIX triple to platform tests to make sure the
registrations are OK.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108490

2 years ago[tsan] Do not include <stdatomic.h> from sanitize-thread-disable.c
Alexander Potapenko [Mon, 23 Aug 2021 13:46:28 +0000 (15:46 +0200)]
[tsan] Do not include <stdatomic.h> from sanitize-thread-disable.c

Looks like non-x86 bots are unhappy with inclusion of <stdatomic.h>
e.g.:

clang-armv7-vfpv3-2stage - https://lab.llvm.org/buildbot/#/builders/182/builds/626
clang-ppc64le-linux - https://lab.llvm.org/buildbot/#/builders/76/builds/3619
llvm-clang-win-x-armv7l - https://lab.llvm.org/buildbot/#/builders/60/builds/4514

It seems to be unnecessary, just remove it and replace atomic_load()
calls with dereferences of _Atomic*.

Differential Revision: https://reviews.llvm.org/D108555

2 years ago[clang-format] break after the closing paren of a TypeScript decoration
Krasimir Georgiev [Mon, 23 Aug 2021 13:51:39 +0000 (15:51 +0200)]
[clang-format] break after the closing paren of a TypeScript decoration

This fixes up a regression we found from
https://reviews.llvm.org/D107267: in specific contexts, clang-format
stopped breaking after the `)` in TypeScript decorations. There were no test cases covering this, so I added one.

Reviewed By: MyDeveloperDay

Differential Revision: https://reviews.llvm.org/D108538

2 years ago[OpenMP][test] fix omp_get_wtime.c test to be more accommodating
Peyton, Jonathan L [Fri, 20 Aug 2021 21:06:13 +0000 (16:06 -0500)]
[OpenMP][test] fix omp_get_wtime.c test to be more accommodating

The omp_get_wtime.c test fails intermittently if the recorded times are
off by too much which can happen when many tests are run in parallel.

Instead of failing if one timing is a little off, take average of 100
timings minus the 10 worst.

Differential Revision: https://reviews.llvm.org/D108488

2 years ago[X86] Add unaligned partial load test
Simon Pilgrim [Mon, 23 Aug 2021 12:59:54 +0000 (13:59 +0100)]
[X86] Add unaligned partial load test

Shows LoadedSlice::canMergeExpensiveCrossRegisterBankCopy failure to merge unaligned dereferencable loads.

Another candidate for PR45116

2 years ago[clang][CodeGen] GetDefaultAlignTempAlloca uses preferred alignment
Andy Wingo [Wed, 4 Aug 2021 13:49:13 +0000 (15:49 +0200)]
[clang][CodeGen] GetDefaultAlignTempAlloca uses preferred alignment

This function was defaulting to use the ABI alignment for the LLVM
type.  Here we change to use the preferred alignment.  This will allow
unification with GetTempAlloca, which if alignment isn't specified, uses
the preferred alignment.

Differential Revision: https://reviews.llvm.org/D108450

2 years ago[clang][NFC] Tighten up code for GetGlobalVarAddressSpace
Andy Wingo [Wed, 4 Aug 2021 09:32:51 +0000 (11:32 +0200)]
[clang][NFC] Tighten up code for GetGlobalVarAddressSpace

The LangAS local is only used in the OpenCL case; move its decl
inwards.

Differential Revision: https://reviews.llvm.org/D108449

2 years ago[clang][NFC] GetOrCreateLLVMGlobal takes LangAS
Andy Wingo [Wed, 4 Aug 2021 09:27:02 +0000 (11:27 +0200)]
[clang][NFC] GetOrCreateLLVMGlobal takes LangAS

Pass a LangAS instead of a target address space to
GetOrCreateLLVMGlobal, to remove a place where the frontend assumes that
target address space 0 is special.

Differential Revision: https://reviews.llvm.org/D108445

2 years ago[mlir][SCF] Do not peel loops inside partial iterations
Matthias Springer [Mon, 23 Aug 2021 12:33:56 +0000 (21:33 +0900)]
[mlir][SCF] Do not peel loops inside partial iterations

Do not apply loop peeling to loops that are contained in the partial iteration of an already peeled loop. This is to avoid code explosion when dealing with large loop nests. Can be controlled with a new pass option `skip-partial`.

Differential Revision: https://reviews.llvm.org/D108542

2 years ago[FuncSpec] Don't specialize function which are easy to inline
Chuanqi Xu [Mon, 23 Aug 2021 11:20:07 +0000 (19:20 +0800)]
[FuncSpec] Don't specialize function which are easy to inline

It would waste time to specialize a function which would inline finally.
This patch did two things:

- Don't specialize functions which are always-inline.
- Don't spescialize functions whose lines of code are less than threshold
(100 by default).

For spec2017int, this patch could reduce the number of specialized
functions by 33%. Then the compile time didn't increase for every
benchmark.

Reviewed By: SjoerdMeijer, xbolva00, snehasish

Differential Revision: https://reviews.llvm.org/D107897

2 years ago[tsan] Add support for disable_sanitizer_instrumentation attribute
Alexander Potapenko [Tue, 17 Aug 2021 11:19:15 +0000 (13:19 +0200)]
[tsan] Add support for disable_sanitizer_instrumentation attribute

Unlike __attribute__((no_sanitize("thread"))), this one will cause TSan
to skip the entire function during instrumentation.

Depends on https://reviews.llvm.org/D108029

Differential Revision: https://reviews.llvm.org/D108202

2 years ago[X86][AVX] Add PR13310 test coverage
Simon Pilgrim [Mon, 23 Aug 2021 10:34:00 +0000 (11:34 +0100)]
[X86][AVX] Add PR13310 test coverage

Show failure to fold scaled-index into gather/scatter scale operands

2 years agoRecommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
Florian Hahn [Mon, 23 Aug 2021 10:17:37 +0000 (11:17 +0100)]
Recommit "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"

This reverts the revert ab9296f13be45cd190608f54a69bdd5c7c561b16.

The issue causing the revert should be fixed in 9baed023b4b5.

2 years ago[AMDGPU] Try to fix a GCC 11 warning
Jay Foad [Mon, 23 Aug 2021 09:49:01 +0000 (10:49 +0100)]
[AMDGPU] Try to fix a GCC 11 warning

Apparently GCC 11 was warning:
AMDGPURegisterBankInfo.cpp:2543:33: warning: enumerated and non-enumerated type in conditional expression [-Wextra]

2 years ago[AArch64][SME] Support NEON scalar FP instructions in streaming mode
Cullen Rhodes [Fri, 13 Aug 2021 07:40:39 +0000 (07:40 +0000)]
[AArch64][SME] Support NEON scalar FP instructions in streaming mode

The following scalar FP instructions are legal in streaming mode:

  0101 1110 xx1x xxxx 11x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar)
  0101 1110 x10x xxxx 00x1 11xx xxxx xxxx # FMULX/FRECPS/FRSQRTS (scalar, FP16)
  01x1 1110 1x10 0001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar)
  01x1 1110 1111 1001 11x1 10xx xxxx xxxx # FRECPE/FRSQRTE/FRECPX (scalar, FP16)

Predicate them on `HasNEONorStreamingSVE`. Full list of affected
instructions:

  FMULX16, FMULX32, FMULX64, FRECPS16, FRECPS32, FRECPS64, FRSQRTS16,
  FRSQRTS32, FRSQRTS64, FRECPEv1f16, FRECPEv1i32, FRECPEv1i64, FRECPXv1f16,
  FRECPXv1i32, FRECPXv1i64, FRSQRTEv1f16, FRSQRTEv1i32, FRSQRTEv1i64

Depends on D107902.

The reference can be found here:
https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions

Execution of NEON instructions that are illegal in streaming mode will
cause a trap or exception. Using FMULX [1] as an example, this check is
at the top of the pseudocode:

  if elements == 1 then
      CheckFPEnabled64();
  else
      CheckFPAdvSIMDEnabled64();

For the legal scalar variants it calls `CheckFPEnabled64`, whereas for the
illegal vector variants it calls `CheckFPAdvSIMDEnabled64` which traps.

This is useful for observing which instructions are/aren't legal
in streaming mode.

[1] https://developer.arm.com/documentation/ddi0602/2021-06/SIMD-FP-Instructions/FMULX--Floating-point-Multiply-extended-

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D108039

2 years ago[AArch64][SME] Add predicate for NEON support in streaming mode
Cullen Rhodes [Wed, 18 Aug 2021 12:46:22 +0000 (12:46 +0000)]
[AArch64][SME] Add predicate for NEON support in streaming mode

Split out from D107903 to remove dependency for D108039 and D108279.

Reviewed By: paulwalker-arm

Differential Revision: https://reviews.llvm.org/D108293

2 years ago[Polly] Never consider non-SCoP blocks as error blocks.
Michael Kruse [Mon, 23 Aug 2021 00:23:51 +0000 (19:23 -0500)]
[Polly] Never consider non-SCoP blocks as error blocks.

Code outside the SCoP will be executed recardless of the code versioning
runtime check introduced by CodeGeneration. Assumption made based on
that these are never executed in Polly-optimized code does not hold.

This fixes the miscompilation of MultiSource/Applications/lambda-0.1.3

2 years ago[libc] Add a multi-waiter mutex test.
Siva Chandra Reddy [Sat, 21 Aug 2021 04:46:32 +0000 (04:46 +0000)]
[libc] Add a multi-waiter mutex test.

A corresponding adjustment to mtx_lock has also been made.

2 years ago[M68k][NFC] Tidy up the just-migrated MC tests
Min-Yih Hsu [Mon, 23 Aug 2021 05:43:02 +0000 (22:43 -0700)]
[M68k][NFC] Tidy up the just-migrated MC tests

Cleanup the formats of the MC tests that were just migrated. NFC

2 years ago[M68k][test] Migrate some MOVE instruction MC tests
Min-Yih Hsu [Mon, 23 Aug 2021 05:24:31 +0000 (22:24 -0700)]
[M68k][test] Migrate some MOVE instruction MC tests

Migrate some MOVE instruction MC tests from test/CodeGen/M68k.
Unfortunately the tests touched in this commit were failed due to
lacking of the `abs.W` operand, which forces any memory address parsed
from assembly being represented in 32-bits.
We're temporarily allowing these unwanted widening in the tests until
the support for `abs.W` is there.

2 years ago[libc] Add range reduction functions based on Paine and Hanek algorithm.
Siva Chandra Reddy [Fri, 4 Jun 2021 06:12:50 +0000 (06:12 +0000)]
[libc] Add range reduction functions based on Paine and Hanek algorithm.

These functions will be used in a future patch to implement
trigonometric functions. Unit tests have been added but to the
libc-long-running-tests suite. The unit tests long running because we
compare against MPFR computations performed at 1280 bits of precision.

Some cleanups or elimination of repeated patterns can be done as follow
up changes.

Differential Revision: https://reviews.llvm.org/D104817

2 years ago[NFC] clang-format -i clang/lib/CodeGen/CGStmtOpenMP.cpp
Shilei Tian [Mon, 23 Aug 2021 02:57:05 +0000 (22:57 -0400)]
[NFC] clang-format -i clang/lib/CodeGen/CGStmtOpenMP.cpp

2 years ago[PowerPC] Use int64_t to represent stack object offset and frame size
Kai Luo [Mon, 23 Aug 2021 02:02:36 +0000 (02:02 +0000)]
[PowerPC] Use int64_t to represent stack object offset and frame size

This is the first step to enable PPC64 support huge frame size(>2G). Also fix an assertion error for frame size, i.e.,`int x; !isInt<32>(x);` should be always evaluated false, so the guard code for frame size is impossible to hit.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D107435

2 years ago[Polly] Add support for -polly-dump-before/after with NPM.
Michael Kruse [Mon, 23 Aug 2021 01:35:14 +0000 (20:35 -0500)]
[Polly] Add support for -polly-dump-before/after with NPM.

The new pass manager does not allow adding module passes at the
-polly-position=before-vectorizer extension point. Introduce a
DumpFunctionPass that dumps only current function. In contrast to the
legacy pass manager's -polly-dump-before, each function will be dumped
into its own file. -polly-dump-before-file is still not supported.

The DumpFunctionPass uses llvm::CloneModule to copy the current function
into a new module and then write it into a file.

2 years ago[mlir] Add op for NCHW conv2d.
Stella Laurenzo [Sun, 22 Aug 2021 22:11:42 +0000 (15:11 -0700)]
[mlir] Add op for NCHW conv2d.

* This is the native data layout for PyTorch and npcomp was using the prior version before cleanup.

Differential Revision: https://reviews.llvm.org/D108527

2 years ago[mlir][linalg] Add script to update the LinalgNamedStructuredOps.yaml. nfc
Stella Laurenzo [Sun, 22 Aug 2021 23:54:10 +0000 (16:54 -0700)]
[mlir][linalg] Add script to update the LinalgNamedStructuredOps.yaml. nfc

Also adds banners to the files with update instructions.

Differential Revision: https://reviews.llvm.org/D108529

2 years ago[mlir][python] Makes C++ extension code relocatable by way of a macro.
Stella Laurenzo [Sun, 22 Aug 2021 20:43:55 +0000 (13:43 -0700)]
[mlir][python] Makes C++ extension code relocatable by way of a macro.

* Resolves a TODO by making this configurable by downstreams.
* This seems to be the last thing allowing full use of the Python bindings as a library within another project (i.e. be embedding them).

Differential Revision: https://reviews.llvm.org/D108523

2 years ago[GVN] Fix test for loop load PRE on alloca (NFC)
Nikita Popov [Sun, 22 Aug 2021 20:28:53 +0000 (22:28 +0200)]
[GVN] Fix test for loop load PRE on alloca (NFC)

This test was not modifying the pointer in the loop, so the loads
just ended up as undef, without relation to loop load PRE.

Pass the alloca to the called function, so the memory is
potentially modified.

2 years ago[GVN] Don't short-circuit load PRE
Nikita Popov [Sun, 22 Aug 2021 19:08:58 +0000 (21:08 +0200)]
[GVN] Don't short-circuit load PRE

4ad41902e8c7481ccf3cdf6e618dfcd1e1fc10fc changed this code to
propagate Changed if scalar GEP PRE is performed. However, as
implemented this would skip the load PRE entirely if GEP indices
were PREd. Make sure load PRE runs even if Changed is already
true.

This likely has no functional effect as load PRE would then
occur on a later GVN iteration.

2 years ago[scudo][standalone] Link tests against libatomic if libatomic exists
Amy Kwan [Sun, 22 Aug 2021 18:46:52 +0000 (13:46 -0500)]
[scudo][standalone] Link tests against libatomic if libatomic exists

It is possible that libatomic does not exist on some systems. This patch updates
the scudo standalone tests to link against libatomic if the library exists.

This is an update to the original patch: https://reviews.llvm.org/D64134 and
aims to resolve https://bugs.llvm.org/show_bug.cgi?id=51431.

Differential Revision: https://reviews.llvm.org/D108503

2 years ago[runtimeunroll] Use early return to reduce nesting [nfc]
Philip Reames [Sun, 22 Aug 2021 18:34:50 +0000 (11:34 -0700)]
[runtimeunroll] Use early return to reduce nesting [nfc]

2 years agoSpecial case common branch patterns in breakLoopBackedge
Philip Reames [Sun, 22 Aug 2021 17:40:05 +0000 (10:40 -0700)]
Special case common branch patterns in breakLoopBackedge

This special cases an unconditional latch and a conditional branch latch exit to improve codegen and test readability.  I am hoping to reuse this function in the runtime unroll code, but without this change, the test diffs are far too complex to assess.

2 years ago[X86] combineMul - move MUL_IMM comment inside function. NFC.
Simon Pilgrim [Sun, 22 Aug 2021 17:27:03 +0000 (18:27 +0100)]
[X86] combineMul - move MUL_IMM comment inside function. NFC.

combineMul is now used for other things as well as the mul-with-constant expansion - move the comment to where its actually relevant.

2 years ago[DWARF][Verifier] Do not add child DieRangeInfo with empty address range to the parent.
Alexey Lapshin [Wed, 4 Aug 2021 16:17:33 +0000 (19:17 +0300)]
[DWARF][Verifier] Do not add child DieRangeInfo with empty address range to the parent.

verifyDieRanges function checks for the intersected address ranges.
It adds child DieRangeInfo into parent DieRangeInfo to check
whether children have overlapping address ranges. It is safe to not add
DieRangeInfo with empty address range into parent's children list.
This decreases the number of children which should be navigated and as a result
decreases execution time(parents having a lot of children with empty ranges
spend much time navigating them). For this command: "llvm-dwarfdump --verify clang-repl"
execution time decreased from 220 sec till 75 sec.

Differential Revision: https://reviews.llvm.org/D107554

2 years ago[Transforms] Remove unused declaration emitStrNLen (NFC)
Kazu Hirata [Sun, 22 Aug 2021 16:08:21 +0000 (09:08 -0700)]
[Transforms] Remove unused declaration emitStrNLen (NFC)

The corresponding definition has been missing for at least 5 years.

2 years ago[libc++] Eliminate needless `add_lvalue_reference` from <algorithm> helpers. NFCI.
Arthur O'Dwyer [Thu, 19 Aug 2021 19:06:21 +0000 (15:06 -0400)]
[libc++] Eliminate needless `add_lvalue_reference` from <algorithm> helpers. NFCI.

When `_Compare` is a function parameter already (so it's not `void`
and it's not an abominable function type), `add_lvalue_reference_t<_Compare>`
is simply a synonym for `_Compare&`. We don't need to pull in `<type_traits>`
and instantiate a template trait to figure that out.

Differential Revision: https://reviews.llvm.org/D108400

2 years ago[InstCombine] Perform "eq of parts" fold with logical ops
Nikita Popov [Sun, 22 Aug 2021 14:55:53 +0000 (16:55 +0200)]
[InstCombine] Perform "eq of parts" fold with logical ops

The pattern matched here is too complex for the general logical
and/or to bitwise and/or conversion to trigger. However, the
fold is poison-safe, so match it with a select root as well:

https://alive2.llvm.org/ce/z/vNzzSg
https://alive2.llvm.org/ce/z/Beyumt

2 years ago[InstCombine] Add tests for "eq of parts" with logical op (NFC)
Nikita Popov [Sun, 22 Aug 2021 14:43:27 +0000 (16:43 +0200)]
[InstCombine] Add tests for "eq of parts" with logical op (NFC)

We currently only handle this with a bitwise and/or instruction,
but not a logical.

2 years ago[X86][AVX] matchShuffleAsBlend - use isElementEquivalent to help match broadcast...
Simon Pilgrim [Sun, 22 Aug 2021 14:26:17 +0000 (15:26 +0100)]
[X86][AVX] matchShuffleAsBlend - use isElementEquivalent to help match broadcast/repeated elements

Extend matchShuffleAsBlend to not only match against known in-place elements for BLEND shuffles, but use isElementEquivalent to determine if the shuffle mask's referenced element is the same as the in-place element.

This allows us to replace a number of insertps instructions with more general blendps instructions (better opportunities for commutation, concatenation etc.).

2 years agoFix signed/unsigned comparison warning. NFCI.
Simon Pilgrim [Sun, 22 Aug 2021 14:02:19 +0000 (15:02 +0100)]
Fix signed/unsigned comparison warning. NFCI.

2 years ago[X86] Expose memory codegen in element insert load tests to improve accuracy of checks
Simon Pilgrim [Sun, 22 Aug 2021 13:54:36 +0000 (14:54 +0100)]
[X86] Expose memory codegen in element insert load tests to improve accuracy of checks

Also replace X32 with X86 check prefixes for i686 tests (we tend to try to use X32 for gnux32 targets)

2 years ago[X86][SSE] lowerVECTOR_SHUFFLE - canonicalize with horizontal ops.
Simon Pilgrim [Sun, 22 Aug 2021 13:17:39 +0000 (14:17 +0100)]
[X86][SSE] lowerVECTOR_SHUFFLE - canonicalize with horizontal ops.

Before lowering shuffles, see if we can merge horizontal ops or canonicalize the shuffle mask to point to the same LHS/RHS of the HOps when an HOp's args are repeated.

2 years ago[InstSimplify] fold rotate of -1 to -1
Sanjay Patel [Sun, 22 Aug 2021 13:13:59 +0000 (09:13 -0400)]
[InstSimplify] fold rotate of -1 to -1

This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/GpkFCt

2 years ago[InstSimplify] fold rotate of zero to zero
Sanjay Patel [Sun, 22 Aug 2021 13:10:52 +0000 (09:10 -0400)]
[InstSimplify] fold rotate of zero to zero

This is part of solving more general rotate patterns seen in
bugs related to:
https://llvm.org/PR51575

https://alive2.llvm.org/ce/z/fjKwqv

2 years ago[InstSimplify] add tests for rotates of 0/-1; NFC
Sanjay Patel [Sun, 22 Aug 2021 13:09:49 +0000 (09:09 -0400)]
[InstSimplify] add tests for rotates of 0/-1; NFC

2 years ago[X86] Try to sync HSW + BDW model class defs to simplify comparisons. NFC.
Simon Pilgrim [Sun, 22 Aug 2021 12:02:51 +0000 (13:02 +0100)]
[X86] Try to sync HSW + BDW model class defs to simplify comparisons. NFC.

Broadwell is mainly a die shrink of Haswell, but the model had many of the scheduling classes in different orders, making side-by-side comparisons very difficult.

The InstRW overrides are still quite different, but at least that part of the side-by-side diff is now in the same position.

This was noticed while I was trying to investigate diffs between llvm-mca and other perf analyzers in https://uica.uops.info/ - we used to be able to do diffs between most of the models very easily, but we seem to have lost that simplicity as classes have been altered, models have been refined and other models have rotted.

2 years ago[InstCombine] generalize subtract with 'not' operands
Sanjay Patel [Sat, 21 Aug 2021 17:05:35 +0000 (13:05 -0400)]
[InstCombine] generalize subtract with 'not' operands

The motivation was to get min/max intrinsics to parity
with cmp+select idioms, but this unlocks a few more
folds because isFreeToInvert recognizes add/sub with
constants too.

In the min/max example, we have too many extra uses
for smaller folds to improve things, but this fold
is able to eliminate uses even though we can't reduce
the number of instructions.

2 years agoCGBuiltin.cpp - pass SVETypeFlags by const reference. NFC.
Simon Pilgrim [Sat, 21 Aug 2021 19:46:12 +0000 (20:46 +0100)]
CGBuiltin.cpp - pass SVETypeFlags by const reference. NFC.

Don't pass the struct by value.

2 years ago[LV] Adjust reduction recipes before recurrence handling.
Florian Hahn [Sun, 22 Aug 2021 09:45:20 +0000 (10:45 +0100)]
[LV] Adjust reduction recipes before recurrence handling.

Adjusting the reduction recipes still relies on references to the
original IR, which can become outdated by the first-order recurrence
handling. Until reduction recipe construction does not require IR
references, move it before first-order recurrence handling, to prevent a
crash as exposed by D106653.

2 years ago[DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)
Ben Shi [Thu, 19 Aug 2021 13:51:09 +0000 (21:51 +0800)]
[DAGCombiner] Add target hook function to decide folding (mul (add x, c1), c2)

Reviewed by: lebedev.ri, spatel, craig.topper, luismarques, jrtc27

Differential Revision: https://reviews.llvm.org/D107711

2 years ago[JITLink] Add support of R_X86_64_32S relocation
luxufan [Sun, 22 Aug 2021 08:43:02 +0000 (16:43 +0800)]
[JITLink] Add support of R_X86_64_32S relocation

This patch supported the R_X86_64_32S relocation and add the Pointer32Signed generic edge kind.

Reviewed By: lhames

Differential Revision: https://reviews.llvm.org/D108446

2 years ago[ORC] Add std::tuple support to SimplePackedSerialization.
Lang Hames [Sun, 22 Aug 2021 00:43:06 +0000 (10:43 +1000)]
[ORC] Add std::tuple support to SimplePackedSerialization.

2 years ago[ORC] Rename blobSerializationRoundTrip, drop explicit arg types on calls.
Lang Hames [Sun, 22 Aug 2021 00:58:58 +0000 (10:58 +1000)]
[ORC] Rename blobSerializationRoundTrip, drop explicit arg types on calls.

Renames the blobSerializationRoundTrip test helper function to
spsSerializationRoundTrip ('blob' was the placeholder name for the serialization
scheme during prototyping, this function was missed when renaming everything
for the mainline). Also drops explicit template arguments at call sites where
they can be inferred (and are obvious) from the call argument type.

2 years ago[X86] AVX512FP16 instructions enabling 4/6
Wang, Pengfei [Sun, 22 Aug 2021 00:24:20 +0000 (08:24 +0800)]
[X86] AVX512FP16 instructions enabling 4/6

Enable FP16 unary operator instructions.

Ref.: https://software.intel.com/content/www/us/en/develop/download/intel-avx512-fp16-architecture-specification.html

Reviewed By: LuoYuanke

Differential Revision: https://reviews.llvm.org/D105267

2 years ago[ORC] Add missing header.
Lang Hames [Sun, 22 Aug 2021 00:34:38 +0000 (10:34 +1000)]
[ORC] Add missing header.

Should fix bot failure at
https://green.lab.llvm.org/green/job/clang-stage2-Rthinlto/4367

2 years ago[TargetCallingConv] Change OutputArg ctor to match its members
Fangrui Song [Sat, 21 Aug 2021 23:41:48 +0000 (16:41 -0700)]
[TargetCallingConv] Change OutputArg ctor to match its members

This avoids unneeded MVT->EVT conversion.

2 years ago[AArch64] Replace unneeded CCAssignToRegWithShadow with CCAssignToReg
Fangrui Song [Sat, 21 Aug 2021 23:33:29 +0000 (16:33 -0700)]
[AArch64] Replace unneeded CCAssignToRegWithShadow with CCAssignToReg

CCState::AllocateReg handles aliased registers.

2 years ago[TargetMachine] Drop special case for *-win32-macho
Fangrui Song [Sat, 21 Aug 2021 20:59:17 +0000 (13:59 -0700)]
[TargetMachine] Drop special case for *-win32-macho

clang CodeGenModule shouldAssumeDSOLocal has set dso_local.

2 years ago[TargetMachine] Simplify shouldAssumeDSOLocal. NFC
Fangrui Song [Sat, 21 Aug 2021 19:37:29 +0000 (12:37 -0700)]
[TargetMachine] Simplify shouldAssumeDSOLocal. NFC

2 years ago[clang] Fix typos in documentation (NFC)
Kazu Hirata [Sat, 21 Aug 2021 19:17:58 +0000 (12:17 -0700)]
[clang] Fix typos in documentation (NFC)

2 years ago[InstCombine] combine constants by reassociating add/sub/add
Sanjay Patel [Fri, 20 Aug 2021 22:34:09 +0000 (18:34 -0400)]
[InstCombine] combine constants by reassociating add/sub/add

This may overlap partially with the reassociate pass,
but it seems simple enough that we should try it here
in InstCombine to enable other folds.

This shows up as an opportunity and potential regression
if we improve a subtract fold with 'not' ops to be more
general.

2 years ago[InstCombine] add tests for add/sub/add combines; NFC
Sanjay Patel [Fri, 20 Aug 2021 21:50:18 +0000 (17:50 -0400)]
[InstCombine] add tests for add/sub/add combines; NFC

2 years ago[InstCombine] add tests for min/max with nots and sub; NFC
Sanjay Patel [Fri, 20 Aug 2021 21:33:19 +0000 (17:33 -0400)]
[InstCombine] add tests for min/max with nots and sub; NFC

2 years ago[ARM] Fix VQDMULH fold for scalar smin
David Green [Sat, 21 Aug 2021 15:33:18 +0000 (16:33 +0100)]
[ARM] Fix VQDMULH fold for scalar smin

Add a variant of mve-vqdmulh tests that uses min/max intrinsics
directly, including a scalar test that shows it misbehaving for min
intrinsics and a fix for the combine to prevent it from misbehaving.

2 years ago[flang] Refine output file generation
Andrzej Warzynski [Fri, 20 Aug 2021 10:25:11 +0000 (10:25 +0000)]
[flang] Refine output file generation

This patch cleans-up the file generation code in Flang's frontend
driver. It improves the layering between
`CompilerInstance::CreateDefaultOutputFile`,
`CompilerInstance::CreateOutputFile` and their various clients.

* Rename `CreateOutputFile` as `CreateOutputFileImpl` and make it
  private. This method is an implementation detail.
* Instead of passing an `std::error_code` out parameter into
  `CreateOutputFileImpl`, have it return Expected<>. This is a bit shorter
  and idiomatic LLVM.
* Make `CreateDefaultOutputFile` (which calls `CreateOutputFileImpl`)
  issue an error when file creation fails. The error code from
  `CreateOutputFileImpl` is used to generate a meaningful diagnostic
  message.
* Remove error reporting from `PrintPreprocessedAction::ExecuteAction`.
  This is only for cases when output file generation fails. This is
  handled in `CreateDefaultOutputFile` instead (see the previous point).
* Inline `AddOutputFile` into its only caller,
  `CreateDefaultOutputFile`.
* Switch from `lvm::buffer_ostream` to `llvm::buffer_unique_ostream>`
  for non-seekable output streams. This simplifies the logic in the driver
  and was introduced for this very reason in [1]
* Moke sure that the diagnostics from the prescanner when running `-E`
  (`PrintPreprocessedAction::ExecuteAction`) are printed before the actual
  output is generated.
* Update comments, add test.

NOTE: This patch relands [2]. As suggested by Michael Kruse in the
post-commit/post-revert review, I've added the following:
```
config.errc_messages = "@LLVM_LIT_ERRC_MESSAGES@"
```
in Flang's `lit.site.cfg.py.in`. This way, `%errc_ENOENT` in
output-paths.f90 gets the correct value on Windows as well as on Linux.

[1] https://reviews.llvm.org/D93260
[2] fd21d1e198e381a2b9e7af1701044462b2d386cd

Reviewed By: ashermancinelli

Differential Revision: https://reviews.llvm.org/D108390

2 years ago[lldb] Fix typo in the description of breakpoint options
Kirill Shmakov [Fri, 20 Aug 2021 12:27:37 +0000 (15:27 +0300)]
[lldb] Fix typo in the description of breakpoint options

2 years ago[gn build] Port 7f99337f9bcf
LLVM GN Syncbot [Sat, 21 Aug 2021 09:44:22 +0000 (09:44 +0000)]
[gn build] Port 7f99337f9bcf

2 years ago[ORC] Add EPCGenericMemoryAccess: generic executor memory access via EPC calls.
Lang Hames [Fri, 20 Aug 2021 05:52:42 +0000 (15:52 +1000)]
[ORC] Add EPCGenericMemoryAccess: generic executor memory access via EPC calls.

All ExecutorProcessControl subclasses must provide an
ExecutorProcessControl::MemoryAccess object that can be used to access executor
memory from the JIT process. The EPCGenericMemoryAccess class provides an
off-the-shelf MemoryAccess implementation for JITs that do not need (or cannot
provide) a specialized MemoryAccess implementation. This simplifies the process
of creating new ExecutorProcessControl implementations.

2 years ago[NFC][LoopIdiom] Let processLoopStoreOfLoopLoad take StoreSize as SCEV instead of...
eopXD [Thu, 19 Aug 2021 06:15:38 +0000 (23:15 -0700)]
[NFC][LoopIdiom] Let processLoopStoreOfLoopLoad take StoreSize as SCEV instead of unsigned

Letting it take SCEV allows further modification on the function to optimize
if the StoreSize / Stride is runtime determined.

The plan is to let memcpy / memmove deal with runtime-determined sizes, just
like what D107353 did to memset.

Reviewed By: bmahjour

Differential Revision: https://reviews.llvm.org/D108289

2 years ago[libc] Add a new suite called "libc-long-running-tests".
Siva Chandra Reddy [Mon, 21 Jun 2021 06:05:29 +0000 (06:05 +0000)]
[libc] Add a new suite called "libc-long-running-tests".

This suite is helpful is adding long running tests which take a long
time to finish that they can be run on the public builders. They
will probably be run on special builders in future.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D104816

2 years ago[CodeGen] Remove unused declaration setLiveInsUsed (NFC)
Kazu Hirata [Sat, 21 Aug 2021 02:19:54 +0000 (19:19 -0700)]
[CodeGen] Remove unused declaration setLiveInsUsed (NFC)

The corresponding definition was removed on Jan 20, 2017 in commit
710a4c1f3ddba3aa9313c72c43f9619afbc3e259.

2 years ago[OpenMP] Correctly add member expressions to OpenMP info
Joseph Huber [Fri, 20 Aug 2021 20:43:31 +0000 (16:43 -0400)]
[OpenMP] Correctly add member expressions to OpenMP info

Mapping expressions that have `this` as their base expression aren't
considered a valid base variable and the rest of the runtime expects
this. However, if we have an expression with no value declaration we can
try to extract it manually to provide more helpful debuggin information.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108483

2 years ago[AArch64][GlobalISel] Add legalizer support for the @llvm.get.dynamic.area.offset...
Amara Emerson [Sat, 21 Aug 2021 00:04:36 +0000 (17:04 -0700)]
[AArch64][GlobalISel] Add legalizer support for the @llvm.get.dynamic.area.offset intrinsic.

This is just 0 on AArch64.

2 years ago[Bazel] Fix version defines
Geoffrey Martin-Noble [Fri, 20 Aug 2021 23:53:46 +0000 (16:53 -0700)]
[Bazel] Fix version defines

Some of these were the wrong version and some of them were the wrong
format. Did some hunting around to figure out what exactly they're
supposed to be. Since basically everything is derived from the LLVM
version we should probably make this a bit less hardcoded, but just
fixing the values for now.

Sources:
https://github.com/llvm/llvm-project/blob/b686fc7a1bea/clang/include/clang/Basic/Version.inc.in
https://github.com/llvm/llvm-project/blob/b686fc7a1bea/clang/CMakeLists.txt#L353-L363
https://github.com/llvm/llvm-project/blob/b686fc7a1bea/llvm/CMakeLists.txt#L13-L29
https://github.com/llvm/llvm-project/blob/b686fc7a1bea/lld/CMakeLists.txt#L131-L138

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108500

2 years ago[Driver] Remove discouraged -gcc-toolchain
Fangrui Song [Fri, 20 Aug 2021 23:36:42 +0000 (16:36 -0700)]
[Driver] Remove discouraged -gcc-toolchain

Space separated driver options are uncommon but Clang traditionally
did not do a good job. --gcc-toolchain= is the preferred form.

This discourage form appears to be rare, so we can just drop it.

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D108494

2 years ago[AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores.
Amara Emerson [Fri, 20 Aug 2021 23:23:23 +0000 (16:23 -0700)]
[AArch64][GlobalISel] Don't contract cross-bank copies into truncating stores.

Truncating stores with GPR bank sources shouldn't be mutated into using FPR bank
sources, since those aren't supported.

Ideally this should be a selection failure in the tablegen patterns, but for now
avoid generating them.

2 years ago[Bazel] Reduce quote escaping
Geoffrey Martin-Noble [Fri, 20 Aug 2021 22:58:49 +0000 (15:58 -0700)]
[Bazel] Reduce quote escaping

There's a lot of unnecessary backslashes here that we can avoid to
reduce confusion.

Reviewed By: MaskRay

Differential Revision: https://reviews.llvm.org/D108495

2 years ago[MLIR][OMP] Ensure nested scf.parallel execute all iterations
William S. Moses [Thu, 19 Aug 2021 23:45:58 +0000 (19:45 -0400)]
[MLIR][OMP] Ensure nested scf.parallel execute all iterations

Presently, the lowering of nested scf.parallel loops to OpenMP creates one omp.parallel region, with two (nested) OpenMP worksharing loops on the inside. When lowered to LLVM and executed, this results in incorrect results. The reason for this is as follows:

An OpenMP parallel region results in the code being run with whatever number of threads available to OpenMP. Within a parallel region a worksharing loop divides up the total number of requested iterations by the available number of threads, and distributes accordingly. For a single ws loop in a parallel region, this works as intended.

Now consider nested ws loops as follows:

omp.parallel {
   A: omp.ws %i = 0...10 {
      B: omp.ws %j = 0...10 {
          code(%i, %j)
      }
   }
}

Suppose we ran this on two threads. The first workshare loop would decide to execute iterations 0, 1, 2, 3, 4 on thread 0, and iterations 5, 6, 7, 8, 9 on thread 1. The second workshare loop would decide the same for its iteration. This means thread 0 would execute i \in [0, 5) and j \in [0, 5). Thread 1 would execute i \in [5, 10) and j \in [5, 10). This means that iterations i in [5, 10), j in [0, 5) and i in [0, 5), j in [5, 10) never get executed, which is clearly wrong.

This permits two options for a remedy:
1) Change the semantics of the omp.wsloop to be distinct from that of the OpenMP runtime call or equivalently #pragma omp for. This could then allow some lowering transformation to remedy the aforementioned issue. I don't think this is desirable for an abstraction standpoint.
2) When lowering an scf.parallel always surround the wsloop with a new parallel region (thereby causing the innermost wsloop to use the number of threads available only to it).

This PR implements the latter change.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D108426

2 years ago[test] Migrate -gcc-toolchain with space separator to --gcc-toolchain=
Fangrui Song [Fri, 20 Aug 2021 22:24:58 +0000 (15:24 -0700)]
[test] Migrate -gcc-toolchain with space separator to --gcc-toolchain=

Space separated driver options are uncommon but Clang traditionally
did not do a good job. --gcc-toolchain= is the preferred form.

2 years ago[AArch64][GlobalISel] Legalize non-register-sized scalar G_BITREVERSE
Jessica Paquette [Wed, 18 Aug 2021 23:48:04 +0000 (16:48 -0700)]
[AArch64][GlobalISel] Legalize non-register-sized scalar G_BITREVERSE

Clamp types to [s32, s64] and make them a power of 2.

This matches SDAG's behaviour.

https://godbolt.org/z/vTeGqf4vT

Differential Revision: https://reviews.llvm.org/D108344

2 years ago[AArch64][GlobalISel] Legalize 32-bit + narrow G_SMULO + G_UMULO
Jessica Paquette [Tue, 17 Aug 2021 20:58:58 +0000 (13:58 -0700)]
[AArch64][GlobalISel] Legalize 32-bit + narrow G_SMULO + G_UMULO

SDAG lowers 32-bit and 64-bit G_SMULO + G_UMULO. We were missing the 32-bit
case.

For other sizes, make the 0th type a power of 2 and clamp it to either 32 bits
or 64 bits.

Right now, this will allow us to handle narrow types (e.g. s4, s24, etc.). The
LegalizerHelper doesn't support narrowing G_SMULO or G_UMULO right now. I think
we want clamping behaviour either way, so we might as well include it now to
be explicit.

Differential Revision: https://reviews.llvm.org/D108240

2 years ago[AArch64][GlobalISel] Clamp vectors of p0 when legalizing G_LOAD/G_STORE
Jessica Paquette [Fri, 20 Aug 2021 21:00:17 +0000 (14:00 -0700)]
[AArch64][GlobalISel] Clamp vectors of p0 when legalizing G_LOAD/G_STORE

We had a rule for <n x s64> but not one for <n x p0>. As a result, we'd fall
back on like <5 x p0> or whatever.

Differential Revision: https://reviews.llvm.org/D108484

2 years ago[AArch64][GlobalISel] Add regbankselect support for G_LROUND
Jessica Paquette [Thu, 19 Aug 2021 22:53:39 +0000 (15:53 -0700)]
[AArch64][GlobalISel] Add regbankselect support for G_LROUND

Destination is always a GPR, since the result is always an integer.

Source is always a FPR, since the source is always floating point.

Differential Revision: https://reviews.llvm.org/D108419

2 years ago[libunwind] Add UNW_AARCH64_* beside UNW_ARM64_*
Fangrui Song [Fri, 20 Aug 2021 21:26:27 +0000 (14:26 -0700)]
[libunwind] Add UNW_AARCH64_* beside UNW_ARM64_*

The original libunwind project defines UNW_AARCH64_* instead of UNW_ARM64_*.
Rename the enum members to match. This allows some applications with simple
`unw_init_local` usage to migrate to llvm-project libunwind.

Note: the canonical names of `UNW_ARM_D{0..31}` are now `UNW_AARCH64_V{0..31}`,
to match the original libunwind.

UNW_ARM64_* are kept for now for compatibility. Some may be unneeded and can be
cleaned up in the future.

Reviewed By: #libunwind, compnerd

Differential Revision: https://reviews.llvm.org/D107996

2 years ago[AArch64][GlobalISel] Mark G_LROUND as legal for s64 dst + s32/s64 src.
Jessica Paquette [Thu, 19 Aug 2021 23:06:15 +0000 (16:06 -0700)]
[AArch64][GlobalISel] Mark G_LROUND as legal for s64 dst + s32/s64 src.

Matches SDAG's behaviour for these types.

Differential Revision: https://reviews.llvm.org/D108420

2 years ago[NFC] addAttribute(FunctionIndex) => addFnAttribute()
Arthur Eubanks [Fri, 20 Aug 2021 21:18:30 +0000 (14:18 -0700)]
[NFC] addAttribute(FunctionIndex) => addFnAttribute()

2 years ago[GlobalISel] Add G_LLROUND
Jessica Paquette [Fri, 20 Aug 2021 00:39:30 +0000 (17:39 -0700)]
[GlobalISel] Add G_LLROUND

Basically the same as G_LROUND. Handles the llvm.llround family of intrinsics.

Also add a helper function to the MachineVerifier for checking if all of the
(virtual register) operands of an instruction are scalars. Seems like a useful
thing to have.

Differential Revision: https://reviews.llvm.org/D108429

2 years ago[LoopPassManager] Assert that MemorySSA is preserved if used
Nikita Popov [Thu, 19 Aug 2021 18:56:09 +0000 (20:56 +0200)]
[LoopPassManager] Assert that MemorySSA is preserved if used

Currently it's possible to silently use a loop pass that does not
preserve MemorySSA in a loop-mssa pass manager, as we don't
statically know which loop passes preserve MemorySSA (as was the
case with the legacy pass manager).

However, we can at least add a check after the fact that if
MemorySSA is used, then it should also have been preserved.
Hopefully this will reduce confusion as seen in
https://bugs.llvm.org/show_bug.cgi?id=51020.

Differential Revision: https://reviews.llvm.org/D108399

2 years ago[NFC][MLGO] Use std::move when moving protobufs
Mircea Trofin [Fri, 20 Aug 2021 19:50:20 +0000 (12:50 -0700)]
[NFC][MLGO] Use std::move when moving protobufs

Because of an odd linking problem, we need to temporarily support
building with TF C API 1.15 + tensorflow 2.50 pip package in
'development' mode scenarios. Protobuf Message 'Swap' is partially
implemented in the header (2.50) and relies on a symbol not found in TF
C API 1.15. std::move avoids that, at no semantic cost.

2 years agoRevert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"
Florian Hahn [Fri, 20 Aug 2021 20:22:59 +0000 (21:22 +0100)]
Revert "[LoopVectorize][AArch64] Enable ordered reductions by default for AArch64"

This reverts commit f4122398e7c195147cde120d070f9b72905d7c91 to
investigate a crash exposed by it.

The patch breaks building the code below with `clang -O2 --target=aarch64-linux`

     int a;
     double b, c;
     void d() {
       for (; a; a++) {
         b += c;
         c = a;
       }
     }

2 years ago[DebugInfo] convert btf_tag attrs to DI annotations for record fields
Yonghong Song [Fri, 20 Aug 2021 19:52:51 +0000 (12:52 -0700)]
[DebugInfo] convert btf_tag attrs to DI annotations for record fields

Generate btf_tag annotations for record fields. The annotations
are represented as an DINodeArray in DebugInfo.

Differential Revision: https://reviews.llvm.org/D106616

2 years ago[mlir][linalg] Finish refactor of TC ops to YAML
Rob Suderman [Thu, 12 Aug 2021 23:20:56 +0000 (16:20 -0700)]
[mlir][linalg] Finish refactor of TC ops to YAML

Multiple operations were still defined as TC ops that had equivalent versions
as YAML operations. Reducing to a single compilation path guarantees that
frontends can lower to their equivalent operations without missing the
optimized fastpath.

Some operations are maintained purely for testing purposes (mainly conv{1,2,3}D
as they are included as sole tests in the vectorizaiton transforms.

Differential Revision: https://reviews.llvm.org/D108169

2 years agoFix SEH table addresses for Windows
Daniel Paoliello [Fri, 20 Aug 2021 18:38:50 +0000 (21:38 +0300)]
Fix SEH table addresses for Windows

Issue Details:
The addresses for SEH tables for Windows are incorrect as 1 was unconditionally being added to all addresses. +1 is required for the SEH end address (as it is exclusive), but the SEH start addresses is inclusive and so should be used as-is.

In the IP2State tables, the addresses are +1 for AMD64 to handle the return address for a call being after the actual call instruction but are as-is for ARM and ARM64 as the `StateFromIp` function in the VC runtime automatically takes this into account and adjusts the address that is it looking up.

Fix Details:
* Split the `getLabel` function into two: `getLabel` (used for the SEH start address and ARM+ARM64 IP2State addresses) and `getLabelPlusOne` (for the SEH end address, and AMD64 IP2State addresses).

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D107784

2 years ago[TypePromotion] Remove unused IRBuilder object. NFC
Craig Topper [Fri, 20 Aug 2021 19:08:42 +0000 (12:08 -0700)]
[TypePromotion] Remove unused IRBuilder object. NFC

2 years ago[DebugInfo] generate btf_tag annotations for DIDerived types
Yonghong Song [Mon, 19 Jul 2021 07:12:15 +0000 (00:12 -0700)]
[DebugInfo] generate btf_tag annotations for DIDerived types

Generate btf_tag annotations for DIDrived types. More specifically,
clang frontend generates the btf_tag annotations for record
fields. The annotations are represented as an DINodeArray
in DebugInfo. The following example illustrate how
annotations are encoded in IR:
      distinct !DIDerivedType(tag: DW_TAG_member, ..., annotations: !10)
      !10 = !{!11, !12}
      !11 = !{!"btf_tag", !"a"}
      !12 = !{!"btf_tag", !"b"}

Differential Revision: https://reviews.llvm.org/D106616

2 years ago[libc++] Remove test-suite annotations for unsupported Clang versions
Louis Dionne [Fri, 20 Aug 2021 15:42:38 +0000 (11:42 -0400)]
[libc++] Remove test-suite annotations for unsupported Clang versions

Differential Revision: https://reviews.llvm.org/D108471

2 years ago[libc++] Include <__iterator/distance.h> instead of <iterator> in a few algorithm...
Joe Loser [Fri, 20 Aug 2021 19:02:03 +0000 (15:02 -0400)]
[libc++] Include <__iterator/distance.h> instead of <iterator> in a few algorithm headers

A few headers in algorithm include `<iterator>` when
`<__iterator/distance.h>` would suffice. Change them
to just include `<__iterator.distance.h>`.

Differential Revision: https://reviews.llvm.org/D108393

2 years ago[Coverage][llvm-cov] Correctly export branch coverage in LCOV format
Christian Fetzer [Fri, 20 Aug 2021 18:24:44 +0000 (13:24 -0500)]
[Coverage][llvm-cov] Correctly export branch coverage in LCOV format

Commit 9f2967bcfe2f7d1fc02281f0098306c90c2c10a5 introduced support for
branch coverage including export to the LCOV format.

This commit corrects the LCOV field name for branches from BFH to BRH.
The mistake seems to have slipped in as typo because the correct field
name BRH is used in the comment section at the beginning of the file.

Differential Revision: https://reviews.llvm.org/D108358