Commits · 5ac69674bf4fbe4adaca4170a2ad60c8a32613ed · educg-net-26154-2315672 / llvm_project-2529

27 Feb, 2023 40 commits

[SPIR-V] Support TargetExtType for SPIR-V builtin types · 5ac69674

This patch adds support for TargetExtType/target(...) representing
SPIR-V builtin types. After D135202, target(...) is the preferred way
for representing SPIR-V builtin types in LLVM IR and the only working
in the opaque pointer mode.

In order to maintain compatibility with LLVM IR generated by older
versions of Clang and LLVM/SPIR-V Translator, pointers-to-opaque-structs
denoting SPIR-V/OpenCL builtin types will be translated to equivalent
SPIR-V target extension types. This translation is only available in the
typed pointer mode (-opaque-pointers=0).

The relevant LIT tests with SPIR-V builtins were converted to use the
new target(...) notation.

Differential Revision: https://reviews.llvm.org/D144494

5ac69674

[SLP] Fixes crash in BoUpSLP::isGatherShuffledEntry() · a700fb3d
Vasileios Porpodas authored 2 years ago
```
Crash caused by: 708eb1b9

Differential Revision: https://reviews.llvm.org/D144895
```
a700fb3d

[AArch64] Avoid using intermediate integer registers for copying between... · 72105d10

Nilanjana Basu authored 2 years ago

[AArch64] Avoid using intermediate integer registers for copying between source and destination floating point registers

In post-isel code, there are cases where there were redundant copies from a source FPR to an intermediate GPR in order to copy to a destination FPR. In this patch, we identify these patterns in post-isel peephole optimization and replace them with a direct FPR-to-FPR copy.
One example for this will be the insertion of the scalar result of 'uaddlv' neon intrinsic function into a destination vector. During instruction selection phase, 'uaddlv' result is copied to a GPR, & a vector insert instruction is matched separately to copy the previous result to a destination SIMD&FP register.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D142594

72105d10

[Clang] [AVR] Fix USHRT_MAX for 16-bit int. · 0fecac18

Daniel Thornburgh authored 3 years ago

For AVR, the definition of USHRT_MAX overflows.

Reviewed By: aaron.ballman, #clang-language-wg

Differential Revision: https://reviews.llvm.org/D144218

0fecac18

[clang-format-diff] Correctly parse start-of-file diffs · 50563944

Tamir Duberstein authored 2 years ago

Handle the case where the diff is a pure removal of lines. Before this
change start_line would end up as 0 which is rejected by clang-format.

Submitting on behalf of @tamird.

Differential Revision: https://reviews.llvm.org/D144291

50563944

[Pass][CHR] Move ControlHeightReduction to module optimization pipeline · 66673166

Rong Xu authored 2 years ago

This is a modified version of commit b3744233 by
Arthur (https://reviews.llvm.org/D143424).

Here we invoke to the pass independent of PGOOPT. We now check if the
profile is available through the program summary. This ensures CHR is
called in distributed ThinLTO BE compilation (where PGOOPT might not
be created).

Differential Revision: https://reviews.llvm.org/D144769

66673166

[SCEV] Hoist common cleanup code to function. (NFC) · 2f3c748c

Florian Hahn authored 2 years ago

This allows for easier updating of common code in follow-on patches.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D144847

2f3c748c

[AArch64][GlobalISel] Reorder stack up-adjustment and register copies · 31d6a572

Amara Emerson authored 2 years ago

This change reorders the stack up-adjustment and return value copying phases of
machine-ir generation on Aarch64. Doing so prevents a bug observed for fastcc
calls with >8 arguments, where the up-adjustment required from making that call
is placed in the wrong place relative to spill and reloading code.

See: https://github.com/llvm/llvm-project/issues/60972 for full issue
reproduction and context.

Patch contributed by Bruce Collie

Differential Revision: https://reviews.llvm.org/D144791

31d6a572

[AArch64] Don't remove free sext_inreg(vector_extract(x)) if it leads to multiple extracts · 06daa515

David Green authored 2 years ago

If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG
will try to remove the sext_inreg, using vector_extract(x) directly. This can
lead to multiple uses of both sext_inreg(vector_extract(x)) and
vector_extract(x), leading to the generation of both umov and smov extracts.
This adds a target hook to prevent that under AArch64 where the sext_inreg can
be considered free if there are multiple uses of the sext and no uses of the
vector_extract. This helps fix a small regression from D144550.

Differential Revision: https://reviews.llvm.org/D144850

06daa515

[MLIR] Add primitive builders for scf.if · e7b52c46
Frederik Gossen authored 2 years ago
```
Differential Revision: https://reviews.llvm.org/D144886
```
e7b52c46

[scudo] Only prepare PageMap entry for partial region · 0a0b6fa4

Chia-hung Duan authored 2 years ago

This reduces the size of PageMap and we are more likely to use the
static local buffer. Note that now this is only supported for single
region case, i.e. on SizeClassAllocator64. For SizeClassAllocator32,
it needs a different way to save the PageMap.

Differential Revision: https://reviews.llvm.org/D142659

0a0b6fa4

[libc++][NFC] Format __split_buffer and move constructors that are marked... · 2aeda9aa

Nikolas Klauser authored 2 years ago

[libc++][NFC] Format __split_buffer and move constructors that are marked inline into the class body

Reviewed By: ldionne, #libc

Spies: libcxx-commits

Differential Revision: https://reviews.llvm.org/D142433

2aeda9aa

[libc++] Simplify the modules_include.sh.cpp script a bit · 411c799a

Nikolas Klauser authored 2 years ago

Reviewed By: #libc, ldionne

Spies: vvereschaka, libcxx-commits

Differential Revision: https://reviews.llvm.org/D144825

411c799a

[libc++] Improves clang-format settings. · de6827b5

Mark de Wever authored 2 years ago

Add a new test based .clang-format file which inherits from the generic
one. This moves some test specific formatting rules to the test
directory.

The main benefit is that headers are sorted, which makes it more likely
to catch these errors before creating a review instead of spotting the
error in the CI clang-tidy step.

Reviewed By: ldionne, philnik, #libc

Differential Revision: https://reviews.llvm.org/D144755

de6827b5

[libc++] Fixes operator& hijacking atomic types. · f41f3925

Mark de Wever authored 2 years ago

This uses std::addressof everywherein atomic. This is not strictly
needed for the integral and floating point specializations. They should
not be used by user defined types. But it's easier to fix everything.

Note these changes are made using a WIP clang-tidy plugin.

Reviewed By: #libc, ldionne

Differential Revision: https://reviews.llvm.org/D144786

f41f3925

[LLVMContextImpl] Separate out integer constant ones · 86bdcdf0

Arthur Eubanks authored 2 years ago

Very small compile time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=6a7a8907e8334eaf551742148079c628f78e6ed7&to=454d1181fbdb9121f0c7a3ecf526520db32ab420&stat=instructions:u

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D144746

86bdcdf0

[LLVMContextImpl] Separate out integer constant zeroes · c3166753

Arthur Eubanks authored 2 years ago

Very small compile time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=a628ca4925f7249b4fbd3e932c9627b12e2770dd&to=6a7a8907e8334eaf551742148079c628f78e6ed7&stat=instructions:u

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D144745

c3166753

[SLP]Fix PR61018: Assertion `Mask[I] == UndefMaskElem && "Multiple uses · 007177bd

Alexey Bataev authored 2 years ago

of scalars."' failed.

Need to check for the reused indices when checking if 2 insertelement
instruction are from the same buildvector. If the inidices are reused,
better not to match buildvectors and consider them as differenet,
otherwise need to track the order of insertelement operations.

007177bd

[AMDGPU] Update the CHECK autogenerated as it's expired · d514726d
zhongyunde authored 2 years ago
```
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D144771
```
d514726d

[Sema] Use isSVESizelessBuiltinType instead of isSizelessBuiltinType to prevent crashing on RISC-V. · 2e731117

Craig Topper authored 2 years ago

These 2 spots are protecting calls to SVE specific functions. If RISC-V
sizeless types end up in there we trigger assertions.

Use the more specific isSVESizelessBuiltinType() to avoid letting
RISC-V vectors through.

Reviewed By: asb, c-rhodes

Differential Revision: https://reviews.llvm.org/D144772

2e731117

[Flang][OpenMP][OpenACC] Error for loop with no control · 7d7633bd

Kiran Chandramohan authored 2 years ago

Issue error if a DO construct associated with a loop does not have
loop control. Currently, it is issued only for the loop immediately
following the loop construct. This patch extends it to cases like
collapse where there is more than one loop associated. It also fixes
a crash since the existing code always expects loop control.

This is covered in OpenMP 4.5 standard, Section 2.7.1.
"The do-loop cannot be a DO WHILE or a DO loop without loop control."

OpenACC 3.3 covers this indirectly in Section 2.9.1.
The trip count for all loops associated with the collapse clause must
be computable and invariant in all the loops".

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D144290

7d7633bd

[OpenMP] Ignore implicit casts on assertion for `use_device_ptr` · 853d4059

Joseph Huber authored 2 years ago

There was an assertion triggering when invoking a captured member whose
initializer was in a blase class. This patch fixes it by allowing the
assertion on implicit casts to the base class rather than only the base
class itself.

Fixes https://github.com/llvm/llvm-project/issues/61027

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D144873

853d4059

[Flang][OpenMP] NFC: Change a few message/comments to fit 80chars · 54acf9a3

Kiran Chandramohan authored 2 years ago

Changes are all in the OpenMP semantic checks file.

Reviewed By: SBallantyne

Differential Revision: https://reviews.llvm.org/D144874

54acf9a3

[mlir][Linalg] Reimplement hoisting on tensors as a subset-based transformation · 4521b113

Nicolas Vasilache authored 2 years ago

This revision significantly rewrites hoisting on tensors.
Previously, `vector.transfer_read/write` and `tensor.extract/insert_slice` would
be clumped together when looking for candidate pairs.
This would significantly increase the complexity of the logic and would not apply
independently to `tensor.extract/insert_slice`.

The new implementation decouples the cases and starts to cast the problem
as a generic matching subset extract/insert, which will be future proof when
other such operation pairs are introduced.

Lastly, the implementation makes the distinction clear between `vector.transfer_read/write` for
which we allow bypasses of the disjoint subsets from `tensor.extract/insert_slice` for which we
do not yet allow it.

This can be extended in the future and unified once we have subset disjunction implemented more generally.

The algorithm can be rewritten to be less of a fixed point with interspersed canonicalizations.
As a consequence, the test explicitly adds a canonicalization to clean up the IR and verify we end up in the same state.

That extra canonicalization exhibited that one of the uses in one of the tests was dead, so we fix the appropriate test.

Differential Revision: https://reviews.llvm.org/D144656

4521b113

[mlir] Fix a -Wunused-variable warning, NFC · 779d54fd
Haojian Wu authored 2 years ago

779d54fd

[ConstExpr] Avoid creation of select constant expressions · 5d6dfba1

Nikita Popov authored 2 years ago

These expressions will now only be created if explicitly requested
in IR/bitcode (and by LowerTypeTests, which has a tricky to remove
use).

This is in preparation for removing these expressions entirely,
but also fixes #60983 in the meantime.

5d6dfba1

[MLIR] Add pass to deduplicate functions · b12bcf3f

Frederik Gossen authored 2 years ago

Deduplicate functions that are equivalent in all aspects but their symbol name.
The pass chooses one representative per equivalence class, erases the remainder, and updates function calls accordingly.

Differential Revision: https://reviews.llvm.org/D144738

b12bcf3f

[mlir] Port bazel for 115711c1 · 8877d8f5
Haojian Wu authored 2 years ago

8877d8f5
[MLIR] Expose region equivalence check through OperationEquivalence · 31fc47e3
Frederik Gossen authored 2 years ago
```
Differential Revision: https://reviews.llvm.org/D144735
```
31fc47e3

[InlineCost] Avoid ConstantExpr::getSelect() · 86d1ed9e

Nikita Popov authored 2 years ago

Instead use ConstantFoldSelectInstruction(), which will return
nullptr if it cannot be folded and a constant expression would
be produced instead.

In preparation for removing select constant expressions.

86d1ed9e

[mlir][sparse] Add checking parent op of SortOp · 9a29d875

Kohei Yamaguchi authored 2 years ago

Fix crash with segmentation fault caused by setting a parent operator
that is not func::FuncOp with sparse_tensor SortOp.

fixes https://github.com/llvm/llvm-project/issues/59988

Reviewed By: aartbik, wrengr

Differential Revision: https://reviews.llvm.org/D143874

9a29d875

[mlir][NFC] Cleanup Passes documentation · b46e78c7

Kohei Yamaguchi authored 2 years ago

- Fix a place of NVGPU dialect's pass
- Move a summary of `-finalize-memref-to-llvm` into description
- Fix broken links
- Replace back-quote dialect headers with single-quote headers for
  improved readability.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D142868

b46e78c7

[mlir][LinAlg][Transform][GPU] Add GPU memory hierarchy to the transform.promote op · 115711c1

Amir Mohammad Tavakkoli authored 2 years ago

In this patch we are adding the support of copying a a `memref.subview` to the shared or private memory in GPU. The global to shared memory copy is adopted from codes implemented in IREE (https://github.com/iree-org/iree), but the private memory copy part has not been implemented in IREE. This patch enables transferring a subview from `global->shared`, `global->private`, and `shared->private`.

Our final aim is to provide a copy layout as an affine map to the `transform.promote` op to support transpose memory copy. This map is a permutation of the original affine index map. Although this has been implemented and user can copy data to arbitrary layout , this attempt is not included in this patch since we have still problem with `linalg.generic` operations to change their index map to the transformed index map. You can find more in following links ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/4fd5f93355951ad0fb338858393ff409bd9c62f8 | Initial attempt to support layout map in promote op in transform dialect ]]) ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/9062b5849f91d4defb84996392b71087dadf7a8c | Fix data transpose in shared memory ]])

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D144666

115711c1

[SLP]Fix a crash when trying to find reduced ops for the reduced value. · 5f53e85f

Alexey Bataev authored 2 years ago

Need to use original reduced value, not the one the compiler gets after
reduction, it may be replaced by the extractelement instruction already.

5f53e85f

[InstCombine] Avoid ConstantExpr::getSelect() use (NFCI) · 3c2b1853
Nikita Popov authored 2 years ago
```
Instead let IRBuilder take care of constant folding.

In preparation for removing select constantexprs.
```
3c2b1853
[AArch64] Add some tests for multiple uses of extended vector extracts. NFC · 9e5bfa1a
David Green authored 2 years ago

9e5bfa1a
[mlir] Insert tensor.cast only when needed when folding tensor.cast into extract_slice. · 9fa61cbb
Alexander Belyaev authored 2 years ago
```
Differential Revision: https://reviews.llvm.org/D144868
```
9fa61cbb

[OHOS] Add support for OpenHarmony · c417b7a6

Pavel Kosov authored 2 years ago

Add support for OpenHarmony OS

General OpenHarmony OS discussion on discourse thread "[RFC] Add support for OpenHarmony OS"
https://discourse.llvm.org/t/rfc-add-support-for-openharmony-os/66656

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D138202

c417b7a6

[SME2][AArch64] Add multi-indexed multiply-add long long intrinsics · a9df6270

Kerry McLaughlin authored 2 years ago

Adds intrinsics for the following SME2 instructions (1, 2 & 4 vector):
 - smlall
 - umlall
 - smlsll
 - umlsll
 - sumlall
 - usmlall

NOTE: These intrinsics are still in development and are subject to future changes.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D143278

a9df6270

[GlobalOpt] Ignore only loaded / only stored global parts in global SRA heuristic · 49aa3777

Nikita Popov authored 2 years ago

When limiting the number of parts we split a global into, ignore
any parts that are either only loaded or only stored, because we
expect these to be optimized away after SRA.

Differential Revision: https://reviews.llvm.org/D129857

49aa3777