Commits · 86bdcdf00e82865ee7ae3fcbf843a47235851817 · educg-net-26154-2315672 / llvm_project-2529

27 Feb, 2023 40 commits

[LLVMContextImpl] Separate out integer constant ones · 86bdcdf0

Arthur Eubanks authored 2 years ago

Very small compile time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=6a7a8907e8334eaf551742148079c628f78e6ed7&to=454d1181fbdb9121f0c7a3ecf526520db32ab420&stat=instructions:u

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D144746

86bdcdf0

[LLVMContextImpl] Separate out integer constant zeroes · c3166753

Arthur Eubanks authored 2 years ago

Very small compile time improvement:
https://llvm-compile-time-tracker.com/compare.php?from=a628ca4925f7249b4fbd3e932c9627b12e2770dd&to=6a7a8907e8334eaf551742148079c628f78e6ed7&stat=instructions:u

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D144745

c3166753

[SLP]Fix PR61018: Assertion `Mask[I] == UndefMaskElem && "Multiple uses · 007177bd

Alexey Bataev authored 2 years ago

of scalars."' failed.

Need to check for the reused indices when checking if 2 insertelement
instruction are from the same buildvector. If the inidices are reused,
better not to match buildvectors and consider them as differenet,
otherwise need to track the order of insertelement operations.

007177bd

[AMDGPU] Update the CHECK autogenerated as it's expired · d514726d
zhongyunde authored 2 years ago
```
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D144771
```
d514726d

[Sema] Use isSVESizelessBuiltinType instead of isSizelessBuiltinType to prevent crashing on RISC-V. · 2e731117

Craig Topper authored 2 years ago

These 2 spots are protecting calls to SVE specific functions. If RISC-V
sizeless types end up in there we trigger assertions.

Use the more specific isSVESizelessBuiltinType() to avoid letting
RISC-V vectors through.

Reviewed By: asb, c-rhodes

Differential Revision: https://reviews.llvm.org/D144772

2e731117

[Flang][OpenMP][OpenACC] Error for loop with no control · 7d7633bd

Kiran Chandramohan authored 2 years ago

Issue error if a DO construct associated with a loop does not have
loop control. Currently, it is issued only for the loop immediately
following the loop construct. This patch extends it to cases like
collapse where there is more than one loop associated. It also fixes
a crash since the existing code always expects loop control.

This is covered in OpenMP 4.5 standard, Section 2.7.1.
"The do-loop cannot be a DO WHILE or a DO loop without loop control."

OpenACC 3.3 covers this indirectly in Section 2.9.1.
The trip count for all loops associated with the collapse clause must
be computable and invariant in all the loops".

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D144290

7d7633bd

[OpenMP] Ignore implicit casts on assertion for `use_device_ptr` · 853d4059

Joseph Huber authored 2 years ago

There was an assertion triggering when invoking a captured member whose
initializer was in a blase class. This patch fixes it by allowing the
assertion on implicit casts to the base class rather than only the base
class itself.

Fixes https://github.com/llvm/llvm-project/issues/61027

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D144873

853d4059

[Flang][OpenMP] NFC: Change a few message/comments to fit 80chars · 54acf9a3

Kiran Chandramohan authored 2 years ago

Changes are all in the OpenMP semantic checks file.

Reviewed By: SBallantyne

Differential Revision: https://reviews.llvm.org/D144874

54acf9a3

[mlir][Linalg] Reimplement hoisting on tensors as a subset-based transformation · 4521b113

Nicolas Vasilache authored 2 years ago

This revision significantly rewrites hoisting on tensors.
Previously, `vector.transfer_read/write` and `tensor.extract/insert_slice` would
be clumped together when looking for candidate pairs.
This would significantly increase the complexity of the logic and would not apply
independently to `tensor.extract/insert_slice`.

The new implementation decouples the cases and starts to cast the problem
as a generic matching subset extract/insert, which will be future proof when
other such operation pairs are introduced.

Lastly, the implementation makes the distinction clear between `vector.transfer_read/write` for
which we allow bypasses of the disjoint subsets from `tensor.extract/insert_slice` for which we
do not yet allow it.

This can be extended in the future and unified once we have subset disjunction implemented more generally.

The algorithm can be rewritten to be less of a fixed point with interspersed canonicalizations.
As a consequence, the test explicitly adds a canonicalization to clean up the IR and verify we end up in the same state.

That extra canonicalization exhibited that one of the uses in one of the tests was dead, so we fix the appropriate test.

Differential Revision: https://reviews.llvm.org/D144656

4521b113

[mlir] Fix a -Wunused-variable warning, NFC · 779d54fd
Haojian Wu authored 2 years ago

779d54fd

[ConstExpr] Avoid creation of select constant expressions · 5d6dfba1

Nikita Popov authored 2 years ago

These expressions will now only be created if explicitly requested
in IR/bitcode (and by LowerTypeTests, which has a tricky to remove
use).

This is in preparation for removing these expressions entirely,
but also fixes #60983 in the meantime.

5d6dfba1

[MLIR] Add pass to deduplicate functions · b12bcf3f

Frederik Gossen authored 2 years ago

Deduplicate functions that are equivalent in all aspects but their symbol name.
The pass chooses one representative per equivalence class, erases the remainder, and updates function calls accordingly.

Differential Revision: https://reviews.llvm.org/D144738

b12bcf3f

[mlir] Port bazel for 115711c1 · 8877d8f5
Haojian Wu authored 2 years ago

8877d8f5
[MLIR] Expose region equivalence check through OperationEquivalence · 31fc47e3
Frederik Gossen authored 2 years ago
```
Differential Revision: https://reviews.llvm.org/D144735
```
31fc47e3

[InlineCost] Avoid ConstantExpr::getSelect() · 86d1ed9e

Nikita Popov authored 2 years ago

Instead use ConstantFoldSelectInstruction(), which will return
nullptr if it cannot be folded and a constant expression would
be produced instead.

In preparation for removing select constant expressions.

86d1ed9e

[mlir][sparse] Add checking parent op of SortOp · 9a29d875

Kohei Yamaguchi authored 2 years ago

Fix crash with segmentation fault caused by setting a parent operator
that is not func::FuncOp with sparse_tensor SortOp.

fixes https://github.com/llvm/llvm-project/issues/59988

Reviewed By: aartbik, wrengr

Differential Revision: https://reviews.llvm.org/D143874

9a29d875

[mlir][NFC] Cleanup Passes documentation · b46e78c7

Kohei Yamaguchi authored 2 years ago

- Fix a place of NVGPU dialect's pass
- Move a summary of `-finalize-memref-to-llvm` into description
- Fix broken links
- Replace back-quote dialect headers with single-quote headers for
  improved readability.

Reviewed By: rriddle

Differential Revision: https://reviews.llvm.org/D142868

b46e78c7

[mlir][LinAlg][Transform][GPU] Add GPU memory hierarchy to the transform.promote op · 115711c1

Amir Mohammad Tavakkoli authored 2 years ago

In this patch we are adding the support of copying a a `memref.subview` to the shared or private memory in GPU. The global to shared memory copy is adopted from codes implemented in IREE (https://github.com/iree-org/iree), but the private memory copy part has not been implemented in IREE. This patch enables transferring a subview from `global->shared`, `global->private`, and `shared->private`.

Our final aim is to provide a copy layout as an affine map to the `transform.promote` op to support transpose memory copy. This map is a permutation of the original affine index map. Although this has been implemented and user can copy data to arbitrary layout , this attempt is not included in this patch since we have still problem with `linalg.generic` operations to change their index map to the transformed index map. You can find more in following links ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/4fd5f93355951ad0fb338858393ff409bd9c62f8 | Initial attempt to support layout map in promote op in transform dialect ]]) ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/9062b5849f91d4defb84996392b71087dadf7a8c | Fix data transpose in shared memory ]])

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D144666

115711c1

[SLP]Fix a crash when trying to find reduced ops for the reduced value. · 5f53e85f

Alexey Bataev authored 2 years ago

Need to use original reduced value, not the one the compiler gets after
reduction, it may be replaced by the extractelement instruction already.

5f53e85f

[InstCombine] Avoid ConstantExpr::getSelect() use (NFCI) · 3c2b1853
Nikita Popov authored 2 years ago
```
Instead let IRBuilder take care of constant folding.

In preparation for removing select constantexprs.
```
3c2b1853
[AArch64] Add some tests for multiple uses of extended vector extracts. NFC · 9e5bfa1a
David Green authored 2 years ago

9e5bfa1a
[mlir] Insert tensor.cast only when needed when folding tensor.cast into extract_slice. · 9fa61cbb
Alexander Belyaev authored 2 years ago
```
Differential Revision: https://reviews.llvm.org/D144868
```
9fa61cbb

[OHOS] Add support for OpenHarmony · c417b7a6

Pavel Kosov authored 2 years ago

Add support for OpenHarmony OS

General OpenHarmony OS discussion on discourse thread "[RFC] Add support for OpenHarmony OS"
https://discourse.llvm.org/t/rfc-add-support-for-openharmony-os/66656

Reviewed By: DavidSpickett

Differential Revision: https://reviews.llvm.org/D138202

c417b7a6

[SME2][AArch64] Add multi-indexed multiply-add long long intrinsics · a9df6270

Kerry McLaughlin authored 2 years ago

Adds intrinsics for the following SME2 instructions (1, 2 & 4 vector):
 - smlall
 - umlall
 - smlsll
 - umlsll
 - sumlall
 - usmlall

NOTE: These intrinsics are still in development and are subject to future changes.

Reviewed By: david-arm

Differential Revision: https://reviews.llvm.org/D143278

a9df6270

[GlobalOpt] Ignore only loaded / only stored global parts in global SRA heuristic · 49aa3777

Nikita Popov authored 2 years ago

When limiting the number of parts we split a global into, ignore
any parts that are either only loaded or only stored, because we
expect these to be optimized away after SRA.

Differential Revision: https://reviews.llvm.org/D129857

49aa3777

[libc++][ranges] Implement LWG-3860 range_common_reference_t is missing · a8ead919
Igor Zhukov authored 2 years ago

a8ead919

[mlir] Use the same name as the generated parameter name (NFC). · 01b9d355

Adrian Kuegel authored 2 years ago

When commenting for which parameter a value is passed, the same name
should be used as is used for the real parameter. In this case, the
parameter name is generated from the TransformOps.td file.

01b9d355

[flang][OpenMP] Handle lastprivate on sections construct · f49b6afc

Nimish Mishra authored 2 years ago

This patch adds support for lastprivate on sections construct.
One omp.sections operation can have several omp.section operation. As such, the privatization happens in the lexically last omp.section operation.

Reviewed By: kiranchandramohan, peixin

Differential Revision: https://reviews.llvm.org/D133686

f49b6afc

[Flang] Add Minloc to simplify intrinsics pass · 614cd721

Sacha Ballantyne authored 2 years ago

This patch adds minloc to the simplify intrinsics pass, supporting calls with KIND or MASK arguments while calls which have BACK, DIM or have a CHARACTER input array are rejected. This patch is targeting exchange2, and in benchmarks provides a ~11% improvement in performance.

Also included are some minor style changes / cleanup in simplifyIntrinsics.cpp.

Reviewed By: vzakhari

Differential Revision: https://reviews.llvm.org/D144103

614cd721

[LoopPredication] Account for critical edges when inserting assumes. PR26496 · a18ce47a

Max Kazantsev authored 2 years ago

Loop predication can insert assumes to preserve knowledge about some facts that
may otherwise be lost, because loop predication is a lossy transform. When a guard
is represented as branch by widenable condition, it should insert it in the guarded
block. However, if the guarded block has other predecessors than the guard block,
then the condition might not dominate it. Currently we generate invalid code here.

One possible fix here is to split critical edge and insert the assume there, but in
this case we should modify CFG, which Loop Predication is not currently doing, and we
want to keep it that way.

The fix is to handle this case by inserting a Phi which takes `Cond` as input from the
guard block and `true` from any other blocks. This is valid in terms of IR and does
not introduce any new knowledge if we came from another block.

Differential Revision: https://reviews.llvm.org/D144859
Reviewed By: nikic, skatkov

a18ce47a

Reapply [InstCombine] Remove early constant fold · ee2f9d6d

Nikita Popov authored 2 years ago

The reported compile-time regression has been address in
47f9109d.

Additionally, this contains a change to immediately fold zext
with constant operand, even if it's used in a trunc. I'm not sure
if this is relevant for anything, but I noticed it as a behavioral
discrepancy when investigating this issue.

-----

InstCombine currently performs a constant folding attempt as part
of the main InstCombine loop, before visiting the instruction.
However, each visit method will also attempt to simplify the
instruction, which will in turn constant fold it. (Additionally,
we also constant fold instructions before the main InstCombine loop
and use a constant folding IR builder, so this is doubly redundant.)

There is one place where InstCombine visit methods currently don't
call into simplification, and that's casts. To be conservative,
I've added an explicit constant folding call there (though it has
no impact on tests).

This makes for a mild compile-time improvement and in particular
mitigates the compile-time regression from enabling load
simplification in be88b581.

Differential Revision: https://reviews.llvm.org/D144369

ee2f9d6d

[SelectionDAG] Transitively copy NodeExtraInfo on RAUW · 7f635b90

Marco Elver authored 2 years ago

During legalization of the SelectionDAG, some nodes are replaced with
arch-specific nodes. These may be complex nodes, where the root node no
longer corresponds to the node that should carry the extra info.

Fix the issue by copying extra info to the new node and all its new
transitive operands during RAUW. See code comments for more details.

This fixes the remaining pcsections-atomics.ll tests on X86.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D144677

7f635b90

[X86][FixupBWInsts] Fix propagation of !pcsections metadata · d73da868

Marco Elver authored 2 years ago

Use MIMetadata() to propagate both DebugLoc and !pcsections metadata.

This fixes several of the non-native sized !pcsections tests in
pcsections-atomics.ll.

Reviewed By: dvyukov

Differential Revision: https://reviews.llvm.org/D144676

d73da868

[X86] Improve atomics test for !pcsections · a5653b82

Marco Elver authored 2 years ago

Extend pcsections-atomics.ll to exhaustively test all atomic ops up to
64 bits. This currently shows that some atomic operations do not end up
in PC sections. This will be addressed in a subsequent change.

Differential Revision: https://reviews.llvm.org/D144710

a5653b82

[X86] Move atomics test for !pcsections into separate file · ba63ddd5

Marco Elver authored 2 years ago

The pcsections.ll test primarily tests that the AsmPrinter produces the
right output in sections. This output is not easily covered by
update_llc_test_checks.py, and as such is hand written. This makes
maintenance rather burdensome. Instead, let's keep pcsections.ll as
simple as possible.

Move the more complex tests that primarily test that some atomic
operations end up in the PC section to pcsections-atomics.ll.

NFC.

Reviewed By: dvyukov, vitalybuka

Differential Revision: https://reviews.llvm.org/D144675

ba63ddd5

[InstCombine] Guard against many users when swapping icmp operands · 47f9109d

Nikita Popov authored 2 years ago

This addresses the compile-time regression reported on D144369.
If we don't fold constant operands early, then we might end up
walking very large use lists of constants here. Explicitly exclude
constants, and also limit the number of inspected users to avoid
degenerate cases like this.

This entire transform shouldn't be part of InstCombine in the
first place though.

47f9109d

[clang-format] Fix assertion that doesn't hold under fuzzing. · 398cddf6
Manuel Klimek authored 2 years ago

398cddf6

[SVE] Add intrinsics for uniform dsp operations that explicitly undefine the... · ec67d703

chendewen authored 2 years ago

[SVE] Add intrinsics for uniform dsp operations that explicitly undefine the result for inactive lanes.

This patch adds new intrinsics for uniform dsp operations and changes the lowering for the following builtins to emit calls to the new aarch64.sve.###.u intrinsics.
  svsqsub_x
  svsqsub_n_x
  svuqsub_x
  svuqsub_n_x
  svsqsubr_x
  svsqsubr_n_x
  svuqsubr_x
  svuqsubr_n_x

Reviewed By: Paul Walker
Differential Revision: https://reviews.llvm.org/D144704

ec67d703

[clang-format] Add macro replacement to fuzzing. · f600a5ae
Manuel Klimek authored 2 years ago

f600a5ae
[bazel] Port Bazel for e7950fce · 0264ca43
Haojian Wu authored 2 years ago

0264ca43