- 27 Feb, 2023 40 commits
-
-
Arthur Eubanks authored
Very small compile time improvement: https://llvm-compile-time-tracker.com/compare.php?from=6a7a8907e8334eaf551742148079c628f78e6ed7&to=454d1181fbdb9121f0c7a3ecf526520db32ab420&stat=instructions:u Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144746
-
Arthur Eubanks authored
Very small compile time improvement: https://llvm-compile-time-tracker.com/compare.php?from=a628ca4925f7249b4fbd3e932c9627b12e2770dd&to=6a7a8907e8334eaf551742148079c628f78e6ed7&stat=instructions:u Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144745
-
Alexey Bataev authored
of scalars."' failed. Need to check for the reused indices when checking if 2 insertelement instruction are from the same buildvector. If the inidices are reused, better not to match buildvectors and consider them as differenet, otherwise need to track the order of insertelement operations.
-
zhongyunde authored
Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144771
-
Craig Topper authored
These 2 spots are protecting calls to SVE specific functions. If RISC-V sizeless types end up in there we trigger assertions. Use the more specific isSVESizelessBuiltinType() to avoid letting RISC-V vectors through. Reviewed By: asb, c-rhodes Differential Revision: https://reviews.llvm.org/D144772
-
Kiran Chandramohan authored
Issue error if a DO construct associated with a loop does not have loop control. Currently, it is issued only for the loop immediately following the loop construct. This patch extends it to cases like collapse where there is more than one loop associated. It also fixes a crash since the existing code always expects loop control. This is covered in OpenMP 4.5 standard, Section 2.7.1. "The do-loop cannot be a DO WHILE or a DO loop without loop control." OpenACC 3.3 covers this indirectly in Section 2.9.1. The trip count for all loops associated with the collapse clause must be computable and invariant in all the loops". Reviewed By: clementval Differential Revision: https://reviews.llvm.org/D144290
-
Joseph Huber authored
There was an assertion triggering when invoking a captured member whose initializer was in a blase class. This patch fixes it by allowing the assertion on implicit casts to the base class rather than only the base class itself. Fixes https://github.com/llvm/llvm-project/issues/61027 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D144873
-
Kiran Chandramohan authored
Changes are all in the OpenMP semantic checks file. Reviewed By: SBallantyne Differential Revision: https://reviews.llvm.org/D144874
-
Nicolas Vasilache authored
This revision significantly rewrites hoisting on tensors. Previously, `vector.transfer_read/write` and `tensor.extract/insert_slice` would be clumped together when looking for candidate pairs. This would significantly increase the complexity of the logic and would not apply independently to `tensor.extract/insert_slice`. The new implementation decouples the cases and starts to cast the problem as a generic matching subset extract/insert, which will be future proof when other such operation pairs are introduced. Lastly, the implementation makes the distinction clear between `vector.transfer_read/write` for which we allow bypasses of the disjoint subsets from `tensor.extract/insert_slice` for which we do not yet allow it. This can be extended in the future and unified once we have subset disjunction implemented more generally. The algorithm can be rewritten to be less of a fixed point with interspersed canonicalizations. As a consequence, the test explicitly adds a canonicalization to clean up the IR and verify we end up in the same state. That extra canonicalization exhibited that one of the uses in one of the tests was dead, so we fix the appropriate test. Differential Revision: https://reviews.llvm.org/D144656
-
Haojian Wu authored
-
Nikita Popov authored
These expressions will now only be created if explicitly requested in IR/bitcode (and by LowerTypeTests, which has a tricky to remove use). This is in preparation for removing these expressions entirely, but also fixes #60983 in the meantime.
-
Frederik Gossen authored
Deduplicate functions that are equivalent in all aspects but their symbol name. The pass chooses one representative per equivalence class, erases the remainder, and updates function calls accordingly. Differential Revision: https://reviews.llvm.org/D144738
-
Haojian Wu authored
-
Frederik Gossen authored
Differential Revision: https://reviews.llvm.org/D144735
-
Nikita Popov authored
Instead use ConstantFoldSelectInstruction(), which will return nullptr if it cannot be folded and a constant expression would be produced instead. In preparation for removing select constant expressions.
-
Kohei Yamaguchi authored
Fix crash with segmentation fault caused by setting a parent operator that is not func::FuncOp with sparse_tensor SortOp. fixes https://github.com/llvm/llvm-project/issues/59988 Reviewed By: aartbik, wrengr Differential Revision: https://reviews.llvm.org/D143874
-
Kohei Yamaguchi authored
- Fix a place of NVGPU dialect's pass - Move a summary of `-finalize-memref-to-llvm` into description - Fix broken links - Replace back-quote dialect headers with single-quote headers for improved readability. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D142868
-
Amir Mohammad Tavakkoli authored
In this patch we are adding the support of copying a a `memref.subview` to the shared or private memory in GPU. The global to shared memory copy is adopted from codes implemented in IREE (https://github.com/iree-org/iree), but the private memory copy part has not been implemented in IREE. This patch enables transferring a subview from `global->shared`, `global->private`, and `shared->private`. Our final aim is to provide a copy layout as an affine map to the `transform.promote` op to support transpose memory copy. This map is a permutation of the original affine index map. Although this has been implemented and user can copy data to arbitrary layout , this attempt is not included in this patch since we have still problem with `linalg.generic` operations to change their index map to the transformed index map. You can find more in following links ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/4fd5f93355951ad0fb338858393ff409bd9c62f8 | Initial attempt to support layout map in promote op in transform dialect ]]) ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/9062b5849f91d4defb84996392b71087dadf7a8c | Fix data transpose in shared memory ]]) Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D144666
-
Alexey Bataev authored
Need to use original reduced value, not the one the compiler gets after reduction, it may be replaced by the extractelement instruction already.
-
Nikita Popov authored
Instead let IRBuilder take care of constant folding. In preparation for removing select constantexprs.
-
David Green authored
-
Alexander Belyaev authored
Differential Revision: https://reviews.llvm.org/D144868
-
Pavel Kosov authored
Add support for OpenHarmony OS General OpenHarmony OS discussion on discourse thread "[RFC] Add support for OpenHarmony OS" https://discourse.llvm.org/t/rfc-add-support-for-openharmony-os/66656 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D138202
-
Kerry McLaughlin authored
Adds intrinsics for the following SME2 instructions (1, 2 & 4 vector): - smlall - umlall - smlsll - umlsll - sumlall - usmlall NOTE: These intrinsics are still in development and are subject to future changes. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D143278
-
Nikita Popov authored
When limiting the number of parts we split a global into, ignore any parts that are either only loaded or only stored, because we expect these to be optimized away after SRA. Differential Revision: https://reviews.llvm.org/D129857
-
Igor Zhukov authored
-
Adrian Kuegel authored
When commenting for which parameter a value is passed, the same name should be used as is used for the real parameter. In this case, the parameter name is generated from the TransformOps.td file.
-
Nimish Mishra authored
This patch adds support for lastprivate on sections construct. One omp.sections operation can have several omp.section operation. As such, the privatization happens in the lexically last omp.section operation. Reviewed By: kiranchandramohan, peixin Differential Revision: https://reviews.llvm.org/D133686
-
Sacha Ballantyne authored
This patch adds minloc to the simplify intrinsics pass, supporting calls with KIND or MASK arguments while calls which have BACK, DIM or have a CHARACTER input array are rejected. This patch is targeting exchange2, and in benchmarks provides a ~11% improvement in performance. Also included are some minor style changes / cleanup in simplifyIntrinsics.cpp. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D144103
-
Max Kazantsev authored
Loop predication can insert assumes to preserve knowledge about some facts that may otherwise be lost, because loop predication is a lossy transform. When a guard is represented as branch by widenable condition, it should insert it in the guarded block. However, if the guarded block has other predecessors than the guard block, then the condition might not dominate it. Currently we generate invalid code here. One possible fix here is to split critical edge and insert the assume there, but in this case we should modify CFG, which Loop Predication is not currently doing, and we want to keep it that way. The fix is to handle this case by inserting a Phi which takes `Cond` as input from the guard block and `true` from any other blocks. This is valid in terms of IR and does not introduce any new knowledge if we came from another block. Differential Revision: https://reviews.llvm.org/D144859 Reviewed By: nikic, skatkov
-
Nikita Popov authored
The reported compile-time regression has been address in 47f9109d. Additionally, this contains a change to immediately fold zext with constant operand, even if it's used in a trunc. I'm not sure if this is relevant for anything, but I noticed it as a behavioral discrepancy when investigating this issue. ----- InstCombine currently performs a constant folding attempt as part of the main InstCombine loop, before visiting the instruction. However, each visit method will also attempt to simplify the instruction, which will in turn constant fold it. (Additionally, we also constant fold instructions before the main InstCombine loop and use a constant folding IR builder, so this is doubly redundant.) There is one place where InstCombine visit methods currently don't call into simplification, and that's casts. To be conservative, I've added an explicit constant folding call there (though it has no impact on tests). This makes for a mild compile-time improvement and in particular mitigates the compile-time regression from enabling load simplification in be88b581. Differential Revision: https://reviews.llvm.org/D144369
-
Marco Elver authored
During legalization of the SelectionDAG, some nodes are replaced with arch-specific nodes. These may be complex nodes, where the root node no longer corresponds to the node that should carry the extra info. Fix the issue by copying extra info to the new node and all its new transitive operands during RAUW. See code comments for more details. This fixes the remaining pcsections-atomics.ll tests on X86. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D144677
-
Marco Elver authored
Use MIMetadata() to propagate both DebugLoc and !pcsections metadata. This fixes several of the non-native sized !pcsections tests in pcsections-atomics.ll. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D144676
-
Marco Elver authored
Extend pcsections-atomics.ll to exhaustively test all atomic ops up to 64 bits. This currently shows that some atomic operations do not end up in PC sections. This will be addressed in a subsequent change. Differential Revision: https://reviews.llvm.org/D144710
-
Marco Elver authored
The pcsections.ll test primarily tests that the AsmPrinter produces the right output in sections. This output is not easily covered by update_llc_test_checks.py, and as such is hand written. This makes maintenance rather burdensome. Instead, let's keep pcsections.ll as simple as possible. Move the more complex tests that primarily test that some atomic operations end up in the PC section to pcsections-atomics.ll. NFC. Reviewed By: dvyukov, vitalybuka Differential Revision: https://reviews.llvm.org/D144675
-
Nikita Popov authored
This addresses the compile-time regression reported on D144369. If we don't fold constant operands early, then we might end up walking very large use lists of constants here. Explicitly exclude constants, and also limit the number of inspected users to avoid degenerate cases like this. This entire transform shouldn't be part of InstCombine in the first place though.
-
Manuel Klimek authored
-
chendewen authored
[SVE] Add intrinsics for uniform dsp operations that explicitly undefine the result for inactive lanes. This patch adds new intrinsics for uniform dsp operations and changes the lowering for the following builtins to emit calls to the new aarch64.sve.###.u intrinsics. svsqsub_x svsqsub_n_x svuqsub_x svuqsub_n_x svsqsubr_x svsqsubr_n_x svuqsubr_x svuqsubr_n_x Reviewed By: Paul Walker Differential Revision: https://reviews.llvm.org/D144704
-
Manuel Klimek authored
-
Haojian Wu authored
-