- 27 Feb, 2023 40 commits
-
-
David Green authored
If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG will try to remove the sext_inreg, using vector_extract(x) directly. This can lead to multiple uses of both sext_inreg(vector_extract(x)) and vector_extract(x), leading to the generation of both umov and smov extracts. This adds a target hook to prevent that under AArch64 where the sext_inreg can be considered free if there are multiple uses of the sext and no uses of the vector_extract. This helps fix a small regression from D144550. Differential Revision: https://reviews.llvm.org/D144850
-
Frederik Gossen authored
Differential Revision: https://reviews.llvm.org/D144886
-
Chia-hung Duan authored
This reduces the size of PageMap and we are more likely to use the static local buffer. Note that now this is only supported for single region case, i.e. on SizeClassAllocator64. For SizeClassAllocator32, it needs a different way to save the PageMap. Differential Revision: https://reviews.llvm.org/D142659
-
Nikolas Klauser authored
[libc++][NFC] Format __split_buffer and move constructors that are marked inline into the class body Reviewed By: ldionne, #libc Spies: libcxx-commits Differential Revision: https://reviews.llvm.org/D142433
-
Nikolas Klauser authored
Reviewed By: #libc, ldionne Spies: vvereschaka, libcxx-commits Differential Revision: https://reviews.llvm.org/D144825
-
Mark de Wever authored
Add a new test based .clang-format file which inherits from the generic one. This moves some test specific formatting rules to the test directory. The main benefit is that headers are sorted, which makes it more likely to catch these errors before creating a review instead of spotting the error in the CI clang-tidy step. Reviewed By: ldionne, philnik, #libc Differential Revision: https://reviews.llvm.org/D144755
-
Mark de Wever authored
This uses std::addressof everywherein atomic. This is not strictly needed for the integral and floating point specializations. They should not be used by user defined types. But it's easier to fix everything. Note these changes are made using a WIP clang-tidy plugin. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D144786
-
Arthur Eubanks authored
Very small compile time improvement: https://llvm-compile-time-tracker.com/compare.php?from=6a7a8907e8334eaf551742148079c628f78e6ed7&to=454d1181fbdb9121f0c7a3ecf526520db32ab420&stat=instructions:u Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144746
-
Arthur Eubanks authored
Very small compile time improvement: https://llvm-compile-time-tracker.com/compare.php?from=a628ca4925f7249b4fbd3e932c9627b12e2770dd&to=6a7a8907e8334eaf551742148079c628f78e6ed7&stat=instructions:u Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144745
-
Alexey Bataev authored
of scalars."' failed. Need to check for the reused indices when checking if 2 insertelement instruction are from the same buildvector. If the inidices are reused, better not to match buildvectors and consider them as differenet, otherwise need to track the order of insertelement operations.
-
zhongyunde authored
Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144771
-
Craig Topper authored
These 2 spots are protecting calls to SVE specific functions. If RISC-V sizeless types end up in there we trigger assertions. Use the more specific isSVESizelessBuiltinType() to avoid letting RISC-V vectors through. Reviewed By: asb, c-rhodes Differential Revision: https://reviews.llvm.org/D144772
-
Kiran Chandramohan authored
Issue error if a DO construct associated with a loop does not have loop control. Currently, it is issued only for the loop immediately following the loop construct. This patch extends it to cases like collapse where there is more than one loop associated. It also fixes a crash since the existing code always expects loop control. This is covered in OpenMP 4.5 standard, Section 2.7.1. "The do-loop cannot be a DO WHILE or a DO loop without loop control." OpenACC 3.3 covers this indirectly in Section 2.9.1. The trip count for all loops associated with the collapse clause must be computable and invariant in all the loops". Reviewed By: clementval Differential Revision: https://reviews.llvm.org/D144290
-
Joseph Huber authored
There was an assertion triggering when invoking a captured member whose initializer was in a blase class. This patch fixes it by allowing the assertion on implicit casts to the base class rather than only the base class itself. Fixes https://github.com/llvm/llvm-project/issues/61027 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D144873
-
Kiran Chandramohan authored
Changes are all in the OpenMP semantic checks file. Reviewed By: SBallantyne Differential Revision: https://reviews.llvm.org/D144874
-
Nicolas Vasilache authored
This revision significantly rewrites hoisting on tensors. Previously, `vector.transfer_read/write` and `tensor.extract/insert_slice` would be clumped together when looking for candidate pairs. This would significantly increase the complexity of the logic and would not apply independently to `tensor.extract/insert_slice`. The new implementation decouples the cases and starts to cast the problem as a generic matching subset extract/insert, which will be future proof when other such operation pairs are introduced. Lastly, the implementation makes the distinction clear between `vector.transfer_read/write` for which we allow bypasses of the disjoint subsets from `tensor.extract/insert_slice` for which we do not yet allow it. This can be extended in the future and unified once we have subset disjunction implemented more generally. The algorithm can be rewritten to be less of a fixed point with interspersed canonicalizations. As a consequence, the test explicitly adds a canonicalization to clean up the IR and verify we end up in the same state. That extra canonicalization exhibited that one of the uses in one of the tests was dead, so we fix the appropriate test. Differential Revision: https://reviews.llvm.org/D144656
-
Haojian Wu authored
-
Nikita Popov authored
These expressions will now only be created if explicitly requested in IR/bitcode (and by LowerTypeTests, which has a tricky to remove use). This is in preparation for removing these expressions entirely, but also fixes #60983 in the meantime.
-
Frederik Gossen authored
Deduplicate functions that are equivalent in all aspects but their symbol name. The pass chooses one representative per equivalence class, erases the remainder, and updates function calls accordingly. Differential Revision: https://reviews.llvm.org/D144738
-
Haojian Wu authored
-
Frederik Gossen authored
Differential Revision: https://reviews.llvm.org/D144735
-
Nikita Popov authored
Instead use ConstantFoldSelectInstruction(), which will return nullptr if it cannot be folded and a constant expression would be produced instead. In preparation for removing select constant expressions.
-
Kohei Yamaguchi authored
Fix crash with segmentation fault caused by setting a parent operator that is not func::FuncOp with sparse_tensor SortOp. fixes https://github.com/llvm/llvm-project/issues/59988 Reviewed By: aartbik, wrengr Differential Revision: https://reviews.llvm.org/D143874
-
Kohei Yamaguchi authored
- Fix a place of NVGPU dialect's pass - Move a summary of `-finalize-memref-to-llvm` into description - Fix broken links - Replace back-quote dialect headers with single-quote headers for improved readability. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D142868
-
Amir Mohammad Tavakkoli authored
In this patch we are adding the support of copying a a `memref.subview` to the shared or private memory in GPU. The global to shared memory copy is adopted from codes implemented in IREE (https://github.com/iree-org/iree), but the private memory copy part has not been implemented in IREE. This patch enables transferring a subview from `global->shared`, `global->private`, and `shared->private`. Our final aim is to provide a copy layout as an affine map to the `transform.promote` op to support transpose memory copy. This map is a permutation of the original affine index map. Although this has been implemented and user can copy data to arbitrary layout , this attempt is not included in this patch since we have still problem with `linalg.generic` operations to change their index map to the transformed index map. You can find more in following links ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/4fd5f93355951ad0fb338858393ff409bd9c62f8 | Initial attempt to support layout map in promote op in transform dialect ]]) ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/9062b5849f91d4defb84996392b71087dadf7a8c | Fix data transpose in shared memory ]]) Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D144666
-
Alexey Bataev authored
Need to use original reduced value, not the one the compiler gets after reduction, it may be replaced by the extractelement instruction already.
-
Nikita Popov authored
Instead let IRBuilder take care of constant folding. In preparation for removing select constantexprs.
-
David Green authored
-
Alexander Belyaev authored
Differential Revision: https://reviews.llvm.org/D144868
-
Pavel Kosov authored
Add support for OpenHarmony OS General OpenHarmony OS discussion on discourse thread "[RFC] Add support for OpenHarmony OS" https://discourse.llvm.org/t/rfc-add-support-for-openharmony-os/66656 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D138202
-
Kerry McLaughlin authored
Adds intrinsics for the following SME2 instructions (1, 2 & 4 vector): - smlall - umlall - smlsll - umlsll - sumlall - usmlall NOTE: These intrinsics are still in development and are subject to future changes. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D143278
-
Nikita Popov authored
When limiting the number of parts we split a global into, ignore any parts that are either only loaded or only stored, because we expect these to be optimized away after SRA. Differential Revision: https://reviews.llvm.org/D129857
-
Igor Zhukov authored
-
Adrian Kuegel authored
When commenting for which parameter a value is passed, the same name should be used as is used for the real parameter. In this case, the parameter name is generated from the TransformOps.td file.
-
Nimish Mishra authored
This patch adds support for lastprivate on sections construct. One omp.sections operation can have several omp.section operation. As such, the privatization happens in the lexically last omp.section operation. Reviewed By: kiranchandramohan, peixin Differential Revision: https://reviews.llvm.org/D133686
-
Sacha Ballantyne authored
This patch adds minloc to the simplify intrinsics pass, supporting calls with KIND or MASK arguments while calls which have BACK, DIM or have a CHARACTER input array are rejected. This patch is targeting exchange2, and in benchmarks provides a ~11% improvement in performance. Also included are some minor style changes / cleanup in simplifyIntrinsics.cpp. Reviewed By: vzakhari Differential Revision: https://reviews.llvm.org/D144103
-
Max Kazantsev authored
Loop predication can insert assumes to preserve knowledge about some facts that may otherwise be lost, because loop predication is a lossy transform. When a guard is represented as branch by widenable condition, it should insert it in the guarded block. However, if the guarded block has other predecessors than the guard block, then the condition might not dominate it. Currently we generate invalid code here. One possible fix here is to split critical edge and insert the assume there, but in this case we should modify CFG, which Loop Predication is not currently doing, and we want to keep it that way. The fix is to handle this case by inserting a Phi which takes `Cond` as input from the guard block and `true` from any other blocks. This is valid in terms of IR and does not introduce any new knowledge if we came from another block. Differential Revision: https://reviews.llvm.org/D144859 Reviewed By: nikic, skatkov
-
Nikita Popov authored
The reported compile-time regression has been address in 47f9109d. Additionally, this contains a change to immediately fold zext with constant operand, even if it's used in a trunc. I'm not sure if this is relevant for anything, but I noticed it as a behavioral discrepancy when investigating this issue. ----- InstCombine currently performs a constant folding attempt as part of the main InstCombine loop, before visiting the instruction. However, each visit method will also attempt to simplify the instruction, which will in turn constant fold it. (Additionally, we also constant fold instructions before the main InstCombine loop and use a constant folding IR builder, so this is doubly redundant.) There is one place where InstCombine visit methods currently don't call into simplification, and that's casts. To be conservative, I've added an explicit constant folding call there (though it has no impact on tests). This makes for a mild compile-time improvement and in particular mitigates the compile-time regression from enabling load simplification in be88b581. Differential Revision: https://reviews.llvm.org/D144369
-
Marco Elver authored
During legalization of the SelectionDAG, some nodes are replaced with arch-specific nodes. These may be complex nodes, where the root node no longer corresponds to the node that should carry the extra info. Fix the issue by copying extra info to the new node and all its new transitive operands during RAUW. See code comments for more details. This fixes the remaining pcsections-atomics.ll tests on X86. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D144677
-
Marco Elver authored
Use MIMetadata() to propagate both DebugLoc and !pcsections metadata. This fixes several of the non-native sized !pcsections tests in pcsections-atomics.ll. Reviewed By: dvyukov Differential Revision: https://reviews.llvm.org/D144676
-