- 27 Feb, 2023 40 commits
-
-
Michal Paszkowski authored
This patch adds support for TargetExtType/target(...) representing SPIR-V builtin types. After D135202, target(...) is the preferred way for representing SPIR-V builtin types in LLVM IR and the only working in the opaque pointer mode. In order to maintain compatibility with LLVM IR generated by older versions of Clang and LLVM/SPIR-V Translator, pointers-to-opaque-structs denoting SPIR-V/OpenCL builtin types will be translated to equivalent SPIR-V target extension types. This translation is only available in the typed pointer mode (-opaque-pointers=0). The relevant LIT tests with SPIR-V builtins were converted to use the new target(...) notation. Differential Revision: https://reviews.llvm.org/D144494
-
Vasileios Porpodas authored
Crash caused by: 708eb1b9 Differential Revision: https://reviews.llvm.org/D144895
-
Nilanjana Basu authored
[AArch64] Avoid using intermediate integer registers for copying between source and destination floating point registers In post-isel code, there are cases where there were redundant copies from a source FPR to an intermediate GPR in order to copy to a destination FPR. In this patch, we identify these patterns in post-isel peephole optimization and replace them with a direct FPR-to-FPR copy. One example for this will be the insertion of the scalar result of 'uaddlv' neon intrinsic function into a destination vector. During instruction selection phase, 'uaddlv' result is copied to a GPR, & a vector insert instruction is matched separately to copy the previous result to a destination SIMD&FP register. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D142594
-
Daniel Thornburgh authored
For AVR, the definition of USHRT_MAX overflows. Reviewed By: aaron.ballman, #clang-language-wg Differential Revision: https://reviews.llvm.org/D144218
-
Tamir Duberstein authored
Handle the case where the diff is a pure removal of lines. Before this change start_line would end up as 0 which is rejected by clang-format. Submitting on behalf of @tamird. Differential Revision: https://reviews.llvm.org/D144291
-
Rong Xu authored
This is a modified version of commit b3744233 by Arthur (https://reviews.llvm.org/D143424). Here we invoke to the pass independent of PGOOPT. We now check if the profile is available through the program summary. This ensures CHR is called in distributed ThinLTO BE compilation (where PGOOPT might not be created). Differential Revision: https://reviews.llvm.org/D144769
-
Florian Hahn authored
This allows for easier updating of common code in follow-on patches. Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144847
-
Amara Emerson authored
This change reorders the stack up-adjustment and return value copying phases of machine-ir generation on Aarch64. Doing so prevents a bug observed for fastcc calls with >8 arguments, where the up-adjustment required from making that call is placed in the wrong place relative to spill and reloading code. See: https://github.com/llvm/llvm-project/issues/60972 for full issue reproduction and context. Patch contributed by Bruce Collie Differential Revision: https://reviews.llvm.org/D144791
-
David Green authored
If we have sext_inreg(vector_extract(x)) but the top bits are not used, DAG will try to remove the sext_inreg, using vector_extract(x) directly. This can lead to multiple uses of both sext_inreg(vector_extract(x)) and vector_extract(x), leading to the generation of both umov and smov extracts. This adds a target hook to prevent that under AArch64 where the sext_inreg can be considered free if there are multiple uses of the sext and no uses of the vector_extract. This helps fix a small regression from D144550. Differential Revision: https://reviews.llvm.org/D144850
-
Frederik Gossen authored
Differential Revision: https://reviews.llvm.org/D144886
-
Chia-hung Duan authored
This reduces the size of PageMap and we are more likely to use the static local buffer. Note that now this is only supported for single region case, i.e. on SizeClassAllocator64. For SizeClassAllocator32, it needs a different way to save the PageMap. Differential Revision: https://reviews.llvm.org/D142659
-
Nikolas Klauser authored
[libc++][NFC] Format __split_buffer and move constructors that are marked inline into the class body Reviewed By: ldionne, #libc Spies: libcxx-commits Differential Revision: https://reviews.llvm.org/D142433
-
Nikolas Klauser authored
Reviewed By: #libc, ldionne Spies: vvereschaka, libcxx-commits Differential Revision: https://reviews.llvm.org/D144825
-
Mark de Wever authored
Add a new test based .clang-format file which inherits from the generic one. This moves some test specific formatting rules to the test directory. The main benefit is that headers are sorted, which makes it more likely to catch these errors before creating a review instead of spotting the error in the CI clang-tidy step. Reviewed By: ldionne, philnik, #libc Differential Revision: https://reviews.llvm.org/D144755
-
Mark de Wever authored
This uses std::addressof everywherein atomic. This is not strictly needed for the integral and floating point specializations. They should not be used by user defined types. But it's easier to fix everything. Note these changes are made using a WIP clang-tidy plugin. Reviewed By: #libc, ldionne Differential Revision: https://reviews.llvm.org/D144786
-
Arthur Eubanks authored
Very small compile time improvement: https://llvm-compile-time-tracker.com/compare.php?from=6a7a8907e8334eaf551742148079c628f78e6ed7&to=454d1181fbdb9121f0c7a3ecf526520db32ab420&stat=instructions:u Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144746
-
Arthur Eubanks authored
Very small compile time improvement: https://llvm-compile-time-tracker.com/compare.php?from=a628ca4925f7249b4fbd3e932c9627b12e2770dd&to=6a7a8907e8334eaf551742148079c628f78e6ed7&stat=instructions:u Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144745
-
Alexey Bataev authored
of scalars."' failed. Need to check for the reused indices when checking if 2 insertelement instruction are from the same buildvector. If the inidices are reused, better not to match buildvectors and consider them as differenet, otherwise need to track the order of insertelement operations.
-
zhongyunde authored
Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D144771
-
Craig Topper authored
These 2 spots are protecting calls to SVE specific functions. If RISC-V sizeless types end up in there we trigger assertions. Use the more specific isSVESizelessBuiltinType() to avoid letting RISC-V vectors through. Reviewed By: asb, c-rhodes Differential Revision: https://reviews.llvm.org/D144772
-
Kiran Chandramohan authored
Issue error if a DO construct associated with a loop does not have loop control. Currently, it is issued only for the loop immediately following the loop construct. This patch extends it to cases like collapse where there is more than one loop associated. It also fixes a crash since the existing code always expects loop control. This is covered in OpenMP 4.5 standard, Section 2.7.1. "The do-loop cannot be a DO WHILE or a DO loop without loop control." OpenACC 3.3 covers this indirectly in Section 2.9.1. The trip count for all loops associated with the collapse clause must be computable and invariant in all the loops". Reviewed By: clementval Differential Revision: https://reviews.llvm.org/D144290
-
Joseph Huber authored
There was an assertion triggering when invoking a captured member whose initializer was in a blase class. This patch fixes it by allowing the assertion on implicit casts to the base class rather than only the base class itself. Fixes https://github.com/llvm/llvm-project/issues/61027 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D144873
-
Kiran Chandramohan authored
Changes are all in the OpenMP semantic checks file. Reviewed By: SBallantyne Differential Revision: https://reviews.llvm.org/D144874
-
Nicolas Vasilache authored
This revision significantly rewrites hoisting on tensors. Previously, `vector.transfer_read/write` and `tensor.extract/insert_slice` would be clumped together when looking for candidate pairs. This would significantly increase the complexity of the logic and would not apply independently to `tensor.extract/insert_slice`. The new implementation decouples the cases and starts to cast the problem as a generic matching subset extract/insert, which will be future proof when other such operation pairs are introduced. Lastly, the implementation makes the distinction clear between `vector.transfer_read/write` for which we allow bypasses of the disjoint subsets from `tensor.extract/insert_slice` for which we do not yet allow it. This can be extended in the future and unified once we have subset disjunction implemented more generally. The algorithm can be rewritten to be less of a fixed point with interspersed canonicalizations. As a consequence, the test explicitly adds a canonicalization to clean up the IR and verify we end up in the same state. That extra canonicalization exhibited that one of the uses in one of the tests was dead, so we fix the appropriate test. Differential Revision: https://reviews.llvm.org/D144656
-
Haojian Wu authored
-
Nikita Popov authored
These expressions will now only be created if explicitly requested in IR/bitcode (and by LowerTypeTests, which has a tricky to remove use). This is in preparation for removing these expressions entirely, but also fixes #60983 in the meantime.
-
Frederik Gossen authored
Deduplicate functions that are equivalent in all aspects but their symbol name. The pass chooses one representative per equivalence class, erases the remainder, and updates function calls accordingly. Differential Revision: https://reviews.llvm.org/D144738
-
Haojian Wu authored
-
Frederik Gossen authored
Differential Revision: https://reviews.llvm.org/D144735
-
Nikita Popov authored
Instead use ConstantFoldSelectInstruction(), which will return nullptr if it cannot be folded and a constant expression would be produced instead. In preparation for removing select constant expressions.
-
Kohei Yamaguchi authored
Fix crash with segmentation fault caused by setting a parent operator that is not func::FuncOp with sparse_tensor SortOp. fixes https://github.com/llvm/llvm-project/issues/59988 Reviewed By: aartbik, wrengr Differential Revision: https://reviews.llvm.org/D143874
-
Kohei Yamaguchi authored
- Fix a place of NVGPU dialect's pass - Move a summary of `-finalize-memref-to-llvm` into description - Fix broken links - Replace back-quote dialect headers with single-quote headers for improved readability. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D142868
-
Amir Mohammad Tavakkoli authored
In this patch we are adding the support of copying a a `memref.subview` to the shared or private memory in GPU. The global to shared memory copy is adopted from codes implemented in IREE (https://github.com/iree-org/iree), but the private memory copy part has not been implemented in IREE. This patch enables transferring a subview from `global->shared`, `global->private`, and `shared->private`. Our final aim is to provide a copy layout as an affine map to the `transform.promote` op to support transpose memory copy. This map is a permutation of the original affine index map. Although this has been implemented and user can copy data to arbitrary layout , this attempt is not included in this patch since we have still problem with `linalg.generic` operations to change their index map to the transformed index map. You can find more in following links ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/4fd5f93355951ad0fb338858393ff409bd9c62f8 | Initial attempt to support layout map in promote op in transform dialect ]]) ([[ https://github.com/tavakkoliamirmohammad/iree-llvm-fork/commit/9062b5849f91d4defb84996392b71087dadf7a8c | Fix data transpose in shared memory ]]) Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D144666
-
Alexey Bataev authored
Need to use original reduced value, not the one the compiler gets after reduction, it may be replaced by the extractelement instruction already.
-
Nikita Popov authored
Instead let IRBuilder take care of constant folding. In preparation for removing select constantexprs.
-
David Green authored
-
Alexander Belyaev authored
Differential Revision: https://reviews.llvm.org/D144868
-
Pavel Kosov authored
Add support for OpenHarmony OS General OpenHarmony OS discussion on discourse thread "[RFC] Add support for OpenHarmony OS" https://discourse.llvm.org/t/rfc-add-support-for-openharmony-os/66656 Reviewed By: DavidSpickett Differential Revision: https://reviews.llvm.org/D138202
-
Kerry McLaughlin authored
Adds intrinsics for the following SME2 instructions (1, 2 & 4 vector): - smlall - umlall - smlsll - umlsll - sumlall - usmlall NOTE: These intrinsics are still in development and are subject to future changes. Reviewed By: david-arm Differential Revision: https://reviews.llvm.org/D143278
-
Nikita Popov authored
When limiting the number of parts we split a global into, ignore any parts that are either only loaded or only stored, because we expect these to be optimized away after SRA. Differential Revision: https://reviews.llvm.org/D129857
-