[PATCH] D92086: Generalized PatternMatch & InstSimplify

Fri Jun 10 03:29:02 PDT 2022

nikic added a comment.

This fails to build with:

> lib/libLLVMInstCombine.a(InstructionCombining.cpp.o):InstructionCombining.cpp:function llvm::InstCombinerImpl::visitGEPOfGEP(llvm::GetElementPtrInst&, llvm::GEPOperator*): error: undefined reference to 'llvm::Value* llvm::simplifyAddInst(llvm::Value*, llvm::Value*, bool, bool, llvm::SimplifyQuery const&, llvm::MatcherContext&)'

Probably due to illegal use of `inline`. You need to use `static inline` if you don't add `extern inline` definitions. (Or possibly a missing explicit template instantiation, but I think that wouldn't be used by InstCombine with this patch.)

I'm generally very skeptical about this proposal. This adds a lot of additional complexity to the area of LLVM that likely already has the highest ratio between lines of code changed and review time / amount of tests. After adding tests for baseline behavior, 16 commuted variants, nowrap/FMF variations, vector splat, undef-splat and non-splat variations, and negative tests for all used conditions (including all pattern match conditions), I definitely would rather not add variation tests for VP and constrained FP intrinsics (with which I am not familiar) as well.

I guess one could make an argument that if one uses the generalized matchers, there is nothing that could possibly go wrong, and there is no need to actually test these pattern or understand their correctness. I'm doubtful that this is actually true though. It's almost certainly not true for InstCombine, but even in InstSimplify I expect that there would be some dangerous interaction with predicated div/rem opcodes. Presumably, a zero value in a masked out lane does not render the program undefined, and as such an InstSimplify fold to a poison value would be incorrect.

Another concern here is compile-time, but I won't be able to test that until there is an actually buildable patch.

A possibly stupid question from someone who has not followed the VP design: Why does the predication need to happen on every single instruction? Naively, I'd assume that predication is only important for instructions that have trapping behavior, which is basically just load/store and div/rem class intrinsics. For everything else, it should be fine to represent this as a normal vector add, and then a single masking operation at the end. One could then propagate that masking backwards in the backend. That should ensure that normal instructions optimize fine, while instructions where the masking is semantically relevant (like div/rem) do not blindly participate in optimizations.

So basically, `vp.add(X, Y, M)` can be decomposed into `vp.mask(add(X, Y), M)`, where the `vp.mask` intrinsic can be pushed down and combined with other mask intrinsics. So `vp.add(vp.add(X, Y, M), Z, M)` becomes `vp.mask(add(vp.mask(add(X, Y), M), Z), M)` becomes `vp.mask(vp.mask(add(add(X, Y), Z), M), M)` becomes `vp.mask(add(add(X, Y), Z), M)`, at which point `add(add(X, Y), Z)` optimizes as usual.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92086/new/

https://reviews.llvm.org/D92086