[PATCH] D43306: [X86] Add pass to infer required-vector-width attribute based on size of function arguments and use of intrinsics

Tue Mar 27 15:01:43 PDT 2018

echristo added a comment.

> So for both modes, if there are any larger scalar types in the loop, those types will be multiplied by the VF factor and cause the vectorizer to create types that are larger than the maximum hardware register size. These types will exist all the way to codegen, and it is up to the SelectionDAG type legalization process or target specific DAG combines to split these larger operations. In some cases the larger types are used to represent idioms that contain extends and truncates that we were able to combine to more complex instructions without doing a split. X86 probably has more pre type legalization target specific DAG combines here than most other targets. And we are doing our own splitting for type legalization in some of these combines. This prevents the type legalizer from needing to deal with target specific nodes. Though the type legalizer is capable of calling ReplaceNodeResults and LowerOperation for target specific nodes.

This is interesting, and perhaps not what we should do.

> In addition to the vectorizer, I've believe I've seen evidence of InstCombine creating larger types when it canonicalizes getelementptrs to have pointer width types by inserting sign extends. This shows up in vector GEPs used by gather and scatter intrinsics. So if the GEP used a v16i32 index, InstCombine will turn it into v16i64. X86 tries to remove some of these extends with a DAG combine, but we're not great at finding a uniform base address for gathers which causes the extend to be hidden.  So we end up spltting gathers when their result type is the maximum hardware register width, but their index is twice as large.

This is perhaps also not what we should do with vector accessed operations.

> Based on this current behavior of IR passes with respect to hardware vector width and the fact that codegen already has to handle illegal types. So the first part of my changes(prefer-vector-width), make the vectorizer behavior similar between prefer-vector-width=256 on an avx512f target and an avx2 target by making getRegisterBitWidth return 256 in both cases. So under prefer-vector-width=256, the vectorizer will calculate the same VF factor it would for an AVX2

> The patch presented here overcomes this limitation by providing a required-vector-width attribute that can be used to tell codegen that there is nothing in the IR that strictly requires wider vectors to avoid compilation failures or ABI mismatches. With this we can tell the type legalization process and the X86 DAG combines that only 256 bit vectors are legal. With this the type legalization process should be carried out in a very similar way to AVX2, splitting the wider types from the vectorizer in the same way. But we also gain the AVX512VL specific instructions like scatter and masked operations that aren't present in AVX2.

I don't think this part is what we should do. We shouldn't change the size of a register, but rather change the optimizers to prefer smaller things. Legal versus preferred is a big difference.

> What I'm looking for is a way to guarantee that with prefer-vector-width=256, starting from scalar code and vectorizing it that we won't end up with 512 registers in the output. Even a few of them are enough to trigger than frequency reduction penalty I am trying to avoid. Given the current behavior of IR passes and their expectations of what codegen can handle, I don't see a way to make this guarantee from IR. And it would be easy to break in the future as new passes or passes are modified. The codegen type legalization process seems the best place to handle this. But we need to be able to communicate when it is safe to do so.

I think it might be a lot of work, but possible. I think that we might abuse the legalizer here too much. For example, let's say I want to limit the vector size to 256, but in some specific code I use the intrinsics or other vector code to explicitly ask for 512-bit vectors. I should still be able to get that I think? Right now you'll say that the code isn't legal and try to lower it.

-eric

https://reviews.llvm.org/D43306