[PATCH] D70107: [VFABI] TargetLibraryInfo mappings in IR.

Francesco Petrogalli via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue May 5 08:36:06 PDT 2020


fpetrogalli added a comment.

In D70107#2020413 <https://reviews.llvm.org/D70107#2020413>, @anna wrote:

>


[...]

>> I think your starting point should be to make sure that the front end generates the exact list of functions you want to provide in vector form, using the attribute and relative declarations. Once you have verified the declarations are there, check that the vectorizer vectorizes as expected. If not, improve whatever part of the middle end opt that is needed to make your input IR work.
> 
> I agree with all of the points. Again, to state, for a simple scalar function, we will have 5 vector forms being generated (2,4,8,16 and 32) and we'll have to start recording each of those declarations in the module. Is that right? This will functionally work for us (since we've tried a similar idea in our pipeline).

"is that right?" -> yes :)

> 
> 
>> 
>> 
>>> I was just curious if we had some other attribute I hadn't noticed :)  It's just that adding these bunches of declaration from front-end seemed a lot of such declarations going through each pass (number of scalar functions * 5).
>> 
>> I can see that this might not "look nice" from some points of view, but it is the best way to guarantee that front-end and middle-end are decoupled to be able to unit-test each components independently. In my past I have gone through testing a front-end coupled with a backend - you don't wanna do that if you want to keep sane! :)
> 
> Ah, so there is some difference here. Our front-end and LLVM is completely decoupled (more details here: https://llvm.org/devmtg/2017-10/slides/Reames-FalconKeynote.pdf), but we have a mechanism to query from LLVM to our java VM for anything we want more information about (in this case, pass in the exact set of declarations). So, we can always guarantee the correct set of declarations are retrieved. Building the signature at compile time without any input from FE will be problematic (as you have pointed out above). I can see why the declarations are marked as required for the attribute.

Good that we reached the same conclusion after going through the same process of discovery!

> 
> 
>> the pass assumes power of 2 because the TLI assumes power of two. The pass doesn't know anything about the vectorizer.
>> 
>>> but once we start supporting any number for VF (for example in middle-end it's VF=6 and backend decides what's the correct VF is),
>> 
>> Of course, the TLI assumes power of 2 because the vectorizer assumes power of 2. It is a chain. If you want to vectorize VF=6, I think you should start from the vectorizer.
> 
> Agreed, I was just it  pointing out (and to be clear, this seems to be the assumption in various other parts of the vectorizer as well). :)

Yep.

>>>   we will start having way too many declarations in module (and the pass will also need to be updated).
>> 
>> I think you need to define how many are too many. Even if the IR file will seem to have many unused declarations, those will not end up in an object file, and will not be useless because they could be used by other optimization passes if needed. It seems to be the only way we can keep the scalar-to-mapping info in a useful place.
> 
> So, as stated previously in numbers, we have something like 5 * number of scalar functions which have vector mappings. In our case, we will have 5 per scalar because there is nothing preventing generating a vectorized power-of-2 VF.

Well, the cost model selects the one that is more appropriate for the datatype and the content of the loop. So, for example, if your loop is processing 64-bits scalars, and your vector registers are 128-bit wide, it is very unluckily that the vectorizer will select anything other than VF = 2... So if your function processes 64-bit data, you should first emit the 2-labes version of the vector function in the module, without bothering generating the other 4/8/16/32 lanes ones if the vectorizer already picks up the 2-lane version.

> I remember seeing some "vector length agnostic function", perhaps those can be generated on the fly, if we specify something like _vN rather than _v2 or _v4 etc?

Vector Lengh Agnostic (VLA) is currently used only when targeting the scalable vector extension (SVE, of AArch64), because it uses a property of the underlying hardware. You cannot used it for a fixed width vector extension.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70107/new/

https://reviews.llvm.org/D70107





More information about the llvm-commits mailing list