[PATCH] D21354: Remove redundant direct moves when extracting integers and converting to FP

Tue Jun 14 20:54:16 PDT 2016

amehsan added a comment.

This is how I think of this problem. Currently, when we encounter extract_element of a vector of integers, we generate shift and direct moves. That causes some inefficient patterns like the example for which I opened this bug.

I prefer to generalize the problem and, if possible, come up with a solution for the most general case, instead of focusing on this particular pattern. The more general problem that I can see is this: When we extract an element of an integer vector there are a number of possibilities.

1. all uses of the extracted element are in scalar integer operations.
2. The extracted element is going to be converted to a floating point and used in scalar or vector FP calculations.
3. The extracted element is going to be used in some subsequent vector integer calculations (for example inserted into a vector of int).
4. some mix of the above three cases.

Now I am not familiar with all ISD opcodes, so the rest of this comment, is missing some concrete details, but I hope my point is clear.

I think the right solution, will look at all uses of the extract_element and then based on those uses it will make sure the DAG has proper ISD opcodes so the subsequent pattern matching generates the right code.

For example if we only have (1) then we want the current pattern to be generated. If we have (1) and (2) then we need the direct move, but we don't want the users of type  (2) to convert it back from integer to fp. I am not sure if (2) and (3) has to be treated as two different cases or not. The only difference here is register classes but probably at that stage we are still not aware of register classes.

I believe it should be possible to have DAG combine to massage the DAG depending on what kind of users extract_element has and then some simple and generic pattern matching will generate efficient code. Hopefully with this approach we do not need to hardcode many different patterns in the target description files.

If I am missing something about DAGCombine, or otherwise, that makes this approach non-practical, please let me know.

Repository:
  rL LLVM

http://reviews.llvm.org/D21354