[LLVMdev] X86 - Help on fixing a poor code generation bug
    Andrea Di Biagio 
    andrea.dibiagio at gmail.com
       
    Fri Dec  6 04:19:23 PST 2013
    
    
  
Hi Nadav and Cameron,
On Thu, Dec 5, 2013 at 5:58 PM, Nadav Rotem <nrotem at apple.com> wrote:
> Hi Andrea,
>
> Thanks for working on this. I can see two approaches to solving this problem. The first one (that you suggested) is to catch this pattern after register allocation. The second approach is to eliminate this redundancy during instruction selection. Can you please look into catching this pattern during iSel? The idea is that ADDSS does an ADD plus BLEND operations, and you can easily catch them. You can add a new target specific DAGCombine or a table-ten pattern. You should also handle mul/add/sub.
>
> define <4 x float> @foo(<4 x float> %A, <4 x float> %B) nounwind readnone ssp uwtable {
>   %1 = extractelement <4 x float> %B, i32 0
>   %2 = extractelement <4 x float> %A, i32 0
>   %3 = fadd float %2, %1                                                 //  Both the fadd and the insert element below should be matched into
>   %4 = insertelement <4 x float> %A, float %3, i32 0      //   an ADDSS which does an ADD and a BLEND in one instruction.
>   ret <4 x float> %4
> }
>
I found how to catch the pattern during ISel and I have got a patch
which I think fixes the problem.
I will upload a patch as soon as I finished to test it.
Thanks for the feedback!
Andrea Di Biagio
SN Systems - Sony Computer Entertainment Group
    
    
More information about the llvm-dev
mailing list