[PATCH] D89697: * [x86] Implement smarter instruction lowering for FP_TO_UINT from vXf32 to vXi32 for SSE2 and AVX2 by using the exact semantic of the CVTTPS2SI instruction.

Mon Oct 19 09:08:20 PDT 2020

TomHender added a comment.

@RKSimon I ran it through llvm-mca now. It gives me a reciprocal throughput of 3.5 for Silvermont and 3 for Haswell for the new instruction sequence.

The problem I see with these values is that they seem incorrect relative to the others. Many values are off by a large factor from what I get when I run them through llvm-mca.
Like { ISD::SINT_TO_FP, MVT::v2f64, MVT::v16i8, 16*10 } for SSE2 for example. I don't understand how anyone came up with an astronomic cost of 160. @RKSimon's method gives a value of 3 for me (Godbolt-Link <https://gcc.godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(fontScale:14,j:1,lang:c%2B%2B,selection:(endColumn:14,endLineNumber:3,positionColumn:14,positionLineNumber:3,selectionStartColumn:14,selectionStartLineNumber:3,startColumn:14,startLineNumber:3),source:'using+v0+%3D+__attribute__((vector_size(2)))+signed+char%3B%0Ausing+v1+%3D+__attribute__((vector_size(16)))+double%3B%0Av1+get(v0+a)+%0A%7B%0A++++return+__builtin_convertvector(a,+v1)%3B%0A%7D'),l:'5',n:'0',o:'C%2B%2B+source+%231',t:'0')),k:42.8235294117647,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:clang_trunk,filters:(b:'0',binary:'1',commentOnly:'0',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'1',trim:'1'),fontScale:14,j:1,lang:c%2B%2B,libs:!(),options:'-O3+-g0+-msse2',selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'x86-64+clang+(trunk)+(Editor+%231,+Compiler+%231)+C%2B%2B',t:'0')),k:44.84967320261438,l:'4',m:50,n:'0',o:'',s:0,t:'0'),(g:!((h:ir,i:(editorid:1,fontScale:14,j:1,selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1)),l:'5',n:'0',o:'x86-64+clang+(trunk)+IR+Viewer+(Editor+%231,+Compiler+%231)',t:'0'),(h:tool,i:(args:'-mcpu%3Dslm',argsPanelShow:'1',compiler:1,editor:1,fontScale:14,stdin:'',stdinPanelShown:'1',toolId:llvm-mcatrunk,wrap:'1'),l:'5',n:'0',o:'llvm-mca+(trunk)+%231+with+x86-64+clang+(trunk)',t:'0')),header:(),l:'4',m:50,n:'0',o:'',s:1,t:'0')),k:57.1764705882353,l:'3',n:'0',o:'',t:'0')),l:'2',n:'0',o:'',t:'0')),version:4>). Even exploring the generated machine code for ancient LLVM versions or the Agner Fog instruction table for VIA Nano 3000 cannot explain this magnitude.

I must be misunderstanding something still.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D89697/new/

https://reviews.llvm.org/D89697