[PATCH] D42885: [AMDGPU] intrintrics for byte/short load/store

Mon Mar 11 13:57:27 PDT 2019

rtaylor added inline comments.

================
Comment at: lib/Target/AMDGPU/SIISelLowering.cpp:7834
+    DCI.DAG.viewGraph();
+    auto *M = cast<MemSDNode>(Src);
+    SDValue Ops[] = {
----------------
arsenm wrote:
> arsenm wrote:
> > rtaylor wrote:
> > > arsenm wrote:
> > > > rtaylor wrote:
> > > > > arsenm wrote:
> > > > > > This is missing a check on the source type. If you want to be fancier, you can split out the remainder bits into a new sign extend but there probably isn't much reason to
> > > > > Src is the BUFFER_LOAD_XXX. The only way this code is executed is if the Src is a BUFFER_LOAD_XXX. I'm not sure we need a redundant check here do we?
> > > > The number of bits in the sext_inreg may not match the load's from-8/16 bit source. You can test this with something like
> > > > %load = call llvm.amdgcn.buffer.load.i8()
> > > > %ext = zext i8 %load to i32
> > > > %shl = shl i32 %ext, 27
> > > > %shr = ashr i32 %shl, 27
> > > > 
> > > > There will need more shifts to clear the extra bits in the loaded value
> > > This should produce a buffer_load_sbyte right? That is what it does currently.
> > But it needs additional shifts even after. Right now you'll not be clearing the extra bits in the low 8 that need to be
> You can either leave it as
> x = buffer_load_ubyte
> sext_inreg x, i27
> 
> or 
> x = buffer_load_sbyte
> sext_inreg x, i5
> 
> I'm not sure there's much practical difference between them
Right, I don't think there is, I'm working on doing the former. Thanks.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D42885/new/

https://reviews.llvm.org/D42885