[llvm-dev] Inserting function call in LLVM code before vector instruction - do it in LoopVectorize.cpp? [was Re: Prologue and epilogue for vectorized code]

Wed May 11 13:44:27 PDT 2016

   Hello.
     I come back with this question, rephrased a bit. Note that I guess this question 
should be useful also for the NVPTX LLVM back end, when it will generate automatically 
code for both CPU and NVIDIA device and generate automatically memory transfers, with 
cudaMemcpy().

     Given LLVM scalar and vector code I want to generate code for both the scalar CPU and 
for my research Connex SIMD unit. The CPU and SIMD unit have different memory spaces and 
we require to perform memory transfer from CPU to my Connex SIMD unit, via DMA, to 
"synchronize" the 2 memories.

     Therefore, in the LLVM code with vector instructions I need to add (on the way to 
code generation) a call to a function performing the memory transfer from CPU to my Connex 
SIMD unit. More exactly, for the LLVM code below (obtained from LLVM's opt tool):
       ...
       %8 = getelementptr inbounds [10000 x float], [10000 x float]* @A, i64 0, i64 %7
       %9 = bitcast float* %8 to <32 x float>*
       %wide.load = load <32 x float>, <32 x float>* %9, align 4
       [more...]
     I want on the CPU to add a call to an external function writeDataToArray() like this:
         ...
         %8 = getelementptr inbounds [10000 x float], [10000 x float]* @A, i64 0, i64 %7
         %9 = bitcast float* %8 to <32 x float>*
         call writeDataToArray(%9, 128, 0) ; 2nd parameter is the transfer size in bytes, 
3rd param is the offset to write in the local memory of the SIMD unit
       and, then, run only the following code on the SIMD unit:
         %newVar = getelementptr inbounds i32, i32* inttoptr (i64 0 to i32*), i64 0
         %dst = load <32 x float>, <32 x float>* %newVar, align 4
         [more...]

     Should I perform the insertion of this function call in LLVM's 
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp in method:
        /// Vectorize Load and Store instructions,
        virtual void vectorizeMemoryInstruction(Instruction *Instr) ?
     Or should I do it as a separate LLVM pass or maybe in the back end?


   Thank you,
     Alex


On 4/28/2016 1:46 AM, Alex Susu wrote:
>    Hello.
>      I'd like to generate a sort of prologue+epilogue for a code block running on a SIMD
> architecture obtained from the LLVM loop vectorizer. My SIMD processor receives data from
> the CPU via DMA transfer and sends it via DMA transfer or a FIFO.
>     It is exactly for these transfers that I need to write the prologue+epilogue -
> relatively simple, e.g. a call to a function like TransferViaDMA().
>     Although it doesn't seem to be very difficult, I'm curious what is the best way to do it.
>
>     I haven't found anybody to write prologue+epilogue for vector code (obtained from the
> loop vectorizer), and although it shouldn't be very different from the prologue+epilogue
> for function call, I'm still curious what's the best way to do it.
>
>     Please let me know what do you recommend.
>
>   Thank you,
>     Alex