<div dir="ltr"><div>The front end I'm building for an existing interpreted language is unfortunately producing output similar to this far too often;<br><br>define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind {<br>


  %1 = tail call noalias i8* @malloc(i32 %len) nounwind<br>  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %1, i8* %src, i32 %len, i32 1, i1 false)<br>  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %1, i32 %len, i32 1, i1 false)<br>


  tail call void @free(i8* %1) nounwind<br>  ret void<br>}<br><br></div><div>I'd like to be able to reduce this pattern to this;<br><br>define void @foo(i8* nocapture %dest, i8* nocapture %src, i32 %len) nounwind {<br>


  tail call void @llvm.memcpy.p0i8.p0i8.i32(i8* %dest, i8* %src, i32 %len, i32 1, i1 false)<br>  ret void<br>}<br><br></div>Optimising all cases of this pattern from within my front end's AST would be difficult. I'd rather implement this as an llvm pass or two that runs after other function passes have already cleaned up the mess I've made.<br>


<br>Has anyone written any passes to detect and combine multiple memory copies that originated from the same data? <br>And then eliminate stores and malloc / free pairs for local pointers that are never read from or captured?<br>


<br><br></div>