<div dir="auto"><div dir="auto">I'm trying to determine why offload outlining is done in clang. As far back as 2012 or so the docs suggest that in IR would be better but doesn't work. A thesis by Novillo is cited in one that talks about making optimizations better, but doesn't obviously say why IR doesn't work for it / what modifications would be required.</div><div dir="auto"><br></div><div dir="auto">There are challenges compiling gpu code under the ssa/simt model, but they're common to opencl/hip. I think we basically opt out of a lot of optimizations at present.</div><div dir="auto"><br></div><div dir="auto">Moving memory ops across a target region won't work but that's manageable. Live in/out are roughly what we copy into the object for early outlining.</div><div dir="auto"><br></div><div dir="auto">I'll ask this at the round table later, but any information the community can offer before that will be helpful.</div><div dir="auto"><br></div><div dir="auto">Thanks!</div><div dir="auto"><br></div><div dir="auto">Jon</div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><div dir="auto"><br></div><br></div>