[Mlir-commits] [llvm] [mlir] [MLIR][OpenMP][OMPIRBuilder] Error propagation across callbacks (PR #112533)

Wed Oct 16 08:10:38 PDT 2024

skatrak wrote:

Thanks Michael for sharing your thoughts. I agree with you, in that I also think these errors are unrecoverable. The problem at the moment is that the OMPIRBuilder can't check whether the callbacks it calls have failed, so its codegen functions continue execution assuming the current state is valid. So, currently, when they fail for any of the reasons you mentioned (bugs, unsupported ops, etc) it can then dereference null pointers, trigger asserts later on, etc (see #109755 for an example).

So one solution to this would be what I understand you're proposing, which would be to print a crash report and exit the program directly from within these callbacks, so they never return. However, if you look everywhere else in the omp dialect to LLVM IR translation, you'll see that the expected method to handle errors like these there is to emit MLIR diagnostics and then return early with a `failure()` value (grep for 'emitError' to see many instances of this). This early exit propagates up the stack to the original caller to `mlir::translateModuleToLLVMIR()`, which then exits the program gracefully with an error code.

The solution I'm proposing is addressing long-standing TODO comments for supporting "error propagation in OpenMPIRBuilder" by:
1. Allowing callbacks passed to the OMPIRBuilder to return errors, so that OMPIRBuilder codegen functions can stop early and not cause an unrelated crash.
2. Allowing OMPIRBuilder codegen functions taking these callbacks to also return errors, so they can forward them to the caller.
3. Taking errors forwarded by the OMPIRBuilder to produce MLIR diagnostics, and follow the same error reporting approach when the error is related to the OMPIRBuilder and when it isn't.

https://github.com/llvm/llvm-project/pull/112533