[LLVMdev] Let's get rid of neverHasSideEffects

Wed Aug 22 10:41:27 PDT 2012

On Aug 21, 2012, at 4:41 PM, Chris Lattner <clattner at apple.com> wrote:

> On Aug 21, 2012, at 3:45 PM, Jakob Stoklund Olesen <stoklund at 2pi.dk> wrote:
>>> I don't understand what you're saying.  Are you proposing that all properties (may load, store, side effects) be explicitly added to all instructions, and the pattern only be used to produce warnings?
>> 
>> Yes.
>> 
>> The side effect inference is worse than the load/store inference, but features like this work best when they work all the time. If all instructions had patterns, and we could accurately infer properties, it wouldn't be a problem.
> 
> Ok, I agree with that.  I think it's crazy that we still can't write a pattern for copy/move instructions, so we have to smatter attributes on them.  What are the other cases that we can't write a pattern for them?

Off the top of my head, I'm sure Jim and Owen could pile on:

- Copy/move instructions. We specifically don't want isel to emit these instructions. We prefer a generic COPY which can be recognized quickly and completely by SSA passes and coalescing/regalloc. The only attribute that needs to be set on a copy/move instruction is neverHasSideEffects - precisely the default I want to change.

- Big chunks of X86InstrSystem.td. Some of these have intrinsics, some don't. We want accurate attributes on all of them.

- Assembly-only instructions. Some instructions have variants that can be written in assembly, but that isel would never produce.

- Disassembly-only instructions. Some variants can't even be written in assembly, but a disassembler can decode it, and we want accurate attributes for binary analysis.

- Instructions that are produced by later codegen passes. Opcodes for branch relaxation, cbz formation on ARM, the 27 variants of xmm loads and stores on x86, etc.

- The non-pseudo versions of the x87 instruction set.

- Instructions that produce non-existant types. ARM's VLD1D3 instruction class loads 3 consecutive D-registers. There is no v3f64 or v3i64 type, and I don't think we want there to be.

- Instructions that produce illegal types. ARM's VLD1D4 instruction class could be modeled as a v4f64 load, but we don't support other operations on that type, and it wouldn't make sense to make it legal. If I put a pattern on such an instruction, I would rather have TableGen tell me that the pattern will never be able to match.

- Instructions that don't map 1-1 to their intrinsics. Some NEON intrinsics expand to two instructions.

- Instructions that read or write specific physregs, like DIV and MUL. Clearly a TableGen defect, as you mentioned.

- Instructions that define multiple values. Another TableGen shortcoming.

In many of these cases, we don't ever want isel emitting the instructions. I think adding disabled patterns to such instructions as a way of specifying properties would be backwards.

> I agree that it is sometimes convenient to write Pat<> patterns instead of putting a pattern on an instruction: why not make tblgen also infer predicates from Pat patterns that expand to a single instruction?

I would love to treat Pat<> and instruction patterns equally. Keep reading.

> Personally, I don't like the direction of making everything be redundantly specified with "let" clauses and in the patterns.  I agree that it is a problem that we're not inferring from Pat<> patterns, and that not all instructions are expressible with patterns, but I'd rather we solve *those* problems than throw out inference.

We need to fix the current approach. It appears capricious to someone who isn't closely familiar, and that is most of us. Two features are directly harmful: Inference silently falls back to plain guessing when it fails, and when it works, you get a warning about any redundantly specified properties.

I would compare it to a clang warning: "'x' is initialized, but I can prove that the value is always overwritten before being used." You remove the initializer on x to get rid of the warning. When the code later changes to read the uninitialized value, clang silently guesses that you wanted x to be 7. We would drive Doug out of town with pitchforks if clang did something like that.

We could improve the inference coverage to include Pat<>'s, and I think we should. However, it would only add caprice. Somebody changes a single-instruction Pat<> to a multi-instruction pattern, and we expect him to know that he needs to decorate the instruction with attributes? That's not a reasonable expectation.

A tolerable inference approach is possible, but it absolutely needs to:
- Don't guess when it fails, but complain loudly if some attributes are not given explicitly, and
- Permit and verify explicitly set attributes when it does work.

I don't want to do that because I think we can do better. I want to:

- Explicitly specify all instruction properties that differ from the sane defaults (not a load, not a store, no side effects). Exploiting the regularity in an instruction set to do this easily is actually one of TableGen's strong sides.

- Use the current inference code to emit warnings instead. This would significantly improve TableGen diagnostics.

- Analyze single- and multi-instruction Pat<>'s the same way instruction patterns are analyzed. Notice how easy it is to analyze multi-instruction patterns compared to the near-impossible task of using them for inference.

This has real benefits. If you factor your instruction classes well, you can tag entire classes of instructions with mayLoad/mayStore very easily. Obscure opcodes that would never get a pattern are likely to be tagged correctly, and you get free verification of the instructions that do have patterns. Any kind of pattern.

It also improves readability by making in-instruction patterns and Pat<>'s equals, and by treating all instruction properties equally. There will be much fewer surprises for the casual visitor.

/jakob