[llvm] CodeGen: Add -denormal-fp-math-bf16 flag (PR #90425)

Mon May 20 14:16:15 PDT 2024

andykaylor wrote:

> > Is this really a case of "defective instructions" or is it just a difference between the way that Intel processors understand the bfloat16 type compared to other architectures?
> 
> As a format, it's just IEEE with a different combination of mantissa and exponent widths. Denormals have a specific, and clear meaning here and there's no implied flushing on computation

I don't accept that. Denormals have a specific and clear meaning in the IEEE types, but once you say "This is like the IEEE type except..." you can no longer assume anything that isn't part of the specification for the new type. Is there an accepted standard specification for this type? I'm not sure how this would be adjudicated apart from reference to actual implementations defining an ad hoc standard.

Here's something that comes close to a definition for the type, though it isn't presented as a formal specification:

https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus

That document says, "To ensure identical behavior for underflows, overflows, and NaNs, bfloat16 has the same exponent size as FP32. However, bfloat16 handles denormals differently from FP32: it flushes them to zero."

> > The Intel white paper on bfloat16 (https://www.intel.com/content/www/us/en/content-details/671279/bfloat16-hardware-numerics-definition.html) says, "There is no need to support denormals; FP32, and therefore also BF16, offer more than enough range for deep learning training tasks."
> 
> I don't know how to parse this comment. Denormals, in what type? I almost read this as "you don't need to handle fp16 denormals if you process in bf16 instead". At worst it's a subjective value judgement that bad behavior is OK, but I'm not sure that's what it's really saying

I will admit that doesn't sound like a technical specification so much as a marketing pitch. I haven't spoken to any of the Intel hardware engineers responsible for the BF16 implementation in Intel processors, and any opinions I am expressing here do not represent Intel's official position. I'm just offering my interpretation, and my interpretation is that support for denormals is not required for bfloat16.

https://github.com/llvm/llvm-project/pull/90425