Project Status
This page provides information about the current development status of HGQ2 features and components.
Stable Features
The following features are well-tested and ready for production use (with hls4ml integration):
Note
Only Vitis and Vivado (deprecated) backends are fully supported for hls4ml integration. Intra-layer heterogeneous activation will not work with other backends. However, heterogeneous weights quantization may work with other backends, but we have not tested it.
Core Functionality
✅ Quantization-aware training with fixed-point quantization
✅ Multi-backend support (TensorFlow, JAX, PyTorch)
✅ XLA compilation for accelerated training, Jax and TensorFlow only (PyTorch dynamo is under investigation)
✅ hls4ml integration for FPGA deployment
Quantization Types
✅ kif (Keep-Integer-Fraction) fixed-point quantization
✅ kbi (Keep-Bit-Integer) fixed-point quantization
Layers
✅ Core layers: QDense
✅ Non-linear Activation layers: QUnaryFunctionLUT
✅ Convolutional layers: QConv1D, QConv2D, QConv3D (non-dilated only)
✅ Normalization: QBatchNormalization
✅ Basic operation layers: QAdd, QMultiply, QSubtract, QMaximum, QMinimum
Utilities
✅ trace_minmax: Calibration for bit-exact inference
✅ EBOP tracking: Resource estimation during training
✅ BetaScheduler: Dynamic adjustment of regularization strength
Alpha/Beta Features
These features are functional but may undergo changes or improvements:
Beta Features
🟡 QEinsumDense, Qinsum: Only
io_parallelsupported in hls4ml. Mostly stable but implementation athls4mlis not fully optimal.parallelization_factoris not implemented.🟡 QEinsumDenseBatchnorm, QBatchNormDense: Fused batchnorm layers. We observe some instability for them during training. Once trained, conversion to firmware is regarded as stable.
🟡 QSoftmax: Bit-exact quantized softmax implementation for
hls4mlconvertion. May be unstable during training. Once trained, conversion to firmware is regarded as stable.🟡 QDot: The
hls4mlimplementation supports onlyi,i->pattern dot product, and only works inio_parallel. UseQEinsuminstead if more general dot product is needed.
Alpha Features
🟠 Minifloat quantization: Training support is available, but hardware synthesis is not yet supported. Not widely tested.
🟠 QMultiHeadAttention: Only
io_parallelsupported inhls4ml. Implementation is likely not optimal, but can generate synthesizable firmware.🟠 QSum, QMeanPow2: No
hls4mlsupport at the moment.🟠 QConv3D: No
hls4mlsupport.
Warning
QConv*D layer has greater numerical instability on the torch backend compared to jax and tf. Expect to slight difference between the keras model and the hls4ml model. Though, the difference should not affect the performance of the model significantly.
Version Compatibility
HGQ2 requires Keras v3
For hls4ml integration, the fork at https://github.com/calad0i/hls4ml/tree/da4ml-v2 is required
Reporting Issues
If you encounter any issues or bugs, please report them on the GitHub repository issue tracker with:
A minimal reproducible example
Your environment details (OS, backend, Python version, etc.)
Expected vs. actual behavior