High Granularity Quantization 2
HGQ2 (High Granularity Quantization 2) is a quantization-aware training framework built on Keras v3, targeting real-time deep learning applications on edge devices like FPGAs. It provides a comprehensive set of tools for creating and training quantized neural networks with minimal effort.
HGQ2 implements an gradient-based automatic bitwidth optimization and quantization-aware training algorithm. By laveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.
Key Features
High Granularity: HGQ supports per-weight and per-activation bitwidth optimization, or any other lower granularity.
Automatic Quantization: Bit-widths are optimized via gradients, no need to manually tune them in general.
What you see is what you get: One get exactly what you get from
Keras
models fromRTL
models. - still subject to machine float precision limitation.Accurate Resource Estimation:
EBOPs
estimated by HGQ gives a good indication of the actual resource usage on FPGA, either upper limit ofLUT
(da4ml
) orLUT + 55 * DSP
(hls4ml
).
In addition, this framework improves upon the old HGQ implementation in the following aspects:
Scalability: HGQ2 supports
TensorFlow
,JAX
, andPyTorch
. As XLA compilation inJAX
andTensorFlow
can significantly speed up the training process. Training speed on HGQ2 can be 1.2-5 times faster than the previous implementation.Quantizers: - Fixed-point: While the last implementation only optimizes the number of floating bits with one way of parameterizing the fixed-point numbers, HGQ2 supports multiple ways of parametrizing them, and allows of optimizing any part of them via gradients. - Minifloat: Training with minifloat quantization is supported, also with surrogate gradients support (alpha quality).
More Layers: More layers are supported now, including the powerful
EinsumDense(BatchNorm)
layer and theMultiHeadAttention
layer with bit-accurate softmax and scaled dot-product attention.
import keras
from hgq.layers import QDense, QConv2D
from hgq.config import LayerConfigScope, QuantizerConfigScope
# Setup quantization configuration
# These values are the defaults, just for demonstration purposes here
with (
# Configuration scope for setting the default quantization type and overflow mode
# The second configuration scope overrides the first one for the 'datalane' place
QuantizerConfigScope(place='all', default_q_type='kbi', overflow_mode='SAT_SYM'),
# Configuration scope for enabling EBOPs and setting the beta0 value
QuantizerConfigScope(place='datalane', default_q_type='kif', overflow_mode='WRAP'),
LayerConfigScope(enable_ebops=True, beta0=1e-5),
):
model = keras.Sequential([
QConv2D(32, (3, 3), activation='relu'),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Flatten(),
QDense(10)
])
Index
API Reference:
- hgq package
- hgq.config package
- hgq.constraints package
- hgq.layers package
- Subpackages
- Submodules
- hgq.layers.activation module
- hgq.layers.batch_normalization module
- hgq.layers.conv module
- hgq.layers.einsum_dense_batchnorm module
- hgq.layers.linformer_attention module
- hgq.layers.multi_head_attention module
- hgq.layers.pooling module
- hgq.layers.softmax module
- Module contents
QAdd
QAveragePooling1D
QAveragePooling2D
QAveragePooling3D
QAveragePow2
QAvgPool1D
QAvgPool2D
QAvgPool3D
QBatchNormDense
QBatchNormalization
QConv1D
QConv2D
QConv3D
QDense
QDot
QEinsum
QEinsumDense
QEinsumDenseBatchnorm
QGlobalAveragePooling1D
QGlobalAveragePooling2D
QGlobalAveragePooling3D
QGlobalAvgPool1D
QGlobalAvgPool2D
QGlobalAvgPool3D
QGlobalMaxPool1D
QGlobalMaxPool2D
QGlobalMaxPool3D
QGlobalMaxPooling1D
QGlobalMaxPooling2D
QGlobalMaxPooling3D
QLinformerAttention
QMaxPool1D
QMaxPool2D
QMaxPool3D
QMaxPooling1D
QMaxPooling2D
QMaxPooling3D
QMaximum
QMeanPow2
QMinimum
QMultiHeadAttention
QMultiply
QSoftmax
QSubtract
QSum
QUnaryFunctionLUT
Quantizer
- hgq.quantizer package
- hgq.regularizers package
- hgq.utils package
- qkeras package