High Granularity Quantization 2
HGQ2 (High Granularity Quantization 2) is a quantization-aware training framework built on Keras v3, targeting real-time deep learning applications on edge devices like FPGAs. It provides a comprehensive set of tools for creating and training quantized neural networks with minimal effort.
HGQ2 implements an gradient-based automatic bitwidth optimization and quantization-aware training algorithm. By laveraging gradients, it allows for bitwidth optimization at arbitrary granularity, up to per-weight and per-activation level.
Key Features
High Granularity: HGQ supports per-weight and per-activation bitwidth optimization, or any other lower granularity.
Automatic Quantization: Bit-widths are optimized via gradients, no need to manually tune them in general.
What you see is what you get: One get exactly what you get from
Kerasmodels fromRTLmodels. - still subject to machine float precision limitation.Accurate Resource Estimation:
EBOPsestimated by HGQ gives a good indication of the actual resource usage on FPGA, either upper limit ofLUT(da4ml) orLUT + 55 * DSP(hls4ml).
In addition, this framework improves upon the old HGQ implementation in the following aspects:
Scalability: HGQ2 supports
TensorFlow,JAX, andPyTorch. As XLA compilation inJAXandTensorFlowcan significantly speed up the training process. Training speed on HGQ2 can be 1.2-5 times faster than the previous implementation.Quantizers: - Fixed-point: While the last implementation only optimizes the number of floating bits with one way of parameterizing the fixed-point numbers, HGQ2 supports multiple ways of parametrizing them, and allows of optimizing any part of them via gradients. - Minifloat: Training with minifloat quantization is supported, also with surrogate gradients support (alpha quality).
More Layers: More layers are supported now, including the powerful
EinsumDense(BatchNorm)layer and theMultiHeadAttentionlayer with bit-accurate softmax and scaled dot-product attention.
import keras
from hgq.layers import QDense, QConv2D
from hgq.config import LayerConfigScope, QuantizerConfigScope
# Setup quantization configuration
# These values are the defaults, just for demonstration purposes here
with (
# Configuration scope for setting the default quantization type and overflow mode
# The second configuration scope overrides the first one for the 'datalane' place
QuantizerConfigScope(place='all', default_q_type='kbi', overflow_mode='SAT_SYM'),
# Configuration scope for enabling EBOPs and setting the beta0 value
QuantizerConfigScope(place='datalane', default_q_type='kif', overflow_mode='WRAP'),
LayerConfigScope(enable_ebops=True, beta0=1e-5),
):
model = keras.Sequential([
QConv2D(32, (3, 3), activation='relu'),
keras.layers.MaxPooling2D((2, 2)),
keras.layers.Flatten(),
QDense(10)
])
Index
API Reference:
- hgq package
- hgq.config package
- hgq.constraints package
- hgq.layers package
- Subpackages
- Submodules
- hgq.layers.activation module
- hgq.layers.batch_normalization module
- hgq.layers.conv module
- hgq.layers.einsum_dense_batchnorm module
- hgq.layers.linformer_attention module
- hgq.layers.multi_head_attention module
- hgq.layers.pooling module
- hgq.layers.softmax module
- Module contents
QAddQAveragePooling1DQAveragePooling2DQAveragePooling3DQAveragePow2QAvgPool1DQAvgPool2DQAvgPool3DQBatchNormDenseQBatchNormalizationQConv1DQConv2DQConv3DQDenseQDenseTQDotQEinsumQEinsumDenseQEinsumDenseBatchnormQGRUQGlobalAveragePooling1DQGlobalAveragePooling2DQGlobalAveragePooling3DQGlobalAvgPool1DQGlobalAvgPool2DQGlobalAvgPool3DQGlobalMaxPool1DQGlobalMaxPool2DQGlobalMaxPool3DQGlobalMaxPooling1DQGlobalMaxPooling2DQGlobalMaxPooling3DQLinformerAttentionQMaxPool1DQMaxPool2DQMaxPool3DQMaxPooling1DQMaxPooling2DQMaxPooling3DQMaximumQMeanPow2QMinimumQMultiHeadAttentionQMultiplyQSimpleRNNQSoftmaxQSubtractQSumQUnaryFunctionLUTQuantizer
- hgq.quantizer package
- hgq.regularizers package
- hgq.utils package
- qkeras package