da4ml.cmvm package
Subpackages
Submodules
da4ml.cmvm.api module
- da4ml.cmvm.api.jit_solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1) Pipeline
Optimized implementation of a CMVM computation with cascaded two matrices.
- Parameters:
kernel (np.ndarray) – The input kernel matrix to be implemented.
method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].
method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’
hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)
decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)
qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)
inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)
adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)
carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)
- Returns:
A solution containing the optimized implementation of the CMVM computation with cascaded stages.
- Return type:
CascadedSolution
- da4ml.cmvm.api.minimal_latency(kernel: ndarray, qintervals: list[QInterval], latencies: list[float], carry_size: int = -1, adder_size: int = -1)
Fast latency calculation for a given kernel, QInterval, and input latencies. When carry_size=-1, and the input latency is constant l: this will be the same as l + max(ceiling(log2(max(#CSD bits for each column, 1)))).
- Parameters:
kernel (np.ndarray) – The input kernel matrix.
qintervals (list[QInterval]) – List of QIntervals for each input.
latencies (list[float]) – List of latencies for each input
carry_size (int, optional) – The size of the carry unit for latency computation, by default -1 (fixed latency for each addition operation)
adder_size (int, optional) – The size of the adder unit for latency computation, by default -1 (fixed cost for each addition operation)
- Returns:
The minimal latency for the given kernel, QInterval, and input latencies.
- Return type:
float
- da4ml.cmvm.api.solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1, search_all_decompose_dc: bool = True) Pipeline
Solve the CMVM problem with cascaded two matrices.
- Parameters:
kernel (np.ndarray) – The input kernel matrix to be implemented.
method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].
method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’
hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)
decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)
qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)
inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)
adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)
carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)
search_all_decompose_dc (bool, optional) – If True, search for all possible decomposition depth constraints. If False, use the provided decompose_dc value. Default is True.
- Returns:
A solution containing the optimized implementation of the CMVM computation with cascaded stages.
- Return type:
CascadedSolution
- class da4ml.cmvm.api.solver_options_t
Bases:
TypedDict- adder_size: int
- carry_size: int
- decompose_dc: int
- hard_dc: int
- method0: str
- method1: str
- offload_fn: None | Callable[[ndarray, FixedVariableArray], ndarray]
Callable taking in (constant_matrix, fixed_variable_array) and returning a boolean mask of which weights to offload to multiplication operations.
- search_all_decompose_dc: bool
da4ml.cmvm.types module
- class da4ml.cmvm.types.CombLogic(shape: tuple[int, int], inp_shifts: list[int], out_idxs: list[int], out_shifts: list[int], out_negs: list[bool], ops: list[Op], carry_size: int, adder_size: int, lookup_tables: tuple[LookupTable, ...] | None = None)
Bases:
NamedTupleA combinational logic that describes a series of operations on input data to produce output data.
- shape
#input, #output
- Type:
tuple[int, int]
- inp_shifts
The shifts that should be applied to the input data.
- Type:
list[int]
- out_idxs
The indices of the output data in the buffer.
- Type:
list[int]
- out_shifts
The shifts that should be applied to the output data.
- Type:
list[int]
- out_negs
The signs of the output data.
- Type:
list[bool]
- carry_size
Size of the carrier for the adder, used for cost and latency estimation.
- Type:
int
- adder_size
Elementary size of the adder, used for cost and latency estimation.
- Type:
int
- lookup_tables
Lookup table arrays for lookup operations, if any.
- Type:
tuple[LookupTable, …] | None
The core part of the comb logic is the operations in the ops list. For the exact operations executed with Op, refer to the Op class. After all operations are executed, the output data is read from data[op.out_idx] and multiplied by 2**out_shift.
- adder_size: int
Alias for field number 7
- carry_size: int
Alias for field number 6
- property cost
Total cost of the solution.
- classmethod deserialize(data: list)
Load the solution from a file.
- property inp_kifs
KIFs of all input elements of the solution.
- property inp_latency
Latencies of all input elements of the solution.
- property inp_qint
Quantization intervals of the input elements.
- inp_shifts: list[int]
Alias for field number 1
- property kernel
the kernel represented by the solution, when applicable.
- property latency
Minimum and maximum latency of the solution.
- classmethod load(path: str | Path)
Load the solution from a file.
- lookup_tables: tuple[LookupTable, ...] | None
Alias for field number 8
- out_idxs: list[int]
Alias for field number 2
- property out_kifs
KIFs of all output elements of the solution.
- property out_latency
Latencies of all output elements of the solution.
- out_negs: list[bool]
Alias for field number 4
- property out_qint
Quantization intervals of the output elements.
- out_shifts: list[int]
Alias for field number 3
- predict(data: ndarray[tuple[Any, ...], dtype[_ScalarT]] | Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]], n_threads: int = 0) ndarray[tuple[Any, ...], dtype[float64]]
Predict the output of the solution for a batch of input data with cpp backed DAIS interpreter. Cannot be used if the binary interpreter is not installed.
- Parameters:
data (NDArray|Sequence[NDArray]) – Input data to the model. The shape is ignored, and the number of samples is determined by the size of the data.
n_threads (int) – Number of threads to use for prediction. Negative or zero values will use maximum available threads, or the value of the DA_DEFAULT_THREADS environment variable if set. Default is 0. If OpenMP is not supported, this parameter is ignored.
- Returns:
Output of the model in shape (n_samples, output_size).
- Return type:
NDArray[np.float64]
- property ref_count: ndarray
The number of references to the output elements in the solution.
- save(path: str | Path)
Save the solution to a file.
- save_binary(path: str | Path, version: int = 0)
Dump the solution to a binary file.
- shape: tuple[int, int]
Alias for field number 0
- to_binary(version: int = 0) ndarray[tuple[Any, ...], dtype[int32]]
- class da4ml.cmvm.types.DAState(shifts: tuple[ndarray[tuple[Any, ...], dtype[int8]], ndarray[tuple[Any, ...], dtype[int8]]], expr: list[ndarray[tuple[Any, ...], dtype[int8]]], ops: list[Op], freq_stat: dict[Pair, int], kernel: ndarray[tuple[Any, ...], dtype[float32]])
Bases:
NamedTupleInternal state of the DA algorithm.
- expr: list[ndarray[tuple[Any, ...], dtype[int8]]]
Alias for field number 1
- kernel: ndarray[tuple[Any, ...], dtype[float32]]
Alias for field number 4
- shifts: tuple[ndarray[tuple[Any, ...], dtype[int8]], ndarray[tuple[Any, ...], dtype[int8]]]
Alias for field number 0
- class da4ml.cmvm.types.JSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)
Bases:
JSONEncoder- default(o)
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return super().default(o)
- class da4ml.cmvm.types.Op(id0: int, id1: int, opcode: int, data: int, qint: QInterval, latency: float, cost: float)
Bases:
NamedTupleOne single operation on the data buffer.
- Parameters:
id0 (int) – index of the first operand
id1 (int) – index of the second operand, or special opcode if negative
opcode (int) – 0: addition, 1: subtraction, 2: relu, 3: quantize, 4: const addition
data (int) – Data to be used in the operation
qint (QInterval) – Quantization interval of the resultant buffer
latency (float) – Latency of the data generated by this operation (t_available)
cost (float) – Cost of the operation
- cost: float
Alias for field number 6
- data: int
Alias for field number 3
- id0: int
Alias for field number 0
- id1: int
Alias for field number 1
- latency: float
Alias for field number 5
- opcode: int
Alias for field number 2
- class da4ml.cmvm.types.Pair(id0: int, id1: int, sub: bool, shift: int)
Bases:
NamedTupleAn operation representing data[id0] +/- data[id1] * 2**shift.
- id0: int
Alias for field number 0
- id1: int
Alias for field number 1
- shift: int
Alias for field number 3
- sub: bool
Alias for field number 2
- class da4ml.cmvm.types.Pipeline(solutions: tuple[CombLogic, ...])
Bases:
NamedTupleA pipeline with II=1,with each stage represented by a CombLogic .. attribute:: solutions
A tuple containing the individual Solution objects for each stage of the cascade.
- type:
tuple[Solution, …]
- Properties
- ----------
- kernel
Only useful when the pipeline describes a linear operation. The overall kernel matrix which the cascaded solution implements: vec @ kernel = solution(vec). This is calculated as the matrix product of all individual solution kernels.
- Type:
NDArray[float32]
- cost
The total cost of the cascaded solution, computed as the sum of the costs of all stages.
- Type:
float
- latency
The minimum and maximum latency of the pipeline, determined by the last stage.
- Type:
tuple[float, float]
- inp_lat
Input latencies
- Type:
list[float]
- in_shift
Input shifts
- Type:
list[int]
- out_lat
Output latencies
- Type:
list[float]
- out_shift
Output shifts
- Type:
list[int]
- out_neg
Output signs
- Type:
list[bool]
- shape
The shape of the corresponding kernel matrix.
- Type:
tuple[int, int]
- property cost
- classmethod deserialize(data: dict)
Load the solution from a file.
- property inp_latency
- property inp_qint
- property inp_shifts
- property kernel
- property latency
- classmethod load(path: str)
Load the solution from a file.
- property out_latencies
- property out_neg
- property out_qint
- property out_shift
- property reg_bits
The number of bits used for the register in the solution.
- save(path: str | Path)
Save the solution to a file.
- property shape
- class da4ml.cmvm.types.Precision(keep_negative: bool, integers: int, fractional: int)
Bases:
NamedTupleA class representing the precision of a quantized interval.
- fractional: int
Alias for field number 2
- integers: int
Alias for field number 1
- keep_negative: bool
Alias for field number 0
- class da4ml.cmvm.types.QInterval(min: float, max: float, step: float)
Bases:
NamedTupleA class representing a quantized interval: [min, max] with a step size.
- max: float
Alias for field number 1
- min: float
Alias for field number 0
- step: float
Alias for field number 2