da4ml.cmvm package

Subpackages

Submodules

da4ml.cmvm.api module

da4ml.cmvm.api.jit_solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1) CascadedSolution

Optimized implementation of a CMVM computation with cascaded two matrices.

Parameters:
  • kernel (np.ndarray) – The input kernel matrix to be implemented.

  • method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].

  • method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’

  • hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)

  • decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)

  • qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)

  • inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)

  • adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)

  • carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)

Returns:

A solution containing the optimized implementation of the CMVM computation with cascaded stages.

Return type:

CascadedSolution

da4ml.cmvm.api.minimal_latency(kernel: ndarray, qintervals: list[QInterval], latencies: list[float], carry_size: int = -1, adder_size: int = -1)

Fast latency calculation for a given kernel, QInterval, and input latencies. When carry_size=-1, and the input latency is constant l: this will be the same as l + max(ceiling(log2(max(#CSD bits for each column, 1)))).

Parameters:
  • kernel (np.ndarray) – The input kernel matrix.

  • qintervals (list[QInterval]) – List of QIntervals for each input.

  • latencies (list[float]) – List of latencies for each input

  • carry_size (int, optional) – The size of the carry unit for latency computation, by default -1 (fixed latency for each addition operation)

  • adder_size (int, optional) – The size of the adder unit for latency computation, by default -1 (fixed cost for each addition operation)

Returns:

The minimal latency for the given kernel, QInterval, and input latencies.

Return type:

float

da4ml.cmvm.api.solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1, search_all_decompose_dc: bool = True) CascadedSolution

Solve the CMVM problem with cascaded two matrices.

Parameters:
  • kernel (np.ndarray) – The input kernel matrix to be implemented.

  • method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].

  • method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’

  • hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)

  • decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)

  • qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)

  • inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)

  • adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)

  • carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)

  • search_all_decompose_dc (bool, optional) – If True, search for all possible decomposition depth constraints. If False, use the provided decompose_dc value. Default is True.

Returns:

A solution containing the optimized implementation of the CMVM computation with cascaded stages.

Return type:

CascadedSolution

da4ml.cmvm.types module

class da4ml.cmvm.types.CascadedSolution(solutions: tuple[Solution, ...])

Bases: NamedTuple

A solution that implements cascaded matrix-vector multiplications through multiple CMVM stages.

CascadedSolution represents a sequence of Solution objects where the output of each stage is fed as input to the next stage.

solutions

A tuple containing the individual Solution objects for each stage of the cascade.

Type:

tuple[Solution, …]

Properties
----------
kernel

The overall kernel matrix which the cascaded solution implements: vec @ kernel = solution(vec). This is calculated as the matrix product of all individual solution kernels.

Type:

NDArray[float32]

cost

The total cost of the cascaded solution, computed as the sum of the costs of all stages.

Type:

float

latency

The minimum and maximum latency of the cascaded solution.

Type:

tuple[float, float]

inp_qint

Input quantization intervals

Type:

list[QInterval]

inp_lat

Input latencies

Type:

list[float]

in_shift

Input shifts

Type:

list[int]

out_qint

Output quantization intervals

Type:

list[QInterval]

out_lat

Output latencies

Type:

list[float]

out_shift

Output shifts

Type:

list[int]

out_neg

Output signs

Type:

list[bool]

shape

The shape of the corresponding kernel matrix.

Type:

tuple[int, int]

property cost
classmethod deserialize(data: dict)

Load the solution from a file.

property inp_latency
property inp_qint
property inp_shift
property kernel
property latency
classmethod load(path: str)

Load the solution from a file.

property out_latencies
property out_neg
property out_qint
property out_shift
property reg_bits

The number of bits used for the register in the solution.

save(path: str | Path)

Save the solution to a file.

property shape
solutions: tuple[Solution, ...]

Alias for field number 0

class da4ml.cmvm.types.DAState(shifts: tuple[ndarray[tuple[int, ...], dtype[int8]], ndarray[tuple[int, ...], dtype[int8]]], expr: list[ndarray[tuple[int, ...], dtype[int8]]], ops: list[Op], freq_stat: dict[Pair, int], kernel: ndarray[tuple[int, ...], dtype[float32]])

Bases: NamedTuple

Internal state of the DA algorithm.

expr: list[ndarray[tuple[int, ...], dtype[int8]]]

Alias for field number 1

freq_stat: dict[Pair, int]

Alias for field number 3

kernel: ndarray[tuple[int, ...], dtype[float32]]

Alias for field number 4

ops: list[Op]

Alias for field number 2

shifts: tuple[ndarray[tuple[int, ...], dtype[int8]], ndarray[tuple[int, ...], dtype[int8]]]

Alias for field number 0

class da4ml.cmvm.types.Op(id0: int, id1: int, opcode: int, data: int, qint: QInterval, latency: float, cost: float)

Bases: NamedTuple

One single operation on the data buffer.

Parameters:
  • id0 (int) – index of the first operand

  • id1 (int) – index of the second operand, or special opcode if negative

  • opcode (int) – 0: addition, 1: subtraction, 2: relu, 3: quantize, 4: const addition

  • data (int) – Data to be used in the operation

  • qint (QInterval) – Quantization interval of the resultant buffer

  • latency (float) – Latency of the data generated by this operation (t_available)

  • cost (float) – Cost of the operation

cost: float

Alias for field number 6

data: int

Alias for field number 3

id0: int

Alias for field number 0

id1: int

Alias for field number 1

latency: float

Alias for field number 5

opcode: int

Alias for field number 2

qint: QInterval

Alias for field number 4

class da4ml.cmvm.types.Pair(id0: int, id1: int, sub: bool, shift: int)

Bases: NamedTuple

An operation representing data[id0] +/- data[id1] * 2**shift.

id0: int

Alias for field number 0

id1: int

Alias for field number 1

shift: int

Alias for field number 3

sub: bool

Alias for field number 2

class da4ml.cmvm.types.Precision(keep_negative: bool, integers: int, fractional: int)

Bases: NamedTuple

A class representing the precision of a quantized interval.

fractional: int

Alias for field number 2

classmethod from_qint(qint: QInterval, symmetric: bool = False)
integers: int

Alias for field number 1

keep_negative: bool

Alias for field number 0

property qint
class da4ml.cmvm.types.QInterval(min: float, max: float, step: float)

Bases: NamedTuple

A class representing a quantized interval: [min, max] with a step size.

classmethod from_kif(k: int | bool, i: int, f: int)
classmethod from_precision(prec: Precision)
max: float

Alias for field number 1

min: float

Alias for field number 0

property precision
step: float

Alias for field number 2

class da4ml.cmvm.types.Solution(shape: tuple[int, int], inp_shift: list[int], out_idxs: list[int], out_shifts: list[int], out_negs: list[bool], ops: list[Op], carry_size: int, adder_size: int)

Bases: NamedTuple

Represents a series of operations that can be applied to a vector of data. May represent a CMVM solution or a general neural network

shape

#input, #output

Type:

tuple[int, int]

inp_shift

The shifts that should be applied to the input data.

Type:

list[int]

out_idxs

The indices of the output data in the buffer.

Type:

list[int]

out_shifts

The shifts that should be applied to the output data.

Type:

list[int]

out_negs

The signs of the output data.

Type:

list[bool]

ops

Core list of operations for generating each buffer element.

Type:

list[Op]

carry_size

Size of the carrier for the adder.

Type:

int

adder_size

Elementary size of the adder.

Type:

int

The core part of the solution is the operations in the ops list. For the exact operations executed with Op, refer to the Op class. After all operations are executed, the output data is read from data[op.out_idx] and multiplied by 2**out_shift.

adder_size: int

Alias for field number 7

carry_size: int

Alias for field number 6

property cost

Total cost of the solution.

classmethod deserialize(data: dict)

Load the solution from a file.

property inp_latency

Latencies of all input elements of the solution.

property inp_qint

Quantization intervals of the input elements.

inp_shift: list[int]

Alias for field number 1

property kernel

the kernel represented by the solution, when applicable.

property latency

Minimum and maximum latency of the solution.

classmethod load(path: str | Path)

Load the solution from a file.

ops: list[Op]

Alias for field number 5

out_idxs: list[int]

Alias for field number 2

property out_latency

Latencies of all output elements of the solution.

out_negs: list[bool]

Alias for field number 4

property out_qint

Quantization intervals of the output elements.

out_shifts: list[int]

Alias for field number 3

property ref_count: ndarray

The number of references to the output elements in the solution.

save(path: str | Path)

Save the solution to a file.

save_binary(path: str | Path)

Dump the solution to a binary file.

shape: tuple[int, int]

Alias for field number 0

to_binary()
da4ml.cmvm.types.minimal_kif(qi: QInterval, symmetric: bool = False) Precision

Calculate the minimal KIF for a given QInterval.

Parameters:
  • qi (QInterval) – The QInterval to calculate the KIF for.

  • symmetric (bool) – Only relevant if qi may be negative. If True, -2**i will be regarded as forbidden. May be useful in special cases only. Default is False.

Returns:

A named tuple with the KIF values.

Return type:

Precision

Module contents

class da4ml.cmvm.Op(id0: int, id1: int, opcode: int, data: int, qint: QInterval, latency: float, cost: float)

Bases: NamedTuple

One single operation on the data buffer.

Parameters:
  • id0 (int) – index of the first operand

  • id1 (int) – index of the second operand, or special opcode if negative

  • opcode (int) – 0: addition, 1: subtraction, 2: relu, 3: quantize, 4: const addition

  • data (int) – Data to be used in the operation

  • qint (QInterval) – Quantization interval of the resultant buffer

  • latency (float) – Latency of the data generated by this operation (t_available)

  • cost (float) – Cost of the operation

cost: float

Alias for field number 6

data: int

Alias for field number 3

id0: int

Alias for field number 0

id1: int

Alias for field number 1

latency: float

Alias for field number 5

opcode: int

Alias for field number 2

qint: QInterval

Alias for field number 4

class da4ml.cmvm.QInterval(min: float, max: float, step: float)

Bases: NamedTuple

A class representing a quantized interval: [min, max] with a step size.

classmethod from_kif(k: int | bool, i: int, f: int)
classmethod from_precision(prec: Precision)
max: float

Alias for field number 1

min: float

Alias for field number 0

property precision
step: float

Alias for field number 2

class da4ml.cmvm.Solution(shape: tuple[int, int], inp_shift: list[int], out_idxs: list[int], out_shifts: list[int], out_negs: list[bool], ops: list[Op], carry_size: int, adder_size: int)

Bases: NamedTuple

Represents a series of operations that can be applied to a vector of data. May represent a CMVM solution or a general neural network

shape

#input, #output

Type:

tuple[int, int]

inp_shift

The shifts that should be applied to the input data.

Type:

list[int]

out_idxs

The indices of the output data in the buffer.

Type:

list[int]

out_shifts

The shifts that should be applied to the output data.

Type:

list[int]

out_negs

The signs of the output data.

Type:

list[bool]

ops

Core list of operations for generating each buffer element.

Type:

list[Op]

carry_size

Size of the carrier for the adder.

Type:

int

adder_size

Elementary size of the adder.

Type:

int

The core part of the solution is the operations in the ops list. For the exact operations executed with Op, refer to the Op class. After all operations are executed, the output data is read from data[op.out_idx] and multiplied by 2**out_shift.

adder_size: int

Alias for field number 7

carry_size: int

Alias for field number 6

property cost

Total cost of the solution.

classmethod deserialize(data: dict)

Load the solution from a file.

property inp_latency

Latencies of all input elements of the solution.

property inp_qint

Quantization intervals of the input elements.

inp_shift: list[int]

Alias for field number 1

property kernel

the kernel represented by the solution, when applicable.

property latency

Minimum and maximum latency of the solution.

classmethod load(path: str | Path)

Load the solution from a file.

ops: list[Op]

Alias for field number 5

out_idxs: list[int]

Alias for field number 2

property out_latency

Latencies of all output elements of the solution.

out_negs: list[bool]

Alias for field number 4

property out_qint

Quantization intervals of the output elements.

out_shifts: list[int]

Alias for field number 3

property ref_count: ndarray

The number of references to the output elements in the solution.

save(path: str | Path)

Save the solution to a file.

save_binary(path: str | Path)

Dump the solution to a binary file.

shape: tuple[int, int]

Alias for field number 0

to_binary()
da4ml.cmvm.minimal_latency(kernel: ndarray, qintervals: list[QInterval], latencies: list[float], carry_size: int = -1, adder_size: int = -1)

Fast latency calculation for a given kernel, QInterval, and input latencies. When carry_size=-1, and the input latency is constant l: this will be the same as l + max(ceiling(log2(max(#CSD bits for each column, 1)))).

Parameters:
  • kernel (np.ndarray) – The input kernel matrix.

  • qintervals (list[QInterval]) – List of QIntervals for each input.

  • latencies (list[float]) – List of latencies for each input

  • carry_size (int, optional) – The size of the carry unit for latency computation, by default -1 (fixed latency for each addition operation)

  • adder_size (int, optional) – The size of the adder unit for latency computation, by default -1 (fixed cost for each addition operation)

Returns:

The minimal latency for the given kernel, QInterval, and input latencies.

Return type:

float

da4ml.cmvm.solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1, search_all_decompose_dc: bool = True) CascadedSolution

Solve the CMVM problem with cascaded two matrices.

Parameters:
  • kernel (np.ndarray) – The input kernel matrix to be implemented.

  • method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].

  • method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’

  • hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)

  • decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)

  • qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)

  • inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)

  • adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)

  • carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)

  • search_all_decompose_dc (bool, optional) – If True, search for all possible decomposition depth constraints. If False, use the provided decompose_dc value. Default is True.

Returns:

A solution containing the optimized implementation of the CMVM computation with cascaded stages.

Return type:

CascadedSolution