da4ml.cmvm package

Subpackages

Submodules

da4ml.cmvm.api module

da4ml.cmvm.api.jit_solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1) Pipeline

Optimized implementation of a CMVM computation with cascaded two matrices.

Parameters:
  • kernel (np.ndarray) – The input kernel matrix to be implemented.

  • method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].

  • method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’

  • hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)

  • decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)

  • qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)

  • inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)

  • adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)

  • carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)

Returns:

A solution containing the optimized implementation of the CMVM computation with cascaded stages.

Return type:

CascadedSolution

da4ml.cmvm.api.minimal_latency(kernel: ndarray, qintervals: list[QInterval], latencies: list[float], carry_size: int = -1, adder_size: int = -1)

Fast latency calculation for a given kernel, QInterval, and input latencies. When carry_size=-1, and the input latency is constant l: this will be the same as l + max(ceiling(log2(max(#CSD bits for each column, 1)))).

Parameters:
  • kernel (np.ndarray) – The input kernel matrix.

  • qintervals (list[QInterval]) – List of QIntervals for each input.

  • latencies (list[float]) – List of latencies for each input

  • carry_size (int, optional) – The size of the carry unit for latency computation, by default -1 (fixed latency for each addition operation)

  • adder_size (int, optional) – The size of the adder unit for latency computation, by default -1 (fixed cost for each addition operation)

Returns:

The minimal latency for the given kernel, QInterval, and input latencies.

Return type:

float

da4ml.cmvm.api.solve(kernel: ndarray, method0: str = 'wmc', method1: str = 'auto', hard_dc: int = -1, decompose_dc: int = -2, qintervals: list[QInterval] | None = None, latencies: list[float] | None = None, adder_size: int = -1, carry_size: int = -1, search_all_decompose_dc: bool = True) Pipeline

Solve the CMVM problem with cascaded two matrices.

Parameters:
  • kernel (np.ndarray) – The input kernel matrix to be implemented.

  • method0 (str, optional) – Optimization method for the first stage. Must be one of [wmc, wmc-dc, wmc-pdc, mc, mc-dc, mc-pdc].

  • method1 (str, optional) – Optimization method for the second stage. When ‘auto’, it will select based on hard_dc and method0, by default ‘auto’

  • hard_dc (int, optional) – Hard depth constraint (additional latency allowed beyond minimal latency), by default -1 (no constraint)

  • decompose_dc (int, optional) – Decomposition depth constraint, by default -1 (no constraint, follows hard_dc)

  • qintervals (list[QInterval] | None, optional) – List of quantization intervals for each input, by default None ([-128, 127, 1] for all inputs)

  • inp_latencies (list[float] | None, optional) – List of input latencies, by default None (0. for all inputs)

  • adder_size (int, optional) – Size of the adder unit for latency computation, by default -1 (fixed cost for each addition)

  • carry_size (int, optional) – Size of the carry unit for latency computation, by default -1 (fixed latency for each addition)

  • search_all_decompose_dc (bool, optional) – If True, search for all possible decomposition depth constraints. If False, use the provided decompose_dc value. Default is True.

Returns:

A solution containing the optimized implementation of the CMVM computation with cascaded stages.

Return type:

CascadedSolution

class da4ml.cmvm.api.solver_options_t

Bases: TypedDict

adder_size: int
carry_size: int
decompose_dc: int
hard_dc: int
method0: str
method1: str
offload_fn: None | Callable[[ndarray, FixedVariableArray], ndarray]

Callable taking in (constant_matrix, fixed_variable_array) and returning a boolean mask of which weights to offload to multiplication operations.

search_all_decompose_dc: bool

da4ml.cmvm.types module

class da4ml.cmvm.types.CombLogic(shape: tuple[int, int], inp_shifts: list[int], out_idxs: list[int], out_shifts: list[int], out_negs: list[bool], ops: list[Op], carry_size: int, adder_size: int, lookup_tables: tuple[LookupTable, ...] | None = None)

Bases: NamedTuple

A combinational logic that describes a series of operations on input data to produce output data.

shape

#input, #output

Type:

tuple[int, int]

inp_shifts

The shifts that should be applied to the input data.

Type:

list[int]

out_idxs

The indices of the output data in the buffer.

Type:

list[int]

out_shifts

The shifts that should be applied to the output data.

Type:

list[int]

out_negs

The signs of the output data.

Type:

list[bool]

ops

Core list of operations for generating each buffer element.

Type:

list[Op]

carry_size

Size of the carrier for the adder, used for cost and latency estimation.

Type:

int

adder_size

Elementary size of the adder, used for cost and latency estimation.

Type:

int

lookup_tables

Lookup table arrays for lookup operations, if any.

Type:

tuple[LookupTable, …] | None

The core part of the comb logic is the operations in the ops list. For the exact operations executed with Op, refer to the Op class. After all operations are executed, the output data is read from data[op.out_idx] and multiplied by 2**out_shift.

adder_size: int

Alias for field number 7

carry_size: int

Alias for field number 6

property cost

Total cost of the solution.

classmethod deserialize(data: list)

Load the solution from a file.

property inp_kifs

KIFs of all input elements of the solution.

property inp_latency

Latencies of all input elements of the solution.

property inp_qint

Quantization intervals of the input elements.

inp_shifts: list[int]

Alias for field number 1

property kernel

the kernel represented by the solution, when applicable.

property latency

Minimum and maximum latency of the solution.

classmethod load(path: str | Path)

Load the solution from a file.

lookup_tables: tuple[LookupTable, ...] | None

Alias for field number 8

ops: list[Op]

Alias for field number 5

out_idxs: list[int]

Alias for field number 2

property out_kifs

KIFs of all output elements of the solution.

property out_latency

Latencies of all output elements of the solution.

out_negs: list[bool]

Alias for field number 4

property out_qint

Quantization intervals of the output elements.

out_shifts: list[int]

Alias for field number 3

predict(data: ndarray[tuple[Any, ...], dtype[_ScalarT]] | Sequence[ndarray[tuple[Any, ...], dtype[_ScalarT]]], n_threads: int = 0) ndarray[tuple[Any, ...], dtype[float64]]

Predict the output of the solution for a batch of input data with cpp backed DAIS interpreter. Cannot be used if the binary interpreter is not installed.

Parameters:
  • data (NDArray|Sequence[NDArray]) – Input data to the model. The shape is ignored, and the number of samples is determined by the size of the data.

  • n_threads (int) – Number of threads to use for prediction. Negative or zero values will use maximum available threads, or the value of the DA_DEFAULT_THREADS environment variable if set. Default is 0. If OpenMP is not supported, this parameter is ignored.

Returns:

Output of the model in shape (n_samples, output_size).

Return type:

NDArray[np.float64]

property ref_count: ndarray

The number of references to the output elements in the solution.

save(path: str | Path)

Save the solution to a file.

save_binary(path: str | Path, version: int = 0)

Dump the solution to a binary file.

shape: tuple[int, int]

Alias for field number 0

to_binary(version: int = 0) ndarray[tuple[Any, ...], dtype[int32]]
class da4ml.cmvm.types.DAState(shifts: tuple[ndarray[tuple[Any, ...], dtype[int8]], ndarray[tuple[Any, ...], dtype[int8]]], expr: list[ndarray[tuple[Any, ...], dtype[int8]]], ops: list[Op], freq_stat: dict[Pair, int], kernel: ndarray[tuple[Any, ...], dtype[float32]])

Bases: NamedTuple

Internal state of the DA algorithm.

expr: list[ndarray[tuple[Any, ...], dtype[int8]]]

Alias for field number 1

freq_stat: dict[Pair, int]

Alias for field number 3

kernel: ndarray[tuple[Any, ...], dtype[float32]]

Alias for field number 4

ops: list[Op]

Alias for field number 2

shifts: tuple[ndarray[tuple[Any, ...], dtype[int8]], ndarray[tuple[Any, ...], dtype[int8]]]

Alias for field number 0

class da4ml.cmvm.types.JSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: JSONEncoder

default(o)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)
class da4ml.cmvm.types.Op(id0: int, id1: int, opcode: int, data: int, qint: QInterval, latency: float, cost: float)

Bases: NamedTuple

One single operation on the data buffer.

Parameters:
  • id0 (int) – index of the first operand

  • id1 (int) – index of the second operand, or special opcode if negative

  • opcode (int) – 0: addition, 1: subtraction, 2: relu, 3: quantize, 4: const addition

  • data (int) – Data to be used in the operation

  • qint (QInterval) – Quantization interval of the resultant buffer

  • latency (float) – Latency of the data generated by this operation (t_available)

  • cost (float) – Cost of the operation

cost: float

Alias for field number 6

data: int

Alias for field number 3

id0: int

Alias for field number 0

id1: int

Alias for field number 1

latency: float

Alias for field number 5

opcode: int

Alias for field number 2

qint: QInterval

Alias for field number 4

class da4ml.cmvm.types.Pair(id0: int, id1: int, sub: bool, shift: int)

Bases: NamedTuple

An operation representing data[id0] +/- data[id1] * 2**shift.

id0: int

Alias for field number 0

id1: int

Alias for field number 1

shift: int

Alias for field number 3

sub: bool

Alias for field number 2

class da4ml.cmvm.types.Pipeline(solutions: tuple[CombLogic, ...])

Bases: NamedTuple

A pipeline with II=1,with each stage represented by a CombLogic .. attribute:: solutions

A tuple containing the individual Solution objects for each stage of the cascade.

type:

tuple[Solution, …]

Properties
----------
kernel

Only useful when the pipeline describes a linear operation. The overall kernel matrix which the cascaded solution implements: vec @ kernel = solution(vec). This is calculated as the matrix product of all individual solution kernels.

Type:

NDArray[float32]

cost

The total cost of the cascaded solution, computed as the sum of the costs of all stages.

Type:

float

latency

The minimum and maximum latency of the pipeline, determined by the last stage.

Type:

tuple[float, float]

inp_qint

Input quantization intervals

Type:

list[QInterval]

inp_lat

Input latencies

Type:

list[float]

in_shift

Input shifts

Type:

list[int]

out_qint

Output quantization intervals

Type:

list[QInterval]

out_lat

Output latencies

Type:

list[float]

out_shift

Output shifts

Type:

list[int]

out_neg

Output signs

Type:

list[bool]

shape

The shape of the corresponding kernel matrix.

Type:

tuple[int, int]

property cost
classmethod deserialize(data: dict)

Load the solution from a file.

property inp_latency
property inp_qint
property inp_shifts
property kernel
property latency
classmethod load(path: str)

Load the solution from a file.

property out_latencies
property out_neg
property out_qint
property out_shift
property reg_bits

The number of bits used for the register in the solution.

save(path: str | Path)

Save the solution to a file.

property shape
solutions: tuple[CombLogic, ...]

Alias for field number 0

class da4ml.cmvm.types.Precision(keep_negative: bool, integers: int, fractional: int)

Bases: NamedTuple

A class representing the precision of a quantized interval.

fractional: int

Alias for field number 2

integers: int

Alias for field number 1

keep_negative: bool

Alias for field number 0

class da4ml.cmvm.types.QInterval(min: float, max: float, step: float)

Bases: NamedTuple

A class representing a quantized interval: [min, max] with a step size.

max: float

Alias for field number 1

min: float

Alias for field number 0

step: float

Alias for field number 2

da4ml.cmvm.types.minimal_kif(qi: QInterval, symmetric: bool = False) Precision

Calculate the minimal KIF for a given QInterval.

Parameters:
  • qi (QInterval) – The QInterval to calculate the KIF for.

  • symmetric (bool) – Only relevant if qi may be negative. If True, -2**i will be regarded as forbidden. May be useful in special cases only. Default is False.

Returns:

A named tuple with the KIF values.

Return type:

Precision

Module contents