Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Mapping Expressions

A mapping expression defines where each tensor element sits in a buffer. This page covers the available mapping constructors and the equivalences between mappings.

Consider a tensor with axes![A = 8, B = 512]. The mapping expression m![A, B] places \(A\) as the major axis and \(B\) as the minor axis, requiring a buffer of 8 × 512 = 4096 elements. Buffer position 0 holds \(\{A=0, B=0\}\), position 1 holds \(\{A=0, B=1\}\), and so on through all 512 elements where \(A=0\) before moving to \(A=1\).

Axis Sizes

The axes! macro declares axis identifiers and their sizes. Throughout this section, assume the following axis sizes.

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;
axes![A = 8, B = 512];
}

Mapping Interface

A mapping expression like m![H, W] is a Rust type that describes how tensor indices map to buffer positions. Every mapping expression implements the M trait, which provides the buffer size and a buffer-index-to-tensor-index mapping function:

#![allow(unused)]
fn main() {
// Inside `furiosa_visa_std::prelude`...
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;
use std::fmt::Debug;


/// Tensor index: a map from axis identifiers to coordinate values.
pub struct Index { /* ... */ }

/// Constructs tensor indices.
/// `i![A: 2, B: 3]` creates an `Index` with A = 2 and B = 3.
macro_rules! i {
    () => {};
    /* ... */
}
}

A mapping defines what mathematical tensor a buffer represents. For example, HostTensor<bf16, m![A, B]> denotes a host memory buffer containing m![A, B]::SIZE elements of bf16 data, which is 4096 elements. We say a buffer holds a tensor \(T\) when:

  • For every buffer index i and tensor index ti,
  • if m![A, B]::map(i) = Some(ti),
  • then the i-th element of the buffer stores the value of tensor \(T\) at index ti.

Constructors

Mapping expressions are built by composing small constructors, each of which transforms or combines simpler mappings. These expressions use arithmetic-like operators (/, %, and # for padding) to concisely define the mapping between tensor and linear buffer indices.

Symbol

A symbol is a single uppercase letter whose size comes from the shape declaration. The mapping m![A] maps 8 buffer indices linearly to tensor indices along the axis: buffer index 0 holds i![] (empty tensor index), index 1 holds i![A: 1], index 2 holds i![A: 2], and so on:

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;

axes![A = 8];

type E = m![A]; // Symbol<Ident::A, 8>

#[test]
fn test_symbol() {
    for i in 0..E::SIZE {
        assert_eq!(E::map(i), Some(i![A: i]));
    }
    assert_eq!(E::map(E::SIZE), None);
}
}
// Trait implementation
impl<S: AxisName> M for Symbol<S> {
    const SIZE: usize = S::SIZE;

    fn to_value() -> Mapping {
        Mapping::Symbol {
            symbol: S::NAME,
            size: S::SIZE,
        }
    }

    fn map(i: usize) -> Option<Index> {
        if i < S::SIZE {
            let mut index = Index::new();
            Index::add_term(
                &mut index,
                Term {
                    inner: Atom::Symbol {
                        symbol: S::NAME,
                        size: S::SIZE,
                    },
                    stride: 1,
                    modulo: S::SIZE,
                },
                i,
            );
            Some(index)
        } else {
            None
        }
    }
}

Note

For every symbol A, the 0’th index i![A: 0] corresponds to the empty tensor index i![].

Pair

One way to store a 2D tensor with shape \(\{A=8, B=512\}\) is the pair mapping m![A, B]. This creates a buffer of 4096 elements where A is the major axis and B is the minor axis. The first 512 elements hold A = 0 and the next 512 elements hold A = 1. Buffer index 519 holds i![A: 1, B: 7] since 519 == 512 * 1 + 7.

The mapping Pair<L, R> maps the Cartesian product of two spaces into a linear buffer where L is the major dimension and R is the minor dimension. The size is L::SIZE * R::SIZE, and the mapping uses floor division and modulo to decompose indices. m![A, B, C, D] expands to Pair<A, Pair<B, Pair<C, D>>> and is right-associative.

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;

axes![A = 8, B = 512];

type E = m![A, B]; // Pair<m![A], m![B]>

#[test]
fn test_pair() {
    for i in 0..E::SIZE {
        assert_eq!(E::map(i), Some(i![A: i / <m![B]>::SIZE, B: i % <m![B]>::SIZE]));
    }
    assert_eq!(E::map(2 * <m![B]>::SIZE + 7), Some(i![A: 2, B: 7]));
    assert_eq!(E::map(E::SIZE), None);
}
}
// Trait implementation
impl<L, R> M for Pair<L, R>
where
    L: M,
    R: M,
{
    const SIZE: usize = L::SIZE * R::SIZE;

    fn to_value() -> Mapping {
        Mapping::Pair {
            left: RBox::new(L::to_value()),
            right: RBox::new(R::to_value()),
        }
    }

    fn map(i: usize) -> Option<Index> {
        let mut l = L::map(i / R::SIZE)?;
        let r = R::map(i % R::SIZE)?;
        Index::add(&mut l, r);
        Some(l)
    }
}

Identity

The identity mapping m![1] creates a single-element buffer that maps buffer index 0 to the empty tensor index i![]. It serves as the identity element for Pair: m![1, A] and m![A, 1] are both equivalent to m![A].

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;

type E = m![1]; // Identity

#[test]
fn test_identity() {
    assert_eq!(E::map(0), Some(i![]));
    assert_eq!(E::map(1), None);
}
}
// Trait implementation
impl M for Identity {
    const SIZE: usize = 1;

    fn to_value() -> Mapping {
        Mapping::Identity
    }

    fn map(i: usize) -> Option<Index> {
        if i == 0 { Some(Index::new()) } else { None }
    }
}

Padding

Padding aligns data to hardware requirements by adding unused buffer space. For example, the DMA engine requires rows to start on 64-byte boundaries. With axes![C = 13, D = 61], m![C, D] creates misaligned rows since 61 is not divisible by 64. m![C, D # 64] fixes this by aligning each row to 64-byte boundaries, using 3 extra elements per row.

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;

axes![C = 13, D = 61];

type E = m![C, D # 64]; // Pair<m![C], Padding<m![D], 64>>

#[test]
fn test_padding() {
    assert_eq!(E::map(0),  Some(i![C: 0, D: 0]));
    assert_eq!(E::map(60), Some(i![C: 0, D: 60]));
    assert_eq!(E::map(61), None); // padding
    assert_eq!(E::map(62), None); // padding
    assert_eq!(E::map(63), None); // padding
    assert_eq!(E::map(64), Some(i![C: 1, D: 0]));
}
}
// Trait implementation
impl<L, const SIZE: usize> M for Padding<L, SIZE>
where
    L: M,
{
    const SIZE: usize = SIZE;

    fn to_value() -> Mapping {
        Mapping::Padding {
            inner: RBox::new(L::to_value()),
            padding: SIZE,
            kind: PaddingKind::Top,
        }
    }

    fn map(i: usize) -> Option<Index> {
        L::map(i)
    }
}

Resize

Resize constrains a mapping to a smaller logical size by truncating indices beyond the new size, discarding elements outside that range. Unlike padding, which expands the buffer, Resize shrinks the logical view. The mapping m![D = 2] takes only the first 2 elements of axis D, producing indices D = 0 and D = 1.

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;

axes![C = 2, D = 3];
type E = m![C, D = 2]; // Pair<m![C], Resize<m![D], 2>>

#[test]
fn test_resize() {
    assert_eq!(E::map(0), Some(i![C: 0, D: 0]));
    assert_eq!(E::map(1), Some(i![C: 0, D: 1]));
    assert_eq!(E::map(2), Some(i![C: 1, D: 0]));
    assert_eq!(E::map(3), Some(i![C: 1, D: 1]));
    assert_eq!(E::map(4), None);
}
}
// Trait implementation
impl<L, const SIZE: usize> M for Resize<L, SIZE>
where
    L: M,
{
    const SIZE: usize = SIZE;

    fn to_value() -> Mapping {
        Mapping::Resize {
            inner: RBox::new(L::to_value()),
            resize: SIZE,
        }
    }

    fn map(i: usize) -> Option<Index> {
        if i < SIZE { L::map(i) } else { None }
    }
}

Tiling is implemented through indexed views, pure metadata transformations without data copies. The .tile() method extracts a tile by resizing one dimension to the tile size and offsetting into the buffer.

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;

axes![A = 8, B = 512];

fn tiles() {
let tensor = unsafe { HbmTensor::<bf16, m![1], m![A, B]>::from_addr(0) };
let view = tensor.view(); // HbmTensorView::<'_, bf16, m![1], m![A, B]>
let tile01 = view.tile::<m![B], 2, m![A, B = 2 # 4]>(0); // HbmTensorView::<'_, bf16, m![1], m![A, B = 2 # 4]>
let tile23 = view.tile::<m![B], 2, m![A, B = 2 # 4]>(2); // HbmTensorView::<'_, bf16, m![1], m![A, B = 2 # 4]>
}
}

The .tile() method takes three type parameters and one value parameter.

  • The tile dimension m![B] specifies which dimension to divide along.
  • The tile size 2 specifies the number of elements per tile.
  • The tile mapping m![A, B = 2 # 4] defines the resulting view’s mapping. The mapping B = 2 # 4 signifies that dimension B has a logical size of 2 within the view but exists within a physical footprint of 4. This is essential for preserving the original memory layout and stride calculations.
  • The starting index specifies which tile to extract. Passing 0 captures the range 0..2 for tile01, while passing 2 captures the range 2..4 for tile23.

Stride and Modulo

Stride (/) and modulo (%) decompose a single dimension into two: the outer (block index) and the inner (position within block). Consider the 512-element axis B divided into 8 blocks of 64 elements each. The mapping m![B / 64, B % 64] creates an 8 × 64 grid where the first dimension selects which block and the second dimension selects the position within that block. Buffer index 130 corresponds to block 2 at position 2 within that block, giving tensor index B = 64 × 2 + 2 = 130, equal to the flat-buffer result (since m![B / 64, B % 64] is equivalent to m![B]):

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;
axes![A = 8, B = 512];
type D1 = m![B / 64]; // stride with size 8
type D2 = m![B % 64]; // modulo with size 64

type E = m![B / 64, B % 64]; // equivalent to `m![B]`

#[test]
fn test_stride_modulo() {
    for i in 0..8 {
        assert_eq!(D1::map(i), Some(i![B / 64: i]));
    }
    assert_eq!(D1::map(8), None);

    for j in 0..64 {
        assert_eq!(D2::map(j), Some(i![B % 64: j]));
    }
    assert_eq!(D2::map(64), None);

    for i in 0..8 {
        for j in 0..64 {
            assert_eq!(
                E::map(64 * i + j),     // i![B / 64: i, B % 64: j]
                <m![B]>::map(64 * i + j), // equivalent to above
            );
        }
    }
    assert_eq!(E::map(512), None);
}
}
// Trait implementation
impl<L, const SIZE: usize> M for Stride<L, SIZE>
where
    L: M,
{
    const SIZE: usize = {
        assert!(L::SIZE % SIZE == 0, "Stride size must divide the original size");
        L::SIZE / SIZE
    };

    fn to_value() -> Mapping {
        Mapping::Stride {
            inner: RBox::new(L::to_value()),
            stride: SIZE,
        }
    }

    fn map(i: usize) -> Option<Index> {
        if i < Self::SIZE { L::map(i * SIZE) } else { None }
    }
}

impl<L, const SIZE: usize> M for Modulo<L, SIZE>
where
    L: M,
{
    const SIZE: usize = {
        assert!(L::SIZE % SIZE == 0, "Modulo size must divide the original size");
        SIZE
    };

    fn to_value() -> Mapping {
        Mapping::Modulo {
            inner: RBox::new(L::to_value()),
            modulo: SIZE,
        }
    }

    fn map(i: usize) -> Option<Index> {
        if i < Self::SIZE { L::map(i % L::SIZE) } else { None }
    }
}

Together, m![B / 64, B % 64] transforms axis B into an 8 × 64 grid. The mapping is equivalent to m![B] but expresses a different logical view of the same data, revealing block structure hidden in the flat representation.

Stride and modulo mappings can be visualized in tabular form. Consider the mapping m![B / 4, B % 4] with B::SIZE = 16. The following table shows how buffer indices are arranged: each row corresponds to a specific index of B / 4 (the stride axis), and each column corresponds to an index of B % 4 (the modulo axis):

i![B % 4: 0]i![B % 4: 1]i![B % 4: 2]i![B % 4: 3]
i![B / 4: 0]i![B: 0]i![B: 1]i![B: 2]i![B: 3]
i![B / 4: 1]i![B: 4]i![B: 5]i![B: 6]i![B: 7]
i![B / 4: 2]i![B: 8]i![B: 9]i![B: 10]i![B: 11]
i![B / 4: 3]i![B: 12]i![B: 13]i![B: 14]i![B: 15]

Stride and modulo factorize a single mapping into multiple dimensions. The expression m![B / n] creates an outer dimension indexing blocks of size n. The expression m![B % n] creates an inner dimension indexing positions within each block.

Modulo differs from resize in how it handles buffer size:

  • Resize shrinks the buffer by truncating indices beyond the new size.
  • Modulo preserves the original buffer size while partitioning it into equal-sized blocks.

These operations can be nested for complex decompositions. The following example splits B into three dimensions where the buffer’s bit layout differs from that of the tensor index.

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;
axes![A = 8, B = 512];
// B's bits: 6 - 8,  0 - 4,          5
// Values:   0 - 7, 0 - 31,      0 - 1
type E = m![B / 64, B % 32, B / 32 % 2];

#[test]
fn test_nested_stride() {
    for i in 0..8 {
        for j in 0..32 {
            for k in 0..2 {
                assert_eq!(
                    E::map(64 * i + 2 * j + k),
                    Some(i![B: 64 * i + j + 32 * k]),
                );
            }
        }
    }
    assert_eq!(E::map(512), None);
}
}

The buffer index decomposes as 64 * i + 2 * j + k where i selects the block, j selects position within the block, and k selects the sub-block. The tensor index B reconstructs as 64 * i + j + 32 * k, which rearranges the bit positions.

For example, buffer index 67 maps to B = 97:

  • Buffer 67 = 64 * 1 + 2 * 1 + 1 gives i=1, j=1, k=1
  • Tensor B = 64 * 1 + 1 + 32 * 1 = 97
  • Verify: 97 / 64 = 1, 97 % 32 = 1, (97 / 32) % 2 = 1

This kind of bit rearrangement maps naturally to hardware memory layouts where address bits are reordered for bank interleaving or cache efficiency. In binary, this rearranges bit positions: buffer 001_00001_1 becomes B = 001_1_00001. The buffer groups bits as [8:6]_[5:1]_[0] while B groups them as [8:6]_[5]_[4:0].

Tiling can operate on blocks rather than individual elements. The following example tiles by block using m![B / 32] and creates overlapping tiles:

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;
axes![A = 8, B = 512];
let tensor = unsafe { HbmTensor::<bf16, m![1], m![A, B]>::from_addr(0) };
for i in 0..15 {
    let tile = tensor.view().tile::<m![B / 32], 2, m![A, B / 32 = 2 # 16, B % 32]>(i);
}
}

With B = 512, the dimension B / 32 has 16 blocks numbered 0-15. Each tile takes 2 consecutive blocks starting at index i. Tile 0 covers blocks {0, 1}, tile 1 covers blocks {1, 2}, and so on through tile 14 covering blocks {14, 15}. These tiles overlap because consecutive tiles share one block: tiles 0 and 1 both include block 1.

The tile mapping B / 32 = 2 resizes the block dimension to 2 since each tile contains exactly 2 blocks. When tiling with a single block, B / 32 = 1 simplifies to the identity m![1] since the dimension has only one value.

Escape

For complex mappings, define type aliases and reference them using { ... }. With separate mappings L = m![A] and R = m![B], combining them as m![{ L }, { R }] produces the same result as m![A, B]:

#![allow(unused)]
fn main() {
extern crate furiosa_visa_std;
use furiosa_visa_std::prelude::*;
axes![A = 8, B = 512];
type L = m![A];
type R = m![B];
type E = m![{ L }, { R }]; // equivalent to `m![A, B]`
}

This escape syntax breaks down complex mappings into named, reusable components.

Advanced Constructors

Skewed Axis

A skewed axis creates a diagonal access pattern across two dimensions. Skewed axes introduce derived axis labels defined by arithmetic differences between existing axes; for example, B' = B - A defines a new axis B' whose coordinate at any point equals B minus A. Algorithms that process data along diagonals use this pattern, such as certain wavefront computations.

The expression m![A, B' = 4] with B' = B - A creates a mapping where each row is shifted relative to the previous one. The = operator specifies the logical size after skewing. The result wraps around using modular arithmetic.

For example, with axes![A = 4, B = 4] and B' = B - A:

(A, B’)(A, B)
(0, 0)(0, 0)
(0, 1)(0, 1)
(0, 2)(0, 2)
(0, 3)(0, 3)
(1, 0)(1, 1)
(1, 1)(1, 2)
(1, 2)(1, 3)
(1, 3)(1, 0)

When A = 1 and B' = 3, the original B coordinate wraps to 0 via modular arithmetic since B = (B' + A) % 4 = (3 + 1) % 4 = 0.

Indirect Sequencing

TODO: Document indirect sequencing patterns for non-contiguous memory access. This advanced constructor provides index-based memory access where the sequence of buffer positions is determined by an indirection table rather than a mathematical formula.

Sliding (Linear Combination)

Note

Linear combination expressions $(e1:n1, ..., ed:nd) combine multiple dimensions with specified strides. Formal definition: size_S($(e1:n1, ..., ed:nd)) = 1 + sum_k((size_S(ek) - 1) * nk). The mapping S, $(e1:n1, ..., ed:nd) |- si ~ ti holds if there exist si1...sid, ti1...tid such that for all k: S, ek |- sik ~ tik, si = sum_k(sik * nk), and ti = sum_k(tik * nk).

Linear combinations can encode outer sum: e1 * e2 is equivalent to $(e1 : size_S(e2), e2 : 1). However, outer sum is preferred because it’s more resilient to axis reordering. Changing e1 * e2 to e2 * e1 doesn’t require manual stride updates.

Sliding operations access overlapping data blocks, essential for convolutional neural networks. Consider a buffer of 9 elements representing a tensor with shape \(\{N=5, F=3\}\) where each row is a 3-element slice that slides one element at a time. The tensor element at \((N, F)\) maps to buffer index \(N + 2F\):

$$ \begin{array}{c|ccc} & F=0 & F=1 & F=2 \\ \hline N=0 & 0 & 2 & 4 \\ N=1 & 1 & 3 & 5 \\ N=2 & 2 & 4 & 6 \\ N=3 & 3 & 5 & 7 \\ N=4 & 4 & 6 & 8 \\ \end{array} $$

Note

In this sliding pattern, a single space index can map to multiple tensor indices. For example, space index 4 maps to {4_N}, {2_N, 1_F}, and {2_F} simultaneously. This illustrates the non-one-to-one nature of (S, e).maps(si, ti).

This can be expressed using a linear combination expression where the N axis has stride 1 and the F axis has stride 2, yielding a total size of 1 + (5-1)*1 + (3-1)*2 = 9.

Equivalent Mapping

Mappings E1 and E2 are equivalent when:

  • E1::SIZE == E2::SIZE
  • For every i, E1::map(i) == E2::map(i)

The equivalence relation is reflexive, symmetric, and transitive. Examples:

  • Identity of pairs: for every E, E is equivalent both to m![{ E }, 1] and m![1, { E }].
  • Stride-modulo decomposition: for every E whose size E::SIZE is divisible by n, E and m![{ E } / n, { E } % n] are equivalent.
  • Pair projection: for every A and B, m![[{ A }, { B }] / B::SIZE] is equivalent to m![A] and m![[{ A }, { B }] % B::SIZE] is equivalent to m![B].
  • Associativity of pairs: for every E1, E2, E3, m![{ E1 }, { E2 }, { E3 }], m![[{ E1 }, { E2 }], { E3 }], and m![{ E1 }, [{ E2 }, { E3 }]] are equivalent.
  • Idempotent operations: for every E, E is equivalent to m![{ E } / 1], to m![{ E } # E::SIZE], and to m![{ E } = E::SIZE].
  • Modulo by 1: For every E, m![E % 1] is equivalent to the identity mapping m![1].