Markov Buffer

Tier: Algorithms | ComponentType: 34 | Params: 5

Probabilistic audio slice reordering with similarity-weighted Markov chain transitions, cosine crossfade between slices, and controllable chaos blending.

Overview

MarkovBuffer divides a continuously recorded audio buffer into fixed-length slices and reorders them using a Markov chain. When one slice finishes playing, the next slice is chosen probabilistically based on a transition matrix. This creates shuffled, rearranged versions of the input audio -- from subtle reorderings to completely randomized playback.

The transition probabilities are derived from audio similarity between slices. Each slice is analyzed for RMS level and zero-crossing rate. Similar-sounding slices are more likely to follow each other, creating musically coherent transitions. The Chaos parameter blends between this similarity-weighted matrix and a uniform random distribution -- at 0.0, transitions strongly favor similar slices; at 1.0, any slice is equally likely to follow any other.

Freeze disables recording, locking the buffer contents for sustained manipulation of a captured moment. The buffer continues to play and transition between slices, but no new audio is recorded.

The transition matrix is rebuilt whenever slice parameters change or at slice boundaries (when not frozen), adapting to evolving input content. A 16-entry transition history is maintained in the snapshot for visualization.

File Locations

	Path
Header	`Sources/FolioDSP/include/FolioDSP/Algorithms/MarkovBuffer.h`
Implementation	`Sources/FolioDSP/src/Algorithms/MarkovBuffer.cpp`
Tests	`Tests/FolioDSPTests/MarkovBufferTests.swift`
Bridge	`Sources/FolioDSPBridge/src/FolioDSPBridge.mm` (MarkovBufferBridge)

Parameters

Index	Name	Description	Min	Max	Default Min	Default Max	Default	Unit
0	Slice Count	Number of audio slices	2.0	32.0	4.0	32.0	16.0
1	Slice Length	Duration of each slice	10.0	1000.0	20.0	500.0	100.0	ms
2	Chaos	Blend between similarity-weighted and uniform transitions	0.0	1.0	0.0	1.0	0.3
3	Freeze	Disable recording (0=record, 1=frozen)	0.0	1.0	0.0	1.0	0.0
4	Mix	Dry/wet blend	0.0	100.0	0.0	100.0	70.0	%

Processing Algorithm

The process() function executes these steps for each input sample:

1. Buffer Recording

Unless frozen, input is written to a CircularBuffer<1048576> (~23.8 seconds at 44100 Hz):

\[\text{buffer}[\text{writePos}] = x \quad (\text{if not frozen})\]

2. Slice Analysis

When slice parameters (count or length) change, each slice is analyzed for two features:

\[\text{RMS}_i = \sqrt{\frac{1}{L} \sum_{n=0}^{L-1} s_i[n]^2}\]

\[\text{ZCR}_i = \frac{1}{L} \sum_{n=1}^{L-1} \mathbb{1}[\text{sign}(s_i[n]) \neq \text{sign}(s_i[n-1])]\]

where \(L\) is the slice length in samples and \(s_i[n]\) is sample \(n\) of slice \(i\).

3. Transition Matrix Construction

A similarity-weighted transition matrix is built and blended with a uniform distribution:

\[d_{ij} = \sqrt{(\text{RMS}_i - \text{RMS}_j)^2 + (\text{ZCR}_i - \text{ZCR}_j)^2}\]

\[\text{sim}_{ij} = \frac{1}{1 + 10 \cdot d_{ij}}\]

\[P_{ij}^{\text{raw}} = \text{sim}_{ij} \cdot (1 - c) + \frac{1}{N} \cdot c\]

\[P_{ij} = \frac{P_{ij}^{\text{raw}}}{\sum_k P_{ik}^{\text{raw}}}\]

where \(c\) is the chaos parameter and \(N\) is the slice count. Each row is normalized to sum to 1.0.

4. Slice Playback

Audio is read from the current slice with linear interpolation:

\[\text{basePos} = \text{writePos} - N \cdot L\]

\[\text{absPos} = \text{basePos} + \text{currentSlice} \cdot L + \text{readPos}\]

\[y_{\text{slice}} = \text{buffer.readInterp}(\text{absPos})\]

5. Crossfade Envelope

A cosine crossfade window (64 samples) smooths transitions at slice boundaries:

\[\text{fadeLen} = \min(64, L / 4)\]

\[w(p) = \begin{cases} \frac{1}{2}(1 - \cos(\pi \cdot p / \text{fadeLen})) & \text{if } p < \text{fadeLen} \\ \frac{1}{2}(1 - \cos(\pi \cdot (L - p) / \text{fadeLen})) & \text{if } p > L - \text{fadeLen} \\ 1 & \text{otherwise} \end{cases}\]

\[y_{\text{slice}} \mathrel{\times}= w(\text{readPos})\]

6. Slice Transition

When the read position reaches the end of the current slice (\(\text{readPos} \geq L\)), the next slice is chosen probabilistically. A uniform random number \(r \in [0, 1)\) is generated, and the next slice \(j\) is selected by cumulative probability:

\[j = \min\{k : \sum_{m=0}^{k} P_{i,m} \geq r\}\]

where \(i\) is the current slice. The transition is recorded in the 16-entry history ring buffer.

7. Matrix Rebuild

At each slice boundary (when not frozen), the departing slice is re-analyzed and the transition matrix is rebuilt, allowing the probabilities to track evolving input content.

8. Dry/Wet Mix

\[y = x \cdot (1 - \text{mix}) + y_{\text{slice}} \cdot \text{mix}\]

where \(\text{mix} = \text{mixPct} / 100\).

Core Equations

\[\text{sim}_{ij} = \frac{1}{1 + 10 \cdot \sqrt{\Delta\text{RMS}^2 + \Delta\text{ZCR}^2}}\]

\[P_{ij} = \frac{\text{sim}_{ij}(1-c) + c/N}{\sum_k \left[\text{sim}_{ik}(1-c) + c/N\right]}\]

\[y = x(1 - m) + \text{slice}[\text{markov}(P)] \cdot w(\text{readPos}) \cdot m\]

Snapshot Fields

Field	Type	Range	Description
Input Level	Float	0--1	Smoothed input amplitude
Output Level	Float	0--1	Smoothed output amplitude
Current Slice	Uint8	0--32	Index of the currently playing slice
Slice Count	Uint8	2--32	Total number of slices
Frozen	Bool	0--1	Whether recording is disabled
Transitions	Float[16]	0--32	Ring buffer of recent slice transition indices

Implementation Notes

CircularBuffer<1048576> provides ~23.8 seconds at 44100 Hz. With 32 slices of 1000 ms each, the total slice region spans 32 seconds, which can exceed the buffer. In practice, slices wrap within the circular buffer via mask-based addressing.
Similarity distance uses a scaling factor of 10 in the denominator (\(1 + 10d\)) to make the similarity function respond meaningfully to small differences in RMS and zero-crossing rate.
xorshift32 PRNG generates random numbers for slice selection. The output is mapped to \([0, 1]\) by dividing by 0xFFFFFFFF.
Matrix normalization ensures each row sums to 1.0, making it a valid probability distribution. This is done after blending similarity with uniform weights.
Re-analysis at boundaries keeps the transition matrix current with evolving audio content, but only when not frozen -- frozen mode preserves the captured transition structure.
Fixed 32-slice maximum with stack-allocated arrays. No heap allocation on the audio thread.
All parameters use std::atomic<float> for lock-free thread safety.
Snapshot emission is decimated to ~60 fps (every 735 samples at 44.1 kHz).

Equation Summary

y = slice[markov(chaos)] * crossfade