Spectral Freeze

Tier: Algorithms | ComponentType: 35 | Params: 5

FFT-based spectral freeze with magnitude capture, phase advancement, diffusion noise, spectral tilt shaping, and feedback blending.

Overview

SpectralFreeze uses a Short-Time Fourier Transform (STFT) to analyze incoming audio and, when triggered, freezes the spectral magnitudes in place. The frozen spectrum is then continuously resynthesized with advancing synthesis phases, producing an infinitely sustaining, evolving drone from whatever audio was playing at the moment of capture.

The core of the effect is the separation of magnitude and phase. When Freeze is engaged, the magnitudes are captured and held constant while the synthesis phases continue advancing, maintaining the spectral content but allowing the waveform to evolve naturally. Without this phase advancement, the output would be a static, repeating waveform.

Diffusion adds random noise to the phase advancement, smearing the spectral content and creating a more ambient, diffuse texture. At 0.0, the resynthesis is clean and preserves the original spectral character precisely. At 1.0, the phases are heavily randomized, producing a wash of noise colored by the frozen spectrum.

Tilt reshapes the frozen spectrum by boosting or cutting high frequencies relative to low frequencies. Positive tilt brightens the frozen sound; negative tilt darkens it. Feedback blends the ongoing analysis magnitudes back into the frozen magnitudes, allowing the frozen spectrum to slowly track the input -- useful for creating sustained effects that gradually morph with new input.

The STFT uses a 4096-sample FFT with 75% overlap (1024-sample hop), introducing 3072 samples of latency (~70 ms at 44100 Hz).

File Locations

	Path
Header	`Sources/FolioDSP/include/FolioDSP/Algorithms/SpectralFreeze.h`
Implementation	`Sources/FolioDSP/src/Algorithms/SpectralFreeze.cpp`
Tests	`Tests/FolioDSPTests/SpectralFreezeTests.swift`
Bridge	`Sources/FolioDSPBridge/src/FolioDSPBridge.mm` (SpectralFreezeBridge)

Parameters

Index	Name	Description	Min	Max	Default Min	Default Max	Default	Unit
0	Freeze	Freeze toggle (0=pass, 1=frozen)	0.0	1.0	0.0	1.0	0.0
1	Diffusion	Phase noise amount during freeze	0.0	1.0	0.0	1.0	0.3
2	Tilt	Spectral brightness shaping	-1.0	1.0	-1.0	1.0	0.0
3	Feedback	Blend frozen spectrum with ongoing analysis	0.0	0.99	0.0	0.95	0.0
4	Mix	Dry/wet blend	0.0	100.0	0.0	100.0	100.0	%

STFT configuration:

Parameter	Value
FFT Size	4096 samples
Hop Size	1024 samples (75% overlap)
Bins	2048
Latency	3072 samples (~70 ms at 44100 Hz)
Window	Hann

Processing Algorithm

The process() function operates in two stages: per-sample I/O and per-hop spectral processing.

Per-Sample Processing

1. Input Ring Buffer

Each input sample is written to a circular ring buffer of length 4096:

\[\text{inputRing}[\text{writePos}] = x\]

2. Hop Counting

A counter increments per sample. When it reaches 1024 (the hop size), processHop() is called:

\[\text{hopCounter} \mathrel{+}= 1\]

3. Output Reading

The output sample is read from the overlap-add accumulator:

\[y_{\text{wet}} = \text{outputAccum}[\text{readPos}]\]

\[\text{outputAccum}[\text{readPos}] = 0 \quad \text{(clear after read)}\]

4. Dry/Wet Mix

\[y = x \cdot (1 - \text{mix}) + y_{\text{wet}} \cdot \text{mix}\]

Per-Hop Processing (every 1024 samples)

1. Window Input Frame

The most recent 4096 samples from the input ring are windowed with a Hann function:

\[x_w[n] = \text{inputRing}[(\text{writePos} + n) \bmod 4096] \cdot w_{\text{Hann}}[n]\]

where \(w_{\text{Hann}}[n] = \frac{1}{2}\left(1 - \cos\left(\frac{2\pi n}{N}\right)\right)\) and \(N = 4096\).

2. Forward FFT

The windowed frame is transformed to frequency domain:

\[X[k] = \text{FFT}(x_w), \quad k = 0, \ldots, 2047\]

Magnitudes and phases are extracted:

\[|X[k]| = \sqrt{\text{Re}(X[k])^2 + \text{Im}(X[k])^2}\]

\[\angle X[k] = \text{atan2}(\text{Im}(X[k]),\ \text{Re}(X[k]))\]

3. Freeze Capture

On the rising edge of the freeze parameter (transition from unfrozen to frozen), magnitudes and phases are captured:

\[|F[k]| = |X[k]|, \quad \phi_s[k] = \angle X[k]\]

4. Frozen Resynthesis

While frozen, for each bin \(k\):

Tilt shaping adjusts the magnitude based on normalized bin position:

\[g_{\text{tilt}}[k] = 1 + t \cdot \left(\frac{2k}{N_{\text{bins}}} - 1\right)\]

\[|F'[k]| = |F[k]| \cdot \max(0,\ g_{\text{tilt}}[k])\]

Positive tilt boosts high bins and cuts low bins; negative tilt does the opposite.

Feedback blending gradually incorporates ongoing analysis:

\[|F[k]| \leftarrow |F[k]| \cdot (1 - f_b) + |X[k]| \cdot f_b\]

Phase advancement with diffusion noise:

\[\Delta\phi[k] = \angle X[k] - \angle X[k-1]\]

\[\phi_s[k] \mathrel{+}= \Delta\phi[k] + d \cdot \text{random}(-1, 1) \cdot \pi\]

where \(d\) is the diffusion parameter.

Reconstruction from polar to Cartesian:

\[Y_{\text{re}}[k] = |F'[k]| \cdot \cos(\phi_s[k])\]

\[Y_{\text{im}}[k] = |F'[k]| \cdot \sin(\phi_s[k])\]

When not frozen, the original FFT output passes through unchanged.

5. Inverse FFT

\[y_{\text{ifft}} = \text{IFFT}(Y)\]

6. Window and Overlap-Add

The IFFT output is windowed again and accumulated into the output buffer with synthesis gain compensation:

\[g_{\text{hop}} = \frac{\text{hopSize}}{\text{fftSize}}\]

\[\text{outputAccum}[(\text{readPos} + n) \bmod 4096] \mathrel{+}= y_{\text{ifft}}[n] \cdot w_{\text{Hann}}[n] \cdot g_{\text{hop}}\]

The double windowing (analysis + synthesis) with 75% overlap produces a smooth constant-gain reconstruction.

Core Equations

\[|F[k]| = \text{captured magnitude at freeze onset}\]

\[\phi_s[k] \mathrel{+}= \Delta\phi[k] + d \cdot \text{noise} \cdot \pi\]

\[g_{\text{tilt}}[k] = 1 + t \cdot (2k/N - 1)\]

\[Y[k] = |F[k]| \cdot g_{\text{tilt}}[k] \cdot e^{j\phi_s[k]}\]

Snapshot Fields

Field	Type	Range	Description
Input Level	Float	0--1	Smoothed input amplitude
Output Level	Float	0--1	Smoothed output amplitude
Frozen	Bool	0--1	Whether the spectrum is currently frozen
Diffusion	Float	0--1	Current diffusion amount
Spectrum	Float[32]	0--1	32-band downsampled magnitude spectrum

Implementation Notes

75% overlap (4096-sample window with 1024-sample hop) provides smooth reconstruction. The overlap-add of four Hann-windowed frames sums to a constant, ensuring artifact-free output when not frozen.
3072-sample latency (FFT size minus hop size) is inherent to the STFT architecture. At 44100 Hz, this is approximately 70 ms.
Spectrum snapshot downsamples the 2048 frequency bins to 32 bands by averaging groups of 64 bins. When frozen, the frozen magnitudes are displayed; otherwise, the live analysis magnitudes are shown.
xorshift32 PRNG generates diffusion noise, mapped to \([0, 1]\) then scaled to \([-1, 1] \cdot \pi\) for phase perturbation.
Phase delta estimation uses the difference between adjacent bin phases (\(\angle X[k] - \angle X[k-1]\)) as an approximation of the instantaneous frequency for synthesis phase advancement. This is a simplified approach compared to full phase vocoder phase tracking.
Double Hann windowing (analysis and synthesis) with 75% overlap creates an effective Hann-squared window whose overlap-add is constant.
All smoothed parameters (Diffusion, Tilt, Feedback, Mix) use ParamSmoother. Freeze is not smoothed (discrete toggle).
All parameters use std::atomic<float> for lock-free thread safety.
Snapshot emission is decimated to ~60 fps (every 735 samples at 44.1 kHz).

Equation Summary

|Y|frozen * e^(j*phi_advancing) + diffusion