Spectral Freeze
Tier: Algorithms | ComponentType: 35 | Params: 5
FFT-based spectral freeze with magnitude capture, phase advancement, diffusion noise, spectral tilt shaping, and feedback blending.
Overview
SpectralFreeze uses a Short-Time Fourier Transform (STFT) to analyze incoming audio and, when triggered, freezes the spectral magnitudes in place. The frozen spectrum is then continuously resynthesized with advancing synthesis phases, producing an infinitely sustaining, evolving drone from whatever audio was playing at the moment of capture.
The core of the effect is the separation of magnitude and phase. When Freeze is engaged, the magnitudes are captured and held constant while the synthesis phases continue advancing, maintaining the spectral content but allowing the waveform to evolve naturally. Without this phase advancement, the output would be a static, repeating waveform.
Diffusion adds random noise to the phase advancement, smearing the spectral content and creating a more ambient, diffuse texture. At 0.0, the resynthesis is clean and preserves the original spectral character precisely. At 1.0, the phases are heavily randomized, producing a wash of noise colored by the frozen spectrum.
Tilt reshapes the frozen spectrum by boosting or cutting high frequencies relative to low frequencies. Positive tilt brightens the frozen sound; negative tilt darkens it. Feedback blends the ongoing analysis magnitudes back into the frozen magnitudes, allowing the frozen spectrum to slowly track the input -- useful for creating sustained effects that gradually morph with new input.
The STFT uses a 4096-sample FFT with 75% overlap (1024-sample hop), introducing 3072 samples of latency (~70 ms at 44100 Hz).
File Locations
| Path | |
|---|---|
| Header | Sources/FolioDSP/include/FolioDSP/Algorithms/SpectralFreeze.h |
| Implementation | Sources/FolioDSP/src/Algorithms/SpectralFreeze.cpp |
| Tests | Tests/FolioDSPTests/SpectralFreezeTests.swift |
| Bridge | Sources/FolioDSPBridge/src/FolioDSPBridge.mm (SpectralFreezeBridge) |
Parameters
| Index | Name | Description | Min | Max | Default Min | Default Max | Default | Unit |
|---|---|---|---|---|---|---|---|---|
| 0 | Freeze | Freeze toggle (0=pass, 1=frozen) | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | |
| 1 | Diffusion | Phase noise amount during freeze | 0.0 | 1.0 | 0.0 | 1.0 | 0.3 | |
| 2 | Tilt | Spectral brightness shaping | -1.0 | 1.0 | -1.0 | 1.0 | 0.0 | |
| 3 | Feedback | Blend frozen spectrum with ongoing analysis | 0.0 | 0.99 | 0.0 | 0.95 | 0.0 | |
| 4 | Mix | Dry/wet blend | 0.0 | 100.0 | 0.0 | 100.0 | 100.0 | % |
STFT configuration:
| Parameter | Value |
|---|---|
| FFT Size | 4096 samples |
| Hop Size | 1024 samples (75% overlap) |
| Bins | 2048 |
| Latency | 3072 samples (~70 ms at 44100 Hz) |
| Window | Hann |
Processing Algorithm
The process() function operates in two stages: per-sample I/O and per-hop spectral processing.
Per-Sample Processing
1. Input Ring Buffer
Each input sample is written to a circular ring buffer of length 4096:
2. Hop Counting
A counter increments per sample. When it reaches 1024 (the hop size), processHop() is called:
3. Output Reading
The output sample is read from the overlap-add accumulator:
4. Dry/Wet Mix
Per-Hop Processing (every 1024 samples)
1. Window Input Frame
The most recent 4096 samples from the input ring are windowed with a Hann function:
where \(w_{\text{Hann}}[n] = \frac{1}{2}\left(1 - \cos\left(\frac{2\pi n}{N}\right)\right)\) and \(N = 4096\).
2. Forward FFT
The windowed frame is transformed to frequency domain:
Magnitudes and phases are extracted:
3. Freeze Capture
On the rising edge of the freeze parameter (transition from unfrozen to frozen), magnitudes and phases are captured:
4. Frozen Resynthesis
While frozen, for each bin \(k\):
Tilt shaping adjusts the magnitude based on normalized bin position:
Positive tilt boosts high bins and cuts low bins; negative tilt does the opposite.
Feedback blending gradually incorporates ongoing analysis:
Phase advancement with diffusion noise:
where \(d\) is the diffusion parameter.
Reconstruction from polar to Cartesian:
When not frozen, the original FFT output passes through unchanged.
5. Inverse FFT
6. Window and Overlap-Add
The IFFT output is windowed again and accumulated into the output buffer with synthesis gain compensation:
The double windowing (analysis + synthesis) with 75% overlap produces a smooth constant-gain reconstruction.
Core Equations
Snapshot Fields
| Field | Type | Range | Unit | Description |
|---|---|---|---|---|
| Input Level | Float | 0--1 | Smoothed input amplitude | |
| Output Level | Float | 0--1 | Smoothed output amplitude | |
| Frozen | Bool | 0--1 | Whether the spectrum is currently frozen | |
| Diffusion | Float | 0--1 | Current diffusion amount | |
| Spectrum | Float[32] | 0--1 | 32-band downsampled magnitude spectrum |
Implementation Notes
- 75% overlap (4096-sample window with 1024-sample hop) provides smooth reconstruction. The overlap-add of four Hann-windowed frames sums to a constant, ensuring artifact-free output when not frozen.
- 3072-sample latency (FFT size minus hop size) is inherent to the STFT architecture. At 44100 Hz, this is approximately 70 ms.
- Spectrum snapshot downsamples the 2048 frequency bins to 32 bands by averaging groups of 64 bins. When frozen, the frozen magnitudes are displayed; otherwise, the live analysis magnitudes are shown.
- xorshift32 PRNG generates diffusion noise, mapped to \([0, 1]\) then scaled to \([-1, 1] \cdot \pi\) for phase perturbation.
- Phase delta estimation uses the difference between adjacent bin phases (\(\angle X[k] - \angle X[k-1]\)) as an approximation of the instantaneous frequency for synthesis phase advancement. This is a simplified approach compared to full phase vocoder phase tracking.
- Double Hann windowing (analysis and synthesis) with 75% overlap creates an effective Hann-squared window whose overlap-add is constant.
- All smoothed parameters (Diffusion, Tilt, Feedback, Mix) use
ParamSmoother. Freeze is not smoothed (discrete toggle). - All parameters use
std::atomic<float>for lock-free thread safety. - Snapshot emission is decimated to ~60 fps (every 735 samples at 44.1 kHz).
Equation Summary
|Y|frozen * e^(j*phi_advancing) + diffusion