Skip to content

Onset Detector

Tier: Analysis | ComponentType: 37 | Params: 3

Spectral flux transient detection with adaptive threshold, cooldown, and optional band-split analysis. Audio passes through unchanged.

Overview

OnsetDetector uses short-time Fourier analysis to detect note onsets, transients, and percussive attacks in the audio stream. It accumulates samples into a 1024-sample frame, applies a Hann window, and computes the FFT every 256 samples (hop size). The spectral flux — the sum of positive magnitude differences between consecutive frames — measures how rapidly the spectrum is changing. Sudden increases in spectral flux correspond to transient events.

An adaptive threshold based on the running median of recent flux values prevents false triggers in sustained or noisy signals. The sensitivity parameter scales this threshold: higher values require larger spectral changes to trigger an onset. A cooldown timer enforces a minimum interval between detections, preventing repeated triggering from the same event.

When Band Split is enabled, the detector also computes flux in three frequency bands (low: 0-300 Hz, mid: 300-3000 Hz, high: 3000+ Hz), each with a reduced threshold. This allows downstream systems to distinguish between bass transients, midrange attacks, and high-frequency events.

This is an analysis-only component — audio passes through unchanged. All detection results are emitted via the snapshot pipeline.

File Locations

Path
Header Sources/FolioDSP/include/FolioDSP/Analysis/OnsetDetector.h
Implementation Sources/FolioDSP/src/Analysis/OnsetDetector.cpp
Tests Tests/FolioDSPTests/OnsetDetectorTests.swift
Bridge Sources/FolioDSPBridge/src/FolioDSPBridge.mm (OnsetDetectorBridge)

Parameters

Index Name Description Min Max Default Min Default Max Default Unit
0 Sensitivity Threshold multiplier (higher = less sensitive) 0.1 10.0 0.5 5.0 1.5
1 Min Interval Minimum time between onsets (cooldown) 5.0 500.0 20.0 200.0 50.0 ms
2 Band Split Enable per-band detection (0=off, 1=on) 0.0 1.0 0.0 1.0 0.0

Processing Algorithm

The process() function accumulates samples and manages cooldown. Every hop, analyzeFrame() runs the full detection pipeline:

1. Frame Accumulation

Samples are written into a 1024-sample circular buffer. Every 256 samples (hop size), the analysis frame is extracted:

\[N_{\text{FFT}} = 1024, \quad \text{hop} = 256\]

2. Hann Windowing

The frame is windowed to reduce spectral leakage before FFT:

\[x_w[n] = x[n] \cdot w[n], \quad w[n] = 0.5 - 0.5 \cos\!\left(\frac{2\pi n}{N}\right)\]

The Hann window values are retrieved from a precomputed 1024-entry lookup table with interpolation.

3. Forward FFT

A real-to-complex FFT transforms the windowed frame into 512 frequency bins:

\[X[k] = \text{FFT}(x_w), \quad k \in [0, 511]\]

4. Magnitude Computation

The magnitude of each frequency bin is computed:

\[|X[k]| = \sqrt{\text{Re}(X[k])^2 + \text{Im}(X[k])^2}\]

5. Spectral Flux (Half-Wave Rectified)

Spectral flux measures the sum of positive magnitude increases across all bins. Only increases count — decreases are ignored, making the detector sensitive to energy appearing (onsets) rather than disappearing (offsets):

\[\Phi = \sum_{k=0}^{511} \max\!\left(0, \; |X_n[k]| - |X_{n-1}[k]|\right)\]

6. Adaptive Threshold

The threshold adapts to the signal's recent spectral activity using the median of the last 20 flux values, scaled by the sensitivity parameter:

\[\theta = \text{median}(\Phi_{\text{history}}[0..19]) \times \text{sensitivity}\]

The median is computed via insertion sort on a copy of the flux history buffer.

7. Onset Detection

An onset is declared when the spectral flux exceeds the threshold and the cooldown has expired:

\[\text{onset} = (\Phi > \theta) \;\wedge\; (\text{cooldown} = 0)\]

8. Onset Strength

The strength of the detected onset is proportional to how far the flux exceeds the threshold:

\[\text{strength} = \min\!\left(1, \; \frac{\Phi}{\theta} - 1\right)\]

9. Cooldown

After an onset, a cooldown timer prevents retriggering:

\[\text{cooldown}_{\text{samples}} = \text{minInterval}_{\text{ms}} \times \frac{f_s}{1000}\]

The onset flag remains set during cooldown and clears when the counter reaches zero.

10. Band-Split Detection (Optional)

When enabled, separate flux is computed for three frequency bands with bin ranges derived from the sample rate:

\[\text{binPerHz} = \frac{N/2}{f_s / 2}\]
\[\text{Low: } [0, 300 \cdot \text{binPerHz}), \quad \text{Mid: } [300 \cdot \text{binPerHz}, 3000 \cdot \text{binPerHz}), \quad \text{High: } [3000 \cdot \text{binPerHz}, 512)\]

Each band uses a reduced threshold:

\[\theta_{\text{band}} = \theta \times 0.3\]

11. Audio Passthrough

The input sample is returned unchanged. OnsetDetector is a pure analysis component with no effect on the audio signal.

Core Equations

\[\Phi = \sum_{k} \max(0, \; |X_n[k]| - |X_{n-1}[k]|)\]
\[\theta = \text{median}(\Phi_{\text{history}}) \times \text{sensitivity}\]
\[\text{onset} = \Phi > \theta\]

Snapshot Fields

Field Type Range Unit Description
Spectral Flux Float 0–10 Current spectral flux magnitude
Threshold Float 0–10 Current adaptive threshold value
Onset Bool 0–1 Whether an onset was detected this frame
Strength Float 0–1 Onset strength (how far flux exceeds threshold)
Low Onset Bool 0–1 Onset detected in low band (0-300 Hz)
Mid Onset Bool 0–1 Onset detected in mid band (300-3000 Hz)
High Onset Bool 0–1 Onset detected in high band (3000+ Hz)
Flux History Float[32] 0–10 Ring buffer of recent spectral flux values

Implementation Notes

  • FFT size is 1024 with a hop of 256 (75% overlap), giving ~172 Hz analysis rate at 44.1 kHz. The 512-bin resolution provides approximately 43 Hz per bin.
  • Half-wave rectification of the spectral difference is critical — it makes the detector respond only to energy appearing in new frequency bins, not to energy decaying. Without this, sustained notes would produce continuous flux.
  • Adaptive threshold via median is more robust than a fixed threshold or mean-based threshold, since medians are resistant to outliers from the onsets themselves.
  • Band-split threshold is multiplied by 0.3 (not the full sensitivity value) because per-band flux is inherently lower than total flux. This ensures band-level onset detection is proportionally sensitive.
  • Cooldown operates at sample resolution, not hop resolution, providing fine-grained control over minimum onset interval.
  • The Sensitivity parameter uses ParamSmoother (smoothed = true) to prevent abrupt threshold changes during live performance.
  • All parameters use std::atomic<float> for lock-free thread safety.
  • Snapshot emission is decimated to ~60 fps (every 735 samples at 44.1 kHz).

Equation Summary

onset = flux > median*sensitivity