Pitch Tracker
Tier: Analysis | ComponentType: 36 | Params: 3
Monophonic YIN pitch detection with cumulative mean normalized difference, parabolic interpolation, and median filtering. Audio passes through unchanged.
Overview
PitchTracker implements the YIN algorithm for fundamental frequency estimation. It accumulates input samples into a 2048-sample frame buffer and runs analysis every 512 samples (hop size). The algorithm computes the difference function across half the frame, normalizes it with cumulative mean normalization (CMNDF), then searches for the first dip below a confidence threshold. Parabolic interpolation refines the lag estimate to sub-sample accuracy, and a 5-element median filter stabilizes the output against octave jumps.
This is an analysis-only component — audio passes through the process() method unchanged. All pitch information is emitted via the snapshot pipeline. The detected frequency is converted to a MIDI note number for display, and a 32-element note history ring buffer provides a scrolling pitch trace.
The Threshold parameter controls detection sensitivity. Lower values accept weaker pitch candidates (more detections but more false positives). Higher values require stronger periodicity (fewer detections but higher confidence). The frequency range is bounded by Min Freq and Max Freq, which convert to lag search bounds internally.
File Locations
| Path | |
|---|---|
| Header | Sources/FolioDSP/include/FolioDSP/Analysis/PitchTracker.h |
| Implementation | Sources/FolioDSP/src/Analysis/PitchTracker.cpp |
| Tests | Tests/FolioDSPTests/PitchTrackerTests.swift |
| Bridge | Sources/FolioDSPBridge/src/FolioDSPBridge.mm (PitchTrackerBridge) |
Parameters
| Index | Name | Description | Min | Max | Default Min | Default Max | Default | Unit |
|---|---|---|---|---|---|---|---|---|
| 0 | Threshold | YIN detection threshold (lower = more sensitive) | 0.01 | 0.8 | 0.05 | 0.5 | 0.15 | |
| 1 | Min Freq | Minimum detectable frequency | 20.0 | 500.0 | 30.0 | 200.0 | 50.0 | Hz |
| 2 | Max Freq | Maximum detectable frequency | 200.0 | 8000.0 | 500.0 | 5000.0 | 2000.0 | Hz |
Processing Algorithm
The process() function accumulates samples and triggers analysis every hop. The analysis pipeline executes these steps:
1. Frame Accumulation
Samples are written into a 2048-sample circular buffer. Every 512 samples (hop size), analyzeFrame() runs:
2. Difference Function
The squared difference function measures how similar a signal is to a time-shifted version of itself:
At the true period \(\tau_0\), \(d(\tau_0)\) reaches a minimum because the signal aligns with itself.
3. Cumulative Mean Normalized Difference (CMNDF)
The raw difference function is normalized to remove the dependence on signal amplitude:
This normalization ensures \(d'(0) = 1\) and prevents the trivial minimum at \(\tau = 0\) from being selected. Values of \(d'(\tau) < 1\) indicate periodicity stronger than average.
4. Absolute Threshold Search
The algorithm finds the first lag \(\tau\) in the valid range where the CMNDF falls below the threshold, then continues to the local minimum:
5. Parabolic Interpolation
Sub-sample accuracy is achieved by fitting a parabola through three points around the best lag:
6. Frequency Estimation
The fundamental frequency is the sample rate divided by the refined lag:
7. Median Filter
A 5-element median filter smooths the frequency output to reject octave jumps and spurious detections:
8. MIDI Note Conversion
The detected frequency is converted to a MIDI note number:
9. Confidence
Confidence is derived from the CMNDF value at the best lag. Lower CMNDF values indicate stronger periodicity:
When no valid lag is found, confidence decays exponentially:
10. Audio Passthrough
The input sample is returned unchanged. PitchTracker is a pure analysis component with no effect on the audio signal.
Core Equations
Snapshot Fields
| Field | Type | Range | Unit | Description |
|---|---|---|---|---|
| Frequency | Float | 20–8000 | Hz | Detected fundamental frequency |
| Confidence | Float | 0–1 | Detection confidence (1 - CMNDF at best lag) | |
| MIDI Note | Float | 0–127 | Frequency converted to MIDI note number | |
| Tracking | Bool | 0–1 | Whether a pitch is currently being tracked | |
| Note History | Float[32] | 0–127 | Ring buffer of recent MIDI note detections |
Implementation Notes
- Frame size is 2048 samples with a hop size of 512, giving ~86 Hz analysis rate at 44.1 kHz. This limits the lowest detectable frequency to approximately
sampleRate / 1024(the YIN search spans half the frame). - CMNDF is the key innovation of YIN over raw autocorrelation — it eliminates the need for an explicit threshold on the difference function magnitude, instead normalizing so that values below 1.0 indicate periodicity.
- Parabolic interpolation provides fractional-lag precision essential for accurate pitch tracking at high frequencies, where a single-sample error represents a large frequency difference.
- Median filter uses a fixed 5-element window with insertion sort. This rejects isolated octave errors (e.g., detecting the second harmonic instead of the fundamental) without adding significant latency.
- Confidence decay at 0.95 per hop means tracking is declared lost after approximately 20 hops (~120 ms at 44.1 kHz) without a valid detection.
- All parameters use
std::atomic<float>for lock-free thread safety. - Snapshot emission is decimated to ~60 fps (every 735 samples at 44.1 kHz).
Equation Summary
f = YIN(CMNDF); passthrough