DDSP Resynth

DDSP Resynth implements Google's Differentiable Digital Signal Processing pipeline: extract pitch + loudness + harmonic distribution + noise filter from an…

Parameters

[Parameters not yet documented — likely declared in source outside register_animatable(). Add an override in _overrides/ if needed.]

Additional controls

Frame Rate (Hz) — Analysis frame rate for the entire DDSP pipeline, 50–250 Hz (default 100). Pitch, loudness, harmonics, and noise are all sampled at this rate. Higher = finer time resolution (catches faster pitch variations, expressive vibrato) but more compute and analysis frames. 100 Hz is the DDSP standard from Google’s original paper and works well for vocals / sustained instruments. Push to 150–200 Hz for very fast melodic content.

Mix — Blend between the synthesized resynthesis (1) and silence (0). 0 = no synth output (input still passes through bypass logic if needed); 1 = full synth. Use intermediate values to tame an over-bright resynthesis or to layer the synth subtly under another source.

Pitch Quality — CREPE neural pitch detection model size:

Draft — tiny model, ~1 MB, fastest. Fine for development / quick previews; misses subtle pitch variations.
Standard — small model, ~5 MB, default. Best speed/accuracy trade-off for most material.
Precise — medium model, ~15 MB. Cleaner pitch tracks, especially on noisy or breathy sources.
Extreme — full model, ~90 MB. Research-grade accuracy. Use only when pitch fidelity is paramount and you can afford the inference time.

Models are auto-downloaded from HuggingFace on first use; subsequent runs reuse the cache.

Voicing Threshold — CREPE confidence threshold for marking frames as “voiced”, 0–1 (default 0.5). Lower = more frames count as having pitch (catches softer / breathier sections); higher = only frames with strong tonal content drive the synth (cleaner output, may drop syllables on whispered material). Tune to your source’s character.

Pitch Fine Tune — Pitch correction in cents, ±100 (default −54). The default is not zero by accident — CREPE’s published model has a known pitch bias (~54 cents sharp on average); the default compensation aligns its output with concert pitch. If your source is referenced to a different tuning, adjust here.

Loudness Weighting — Frequency weighting for loudness extraction:

A-weighted — perceptual (rolls off bass and very high treble; matches how humans judge loudness). Default; right for most music/voice.
C-weighted — flatter (less bass rolloff). Better when bass content matters perceptually.
Flat — pure RMS, no weighting. Use when you want raw energy rather than perceived loudness.

Num Harmonics — Number of harmonic partials in the additive synthesizer, 1–100 (default 60). More harmonics = brighter / more detailed timbre but more compute. 60 is enough for most material; bump to 80–100 for very bright sources (saxophone, trumpet, fuzz guitar) where high partials carry character.

Interp Mode — How harmonic amplitudes are extracted from the input spectrum:

Nearest — fastest, snaps to closest bin. Loses sub-bin precision, may produce stepping on slow pitch glides.
Linear — default; interpolates between adjacent bins. Good balance.
Parabolic — quadratic peak interpolation. Most accurate, especially when harmonics fall between FFT bins. Use for sustained tonal material where timbral fidelity matters.

Harm Smoothing — Temporal smoothing applied to harmonic amplitudes between frames, 0–0.9 (default 0.1). 0 = no smoothing (harmonics jump frame-to-frame, can sound buzzy on fast material). Higher = smoother harmonic contours but loses transient detail. Tune to source: vocals/sustained instruments tolerate 0.2–0.4; percussive sources should stay near 0.

Normalize Power — When on, harmonic amplitudes are power-normalized so the additive synth’s energy stays consistent regardless of how many partials are active. When off, harmonic levels are absolute. Default on — keeps perceived loudness stable.

Num Noise Bands — Filter bands for the noise component, 16–128 (default 65). The noise generator uses a filtered-noise model — more bands = finer spectral shaping of the breath/hiss/buzz portion of the sound. 65 is a sensible default; push to 100+ for material with rich noise components (breathy vocals, bowed strings).

Noise Smoothing — Temporal smoothing for noise filter coefficients, 0–0.9 (default 0.2). Same idea as harm_smoothing but for the noise band. Higher = smoother noise color over time.

Floor (dB) — Noise floor below which the noise component is silent, −80 to −20 dB (default −60). Sets how aggressively quiet portions of the source get processed as noise. Higher (closer to 0) = more noise injection on quiet sections; lower = cleaner silences.

Harmonic Level (dB) — Output level for the harmonic (additive sine) component, −24 to +12 dB. Trim per-instrument-type — bright sources may need lower; mellow sources may need a boost.

Noise Level (dB) — Output level for the noise (filtered noise) component, −24 to +12 dB (default −12). Default is conservative — DDSP can sound buzzy if noise is too high. Tune by ear: pull up for breathy vocals / wind instruments, push down for clean tonal sources.

Harmonic Rolloff — Spectral tilt applied to harmonic amplitudes, −12 to +6 dB/octave. Negative = darker (high partials attenuated, like a low-pass shelf on the synth); positive = brighter. Use to taste-shape the timbre after extraction without re-running the analysis.

Noise Color — Noise spectrum tilt, −1 (pink) to +1 (blue), 0 = white. −1 weights the noise toward low frequencies; +1 toward highs. Subtle but useful for matching the noise band to the original source’s character.

Output Gain (dB) — Master output gain for the synthesized signal, −24 to +12 dB. Final stage trim; sits after harmonic/noise summing.

Phase Mode — Phase reconstruction strategy for the synth output:

None — default. Synth uses raw additive phases (bin-coherent within itself but unrelated to the input). Cheapest. Sounds correct on its own; mixing back with the input can cause comb-filtering since phases don’t align.
RTPGHI (Recommended) — Real-Time Phase Gradient Heap Integration. Reconstructs phase from the synthesized magnitude using PGHI’s integration scheme. More natural sustains; slightly more expensive.
Anchored — uses the input as a phase reference. Best alignment with the original; useful when summing dry+wet. Most expensive.

Presets — Six one-click preset buttons that overwrite the synth parameters with sensible starting points: Sawtooth, Square, Breathy, Warm Pad, Bright Lead, (reset). They write into the harmonic/noise levels, rolloff, color, and smoothing — pitch/voicing/frame rate are untouched. Use to jump-start a sound design session, then tweak from there.

About DDSP Resynth

DDSP Resynth implements Google’s Differentiable Digital Signal Processing pipeline: extract pitch + loudness + harmonic distribution + noise filter from an input recording, then resynthesize via additive synthesis (sine bank for harmonics, filtered noise for the rest). The result is a fully synthetic version of the input that can be timbrally manipulated via the synthesis parameters (harmonic level, rolloff, noise color, etc.) — change them and the same melody comes out as a different instrument.

Best on: monophonic harmonic sources — voice, violin, flute, single-line synth leads. Anything with one clear pitch contour.

Worst on: polyphonic music (chords confuse the pitch extractor), inharmonic sounds (bells, metallic percussion — the harmonic model assumes integer multiples of the fundamental), and drums / transient-heavy material (DDSP smears transients).

Build requirement: requires K2K built with -DK2K_USE_ONNX=ON (CREPE pitch detection runs on ONNX Runtime). In an ONNX-less build the node falls back to a stub that just shows “ONNX Runtime not available” — synthesis itself is native C++, but pitch extraction is the gating dependency.

UI is organized as an exclusive accordion: only one section is open at a time (Global / Pitch Extraction / Loudness Extraction / Harmonic Analysis / Noise Analysis / Synthesis / Presets). This keeps the panel compact despite the parameter count. Compare with Neural Stem Separator for source separation (different problem entirely — separation vs resynthesis), and with K2K’s other extractor nodes for non-neural feature extraction.

Generated 2026-05-05 from K2K_Dev@96730bdc by scripts/gen_lexique.py. Edit _intros/ or _overrides/, not this file.