Skip to main content
k2k audio logo k2k audio

Back to Neural
Documentation tree

Neural Stem Separator

Neural Stem Separator runs Demucs v4 (htdemucs) — currently the strongest open-source source-separation model — to split a mixed audio recording into 4…

Parameters

[Parameters not yet documented — likely declared in source outside register_animatable(). Add an override in _overrides/ if needed.]

Additional controls

Preview Stem — Which of the 4 separated stems to send to the MainViewer for preview/scope display: Drums, Bass, Other, Vocals. Doesn’t affect the actual outputs (all 4 are always produced) — just picks which one shows up in the scope and through audio playback when this node is the focused preview source. Switch between them while the node is selected to audition each stem in isolation.

Mix — Stem isolation amount, 0–100%. 0% = silence on the output ports. 100% = full neural separation. Intermediate values blend each stem with silence rather than with the dry input — at 50%, every stem is at half-amplitude. Mostly useful as a quick mute / level trim per node; for any serious mixing, route the 4 outputs into a Mixer or Stereo Mixer downstream and balance there.

Normalize Output — When on, each stem is rescaled to prevent inter-stage clipping (per-stem peak normalization). When off, stems retain Demucs’ raw output level — louder stems may need a Gain node downstream to fit your headroom budget. Default on; turn off only if you need bit-exact comparison with the model’s raw amplitudes.

Use Gpu — Toggle GPU execution for the ONNX inference. When on (and a CUDA / DirectML / similar provider is available), inference runs on the GPU — typically 5–20× faster than CPU. When off, runs on CPU (always available, but for a 4-minute song expect 30 s+ of inference). The toggle is a request — if no GPU provider is available, the runtime falls back to CPU silently. Setting takes effect on next model load.

About Neural Stem Separator

Neural Stem Separator runs Demucs v4 (htdemucs) — currently the strongest open-source source-separation model — to split a mixed audio recording into 4 semantic stems: Drums, Bass, Other (everything that isn’t drums/bass/vocals — guitars, synths, keys, fx), and Vocals. This is the killer feature for surgical work: frequency-based extractors can isolate spectral regions but they can’t tell a vocal from a synth pad on the same notes. Use case: drop a finished mix in, extract the vocal, run it through a totally different processing chain than the music, sum it all back. Or: mute the drums and use only Bass+Other+Vocals for a remix bed. Or: take the Drums stem alone into the slicer for sample chopping with no bleed.

Cost / setup: Demucs is a SLOW node (10–30 s per pass for a 2–4 minute track on GPU; CPU is much slower). The model is large (~80 MB) and is auto-downloaded from HuggingFace on first use into the model cache — first run blocks while it downloads. Subsequent runs reuse the cached weights. Audio is internally resampled to 44.1 kHz (Demucs’ native rate) and processed in ~7.8 s segments, then concatenated and resampled back to your project rate.

Build requirement: Like all Neural-category nodes, requires K2K to be built with -DK2K_USE_ONNX=ON. In an ONNX-less build the node still appears but only displays an “ONNX Runtime not available” message. Pair with: any non-spectral effect chain to process stems differently; a Mixer to recombine balanced stems; the Slicing category to chop isolated drum stems without melodic bleed.


Generated 2026-05-05 from K2K_Dev@96730bdc by scripts/gen_lexique.py. Edit _intros/ or _overrides/, not this file.