-
Notifications
You must be signed in to change notification settings - Fork 75
Operations
High-level operations are based, one way or another, on transforms and filters.
Here's the list of available operations:
- Convolution
- Cross-correlation
- Block convolution
- Resampling
- Time-stretching
- Rectification
- Envelope detection
- Spectral subtraction
- Signal reconstruction from spectrogram
- Harmonic-percussive source separation
- Modulation
- Deconvolution
Most operations can be carried out using the corresponding static methods of Operation
class:
// convolution
var conv = Operation.Convolve(signal, kernel);
var xcorr = Operation.CrossCorrelate(signal1, signal2);
// block convolution
var filtered = Operation.BlockConvolve(signal, kernel, 4096, FilteringMethod.OverlapAdd);
// resampling
var resampled = Operation.Resample(signal, 22050);
var interpolated = Operation.Interpolate(signal, 3);
var decimated = Operation.Decimate(signal, 2);
var updown = Operation.ResampleUpDown(signal, 3, 2);
// tsm
var stretch = Operation.TimeStretch(signal, 0.7, TsmAlgorithm.Wsola);
var cool = Operation.TimeStretch(signal, 16, TsmAlgorithm.PaulStretch);
// envelope following
var envelope = Operation.Envelope(signal);
// rectification
var halfRect = Operation.HalfRectify(signal);
var fullRect = Operation.FullRectify(signal);
// spectral subtraction
var clean = Operation.SpectralSubtract(signal, noise);
All of the above-mentioned functions are non-mutating and create new output signal for each input. These functions are OK for one-time call. For repeated operations it's usually better to call methods of special stateful classes responsible for each particular operation. Besides, there are some additional parameters that can be tweaked in methods (they weren't specified in the code above).
Convolver
class provides overloaded methods for carrying out fast convolution and cross-correlation via FFT. It works with its own internal buffers everytime its methods are called. Hence, this class should be used when processing signal sequentially, block after block.
var convolver = new Convolver(512); // FFT size
var kernel = new float[101];
// fill kernel
float[] output = new float[512];
convolver.Convolve(input1, kernel, output);
// do something with output
convolver.Convolve(input2, kernel, output);
// do something with output
convolver.Convolve(input3, kernel, output);
// do something with output
// cross-correlation has side-effect! it reverses second array
convolver.CrossCorrelate(input1, corr1, output);
// do something with output
// corr1 is now reversed
convolver.CrossCorrelate(input1, corr2, output);
// do something with output
// corr2 is now reversed
Remember, theoretical length of convolution/cross-correlation signal is input.Length + kernel.Length - 1
. So the length of output array must be at least the nearest power of 2 to this value.
ComplexConvolver
is similar class that convolves signals of type ComplexDiscreteSignal
.
Two well-known techniques of block convolution are implemented:
- Overlap-Add (
OlaBlockConvolver
class) - Overlap-Save (
OlsBlockConvolver
class)
Both classes implement IFilter
and IOnlineFilter
interfaces. Hence, they can be used as filters in offline and online processing of signals.
var kernel = firFilter.Kernel;
var processor = new OlaBlockConvolver(kernel, 4096);
// equivalent line:
var processor = OlaBlockConvolver.FromFilter(firFilter, 4096);
var filtered = processor.ApplyTo(signal); // like any filter
Online processing:
// Overlap-Add / Overlap-Save
FirFilter filter = new FirFilter(kernel);
var blockConvolver = OlsBlockConvolver.FromFilter(filter, 16384);
// processing loop:
// while new input sample is available
{
var outputSample = blockConvolver.Process(sample);
}
// or:
// while new input buffer is available
{
blockConvolver.Process(input, output);
}
Note that the output will always be "late" by FftSize - KernelSize + 1
samples. The property (Ola|Ols)BlockConvoler.HopSize
returns this value. So you might want to process first HopSize - 1
samples without storing the result anywhere (the samples will just get into delay line). For example, this is how offline method ApplyTo()
is implemented for block convolvers:
var firstCount = Math.Min(HopSize - 1, signal.Length);
int i = 0, j = 0;
for (; i < firstCount; i++) // first HopSize samples are just placed in the delay line
{
Process(signal[i]);
}
var filtered = new float[signal.Length + _kernel.Length - 1];
for (; i < signal.Length; i++, j++) // process
{
filtered[j] = Process(signal[i]);
}
var lastCount = firstCount + _kernel.Length - 1;
for (i = 0; i < lastCount; i++, j++) // get last 'late' samples
{
filtered[j] = Process(0.0f);
}
See also OnlineDemoForm code.
For double-precision block convolution the 64-bit versions of block convolvers are available:
OlsBlockConvolver64
OlaBlockConvolver64
Resampler
class provides methods for simple decimation, interpolation, up-down resampling (for small factors) and band-limited resampling:
// signal is sampled at 16kHz
var resampler = new Resampler();
var signal_22_5 = resampler.Resample(signal, 22050); // band-limited resampling
var signal_8 = resampler.Decimate(signal, 2); // downsample to 8 kHz
var signal_48 = resampler.Interpolate(signal, 3); // upsample to 48 kHz
var signal_24 = resampler.ResampleUpDown(signal, 3, 2); // resample to 24 kHz
For simple decimation/interpolation/resampling the three latter methods will work faster. Bandlimited resampling resampling is universal and will work for any sampling rates.
All methods use anti-aliasing low-pass filtering under the hood. By default, the lowpass filter is designed inside the routines (of order 101), but you can specify your own anti-aliasing filter as the 3rd parameter:
var lpFilter = DesignFilter.FirLp(301, 0.5f / 2);
var resampled = resampler.Decimate(signal, 2, lpFilter);
var fasterFilter = DesignFilter.FirLp(51, 0.5f / 3);
resampled = resampler.Interpolate(signal, 3, fasterFilter);
Operation
class provides methods for rectification. Unlike their DiscreteSignal
analogs, these methods don't affect initial signal:
var fullRect = Operation.FullRectify(signal);
var halfRect = Operation.HalfRectify(signal);
signal.FullRectify(); // in-place
signal.HalfRectify(); // in-place
EnvelopeFollower
class implements IOnlineFilter
interface. It's used, for instance, in AutoWah audio effect.
The constructor has three parameters: 1) sampling rate; 2) attack time; 3) release time.
var envelope = new EnvelopeFollower(samplingRate, 0.01f, 0.05f);
// while new input sample is available
{
var envelopeSample = envelope.Process(sample);
//...
}
envelope.Reset();
In principle, envelope detection could also be achieved with simple low-pass filtering, but EnvelopeFollower
usually gives better results.
Four well-known TSM algorithms are implemented. Each one is reflected in TsmAlgorithm
enum:
- Phase vocoder
- Phase vocoder with identity phase locking
- WSOLA (waveform similarity overlap-add)
- Paul stretch algorithm
In general, phase vocoder with phase locking (PVIPL) produces best results, so it's used by default in time-stretching operations. Wsola is usually good for speech signals. PaulStretch is different: it produces interesting sounds for large stretch factors (10 and more).
Each algorithm is coded in separate class implementing IFilter
interface.
var wsola = new Wsola(0.75, windowSize, hopSize, maxDelta);
wsola = new Wsola(0.75); // parameters will be estimated automatically
var pvipl = new PhaseVocoderPhaseLocking(0.75, hopSize, fftSize);
var pv = new PhaseVocoder(1.25, hopSize, fftSize);
var paulStretch = new PaulStretch(16, hopSize, fftSize);
var output1 = wsola.ApplyTo(signal);
var output2 = pv.ApplyTo(signal);
var output3 = pvipl.ApplyTo(signal);
var output4 = paulStretch.ApplyTo(signal);
Parameters fftSize
and hopSize
can be tweaked. But general recommendation is to set relatively small hop length (corresponding to about 8-15ms), while size of FFT must be at least 6-7 times longer (and power of 2). For example, in case of signals sampled at 16kHz parameters fftSize=1024, hopSize=128 are OK (the computations will take longer time, though. Bigger hop length will lead to faster processing and poorer results).
Here are the parameters for WSOLA (can be viewed as the starting point for tuning):
// parameters are for 22.05 kHz sampling rate, so they will be adjusted for an input signal
if (_stretch > 1.5)
{
_windowSize = 1024; // 46,4 ms
_hopAnalysis = 128; // 5,8 ms
}
else if (_stretch > 1.1)
{
_windowSize = 1536; // 69,7 ms
_hopAnalysis = 256; // 10,6 ms
}
else if (_stretch > 0.6)
{
_windowSize = 1536; // 69,7 ms
_hopAnalysis = 690; // 31,3 ms
}
else
{
_windowSize = 1024; // 46,4 ms
_hopAnalysis = 896; // 40,6 ms
}
SpectralSubtractor
class performs spectral subtraction according to
[1979] M.Berouti, R.Schwartz, J.Makhoul "Enhancement of Speech Corrupted by Acoustic Noise".
The class implements IFilter
and IOnlineFilter
interfaces.
// some noise signal is already measured or prepared
var subtractor = new SpectralSubtractor(noise, fftSize: 1024, hopSize: 300);
var clean = subtractor.ApplyTo(noisySignal);
// online:
// while input sample is available
{
var outputSample = subtractor.Process(inputSample);
//...
}
// noise can be re-estimated:
subtractor.EstimateNoise(newNoise);
clean = subtractor.ApplyTo(noisySignal);
This algorithm allows reconstructing a signal from a given power or magnitude spectrogram. The algorithm is iterative so we can specify number of iterations in Reconstruct()
method (by default, it's 20) or call Iterate()
method any number of times.
var stft = new Stft(512, 200);
// someone has computed spectrogram before like this:
// var spectrogram = stft.Spectrogram(signal);
// and we'd like to reconstruct signal from spectrogram:
var griffinLim = new GriffinLimReconstructor(spectrogram, stft); // for power spectrogram (default)
var griffinLim = new GriffinLimReconstructor(spectrogram, stft, 1); // for magnitude spectrogram
float[] reconstructed = griffinLim.Reconstruct();
var signal = new DiscreteSignal(16000, reconstructed);
// if we want more iterations:
float[] reconstructed = griffinLim.Reconstruct(50);
// step-by-step:
var reconstructed = griffinLim.Iterate(); // listen / visualize result
reconstructed = griffinLim.Iterate(reconstructed); // listen / visualize result
reconstructed = griffinLim.Iterate(reconstructed); // listen / visualize result
// done, the result is OK for us
HarmonicPercussiveSeparator
class performs HPSS according to
[2010] D.Fitzgerald. Harmonic/percussive separation using median filtering // 13th International Conference on Digital Audio Effects (DAFX10), Graz, Austria, 2010.
Conceptually the algorithm is pretty simple:
- compute signal spectrogram
- do median filtering along time axis (harmonic part)
- do median filtering along frequency axis (percussive part)
- post-process two filtered spectrograms using certain mask
- reconstruct signals from updated spectrograms (optional)
var masking = HpsMasking.Wiener2;
var hpss = new HarmonicPercussiveSeparator(2048, 400, 25, 17, masking);
var signals = hpss.EvaluateSignals(signal);
// ...or we can compute only spectrograms
var spectrograms = hpss.EvaluateSpectrograms(signal);
harmonicSignal = signals.Item1;
percussiveSignal = signals.Item2;
harmonicSpectrograms = spectrograms.Item1;
percussiveSpectrograms = spectrograms.Item2;
Constructor parameters are:
- FFT size for STFT
- hop size for STFT
- window size for "harmonic" median filter
- window size for "percussive" median filter
- masking mode (binary, Wiener1, Wiener2):
// we have two spectrograms;
// each sample in filtered spectrogram is post-processed based on itself (h) and
// the corresponding sample (p) of alternative spectrogram:
float BinaryMask(float h, float p) => h > p ? 1 : 0;
float WienerMask1(float h, float p) => h + p > 1e-10 ? h / (h + p) : 0;
float WienerMask2(float h, float p) => h + p > 1e-10 ? h * h / (h * h + p * p) : 0;
- Amplitude modulation / demodulation
- Frequency modulation / demodulation
- Linear frequency modulation
- Sinusoidal frequency modulation
- Phase modulation
var modulator = new Modulator();
var ring = modulator.Ring(carrier, modulatorSignal);
var modAmp = modulator.Amplitude(carrier, 20/*Hz*/, 0.5f);
var demodAmp = modulator.DemodulateAmplitude(modAmp);
var modFreq = modulator.Frequency(baseband, carrierAmplitude: 1, carrierFrequency: 16000/*Hz*/, deviation: 0.1f);
var demodFreq = modulator.DemodulateFrequency(modFreq);
var modPhase = modulator.Phase(baseband, 1, 16000/*Hz*/, deviation: 0.1f);
The deconvolution operation has well-known numerical problems when it's done via FFT. It will work OK if the corresponding polynomials can be divided.
var signal = new ComplexDiscreteSignal(1, new[] { 1, 0, 0, -27 });
var kernel = new ComplexDiscreteSignal(1, new[] { 1, -3 });
var c = new ComplexConvolver();
var deconvolved = c.Deconvolve(signal, kernel);
// deconvolved: { 1, 3, 9 }
// x^3 - 27 = (x - 3)(x^2 + 3x + 9)