-
Notifications
You must be signed in to change notification settings - Fork 75
Notes for non~experts in DSP
This chapter is based on the most common questions asked by NWaves users.
There are some not obvious usage and performance tips, so feel free to ask questions!
1) How to setup my filter and get it working?
2) I want to get MFCCs like in Kaldi/HTK
3) I want to get MFCCs like in librosa
4) MfccExtractor produces strange results
5) Tips for efficient online data processing
First of all, you need to define your filter parameters. Regardless of the filter type, at least one cutoff frequency must be specified. Since DSP deals with discrete-time signals and systems, you need to pass normalized frequency (the number within [0, 0.5] range). If you have frequency value in Hz then you need to divide it by the sampling frequency of your signal:
int samplingRate = signal.SamplingRate;
double freq = 1000; // 1000 Hz, for example
double f = freq / samplingRate; // normalize frequency onto [0, 0.5] range
int order = 5;
var filter = new Butterworth.HighPassFilter(order, f);
// another example - FIR lowpass filter:
order = 25;
var lpf = DesignFilter.FirWinLp(order, f);
Now the filter is ready to be applied to input data.
Offline processing means that filter processes entire signal / array and produces new signal as output.
Online processing means that filter processes data sample after sample (or chunk after chunk).
Filters have state, so call filter.Reset()
if you need to start filtering new data "from scratch".
// preferred way for offline filtering:
var filtered = filter.ApplyTo(signal);
// online filtering:
foreach (var sample in onlineSamples)
filteredSample = filter.Process(sample)
// or new chunk is available:
//
void ProcessData(float[] data)
{
filter.Process(data, output);
//filter.Process(data, data); // or in-place
}
Read more about online filtering
With IIR filters it's often preferable to work with their SOS decompositions.
Read more here:
https://colab.research.google.com/drive/1A_y7sTt3qJoQMyhSv-tOT_pv6-pk4a8d?usp=sharing
In most cases single precision is sufficient for filtering operations, so NWaves classes/functions operate on data with single precision. However, alternative classes for filtering with double precision are also available (we're talking here about filtering operation itself; filter design and analysis is always done with double precision). They can be found in Filters.Base64
namespace (+ OlsBlockConvolver64
and OlaBlockConvolver64
):
var tf = new Filters.Butterworth.BandPassFilter(4f/250, 8f/ 250, 5).Tf;
// it's ok: TransferFunction has always been with double precision, so use it here:
var filter = new Filters.Base64.IirFilter64(tf);
// offline filtering:
// now the filter carries out all its operations with double precision:
var filtered = signal.Samples.Select(s => filter.Process(s));
// filter.Process() accepts one sample of type 'double' and returns 'double'
// so we have Enumerable<double> and we can LINQ-materialize it whenever we want
// online filtering:
// while (double_sample)
// filteredSample = filter.Process(double_sample)
// we can downcast filteredSample to float if we need
Use MfccExtractor
class with MfccHtkOptions
:
var samplingRate = 16000;//Hz
var featureCount = 13;
var lowFreq = 200;//Hz
var highFreq = 6800;//Hz
var frameDuration = 0.025;//seconds
var opts = new MfccHtkOptions(samplingRate, featureCount, frameDuration, lowFreq, highFreq)
{
HopDuration = 0.010,//seconds
};
var extractor = new MfccExtractor(opts);
var mfccs = extractor.ParallelComputeFrom(signal);
Keep in mind that HTK does the following pre-processing: zero-mean and pre-emphasis. Turn these settings off in HTK config if possible, because HTK does this pre-processing per frame instead of entire signal (which is weird given that frames overlap).
Also, HTK DOESN't normalize signal! If it's normalized then multiply by 32768 before processing:
signal.Amplify(32768);
var mfccs = extractor.ComputeFrom(signal);
There are even more MFCC options - read here
There are couple of important nuances in librosa:
- htk = true or false
This parameter essentially defines the weights of mel-filterbank (HTK-style or Slaney-style).
- centering
In NWaves, like in many other frameworks, frames are not centered the way they are centered by default in librosa (in fact, I don't quite understand this 'centering' thing in librosa), so this parameter must be set to False.
Let's consider an example. Say we have the following setup in librosa:
mfccs = librosa.feature.mfcc(y, sr, n_mfcc=13,
dct_type=2, norm='ortho', window='hamming',
htk=False, n_mels=40, fmin=100, fmax=8000,
n_fft=1024, hop_length=int(0.010*sr), center=False)
In NWaves it is equivalent to:
int sr = 22050; // sampling rate
int fftSize = 1024;
double lowFreq = 100; // if not specified, will be 0
double highFreq = 8000; // if not specified, will be samplingRate / 2
int filterbankSize = 40; // or 24 for htk=true (usually)
// if 'htk=False' in librosa:
var melBank = FilterBanks.MelBankSlaney(filterbankSize, fftSize, sr, lowFreq, highFreq);
// if 'htk' parameter in librosa will be set to True, replace the previous line with these lines:
// var melBands = FilterBanks.MelBands(filterbankSize, sr, lowFreq, highFreq);
// var melBankHtk = FilterBanks.Triangular(fftSize, sr, melBands, null, Scale.HerzToMel);
var opts = new MfccOptions
{
SamplingRate = sr,
FrameDuration = (double)fftSize / sr,
HopDuration = 0.010,
FeatureCount = 13,
Filterbank = melBank, // or melBankHtk if htk=True
NonLinearity = NonLinearityType.ToDecibel, // mandatory
Window = WindowTypes.Hamming, // in librosa 'hann' is by default
LogFloor = 1e-10f, // mandatory
DctType="2N",
LifterSize = 0
};
var extractor = new MfccExtractor(opts);
var mfccs = extractor.ParallelComputeFrom(signal);
And there are even more options - read here
If you're getting a row of NANs at random frames samples, check that MfccExtractor
instance is not shared by several threads (it's not thread-safe). If you need parallelization and better performance, call extractor.ParallelComputeFrom()
method. If there's a crucial need for your own parallelization scheme, create a separate extractor for each task.
If filter order exceeds approx. 60-65, don't call filter.Process(sample)
. Switch to block convolvers instead:
var kernel = DesignFilter.FirWinLp(431, 0.12);
var filter = new FirFilter(kernel);
var blockConvolver = OlsBlockConvolver.FromFilter(filter, 2048);
// usually the second parameter is approx. 4N, where N is filter order
// or without creating filter object:
var blockConvolver = OlsBlockConvolver(kernel, 2048);
// processing loop:
// while new input sample is available
{
var outputSample = blockConvolver.Process(sample);
}
// or:
// while new input buffer is available
{
blockConvolver.Process(input, output);
// blockConvolver.Process(input, input); // or in-place
}
If you do IIR filtering and you face numerical problems with IirFilter/ZiFilter
then switch to second-order sections (SOS):
- obtain SOS via
DesignFilter.TfToSos()
- construct
FilterChain
and use it as a filter
var tf = new Butterworth.BandPassFilter(0.1, 0.16, 7).Tf;
TransferFunction[] sos = DesignFilter.TfToSos(tf);
var sosFilter = new FilterChain(sos);
var y = sosFilter.ApplyTo(x);
// or process samples online:
// ... outSample = sosFilter.Process(sample);
If the filter is still amplifying signal too much, try re-estimating gain:
// ========== setup filter ===========
var order = 10;
var filter = new Filters.Butterworth.BandPassFilter(4f / 250, 8f / 250, order);
var sos = DesignFilter.TfToSos(filter.Tf);
var filters = new FilterChain(sos);
// =========== filtering ==============
var gain = filters.EstimateGain();
var filteredSignal = filters.ApplyTo(signal, gain);
// or in a loop
//while (<inputSample is available>)
//{
// var outputSample = filters.Process(inputSample, gain);
// ...
//}