-
Notifications
You must be signed in to change notification settings - Fork 75
Feature extractors
(...under construction...)
- Feature extractors
- Post-processing
- Serialization
- Time-domain feature extractor
- Spectral feature extractor
- Pitch
- MPEG-7 feature extractor
- MFCC/PNCC
- LPC/LPCC
- AMS
- Creating your own feature extractor
All feature extractors are inherited from abstract class FeatureExtractor
. It's relatively complex, so first we'll consider only the most important part. The rest will be discussed in section Creating your own feature extractor.
Each feature extractor has method ComputeFrom()
that takes float[]
array or DiscreteSignal
as input (and optionally starting and ending positions for analysis). Extractors can and must be reused: create extractor object once and call ComputeFrom()
method every time there's a new portion of data and new features should be calculated.
All feature extractors decompose signal or array of samples into sequence of frames with certain hop. So in constructor of any feature extractor at least (1) sampling rate, (2) number of features, (3) frame duration and (4) hop duration (in seconds) must be specified.
Properties:
abstract int FeatureCount
abstract List<string> FeatureDescriptions
virtual List<string> DeltaFeatureDescriptions
virtual List<string> DeltaDeltaFeatureDescriptions
-
double FrameDuration
(in seconds) -
double HopDuration
(in seconds) -
int FrameSize
(in samples) -
int HopSize
(in samples) int SamplingRate
Methods:
abstract List<FeatureVector> ComputeFrom()
virtual List<FeatureVector> ParallelComputeFrom()
virtual bool IsParallelizable()
Example:
var mfccExtractor = new MfccExtractor(signal.SamplingRate, 13, 0.025, 0.01);
var mfccVectors = mfccExtractor.ComputeFrom(signal);
// let's say we have some samples from external source
// process them with same extractor
// (for demo, let's also ignore first 100 samples):
var newVectors = mfccExtractor.ComputeFrom(samples, 100, samples.Length);
// (supposing that signal sampling rate is 16000 Hz)
// properties:
var count = mfccExtractor.FeatureCount; // 13
var names = mfccExtractor.FeatureDescriptions; // { "mfcc0", "mfcc1", ..., "mfcc12" }
var frameDuration = mfccExtractor.FrameDuration; // 0.025
var hopDuration = mfccExtractor.HopDuration; // 0.010
var frameSize = mfccExtractor.FrameSize; // 400 = 0.025 * 16000
var hopSize = mfccExtractor.HopSize; // 160 = 0.010 * 16000
Feature extractors return collection of feature vectors (List<FeatureVector>
). Each feature vector contains time position (in seconds) and array of features:
double TimePosition
float[] Features
Also, there is method featureVector.Statistics()
returning Dictionary with keys:
var stats = featureVector.Statistics();
var min = stats["min"];
var max = stats["max"];
var avg = stats["mean"];
Feature extractors can be parallelizable, i.e. they can create their internal clones which will simultaneously process different parts of signal. In this case property IsParallelizable
will return true. All available extractors in NWaves are parallelizable, except PnccExtractor
. For example:
var lpcExtractor = new LpcExtractor(signal.SamplingRate, 16, 0.032, 0.015);
var lpcVectors = lpcExtractor.ParallelComputeFrom(signal);
Note. ParallelComputeFrom()
method involves all available CPU cores and this behavior can't be changed. That's just because in version 0.9.0 I simply forgot to add threadCount
parameter to the method ((. In later versions this will be fixed, and there will be possibility to tweak the number of cores for computations. So far there's still a workaround: using public helper-method ParallelChunksComputeFrom(signal, threadCount)
that actually allows changing the number of cores. However, it's slightly inconvenient, since it returns not the list of feature vectors, but the array of List<FeatureVector>
calculated in each thread.
- Mean subtraction
- Variance normalization
- Adding deltas and delta-deltas to existing feature vector
- Joining (merging feature vectors into one longer vector)
- Extracting the featuregram
FeaturePostProcessing.NormalizeMean(mfccVectors);
FeaturePostProcessing.NormalizeVariance(mfccVectors, bias: 0);
FeaturePostProcessing.AddDeltas(mfccs, includeDeltaDelta: false);
var totalVectors = FeaturePostProcessing.Join(mfccs, lpcs, lpccs);
IEnumerable<float[]> featureGram = mfccs.Featuregram();
The bias
parameter in NormalizeVariance()
method is by default equal to 1 (so estimate of variance is unbiased). This parameter is present in formula:
Method AddDeltas()
extends each feature vector in the list. Deltas are computed according to formula:
Since in NWaves N is fixed and N=2, the formula reduces to:
As can be seen, we need to account for marginal cases (at the beginning and at the end of the list). Method AddDeltas()
by default adds two zero vectors at the beginning and two zero vectors at the end of the list (and that's perfectly fine in most cases). You can also prepend and append your own collections of feature vectors to specify marginal 'previous' and 'next' sets (there must be at least two vectors in each set; method will unite all lists and calculate deltas in united list starting from the third vector and ending with the third from the end):
FeatureVector[] previous = new FeatureVector[2]; // for two previous vectors
FeatureVector[] next = new FeatureVector[2]; // for two vectors after the last one
// fill 'previous' and 'next' with values
FeaturePostProcessing.AddDeltas(mfccs, previous, next);
Featuregram
is an analog of stft.Spectrogram
. It simply skips the time positions and yields 2D representation of feature values (float[]
) from List<FeatureVector>
. Note, it returns IEnumerable
.
var mfccs = new MfccExtractor(signal.SamplingRate, 13).ComputeFrom(signal);
using (var csvFile = new FileStream("mfccs.csv", FileMode.Create))
{
var serializer = new CsvFeatureSerializer(mfccs);
await serializer.SerializeAsync(csvFile);
}
TimeDomainFeaturesExtractor
class is the first representative of the multi-extractors family:
TimeDomainFeaturesExtractor
SpectralFeaturesExtractor
Mpeg7SpectralFeaturesExtractor
These extractors compute several features at once using different formulae/routines. They accept the string containing the list of feature names enumerated with any separator (',', '+', '-', ';', ':'). If the string "all" or "full" is specified then multi-extractor will compute all pre-defined features (returned by FeatureSet
public property).
TimeDomainFeaturesExtractor
, as the name suggests, computes time-domain features, such as: energy, RMS, ZCR and entropy (all these methods are contained in DiscreteSignal
class):
var tdExtractor = new TimeDomainFeaturesExtractor(signal.SamplingRate, "all", 0.032, 0.02);
var tdVectors = tdExtractor.ComputeFrom(signal);
// compute only RMS and ZCR at each step:
tdExtractor = new TimeDomainFeaturesExtractor(signal.SamplingRate, "rms, zcr", 0.032, 0.02);
tdVectors = tdExtractor.ComputeFrom(signal);
// let's examine what is available:
var names = tdExtractor.FeatureSet; // { "energy, rms, zcr, entropy" }
Recognized keywords are:
- Energy: "e", "en", "energy"
- RMS: "rms"
- ZCR: "zcr", "zero-crossing-rate"
- Entropy: "entropy"
Keywords are case-insensitive.
You can also add your own feature with corresponding function for its calculation. This function must accept three parameters: 1) signal, 2) start position, 3) end position (exclusive). Code example:
var tdExtractor = new TimeDomainFeaturesExtractor(sr, "RMS", frameDuration, hopDuration);
// let's add two features:
// 1) "positives": percentage of samples with positive values
// 2) "avgStartEnd": just the average of start and end sample
tdExtractor.AddFeature("positives", CountPositives);
tdExtractor.AddFeature("avgStartEnd", (s, start, end) => { return (s[start] + s[end - 1]) / 2; } );
var count = tdExtractor.FeatureCount; // 3
var names = tdExtractor.FeatureDescriptions; // { "rms", "positives", "avgStartEnd" }
// ...
float CountPositives(DiscreteSignal signal, int start, int end)
{
var count = 0;
for (var i = start; i < end; i++)
{
if (signal[i] >= 0) count++;
}
return (float)count / (end - start);
}
SpectralFeaturesExtractor
computes spectral features, such as: centroid, spread, flatness, etc. All these methods are taken from static class Spectral
and can be calculated separately for one particular spectrum, without creating any extractor:
using NWaves.Features;
// prepare array of frequencies
// (simply spectral frequencies: 0, resolution, 2*resolution, ...)
var resolution = (float)samplingRate / fftSize;
var frequencies = Enumerable.Range(0, fftSize / 2 + 1)
.Select(f => f * resolution)
.ToArray();
var spectrum = new Fft(fftSize).MagnitudeSpectrum(signal);
// compute various spectral features
// (spectrum has length fftSize/2+1)
var centroid = Spectral.Centroid(spectrum, frequencies);
var spread = Spectral.Spread(spectrum, frequencies);
var flatness = Spectral.Flatness(spectrum, minLevel);
var noiseness = Spectral.Noiseness(spectrum, frequencies, noiseFreq);
var rolloff = Spectral.Rolloff(spectrum, frequencies, rolloffPercent);
var crest = Spectral.Crest(spectrum);
var decrease = Spectral.Decrease(spectrum);
var entropy = Spectral.Entropy(spectrum);
var contrast1 = Spectral.Contrast(spectrum, frequencies, 1);
//...
var contrast6 = Spectral.Contrast(spectrum, frequencies, 6);
Usually, the spectral frequencies are involved in calculations, but you can specify any frequency array you want:
var freqs = new float[] { 200, 300, 500, 800, 1200, 1600, 2500, 5000/*Hz*/ };
var centroid = Spectral.Centroid(spectrum, freqs);
var spread = Spectral.Spread(spectrum, freqs);
SpectralFeaturesExtractor
usage example:
var extractor = new SpectralFeaturesExtractor(signal.SamplingRate, "all", 0.032, 0.02);
var vectors = extractor.ComputeFrom(signal);
// let's examine what is available:
var names = extractor.FeatureSet;
// { "centroid, spread, flatness, noiseness, rolloff, crest, entropy, decrease, c1+c2+c3+c4+c5+c6" }
Recognized keywords are:
- Spectral Centroid: "sc", "centroid"
- Spectral Spread: "ss", "spread"
- Spectral Flatness: "sfm", "flatness"
- Spectral Noiseness: "sn", "noiseness"
- Spectral Rolloff: "rolloff"
- Spectral Crest: "crest"
- Spectral Entropy: "ent", "entropy"
- Spectral Decrease: "sd", "decrease"
- Spectral Contrast in band 1,2,...: "c1", "c2", ...
Keywords are case-insensitive.
You can also add your own spectral feature with corresponding function for its calculation. This function must accept two parameters: 1) array of samples, 2) array of frequencies. Code example:
var extractor = new SpectralFeaturesExtractor(sr, "sc", frameDuration, hopDuration);
// let's add new feature: relative position of the first peak
extractor.AddFeature("peakPos", FirstPeakPosition);
// ...
// in our case 'frequencies' array will be ignored
float FirstPeakPosition(float[] spectrum, float[] frequencies)
{
for (var i = 2; i < spectrum.Length - 2; i++)
{
if (spectrum[i] > spectrum[i - 2] && spectrum[i] > spectrum[i - 1] &&
spectrum[i] > spectrum[i + 2] && spectrum[i] > spectrum[i + 1])
{
return (float) i / spectrum.Length;
}
}
return 0;
}
Full list of constructor parameters:
int samplingRate
string featureList
-
double frameDuration
(0.0256 seconds by default) -
double hopDuration
(0.010 seconds by default) -
int fftSize
(by default 0, i.e. it will be calculated automatically as the closest power of 2 to FrameSize) -
float[] frequencies
(by defaultnull
, i.e. spectral frequencies will be derived automatically) -
WindowTypes window
(by default,WindowTypes.Hamming
) -
IReadOnlyDictionary<string, object> parameters
(by default,null
)
Dictionary of parameters may contain following keys:
- "noiseFrequency" (used for computing Spectral.Noiseness; by default 3000)
- "rolloffPercent" (used for computing Spectral.Rolloff; by default 0.85f)
Note. Spectral noiseness is unconventional parameter and is calculated as a ratio of spectral energy in high-frequency region and total spectral energy, by simple formula:
There are several pitch detection (estimation) techniques. They broadly fall into two groups:
- Time-domain techniques (auto-correlation, YIN, ZCR-based)
- Frequency-domain techniques (Harmonic product/sum spectrum, cepstrum)
Static class Pitch
provides the following methods for pitch evaluation (estimation):
FromAutoCorrelation
FromYin
FromZeroCrossingsSchmitt
FromSpectralPeaks
FromHps
FromHss
FromCepstrum
All of these methods have the overloaded versions that accept either DiscreteSignal
or float[]
as the first parameter. In case of time-domain estimators float[]
array represents array of signal samples, in case of frequency-domain estimators - spectrum array.
All methods, except FromZeroCrossingsSchmitt
, accept the lower and upper frequency of the range where to find pitch (by default they are 80 and 400 Hz, respectively):
var pitch = Pitch.FromAutocorrelation(signal, start, end, 100, 500);
Method, based on the number of zero crossings and Schmitt trigger, is good for estimating pitch in periodic sounds (e.g. guitar string):
var pitch = Pitch.FromZeroCrossingsSchmitt(signal, start, end, -0.2f, 0.2f);
Last two optional parameters are optional thresholds for Schmitt trigger. You can try tweaking them or just don't set them - the thresholds will be estimated by default from the signal.
YIN algorithm is implemented:
De Cheveigné, A., Kawahara, H. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4). - 2002.
var pitch = Pitch.FromYin(signal, start, end, 80, 400, 0.2f);
Last parameter is YIN-specific "threshold for cumulative mean-difference function". By default it's 0.25f.
The first frequency-domain-based method is FromSpectralPeaks
. It's very simple and surprisingly quite OK technique. It simply finds the position of the first peak (the value is peak if it's greater than two left and two right neighbors) in spectrum:
var pitch = Pitch.FromSpectralPeaks(spectrum, sr, 80, 500 /*Hz*/);
pitch = Pitch.FromSpectralPeaks(signal, start, end, 80, 500 /*Hz*/);
The first overloaded method accepts spectrum array. The second version accepts the signal and (optionally) the size of FFT (since it will compute spectrum). If you're not sure ignore the fftSize parameter - the method will derive it automatically.
The same goes for two similar methods: HArmonic Sum Spectrum (HSS) and Harmonic Product Spectrum (HPS):
var pitch = Pitch.FromHss(spectrum, sr, 80, 500 /*Hz*/);
pitch = Pitch.FromHss(signal, start, end, 80, 500 /*Hz*/);
pitch = Pitch.FromHps(spectrum, sr, 80, 500 /*Hz*/);
pitch = Pitch.FromHps(signal, start, end, 80, 500 /*Hz*/);
Example of how we can add any of these methods to SpectralExtractor
algorithms:
var extractor = new SpectralFeaturesExtractor(signal.SamplingRate, "sc", 0.045, 0.015);
extractor.AddFeature("pitch_hss", (spectrum, fs) => { return Pitch.FromHss(spectrum, signal.SamplingRate, 80, 500); } );
var vectors = extractor.ComputeFrom(signal);
There's also PitchExtractor
class inherited from FeatureExtractor
(however, it's created inside NWaves.Features
namespace). Currently, it's based only on auto-correlation technique (since this method is universal and simply works more or less well). Each feature vector in the list contains one component: "pitch".
var sr = signal.SamplingRate;
var extractor = new PitchExtractor(sr, 0.04, 0.01/*sec*/, 80, 500/*Hz*/);
var pitches = extractor.ComputeFrom(signal);
Two last parameters in the constructor are lower and upper frequency of the range where to find the pitch.
If you need a pitch extractor based on other time-domain method (YIN or ZcrSchmitt) then TimeDomainFeatureExtractor
class can be used. Likewise, if you need a pitch extractor based on a certain spectral method (HSS or HPS), then SpectralDomainFeatureExtractor
can be used. Example:
var extractor = new TimeDomainFeaturesExtractor(sr, "en", 0.0256, 0.010);
extractor.AddFeature("yin", (s, start, end) => { return Pitch.FromYin(s, start, end, 80, 500); });
var pitches = extractor.ComputeFrom(signal);
Mpeg7SpectralFeaturesExtractor
follows MPEG-7 recommendations to evaluate the following features:
- Spectral features (MPEG-7)
- Harmonic features
- Perceptual features
It's a flexible extractor that allows varying almost everything. The difference between Mpeg7SpectralFeaturesExtractor
and SpectralFeaturesExtractor
is that former calculates spectral features from total energy in frequency bands while latter analyzes signal energy at particular frequencies (spectral bins). Also, optionally it allows computing harmonic features in addition to spectral features.
Hence, constructors of these two classes are basically the same, except that MPEG-7 extractor accepts array of frequency bands Tuple<double, double, double>[]
instead of array of frequencies.
Recognized keywords for spectral and perceptual features are:
- Spectral Centroid: "sc", "centroid"
- Spectral Spread: "ss", "spread"
- Spectral Flatness: "sfm", "flatness"
- Spectral Noiseness: "sn", "noiseness"
- Spectral Rolloff: "rolloff"
- Spectral Crest: "crest"
- Spectral Entropy: "ent", "entropy"
- Spectral Decrease: "sd", "decrease"
- Perceptual Loudness: "loudness"
- Perceptual Sharpness: "sharpness"
Recognized keywords for harmonic features are:
- Harmonic centroid: "hc", "hcentroid"
- Harmonic Spread: "hs", "hspread"
- Inharmonicity: "inh", "inharmonicity"
- Odd-to-Even Ratio: "oer", "oddevenratio"
- Tristimulus1: "t1"
- Tristimulus2: "t2"
- Tristimulus3: "t3"
Harmonic features can be calculated separately, using the corresponding methods of static class Harmonic
.
Harmonic features rely on pitch and harmonic peaks of the spectrum. Pitch track (float[]
array of pitches) can be precomputed by PitchExtractor
. In this case you can call method extractor.SetPitchTrack(pitchTrack)
so that extractor will use these pre-computed values at each processing step. The second option is to calculate pitch at each step in MPEG-7 extractor itself. By default, simplest and fastest method Pitch.FromSpectralPeaks()
is used for pitch evaluation. But you can set your own pitch estimating function.
Also, the method for harmonic peaks detection must be set. It has quite long signature and by default, it simply calls static method Harmonic.Peaks(float[] spectrum, int[] peaks, float[] peakFrequencies, int samplingRate, float pitch = -1)
. And once again, you can set your own method that should fill arrays of peak indices and peak frequencies.
Phew! That was not easy. Let's take a look at the example:
var mpeg7Extractor = new Mpeg7SpectralFeaturesExtractor(sr, "all", 0.04, 0.015);
mpeg7Extractor.IncludeHarmonicFeatures("all", 12, GetPitch, GetPeaks, 80, 500 /*Hz*/);
// ...
float GetPitch(float[] spectrum)
{
return Pitch.FromHps(spectrum, signal.SamplingRate, 80, 600/*Hz*/);
// or any other user-defined algorithm
}
void GetPeaks(float[] spectrum, int[] peaks, float[] peakFrequencies, int samplingRate, float pitch = -1)
{
if (pitch < 0)
{
pitch = GetPitch(spectrum);
}
// fill peaks array
// fill peakFrequencies array
}
So, IncludeHarmonicFeatures()
method allows setting functions for pitch and spectral peaks detection. The second parameter (in this case 12) is the number of harmonic peaks to evaluate.
In the following example spectral and perceptual features are calculated in 12 mel frequency bands; harmonic features are included as well and calculated based on pitch values precomputed with PitchExtractor
(so the things related to pitch estimation are much simpler in this case):
var sr = signal.SamplingRate;
var fftSize = MathUtils.NextPowerOfTwo(0.04 * sr);
// 12 overlapping mel bands in frequency range [0, 4200] Hz
var melBands = FilterBanks.MelBands(12, fftSize, sr, 0, 4200, true);
var pitchExtractor = new PitchExtractor(sr, 0.04, 0.015, high: 700/*Hz*/);
var pitchTrack = pitchExtractor.ComputeFrom(signal)
.Select(p => p.Features[0])
.ToArray();
var mpeg7Extractor = new Mpeg7SpectralFeaturesExtractor(sr, "all", 0.04, 0.015, fftSize, melBands);
mpeg7Extractor.IncludeHarmonicFeatures("all");
mpeg7Extractor.SetPitchTrack(pitchTrack);
var mpeg7Vectors = mpeg7Extractor.ParallelComputeFrom(signal);
var harmonicFeatures = mpeg7Extractor.HarmonicSet;
// ""hcentroid, hspread, inharmonicity, oer, t1+t2+t3";"
MfccExtractor
constructor has a lot of parameters, and there are broad possibilities for customization: you can pass your own filter bank, for instance bark bank, then the algorithm will become BFCC, or the gammatone bank, then it'll become GFCC, etc.
Full list of constructor parameters:
int samplingRate
-
int featureCount
(number of MFCC coefficients) -
double frameDuration
(0.0256 seconds by default) -
double hopDuration
(0.010 seconds by default) -
int filterbankSize
(20 by default) -
double lowFreq
(0 by default, filter bank lower frequency) -
double highFreq
(samplingRate / 2 by default, filter bank upper frequency) -
int fftSize
(by default 0, i.e. it will be calculated automatically as the closest power of 2 to FrameSize) -
float[] filterbank
(by defaultnull
, i.e. filterbank will be generated from parameters above) -
int lifterSize
(22 by default) -
double preEmphasis
(positive pre-emphasis coefficient, by default 0 - no pre-emphasis) -
WindowTypes window
(by default,WindowTypes.Hamming
)
Constructor of PNCC and SPNCC extractors has one additional parameter:
-
int power
(by default 15)
If power is set to 0 then the Log10(x) operation will be applied to spectrum before doing DCT-II. Otherise the operation Pow(x, 1/power) will be applied.
var sr = signal.SamplingRate;
var mfccExtractor = new MfccExtractor(sr, 13, filterbankSize: 24, preEmphasis: 0.95);
var mfccVectors = mfccExtractor.ParallelComputeFrom(signal);
/* the following lines of code are equivalent to previous lines,
except that in this case intermediate signal will be created after
processing original signal with pre-emphasis filter:
var mfccExtractor = new MfccExtractor(sr, 13, filterbankSize: 24);
var preEmphasis = new PreEmphasisFilter(0.95);
var mfccVectors = mfccExtractor.ParallelComputeFrom(preEmphasis.ApplyTo(signal));
*/
var pnccExtractor = new PnccExtractor(sr, 13);
var pnccVectors = pnccExtractor.ComputeFrom(signal, /*from*/1000, /*to*/10000 /*sample*/);
FeaturePostProcessing.NormalizeMean(pnccVectors);
Default filter bank is triangular overlapping mel-bands. If you specify your own filter bank, then parameters lowFreq
, highFreq
and filterbankSize
will be ignored. For example, let's change mel-bands to bark-bands and obtain actually a BFCC extractor:
var barkbands = FilterBanks.BarkBands(16, 512, sr, 100/*Hz*/, 6500/*Hz*/, overlap: true);
var barkbank = FilterBanks.Triangular(512, sr, barkbands);
var bfccExtractor = new MfccExtractor(sr, 13, filterbank: barkbank, preEmphasis: 0.95);
var bfccVectors = bfccExtractor.ParallelComputeFrom(signal);
Full list of constructor parameters:
int samplingRate
-
int order
(LPC order, i.e. number of LPC coefficients) -
double frameDuration
(0.0256 seconds by default) -
double hopDuration
(0.010 seconds by default) -
double preEmphasis
(positive pre-emphasis coefficient, by default 0 - no pre-emphasis) -
WindowTypes window
(by default,WindowTypes.Rectangular
)
Constructor of LPCC extractor has one additional parameter:
-
int lifterSize
(by default 22)
var lpcExtractor = new LpcExtractor(sr, 16, 0.050, 0.020, 0.95);
var lpccExtractor = new LpccExtractor(sr, 16, 0.050, 0.020, 26, 0.95);
Amplutide Modulation Spectra extractor.
Full list of constructor parameters:
int samplingRate
-
double frameDuration
(0.0256 seconds by default) -
double hopDuration
(0.010 seconds by default) -
int modulationFftSize
(64 by default) -
int modulationHopSize
(4 by default) -
int fftSize
(by default 0, i.e. it will be calculated automatically as the closest power of 2 to FrameSize) -
IEnumerable<float[]> featuregram
(null by default) -
float[][] filterbank
(null by default) -
double preEmphasis
(positive pre-emphasis coefficient, by default 0 - no pre-emphasis) -
WindowTypes window
(WindowTypes.Rectangular by default)
If the filterbank is specified, than it will be used for calculations:
var extractor = new AmsExtractor(signal.SamplingRate, 0.0625, 0.02, 64, 16, filterbank);
var features = extractor.ComputeFrom(signal);
You can also specify featuregram (i.e. compute AMS for various featuregrams, not only spectrograms):
var frameSize = 0.032;
var hopSize = 0.02;
var mfccExtractor = new MfccExtractor(signal.SamplingRate, 13, frameSize, hopSize);
var vectors = mfccExtractor.ComputeFrom(signal);
FeaturePostProcessing.NormalizeMean(vectors);
var featuregram = vectors.Featuregram();
var extractor = new AmsExtractor(signal.SamplingRate, frameSize, hopSize, 64, 16, featuregram);
If neither filterbank, nor featuregram are specified then the filterbank is generated automatically in AmsExtractor
as 12 overlapping mel-bands covering frequency range from 100 to 3200 Hz.