-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathNOTES.txt
156 lines (102 loc) · 5.06 KB
/
NOTES.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
Frequency band picking:
- consider using pinv([ actual stack of cosines/sines we care about]) <-- problem is windowing (should really be per-frequency)
- consider repeated convolution idea; low-pass over and over again and pull out just the difference (problem w/ transient smearing?)
Thinking of frequency band smearing:
- consider question of simulated-ear-style receptor
- what do pipe-organ transients look like?
- is there a no-lag / one-sided lowpass? what does that mean?
- build test suite of finely swept waves and look at binning accuracy. What do we actually want?
- build test suite of finely swept waves and compute the matrix based on desired output(!)
How big is the test suite? How big is the matrix?
128 steps / octave (120 JND's but why not be power-of-two-y?)
2^4Hz (16Hz) - 2^14.3Hz (20161Hz) => 10.3 octaves => ~1316 steps/bins
What are we detecting? Some sort of minimal wavelet thing, I guess.
So call that maybe 4x wavelengths.
16Hz => 48000/16*4 = 1200 samples
2^(4+1315/128)Hz => 9.70 samples
xform is "I just want to multiply by a 1200 * 1315 matrix" (?)
xform is " ... where of course some parts are zero" (?)
"explain this sound in terms 1315 damped plucks starting right now, with [ideally] sparsity on the weights"
Still have a windowing problem because sound does continue.
Filter bank option seems great except it's letting through only little epsilons of sound.
(maybe the answer is to sum energy after each filtering and compare the total energies?)
Phased plucks on filter bank output seems interesting potentially.
TESTING PLAN:
- build test harness with phased plucks, constant sines
- use 'lil steps (smaller than 128/octave) to test
- produce plots of "spectrogram" vs frequency <-- frequency accuracy
- produce plots of fancy peak finding vs frequency <-- frequency accuracy
- produce plots of peak-response-power vs time <-- time accuracy
- baseline: FFT with peak finding, I guess?
- frequency bank method
- damped spring bank method
- reverse damped spring bank method
- some sort of pinv-like method where the goal is to put power into each coef but penalty only considers (gently) bandpassed version of the sound(?)
rather than dampped spring model, try a model where the sound is granularized and looped and the fit constant waves
----
Big ideas:
- samples get mixed in 8.24 fixed point (maybe?) @ 48kHz
- samples get arranged in log2hz vs time space
- samples can follow a curve in terms of speed vs time
- frequency
Using OTFFT getting some GPFs -- think it may be memory alignment but was having problems replicating. Trying to match -march flags between fs2 and otfft.o builds might help. (This is going to be a bit tricky on windows.)
time: integer sample offset @ 48000 kHz, float would work
log2hz: integer 8.24 would work, float would be fine
sample: 8.24 would work, float [might] be fine
Apparently convolution reverbs often use FFTs for the tail and not for the head (but maybe this is only a real-time limitation)
Usage:
- samples populate in left pane, click to preview/stop
- drag samples to main grid area, snap to time increments
- select several samples => block
- create reference block and snap to next measure
- CTRL+click / ALT+click to set loop bounds
- SPACE to play
- click to set play position (?)
d/dx c * 2^{ax+b}
= c * a * ln(2) * 2^{ax + b}
d/dx (1/(a*ln(2))) * 2^{ax + b}
= 2^{ax+b}
a^-1
= 2^{log2(1/a)}
= 2^{-log2(a)}
(1/(a*ln(2))) * 2^{ax + b}
= 1/ln(2) * 2^{ax + b - log2(a)}
Explicit integration:
\int_0^y 2^{ax + b} dx
2^{ax + b}
= 2^b * (2^a)^x
= 2^b * e^(log(2^a)x)
2^b * e^(log(2^a)x) / log(2^a) <--- THIS
d / dx ( 2^{ax + b} )
= log(2) (ax + b) 2^{ax + b}
----
Analysis Thoughts:
WANT: log(power) @ log2hz over the usable range in reasonable steps.
Wikipedia: JND is about 10 cents (0.6%) which gives 1,400 steps over 16 to 16000 Hz
1200 cents / octave => 120 JND steps per octave.
(says "1Hz steps below 500Hz", so 120 steps is overkill but okay)
So let's go with that and ask for 128 divisions per octave, from 2^4 Hz (16Hz) to 2^15 Hz (32kHz). That's 1408 values.
NOTE: at 48kHz we actually *can't* estimate 32kHz response. 20kHz would be fine. So call that 2^(14.3).
We could also aim for 2048 log-distributed values from 20Hz to 20kHz... seems good.
How do we get those values easily?
FFT of N samples gives convolutions with waves with periods
(DC "junk"), 1 / N, .., 1 / 2, (negative frequency "junk" which is complex conjugate)
(k / ( N / S)) Hz
k * (S / N) Hz
So to get smaller steps, decrease sampling rate (counter-intuitive?) or increase window size.
48kHz signal, compute a 2048 tap FFT, you get:
48000 / 2048 = 23.4375 Hz steps up to 24kHz
So that's good to about 1750/2048 = 7319Hz
So 24000 / 2048 = 11.718 Hz steps up to 12kHz
That's good to about 1530/2048 = 3485Hz
Or 16000 / 2048 = 7.8125Hz steps up to 8kHz
That's good to about 1410/2048 = 2325Hz
Then 12000 / 2048 = 5.8459Hz up to 6kHz
Then 6000 / 2048 = 2.929Hz up to 3kHz
Then 3000 / 2048 = 1.4648Hz up to 1.5kHz (at ~915)
----
TODOs:
- (analysis)
- export
- better fundamentals
- font characters for folder stuff