I’ve been playing with the HYPE framework recently, and I noticed that they use a SoundAnalyzer class to wrap the Flash Player’s native SoundMixer.computeSpectrum method. This method’s FFT mode is known to have some problems (not really, but more on that later), and as it turns out I’ve dealt with this some already. So here it is (after sitting in my drafts for longer than I care to admit), the Math behind Flash’s FFT!
First, let’s take a look at some of the raw data from the computeSpectrum method. When I originally intended to write this post I took the opportunity to capture aggregate results over a large number of frequency sweeps using an AIR application, and I’ve posted some of the data for you here.
| index | floor | peak | ceiling |
| 0 | 2 | 6 | 23995 |
| 1 | 2 | 61 | 20591 |
| 2 | 9 | 103 | 23995 |
| 3 | 2 | 139 | 20591 |
| 4 | 14 | 187 | 23995 |
| 5 | 7 | 218 | 20591 |
| … | … | … | … |
| 253 | 947 | 10901 | 23950 |
| 254 | 217 | 10947 | 23995 |
| 255 | 947 | 10990 | 23950 |
As I alluded to earlier, the long known problem with computeSpectrum’s FFT is that the majority of frequencies seem to be skewed toward the left side - but what they don’t mention in the docs is that it’s because the FFT results are still distributed linearly. Yep, even though the floor and ceiling values in my data were mostly useless (perhaps due to harmonics), you can see a rough pattern in the peak frequencies - an average linear distribution of about 43.07451!
…okay, now let me back up a second to explain that. The way that we naturally understand sound is logarithmic. In musical terms this means that an octave below the note A-440 (440 is the frequency) is A-220, but an octave above A-440 is actually A-880 (not 660!). Audibly we think of the distance between A-440 and A-220 to be the same as between A-440 and A-880, but when displayed in a linear distribution (like our FFT results) A-880 is twice the distance from A-440 as A-220 - and when you use this linear distribution to create visualizations, it doesn’t match our natural perception of the sound. So, just in case any of that got really confusing, I created a little Flash application to illustrate the issue better: FFTMath.
It’s important to note that this isn’t a flaw in Flash’s FFT results. FFT calculations are supposed to return linear frequency distributions, but this raw data needs to be redistributed into logarithmically spaced frequency bands before being used for visualizations. Before redistributing our linear FFT results though, we’re going to need some math to explain them.
After lots of searching on my magic number and a little bit of luck, I came across this amazing blog post: http://code.compartmental.net/2007/03/21/fft-averages/ - and Eureka!
Each point i in the FFT represents a frequency band centered on the frequency i/1024 * 44100 whose bandwidth is 2/1024 * 22050 = 43.0664062 Hz…
Alright math club, here’s the good part. Based on the math from this awesome blog post and the results from my testing, it looks like Flash uses a sample size of 1024 and a default sample rate of 44100 resulting in frequency bands of (2/1024) * (44100/2) or 43.0664062 Hz. Even though an FFT with 1024 samples should return 513 valid results below the Nyquist frequency (sampleRate/2), Flash further clips the data to 256 results for a top frequency of around 11,025 Hz - still much higher than needed for most visualizations.
To review:
- frequency = i/1024 * 44100
- bandwidth = (i==0) ? 1/1024 * 22050 : 2/1024 * 22050;
This information let’s us understand the results we’re getting from computeSpectrum. In my next post we’ll use this information to redistribute the data and create more accurate frequency visualizations in Flash.
Download: FrequencyAnalyzer.zip (2.37 MB)