Analysis of real sounds for Additive Synthesis by the Kawai K5000
Without going into too much mathematical depth, it is possible to find the frequency spectrum of a digitized sound by applying a Fourier transform to the numbers that make up the sounds. The Fourier transform is not magic, and some care is required in how you perform the analysis and interpret the results.
The K5000’s internal calculations start with the fundamental frequency of the note you’re playing. A above Middle C, for example, has a fundamental of 440 Hz. All the harmonics used by the K5000 are integer multiples of the fundamental – if you play an A440, the sound will be built of of varying quantities of 440Hz, 880Hz, 1320Hz, 1760Hz, and so on. They are simply added together – hence the term additive synthesis. Only “pure” sounds conform to this specification – metallic sounds contain non-integer harmonics, for example – so a pure additive sound can be flat or boring.
The K5000 uses several methods to make its sounds more interesting, starting with the envelopes available to each additive set to produce movement within the sound. Each additive set has a formant filter, which can be seen as a powerful and precise Graphic EQ that can also be given movement. There is layering of up to six additive or PCM (real sampled) sounds, which can be detuned from each other. Finally, there is a decent set of effects that act on the final result, including useful overdrive and chorus effects.
My first attempt to do sound analysis for the K5000 used Chris Dalton’s wav2add program, which analyses a single sound cycle and prints out the K5000 Additive parameters directly. He includes his C source code with his program, which was a great help in working out some details of the procedure for myself. Since wav2add only analyses a single sample cycle, it doesn’t have anything to say about the evolution of the sound over time, unless you feed it multiple single cycles taken from different sections of the file.
Most spectral analysis on computer is done using the Fast Fourier Transform, which is a highly efficient method of calculating a Fourier transform. It has one requirement: the length of the data you feed it must be an exact power of two (2, 4, 8, 16… 256, 512, 1024… 65536, 131072…). This is due to its divide-and-conquer algorithm, and that limitation isn’t a problem for most general spectral analysis purposes. It is a real problem when it comes to picking out the exact harmonics of musical sounds, however, because the spectrum returned by the FFT has no strong relationship to the real sound. The fundamental and integer harmonics of the sound can fall “between the cracks” and be useless for additive synthesis.
For this reason I have been using the Discrete Fourier Transform (DFT), which doesn’t have the power-of-two requirement. The drawbacks of DFT? The calculation times are relatively much longer, but still acceptable on a modern PC for sensibly-sized samples. I’ve been limiting input samples to ~1 second at 44100 samples per second, and plotting up to 64 harmonics, even though harmonics higher than 16 are rarely usable.
For example: if you have a sound with a fundamental frequency of 440Hz (A), and it was sampled at 44100 samples per second (CD-quality), a single cycle is 44100/440 or 100.22 samples long. You have to round that figure to 100, but it’s just 2% off, still a better match than the nearest power of two (128). In practice, you don’t take one cycle, you “oversample”. With an oversample figure of 8, as I have been using most often, you analyse 8 cycles at a time, but stepping forward one cycle at a time. Using the same A440 example, 8 cycles will take up 8 * 44100 / 440 or ~802 samples. Not only does this mean an even better match to the real harmonics (only 0.25% deviation), the results are smoother over time, but at the cost of some small variations over time.
Programming this in Mathematica was a tough nut to crack, but it’s finally working:
- wave3.nb is the Mathematica notebook – just the formulae without the data or the results, to keep it small. You will need a working copy of Mathematica to perform any calculations using this, of course.
- analysis.zip is a ZIP file with an example of the results, using a Mellotron String sample .
Both these files can be found in the box.net widget to the right of this page.
Unfortunately, Mathematica is a commercial product, and it isn’t cheap, so I’m in the process of translating this method to Scilab, an open source mathematics system that is freely available. To be continued…