[Update: Iain Peet pointed out that the definitions of “aliasing” and “anti-aliasing” in audio are more narrow than I thought when I wrote this post. Specifically, aliasing in audio refers to the artifacts you get when you shift or create frequencies beyond half the sample rate, and anti-aliasing refers to low-pass filtering audio signals in order to remove these artifacts. Both of these play no part in this post, and I have updated it to remove the terms.]

In this fifth installment of my series on dynamic audio programming in AS3, I want to take a quick look at the artifacts we introduce when we process audio signals without interpolation. We’ve already had a brief encounter with these, when we implemented a basic flanger effect back in part 3, which turned out to have a somewhat dirty, distorted sound to it. In this article, we’ll take a closer look at where this dirt came from, by looking at a naïve implementation of pitch shifting.

 

Pitch-shifting, and how not to do it

To be clear, for the purposes of this article, “pitch shifting” is what happens when you slow a recorded sound down (the pitch lowers, by an octave whenever the speed is halved) or speed it up (the pitch rises, by an octave whenever the speed is doubled). Pitch-shifting while preserving the original duration of a sound is an entirely different story, and a good deal more involved.

With that out of the way, let’s look at what a basic implementation of the effect might look and sound like:

The following class stores a Vector.<Number> of mono sample data for a sound and writes a pitched version of that sound to an output Vector of arbitrary length whenever you call its process() method. It loops its assigned sample seamlessly (placing multiple copies into the output Vector if necessary), so it can be called whenever a dynamic sound needs more data:

package
{
  public class PitchShifter
  {
    protected var source:Vector.<Number>;
    protected var currentIndex:Number = 0;
    public var pitch:Number = 1;
    
    public function PitchShifter(monoSample:Vector.<Number>)
    {
      source = monoSample;
    }
    
    public function setSource(monoSample:Vector.<Number>):void
    {
      source = monoSample;
      currentIndex = 0;
    }
    
    public function process(monoDestination:Vector.<Number>):void 
    {
      var numSamples:int = monoDestination.length;
      var sourceLength:int = source.length;
      
      for (var i:int = 0; i<numSamples; i++) 
      {
        currentIndex += pitch;
        if (currentIndex >= sourceLength) currentIndex -= sourceLength;
        monoDestination[i] = source[int(currentIndex)];
      }
    }
  }
}
									

(Hint: if you’re trying this out and compiling from the Flash IDE, make sure you set your sound compression settings to a high bit rate!)

This implementation is pretty straightforward: for each sample in the output array, the currentIndex member (which is a floating point Number, not an int) is incremented a little – by 1, if the effect is set to original playback speed, 0.5 if it’s slowed down by half, 2.0 if its speed is doubled. The currentIndex is then cast to an int, so each output sample index corresponds to an integer index into the source sound’s data.

The following example shows this code in action, with a knob that lets you change the playback speed and three different sound samples to try out:

Get Adobe Flash player

Obviously something weird is happening to the sound, when it’s pitched in this manner. When pitching down, the sound does get lower, but it also gets a ringing quality that wasn’t there before. When pitching up, things sound better, as long as the pitch factor is an integer multiple of 1. Any value in between will result in the same sort of distortion.

To better understand the root of the problem, let’s look at what should happen when you pitch down a basic sine wave, and what actually happens in our implementation.

Sine wave scaled up without antialiasing

A sine wave scaled up without interpolation. Left: the original wave. Center: what the stretched wave should look like. Right: what the stretched wave actually looks like in our simple implementation.

The center part of this illustration shows how a sine wave should look after being stretched out to half its frequency. The right part shows what happens when we stretch samples with the naïve implementation given above: if the samples are pitched down by an octave (i.e. 1/2 of their original frequency), the output is simply two identical samples in a row for every input sample.

What should be a smooth wave becomes a collection of jagged edges, and wherever these edges appear, distortion is added to the frequency spectrum. This gets worse when using this method to stretch by non-integer multiples – since the index into the sample array is always an integer, one sample of the original wave may be stretched out to three samples, the next to four, the next to three again, and so on!

The same thing happens when pitching up by a non-integer multiple, instead of down: if you’re pitching to 2.5 of the original speed, one input sample is skipped for each output sample half of the time, and two input samples per output sample the other half. Again, these alternations introduce unwanted additional frequencies.

The following two illustrations show spectrograms of these artifacts. The first image shows the frequency distributions of a guitar chord when pitched up or down. The second image shows pitched sine waves.

pitched guitar sample without anti-aliasing

Spectrogram of a guitar chord, pitched up and down to various speeds, without interpolation. Stretching to any speed other than an integer multiple creates copies across the frequency spectrum. Note that the zoom factors of the different samples along the time axis are not consistent (e.g. the 0.5x sample should actually be twice the length of the 1x version).

pitched sine wave without anti-aliasing

Spectrogram of a sine wave stretched to various speeds without interpolation. Again, the zoom factors of the samples along the time axis are not consistent.

The sine wave illustrates the effects particularly well: when pitched up by an integer (2x, 4x), the spectral shape of the sound remains more or less intact. Pitching down by an integer (0.5x, 0.25x) introduces copies of the shape at different frequencies. Pitching up or down by a non-integer value creates a lot more of these copies, making the distortion of the sound even worse.

 

Interpolation

The solution to avoid artifacts when pitching a piece of audio down is the same as the solution to avoid jagged edges when scaling up a bitmap image: interpolation.

Remember that in our basic implementation we used a floating point index into the sample data. In order to get each sample for our output Vector, we incremented this index a little (depending on playback speed) and then cast it to an int before fetching the corresponding source sample. Even though for any non-integer playback speed the index should usually fall between two samples, our implementation rounded the index to the next smallest integer, discarding any sub-sample information and thereby introducing artifacts.

Adding simple linear interpolation improves things drastically:

Suppose that i is the current floating point index into the samples Vector. iint = (int) i is then the next smallest integer (the same as Math.floor(i)). The resulting output sample is then:

var f:Number = i-iint; // f goes from 0…1
var result:Number = samples[iint]*f + samples[iint+1]*(1-f);

Applying that to our implementation results in the following code sample:

package
{
  public class AntialiasedPitchShifter extends PitchShifter
  {
    public function AntialiasedPitchShifter(monoSample:Vector.<Number>)
    {
      super(monoSample);
    }

    override public function process(monoDestination:Vector.<Number>):void 
    {
      var numSamples:int = monoDestination.length;
      // NOTE: we're reducing the source's length by one here, which effectively 
      // removes the last sample, but allows us to disregard the edge case of 
      // currentIndex falling between the last and first sample at the end
      // of a loop.
      var sourceLength:int = source.length-1;
      
      for (var i:int = 0; i<numSamples; i++) 
      {
        currentIndex += pitch;
        if (currentIndex >= sourceLength) currentIndex -= sourceLength;
        
        var index1:int = int(currentIndex);
        var index2:int = index1+1;
        var index2Factor:Number = currentIndex-index1;
        var index1Factor:Number = 1-index2Factor;
        
        monoDestination[i] = source[index1]*index1Factor + source[index2]*index2Factor;
      }
    }
    
  }
}
									

The following spectrogram shows how much of an improvement linear interpolation gives us over simple nearest-neighbor sampling:

pitched guitar sample with anti-aliasing

Spectrogram of a guitar chord, pitched up and down with linear interpolation. Note that the spectral "echoes" are still present, but much fainter! Again, the zoom factors of the samples along the time axis are not consistent.

pitched sine wave with anti-aliasing

Spectrogram of a sine wave pitched up and down with linear interpolation. (Again, the scaling of the different copies along the time axis is not consistent.)

The copies of the original sound across the frequency spectrum are substantially fainter, both when pitching down and when pitching up. To be sure, there are situations where this type of interpolation isn’t enough: If you play back the source at several times its original speed, high frequency content – such as very short spikes – can disappear (if one sample index is before a spike and the next sample index after it) or be substantially altered. In this case, each output sample should be a weighted average of all input samples between one index i and the next.

In the same vein, when you pitch down with linear interpolation, the stretched waveform is simply approximated with straight lines connecting the original wave’s samples. Using a better approximation method such as cubic interpolation would reduce unwanted artifacts even more, at the cost of a higher performance penalty.

For most purposes however, linear interpolation produces good results, and it’s fast enough to be easily done in realtime in AS3.

The following example shows the principle in action, using the same sounds as the distorted version from the beginning of this article.

Get Adobe Flash player

 

8 Responses to Realtime audio processing in Flash, part 5: Interpolation and pitching

  1. Iain Peet says:

    My apologies if this is a bit pedantic, but…

    This is not anti-aliasing at all (and you have not demonstrated aliasing). You are correct in that the effect you present is indeed a consequence of interpolation, which is why improved interpolation helps. (To go onto a truly pedantic tangent, your pitching methods are time-variant, and therefore don’t technically have frequency responses…)

    If you want to see aliasing, use a tone with frequency between 0.25 and 0.5 of your sampling frequency. Then, pitch it up by 2x or more. This will produce true aliasing, and you will find that improved interpolation does not help at all (even if you use perfect (Whittaker-Shannon) interpolation).

    See wikipedia for a description of aliasing. The TL;DR is a sampling artefact that occurs when you try to represent frequencies higher than half your sampling frequency.

    True anti-aliasing consists of applying some sort of low-pass filter before sampling.

  2. Philipp says:

    Hi Iain,
    Thanks very much for your comments!

    My understanding of aliasing is that the term refers to artifacts caused by bad sampling (and I thought WP’s introductory paragraph on Aliasing applies to the issues discussed in the article).

    I’m aware that shifting parts of a signal past the Nyquist frequency causes aliasing, and that the proper solution to that is low-pass filtering, but I thought this was just one case of aliasing/anti-aliasing, and the artifacts caused by bad interpolation were another.

    I have tried googling for the proper terminology to use here, but so far I’ve come up empty. If there is a better set of terms to describe the effects shown in the article, could you point me in the right direction?

  3. Kyle says:

    I’ve been reading through your articles on real time audio processing. I was just wondering if you are going to continue into the sound manager now? Looking forward to it.

  4. Philipp says:

    I do still want to continue the series, but there’s currently a lot on my plate, and the audio articles take a surprisingly long time to write.

    I’m considering re-writing this as a series of general purpose audio tutorials though, perhaps building on Processing (which is free and very easy to pick up). I feel that Flash is now definitely on the decline, and that’s not a motivating situation for writing as3 tutorials.

  5. iND says:

    Flash is not equivalent to Actionscript. AS3 can be used in Flex (which is being eliminated from Adobe), Flash, and Adobe Air. Unlike Flash, Actionscript is not on the decline, since you can very simply make apps for various platforms using the same code. In the near future, Flash will be able to be converted directly to HTML5/Javascript, so it will still remain at the center of the multimedia programming world. Note that in the new, Java-influenced code design that is coming to Actionscript, there will be the ability to do threads and use the GPU (not specifically sound, but removes a lot of resources from the CPU), and I imagine that there will be a lot of efficiency brought into the audio streams.

    You can make these articles based in AS3 but focus on the sine wave transforms. Given the base class that you have designed, I would be interested in how to make a synth that works both efficiently to transform the wave, and also allows the plugin idea for these transforms.

    That is, I would like to see the design for a flanger knob (code only, no Flash design needed) and how it would inject it’s effect on a wave, and, in an independent control, a low-pass filter, with the option to switch the effect to before the flanger, after, or both. With many more controls of this sort, the idea of a plugin seems natural, but how would it be designed?

    You may take a long time with these tuts, but they are the best I have seen relating to Actionscript. Please don’t abandon the effort. I know you are focusing for the moment on Unity, but Unity is expensive for independents.

  6. Philipp says:

    Thanks for the feedback!

    The good news is I’ve actually just started writing a few more tutorials for this series. They’re not quite yet ready to be published, but I expect I’ll post the first one sometime next week. The setup you said you’d like to see is exactly where I want to end up (but yeah, unfortunately that’ll take another while).

  7. IcyW says:

    Is there a way to display a scpectrogram wuth this flash thingy?

    (Something like this: http://ua3vvm.qrz.ru/spectrogram/pwarb000.jpg)

  8. Camacho says:

    Hello everybody,
    I have totell Philipp that his tutorials helped me so much !
    I just have a remark on this pitchshift.
    It just duplicate my sound and playback two times the sound.

    So I just changed your version of the pitch shift by replacing this line :

    if (currentIndex >= sourceLength) currentIndex = (sourceLength-1);

    Now, once the sound is at the end, it doesn’t playback once again bust just stay at the end.

    A big thank to you, because you’re just the one I could find on the web who explains the sound transforming so well.

    Thanks so much.

    Regards

    Mr Camacho / Web developer in France
    ->www.marco-camacho.com

Leave a Reply

Your email address will not be published. Required fields are marked *

*


× 6 = forty eight

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>