In this series of advanced ActionScript tutorials, I’ll give some practical examples on how to work with the sound API introduced in Flash Player 10 to process audio in real time (filtering, adding effects, etc.) or synthesize sound from scratch. My goal is to evolve this into a series of articles that starts at the very beginning but goes a good deal farther than other AS3 audio tutorials, which tend to explain the API and then stop there, as if the rest were easy to figure out on your own.

The current roadmap is as follows (I’ll edit this if plans change):

Part 1 gives an overview of what the API is and what it can and cannot do.
Part 2 will explore the different parts that make up the API.
In part 3, we’ll put all of that to use and create a simple effect that takes input from a microphone and turns it into a robot voice.
In part 4, we review a bit of audio theory and explain how flangers, comb filters and chorus effects work.
In part 5, we’ll discuss interpolation by looking at pitch shifting implementations.
In part 6, we’ll start working on a sound manager that will allow you to mix and seamlessly string together pieces of a song. In a Flash game, this would give you the ability to create much less repetitive music while conserving file size.

In the installments following that, we’ll extend the sound manager with a flexible effect architecture. To think of what you’ll be able to do with this, picture a Flash game in which entering a cavern would add a low pass filter and reverb to the game music. Finally, at the end of the series, I’ll give you some pointers on how to write your own software synthesizer in Flash.

In between, I’ll probably also write some posts to explain some basic audio concepts or show you how to write specific effects, so if that’s of interest to you, it might be worth keeping an eye out for updates even if you’re not particularly interested in AS3 development. Once we get the basics out of the way, the concepts are pretty independent of language or API.

 

The sound API

Back in the olden days, all that we could do to make some noise in ActionScript was to instantiate Sound objects from an SWF’s library, or load or stream them from the web. There were a few things one could tweak about a Sound instance, such as setting its volume and pitching or panning it, but the overall scope of what you could do with audio in Flash was very limited.

A few very smart folks came up with hacks that allowed dynamic audio processing by manipulating SWFs containing sound objects in memory (I think that’s how it worked – I didn’t pay as much attention at the time, so I confess I’m fuzzy on the details). These workarounds didn’t work consistently across player versions and platforms though, prompting the inception of the “Adobe, MAKE SOME NOISE” campaign. Adobe listened and added dynamic sound in Flash Player 10.

The introduction of the new sound API in ActionScript was a milestone similar to when Adobe first added the ability to directly manipulate pixels in a BitmapData: for most people it made little difference, but for those who like or need to work close to the metal, it opened up a world of possibilities to explore.

Granted, just as people still position Sprites and Bitmaps on a stage, you most likely won’t be abandoning Sound.play() any time soon. For most use cases the conventional method of instantiating and playing Sound objects will be sufficient, and it comes with less latency and less overhead, both in CPU intensity and development time.

 

What it can do

Just as the introduction of BitmapData objects enabled us to access an array of pixels that make up a bitmap image, the sound API enables us to access an array of samples. These samples are either part of a pre-loaded sound, part of a sound that is playing right now (so we can manipulate and change the sound output), or part of the sound input coming from a microphone.

That’s it. You basically get the equivalent of reading pixels in a pre-loaded bitmap, writing pixels to a bitmap in memory and reading pixels from a webcam. You’ll have to code any fancy processing yourself – and that includes even the basics that have already been (and continue to be) possible with Sound and SoundChannel objects, such as panning or changing the volume of the sound samples.

The sound API lets you process and synthesize sound in pretty much any manner you can imagine – you’ll just have to write the necessary ActionScript code yourself.

For an amazing example of what can be done, check out the Hobnox Audiotool!

 

What it can’t do

I’m sad to say you won’t be able to write the next version of Reason in Flash any time soon! (You might however be able to write the next version of ReBirth, so stay with me a little longer!)

There are three reasons why:

No MIDI – ActionScript still doesn’t support MIDI input. To my understanding, Hobnox Audiotool gets around this by actually using a Java Applet for MIDI support and relaying the MIDI input to the Flash application via a local connection. It’s a wonderfully resourceful solution, but it relies on Java applet support (which is not always present and often broken) and apparently needs the user to redo the MIDI mapping every time they open an arrangement.

Audio processing is CPU intensive – dynamic audio in Flash is always a 44100 Hz stereo stream of 32-bit floating point data (in Flash this means the samples are of type Number). In the best case, you’ll be calculating 88200 numbers per second in ActionScript, and ActionScript is horribly slow. For any non-trivial application this means you’ll be doing some serious optimization of your code, and even then, there’s a limit to what you’ll be able to achieve.

Latency – the single biggest issue with the sound API in Flash, however, is latency. Latency is the delay between triggering an audio event and the event actually emanating from your speakers. For any live input (e.g. playing notes on a keyboard), you’ll want to keep latency as low as possible, probably somewhere below 10 milliseconds and definitely below 30 (for comparison, in a 160 BPM song, a 16th note is ~94ms long).
The most obvious contribution to latency in digital audio usually comes from the sound buffer size. The sound API lets you set this anywhere between 2048 and 8192 samples, so right off the bat your minimum latency will be 2048 samples at 44100 samples per second, amounting to 46ms. That, however, is just the start: according to Adobe’s Tinic Uro, the Flash player has an internal buffer, which adds 200 to 500 milliseconds latency (note that the same article gave the minimum buffer size as 512, but that information is outdated). That’s a fifth to half a second delay between the moment you hit a note on your keyboard and the moment you hear that note on your speaker. To be honest, I’m not even sure if that covers the actual latency of the sound card driver – which is traditionally horrible on Windows systems – or if that is added on top of that. I do however know that the sound API’s latency can be very different across platforms. While it seems to be a lot less noticeable on Mac, we’ve had huge delays on some Windows XP machines.
Either way, writing a software instrument in Flash that reacts to a skilled player’s live input is out of the question (unless the latency issues are fixed in future Flash players).

Writing something like ReBirth on the other hand is entirely possible. If your synthesizer is controlled by turning notes on and off in a sampler type user interface, latency problems become negligible. And while you probably wouldn’t want a Flash game’s sound effects to go through a significant delay, you could still have its music be affected by in-game events in interesting ways without a 200ms lag becoming too much of a problem.

 

9 Responses to Realtime audio processing in Flash, part 1: Introduction

  1. Amit Patel says:

    Philipp, this has been a fantastic series. Thank you!

  2. Philip says:

    Hi Philipp, you have a very good understanding of the latency issues here. Unfortunately I don’t myself, and have been stumped on finding the best way to measure the latency of the microphone (if even possible). To put it in layman’s terms, I’m trying to measure the time from when I say “beep” into my mic up until the point where the corresponding SampleDataEvent is received. I think I’ve been looking at this for far too long now… if you could shed any light on the best way to do this, I’d be eternally grateful :)

  3. Philipp says:

    I’m not sure about measuring the input side alone, but if you wanted to measure the latency of both input and output combined, i.e. the time between you saying “beep” and the computer playing it back (like in a situation where you’d monitor a live recording over headphones), you could do the following: write a script that monitors the microphone and takes note of the time when the input volume in the sample data is above a threshold. Then run another script at the same time, which outputs a single loud burst to your speakers at a specific point in time. The difference between the time at which the second script emanates the sound and the first script detects it is the total latency of the system. Of course this only works if the user is cooperating, i.e. the speakers are actually on, and the background noise level is low enough.

    Another thing you could try, which might be closer to actually measuring the microphone latency by itself is to detect a keypress, both through the key event being fired and through the microphone picking up the sound of the key going down. If the latency of the key event being fired were zero, then the difference between these two would be the latency of the microphone input only. I have no idea about the actual latency of keyboard input, but as Flash can do pretty twitchy games, I reckon it should be fairly low.

  4. Philip says:

    Thanks Philipp, you’ve given me a few things to think about. Thanks for the great articles =)

  5. Alexander says:

    That’s fine! Thanks, Adobe! Now, please, allow us to use inline assembly in ActionSctipt, cause it has no mean otherwise.

  6. Oddbrother says:

    Looking back at this issue, it seems that there’s less latency in later Windows OSes (and practically every other non-Windows OS out there in the world) than there are in Windows XP, despite having the same Flash Player version.

  7. Danny says:

    Thanks Philipp! this series is helping me to understand how sound works on flash, one question: is the same on adobe air for mobile ? (iOS-Android)

  8. Philipp says:

    I’m sorry to say that I don’t have any experience with AIR on mobile platforms…

  9. bob says:

    @Danny

    I’m actually working on a game for android using AIR, and yes, audio latency is horrible.
    That’s why I ended up here today, to solve my problem…

    Basically, if a sprite is firing bullets with a machine gun, at 30fps, 2/3 of all the sound.play(); requests on each shot will just fail and produce no sound at all… Which is very unfortunate…

Leave a Reply

Your email address will not be published. Required fields are marked *

*


− 2 = two

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>