Interaction for collecting audio from a mic (browser)

I’m developing a new product to help people improve their speaking skills called PitchCake.

One of the interactions I’ve been working on is the process of collecting audio from a browser. I’m using Flash based WAMI which kind of sucks. I’d like a hybrid solution using HTML5 and falling back on Flash but not there yet.

The current flow…

  1. User presses record button and speaks into the mic.
  2. Loading sign appears as audio is processing.
  3. Audio player lights up and plays back the pitch.
  4. User has option to try again and repeat steps 1-3 until they are satisfied.
  5. User clicks on submit pitch button when finished.
    You can test it by signing up and doing a practice pitch.

So I have a few questions:

  1. Does the flow make sense to you?
  2. What could I do visually to make the interface more logical?
  3. It currently takes up to a minute for our server to process the audio before hearing the playback. How can we make the waiting experience more engaging?

Using flash is the only realistic way to capture audio in browser for at least the next couple of years, unless you only want a small fraction of your visitors to be able to use it. Don’t let html5 hype stand in the way of providing the best solution.

In stage 2 I don’t understand why there is a processing delay. Using flash, you can capture audio into browser memory and play it back instantaneously, or record it via rtmp to the server, then play back straight away?