irskep, steveasleep.com, slam jamsen, diordna, etc

Oscillator Drum Jams: The Audio Player

This is the second post in a series about my new app Oscillator Drum Jams. Start here in Part 1.

You can download Oscillator Drum Jams at oscillatordrums.com.

With my audio assets in place, I started work on a proof of concept audio player and metronome.

The audio player in Oscillator has three requirements:
1. It must support multiple audio streams playing exactly in sync.
2. It must loop perfectly.
3. It must include a metronome that matches the audio streams at any tempo.

Making the audio player work involved solving a bunch of really easy problems and one really hard problem. I’m going to gloss over lots of detail in this post because I get a headache just thinking about it.

AudioKit

I used AudioKit, a Swift wrapper on top of Core Audio with lots of nice classes and utilities. My computer audio processing skills are above average but unsophisticated, and using AudioKit might have saved me time.

I say “might have saved me time” because using AudioKit also cost me time. They changed their public APIs several times in minor version bumps over the two years I worked on this project, and the documentation about the changes was consistently poor. I figured things out eventually by experimenting and reading the source code, but I wonder if I would have had an easier time learning Core Audio myself instead of dealing with a feature-rich framework that loves rapid change and hates documentation.

Time stretching is easy unless you want a metronome

Playing a bunch of audio tracks simultaneously and adjusting their speed is simple. Create a bunch of audio players, set them to loop, and add a time pitch that changes their speed and length without affecting their pitch.

My first attempt for adding a metronome to these tracks was to keep doing more of the same: record the metronome to an audio track with the same length as the music tracks and play them simultaneously.

This syncs up perfectly, but sounds horrible when you play it faster or slower than the tempo it was recorded at. This is because each tick of a metronome is supposed to be a sharp transient. If you shorten the metronome loop track, each metronome tick becomes shorter, and because the algorithm can’t preserve all the information accurately, it gets distorted and harder to hear. If you lengthen the metronome loop track, the peak of the metronome’s attack is stretched out, so the listener can’t hear a distinct “tick” that tells them exactly when the beat occurs.

My first solution to this was to use AudioKit’s built-in AKMetronome class. This almost worked, but because it was synchronized to beats-per-minute rather than the sample length of the music tracks, it would drift over time due to tiny discrepancies in the number of audio ticks between the two.

My second, third, and fourth solutions were increasingly hacky variations on my first solution.

My fifth and successful metronome approach was to use a MIDI sequencer that triggers a callback function on each beat. On the first beat, the music loops are all be triggered simultaneously, and a metronome beat is played. On subsequent beats, just the metronome is played.

Metronome timing is hard

With a metronome that never drifted, I still had an issue: the metronome would consistently play too late when the music was sped up, and too early when the music was slowed down.

The reason is obvious when you look at the waveforms:

Illustration of waveforms

The peak of each waveform doesn't match exactly with the mathematical location of each beat, because each instrument’s note has an attack time between the start of the beat and the peak of the waveform. When we slow down a loop, the attack time increases, but the metronome attack time is the same, so the music starts to sound “late” relative to the metronome. If we speed it up, the attack time decreases, and it starts to sound “early.”

To get around this, I did some hand-wavey math that nudges the metronome forward or backward in time relative to the time pitch adjustment applied to the music tracks.

This approach uses the CPU in real time, which adds risk of timing problems when the system is under load, but in practice it seems to work fine.

Continue to Part 4: The Interface