I’m currently brushing up on my signals processing knowledge to use in a small project that I’m working on. The basic premise of the project is that I want a program where I can plug in a microphone, play music at it on my trumpet, and have it give me feedback on my intonation (is my pitch correct, or am I sharp or flat).
If you’ve played with a guitar tuner before you’ll have an idea of the sort of thing I have in mind, but I want mine to respond instantly (or at least very very quickly) and graphically so that I can play more natural sounding phrases and have it tell me how I’m doing.
This raises an interesting signals processing question: how can I figure out the precise musical pitch that people hear based on the samples that I’m reading from the microphone?
This post is part one of a two parter. In this part, I’ll be writing about what sound is in a physical sense, how that relates to our musical understanding of sound, and what exactly the microphone does to let us work with sound in a programming context.
First Things First: What is Sound?
Physically, sound is the result of air vibrations. Your ears sense the vibrating air, and it’s interpreted by your brain as sound. If the vibrations are bigger and more powerful, then we hear the sound as louder. Whenever you hear something, somewhere something has disrupted the air and now it’s vibrating.
In the case of musical sound, we usually control the conditions of vibration a bit more carefully. Maybe we pluck a string of a very carefully tuned length and tension. The result is a set of air vibrations that, when we hear them, we can listen to and say “That is a guitar playing a C”.
Pitch and Frequency
When you listen to a note, and identify how high or low it sounds, or how sharp or flat it sounds, what you’re really judging is the frequency of the vibrations. Going back to the guitar example, if you tighten the string before plucking it, it will vibrate faster, and it will sound sharper.
Interestingly, when we listen to a scale of notes and identify the pitch as going up linearly, if you look at the frequency it’s actually increasing exponentially! There are several online tools that you can use to play with this relation, like this one. Try entering 440Hz for the ‘A’ typically used in tuning an orchestra. If you enter 220Hz, half, you’ll hear the ‘A’ an octave down. If you enter 880Hz, double, you’ll hear the ‘A’ an octave up. In music notation, this would be identified as 8 whole tones down or up, but in terms of the frequency of vibrations it’s half or double.
If you listen to the sine waves that the online tools let you generate, you’ll find that they don’t sound like any particular instrument, and tend to feel a bit lifeless. Real world instruments produce vibrations that are a bit more complicated than a simple sine wave, but the basic principle relating frequency and pitch still applies.
The Microphone and Sampling
If sound is air vibrating, how do we get it inside the computer? There are many different types of microphones, but they generally operate under the same principle. First, you have some sort of membrane that is moved by the sound. So it takes the vibrating air, which is very difficult to see or track, and translates it into a vibrating membrane which is easy to track. A bit of electronics converts the movement of the membrane into an electrical signal, representing where the membrane is as it vibrates forwards and backwards. The precise nature of the membrane and how its movement is measured is what determines the type of microphone.
The electrical signal is then sampled to get it into the computer. The process of sampling is simply measuring the level of the signal at regular intervals, and sending it to the computer as numbers. For most computer sound cards, the default is to take 44100 samples (measurements) per second. So you can write a program that, for every second of sound, takes in an array of numbers that is 44100 numbers long representing the sound.
If your computer were to then take those same 44100 numbers, and send them to your speakers, the speakers would make their membranes follow the same movement that your microphone experienced, which would create the sound that the microphone recorded as vibrating air again.
Programming with a Microphone
How exactly you end up getting those numbers being sent by the microphone into your program will depend on which language you’re using, and other constraints you might have. Each operating system also has its own ideas on how to expose the devices, so I personally prefer to find a library which abstracts away the differences between operating systems.
I’ve chosen to use PortAudio for my project, which is a cross platform open source C library which will let me write one codebase that supports Linux, Mac, and Windows. To be specific, I’m using Rust-PortAudio, which is a Rust binding to the C library. It lets me create a callback function which will be called at regular intervals (probably some number of milliseconds in the final project) with an array of all of the samples that the microphone has taken in that time period.
What Do You Do with the Samples?
That is where the more complicated signals processing stuff starts. To find out what I’m going to do with the samples, you’re going to have to read part 2, coming soon!