What’s your CAPTCHA saying?
Especially your audio CAPTCHA. Listen to it carefully.
Just the other day, I was trying to set up my feedburner email. Inadvertently, I clicked on the handicapped sign out of curiosity to check out the audio CAPTCHA.
Three beeps later, a drone of background voices filled the recording, until a distinct female voice came up and started reading out the real letters. You don’t really pick it up the first time, until this voice says “Once again.” That’s when your brain is on full attention reconfirming what the voice said earlier.
This audio CAPTCHA is brilliantly designed for human behaviour. It allows a human being to recognize the words, but ensures that machines cannot (or find it super tough to). How does this actually work? This is where knowledge in the human brain and its workings can help us decode this audio gibberish, and actually give us pointers to how our brain works.
A computer or machine would find it very tough to recognize the words because they’re designed to process information with all available data. Computers process everything as a whole. So, to effectively crack an audio CAPTCHA, you would need to have a library of sounds representing each character in the CAPTCHA’s database. Depending on distortion in some CAPTCHAs, there might be several sounds for the same character. Hence, several machine learning techniques are required to perform automated speech recognition on segments of the CAPTCHA.
The really hard task is teaching a computer how to process information in a way similar to how humans think. That sounds like a whole lot of work for cracking an audio file.
So, how is the audio CAPTCHA behaviourally designed for a human to correctly understand it?
The human brain is constantly taking in stimuli all the time. The brain will be overloaded if it tried to tackle every detail of every object at once. So we just don’t bother. We’re all equipped to selecting the key aspects of the scene one at a time, so this attention system allows us to concentrate on one thing while the rest of the world falls into the background.
As humans, we are also not designed to do two simultaneous activities at once. Rightfully so, the ‘cocktail party effect‘ is our impressive and under-appreciated ability to tune our attention to just one voice from a multitude.
Our ability to separate one conversation from another is beautifully demonstrated in a classic study carried out by Colin Cherry, then at Imperial College London (Cherry, 1953). Cherry used the simple method of playing back two different messages at the same time to people, under a variety of conditions. In doing so he discovered just how good we are at filtering what we hear.
The human brain’s ability of separating sounds from background is based on several characteristics of the sounds, including gender of the speaker, direction from which the sound is coming, pitch, or the speaking speed. No wonder, once the real person starts talking in the audio CAPTCHA, our brains immediately latch on to this new distinct voice trying to decode what it’s saying.
Moreover, its not just this cocktail party effect. Notice the start of the recording with the distinguishable three tones. There has been quite some research done on music and how it increases alertness. One piece of research shows that peaks of neural activity occurred during silence in between two tones. This is because the brain is anticipating the next sound, and this creates alertness. That is precisely the objective for the start of the audio CAPTCHA – Get the listener to be more alert so that they can distinguish between the background voices and the real voice.
As human beings, we’re still far away from replicating the human brain. So maybe, until that happens, behavioural design of security will probably favour us humans more. Or at least until the artificial human brain comes to life. With regards to the robot apocalypse, well that’s reserved for another post…