How does Siri work?


Apple promotes it as “Siri. Your wish is its command.” As you’d expect with such a strapline, Siri is a sort of virtual personal assistant – one that responds to the spoken word rather than text. But I bet you want to find out how does Siri work! This “tech genie” started its life as a free iPhone app created by Siri Inc. The company was planning on expanding to Android and Blackberry devices but all such plans were scrapped when it was acquired by Apple for a reported $200m in April 2010. How Siri works Siri allows you to send text messages, write emails, schedule meetings, look up phone numbers, set alarms, ask about stocks and shares, set timers, ask for reminders, ask about the weather and ask for local information – all just by using your voice. It can even dispense pearls of wisdom on the meaning of life (for examples of comic Siri commands, check out Siri does this using a number of quick steps in order to understand and reply to your requests. Looking more deeply into it, Siri consists of three layers, which can be split into different steps: a speech-to-text analyser, a grammar analyser and a set of service providers. When you speak... A speech-to-text analyser will transform your voice into a written text. Siri encodes the movement of air and small changes in air pressure (your voice) into a compact digital form of numbers; this process is known as digitization. It then separates the background noise from your now digitized voice by using mathematical operations that filter out any sound that it doesn’t recognise as speech. It is easier for this to be done by a phone such as the iPhone, rather than a computer, because the iPhone microphone technology incorporates some noise cancellation. Then, it tries to filter, by performing mathematical operations, which parts of the sound wave are speech and which ones are caused by external factors. Actually, a sound wave from speech is a mix of multiple waves at different frequencies. The complex wave is transformed by mathematical operations into numerical representation of the important features. 800px-Spectrogram_-iua- Once it’s recognised your voice... All this being done, the grammar analyser has to guess possible words that are being spoken by splitting them up into phonemes. In English, there are about 40 speech sounds or phonemes, so Siri must have a model of each phoneme in a bunch of different contexts and distinguish them. In the speech stream there are no word breaks, so the system needs to find out by its own where to delimitate the words by finding strings of phonemes that match valid words. It then needs to work out how it can help... The third component is the set of service providers that Siri can send your commands to. As soon as it decides which words have been said, it will take action. So Siri will make a call, send text messages, look up for things, write emails, schedule meetings and so on. At this level, Siri can do everything you can do (using the calendar app, maps app, etc) purely using your voice commands. This is of course the huge benefit of siri, though this final step is actually the least technically sophisticated part of the whole process. Here is Apple’s Siri demo trailer: Disagree with anything here or have something to add? We’d love to hear from you in the comments!

May 07, 2014