Blog Technology

Building a Custom Voice Assistant: Part 1

November 27, 2018

It’s kind of an open secret that I love Amazon Alexa, my friends know it, my girlfriend knows it, hell even my two roommates who I scare the shit out of when I drop in via Alexa to randomly say hello in the kitchen know it. This is all due to the fact that over the past six months, I’ve been slowly offloading small tasks to Alexa. Turning on the lights, setting timers, playing music, etc. I’ve started experimenting with “scenes”, automated actions such as turning lights on an air conditioner on or off. for when I come home or when I leave for work(turning off lights) It’s surprising how much small these small automations have altered how I operate now day to day. Looking forward, I hope to utilize them even more so, but there’s a problem. See, to put it lightly, I’m not really a fan of Jeff Bezos’ piggy bank listening to everything I say. The problem doesn’t lie only with Amazon though. Every single voice assistant operating via the cloud offered by Google, Facebook, and anyone else cannot be trusted to protect your privacy.

So in an effort to minimize my data footprint, I’m gonna build a private voice assistant myself. I’ve seen numerous guides on building your own Amazon Alexa in a custom piece of hardware such a raspberry pi, but those still send queries to Amazon. I want to build a private voice assistant that can replace Amazon Alexa’s functions for me, but can ensure that my privacy is protected at the same time.

To make a long story about decentralization and privacy by design short(er), all of these companies’ business models operate by mining your data and then selling it. Amazon isn’t offering these Echo Dots and Echo Spots at such an insanely cheap discounts for no reason, Amazon intends to milk it for all it’s worth. There’s even a new Echo Auto that’s only $25 that connects to your phone that you can use in the car. Knowing that the Amazon Alexa app requires location permissions on the phone app, I’ll bet my Grandma’s lucky silver dollar that it’ll continuously send that data to Amazon. It’ll send anything it can store, because some have estimated that Amazon will hit $10 billion in Alexa related sales in 2020 alone. Amazon knows exactly what it’s doing and Alexa is going to be the main point of contact between Amazon and their customers in the next decade going forward, so this pull into the internet of things is only going to ramp up going forward.

Every single time I state an action, my words are translated to text, that text is then parsed, categorized, and then stored. I know because I can open my Alexa phone app and see in the history that the other day, my roommates 30 miles away asked the Alexa in the kitchen if she likes handjobs. She didn’t respond, shes always so coy, but anyway, I digress.

A lot of people are guilty of obfuscation when it comes to the cloud by making it seem far more complicated than it is. The cloud is just someone else’s computer, really. It’s also heavily insinuated that these complex voice assistants that are often branded as an “AI” require processing in the cloud, this is argued because it requires too much processing power to computer these commands.

That is not true. It is perfectly possible to process these sort of commands locally on the device.

This is how we’re told Amazon Alexa operates. Requests are processed through Amazon’s API at their servers and then routed through a device’s manufacturer’s network, and then that’s pushed to the device. For example, the RGB led lights in my living room are from a company called Magic Home that requires their own account and sign up process. This opens a can of worms and begs the question, how much info is being shared between Amazon and Magic Home? Is Amazon allowing Magic Home access to a lot more data than they should? Is even my Alexa connected coffee pot also sending private information somewhere?

The way to resolve this is by processing everything locally on the device. Allow my very shitty diagram to illustrate. I have been looking at a number of different solutions and I think I’m going to try out first. Snips seems to be trying to do exactly what I had in mind. Local processing of queries all done in an open source environment so I can guarantee that my information isn’t sent anywhere I don’t want. I also could even unplug my router and ensure it still operates regularly unlike Alexa who has a stroke when you do that.

I know through the Magic Home app that the modules can be manipulated over a local network via an app, the devices themselves broadcast their own tiny wifi network that phones can use to connect to. If for some reason I can’t go that route and I have to include Magic Home’s servers in this process, I will at least have reviewed the messages myself and would implement any sort of compensating control if possible, but I’ll cross that if necessary. If worse comes to worse and I can’t use the small Magic Home LED module with an led strip (found here), I can directly wire the leds manually with a mosfet.

So how much is this gonna cost anyway?

Actually, not that much.


Step 1: Hardware

1.) Raspberry Pi 3 Kit with Clear Case and 2.5A Power Supply – $49.99

I found a decent kit that includes most of everything on Amazon here, but damn, we’re already going over our budget. Just tell yourself you’re saving money in the long run by not paying with your data. Amazon is probably Echo Dots at huge losses anyway and there’s no way to financially compete. I went with the pi 3 because I don’t want to have to worry about any performance bottlenecks. I’m not even putting thought into older Raspberry pis. If we can try this config with older pis, we can do that later and perhaps find cheaper kits to use.




2.) 3.5mm Mini Portable Stereo Speaker for iPod

I bought this 3.5mm speaker, but I didn’t realize it required a battery. Don’t buy that one, buy this one. That has a usb that can keep the speaker powered. Also, don’t buy things on Amazon in a flurry.






3.) TONOR PC Microphone USB Computer Condenser Studio Mic

Looking for something that’s omni directional. I imagine tweaking the microphone setup to find the sweet spot of sensitivity is going to be a chore. I’ve even seen in other guides of people using a microphone array.

This’ll do.




4.) USB Memory Stick

I already have one of these laying around. You should be able to find a microsd online for less than $10. If you don’t have a microSD, you can boot from a usb stick only after you’ve already booted from a microSD. Just get a microSD you cheap bastard..






Step 2: Software

Take that microusb and install NOOBS on it here….. right after you realize you don’t have a card reader and quickly bought one. Also, now that we’re realizing we’re missing some basic stuff, make sure you have a keyboard and mouse too.

I’m running Debian (found here), in a virtual machine in Oracle VM VirtualBox (found here) on my PC (found in my apartment). If you’re running linux or Mac, you don’t need the virtual machine and can run the commands straight from terminal. After you have the pi booted and running, make sure you enable SSH on it (how to here) and be sure to harden it (guide here) so it doesn’t spontaneously learn mandarin.

Installing Curl, Node, NPM, and Snips

Open up terminal on the virtual machine (or in terminal on Mac or linux) and run the following

curl -sL | sudo bash -

After that, run the following to make sure node.js (at least v7.5,0) and npm are installed.

sudo apt-get install nodejs

Verify installs on both by running node -v and npm -v

Now run the following to install snips and sam.

npm install -g snips-sam


Connecting to the pi

Now, in the virtual Debian on my PC I should be able to connect to the pi by running the following command.

sam devices

Ordinarily, sam will list all the devices it detects and you should be able to connect directly to it by running sam connect raspberrypi.local however, in this scenario it wouldn’t detect my pi. Running ifconfig on the pi will display the ip address. Take that ip and run sam connect <ip address of raspberry pi>

sam connect <ip address of pi>
Enter username for the device: pi
Enter password for the device: 
Connected to <ip address of pi>

Login via your pi username and password and viola, you’re connected and logged in on the pi. Anything run on this command line will be executed in the pi. It’s actually downhill from here once you run the following.

sam init

Watch the command line go to work. It’ll take a few minutes so go make a shirley temple in the meantime. Or if you have friends that are easily impressed, you can let them watch and watch them then assume you’re practically Lisbeth Salander. Woah buddy, big league hacker shit here.


Configure Snips Hardware

After the install is complete, run the following to get a status on everything.

sam status

You should get something like this

sam status
Connected to device raspberrypi.local
OS version ................... Raspbian GNU/Linux 9 (stretch)
Installed assistant .......... Not installed
Status ....................... Installed, not running
Service status:
snips-analytics .............. 0.55.2 (not running)
snips-asr .................... 0.55.2 (not running)
snips-audio-server ........... 0.55.2 (running)
snips-dialogue ............... 0.55.2 (not running)
snips-hotword ................ 0.55.2 (not running)
snips-nlu .................... 0.55.2 (not running)
snips-skill-server ........... 0.55.2 (not running)
snips-tts .................... 0.55.2 (running)

Lets quickly run through the main pieces and ensure everything is working. Runsam test speaker

With you speaker connected, you should hear a voice. I have an HDMI cord running from my pi and I heard the audio through the TV. So the output from the device is working. Let’s move on to the microphone and plug it in.

After plugging that bad boy in, run sam setup audio

This will allow you to select the microphone.

sam setup audio
Starting microphone setup...
What microphone do you use?
[1] Generic USB

After it’s selected, run sam test microphone

sam test microphone
Testing microphone
Say something in the microphone, then press Enter...

Try recording a quick joke (and press enter) to hear it back and realize just how unfunny you are.

Then run sam sound-feedback on

This adds the “ding’ when you make a command.


Install Demo

We’re almost to the end, run sam install demo

This should install and turn on the snips service and load it with a basic test app. The default test app just translates your speech into text via the STT (speech to text) API and then repeats it back with the TTS(Text to Speech) API. Once it’s done installing, Snips is ready to be operated via speech by speaking “Hey Snips, <say phrase to be repeated>” You can probably ascertain it’s not perfect, but its usable to improve upon and Any custom commands and tweaks including my own lighting setup and automations. I’ll document in part 2.


For now, you can say, “Hey Snips, the colossus of clout!” and you can marvel that you’ve made a digital Tommy “Repeat” Timmons from The Sandlot.