A Raspberry Pi with a speaker and microphone attached.

Building a Raspberry Pi Voice Assistant

That answers all your celestial questions

5 min readJan 6, 2020

This fall I did Recurse Center. When I thought about how to spend my time there, one area of interest stuck out to me — voice technology. I see my mom use dictation software and voice assistants often. The software is at once impressive and also has room for improvement, which makes for a perfect area to explore. My mom also loves astronomy. So, I decided to make a celestial voice assistant for her.

A celestial voice assistant is a voice assistant that helps you with your celestial questions, such as: “When will the sun set?” and “What phase is the moon today?”

I wanted it to be a small, physical device that you could just plug in to the wall. Apparently, that’s a natural fit for the Raspberry Pi. I had never worked with Raspberry Pi before, and my conception of it wasn’t quite right. Now, I would just explain it as “a small computer”. It was much more full-featured than I thought, and that made it pretty smooth to work with.

I’ll first talk about this hardware, then I’ll talk about the actual voice software part of it.

The hardware

For my initial, inexpensive setup I had these ingredients:

Raspberry Pi 3B
Pi standard power supply
32GB SD Card
Playstation Eye (USB microphone)
Adafruit Mini Speaker (USB speaker)
Sense HAT (Pi add-on board with a compass and LED grid)

Later, for my mom’s present, I upgraded to a sleeker ReSpeaker USB Mic Array, instead of the Playstation Eye — which had an unfortunate webcam vibe but was otherwise a pretty good “far-field” microphone.

For development, I often just used earbuds plugged into the Pi’s audio jack, instead of a speaker, and spoke directly into an Adafruit USB Mini Microphone.

To put it all together, I attached the Sense HAT to the Pi’s GPIO pins, wrote the Raspian OS on the SD card according to these instructions, set up the wifi configuration, and a few other things, which I documented here.

The software: voice assistant overview

You can write your own voice assistant software completely from scratch, but it would be a huge project. Easier is to use an existing framework.

I settled, maybe too quickly, on one particular voice framework: Snips. I’ll just get this out of the way: Snips was bought by Sonos a couple weeks ago and is shutting their developer support soon. The code was never completely open sourced, so look elsewhere if you’re starting a voice project! It seems like Rhasppy is a good alternative to look into.

Snips operates as several different services that run on the device. Many of these components are common across voice frameworks: there’s an audio component that interfaces with the microphone to get the audio as data. The wakeword or hotword detector is listening for a specific phrase from the stream of data (e.g. “Alexa” or “Hey Snips”).

Then there’s the automatic speech recognition (ASR): a component that takes the audio data and tries to sus out the text of what the person was saying. The natural language understanding (NLU) component takes text and parses it into “intents”.

Intents are kind of like the set of questions that the assistant can answer. An intent is often not mapped to one exact sentence, but several that a person could say to get the same result. An example is an intent to get the temperature. Someone could say either “What’s the temperature?” or “How hot is it right now?” to trigger this one intent.

Intent detection is the purpose of the NLU, and each intent the assistant supports needs to have been trained into the NLU, which employs machine learning for flexibility in recognizing variations on the same intent.

Finally, there are “actions” to handle the intents, and a text-to-speech (TTS) component which takes the answer the action provides and says it out loud over a speaker in an artificial voice.

The software: action code

Snips provides all of the standard components (the NLU is the only one open-sourced). The code I wrote was mainly action code to handle the intents that I’d specified to the NLU.

For Snips, all of the components: wakeword, ASR, NLU, TTS — run as separate services on the device. The components react by publishing and subscribing to topics on the network using the MQTT protocol.

This divide lets some services, like the wakeword detection and ASR, run on “satellite” microphone-only devices, while shipping off the intent processing and speech to a main, speaker-enabled device. It also allows a lot of flexibility in writing intent handlers, as you can write an action in any language you want, as long as it has an MQTT library, which most languages do.

Here’s an edit of the code to handle the MoonPhase intent, which tells you what phase the moon is currently in:

Hermes is a protocol over MQTT that Snips created for controlling sessions in the voice assistant. The call hermes.publish_end_session(session_id, msg) will, essentially, publish an MQTT message to the hermes/tts/say topic telling the TTS to say e.g. “The moon is a waning crescent” aloud.

The code uses the 8x8 LED display of the Sense HAT to display an image of the moon phase. There’s a very nice Python library for interfacing with the Sense HAT, as well as a web emulator for fast experimenting. Later, I used the Sense HAT’s magnetometer to get the orientation of the Pi for displaying which direction a celestial body was rising.

I used the astroplan library for getting the current moon phase, and ended up scraping charts for rising and setting times from the US Naval Observatory web site. I wanted to compute the rising and setting times dynamically based on location, but that story is for another blog post. Let’s just say that this assistant is hyper-personalized to my mom.

You can see all the code for the app here: https://github.com/harthur/celestial-snips-app

Celestial: the final product

The question that Celestial answers as of now are:

“What time does the Sun/Moon/Jupiter/Venus/Mars/Orion rise?”
“When does the Sun/Moon/Jupiter/Venus/Mars/Orion set?”
“What moon phase is it?”
“When is the next full/new moon?”
“When can I see the space station?”

The assistant was well-received, and we had some good fun also asking Alexa when Orion rose and getting a non-sensical answer about shoe stores in Truckee (?). But my mom had a bit too much confidence in me when she tried asking Celestial, “What planets can I see in the sky tonight?”.