Sunday, 8 January 2012

Voice control of PC

UPDATE: I lost the files for this but re-wrote some of the basics and uploaded them here.

When the iPhone 4S was released a few months ago, everyone seemed to go nuts over Siri. It looked pretty cool, it's true, but I had the sneaking suspicion that I'd seen it before. I had a look on my desktop and found, lurking at the back of the Start bar, Windows Speech Recognition. Now, in its natural state it's pretty limited - you can open and close programs, and in some Microsoft-brand applications do basic menu commands, but that's about it. However, it does come with the option of writing your own macros, and that's what I did. I spent a weekend writing the basic framework, and since then I've been adding features as I thought of them.

Over Christmas I mentioned that I'd been doing this to some friends, who expressed an interest in seeing it in action, so I've put together a brief video showcasing some of its more basic abilities. To be honest, I've kind of forgotten everything that it can do, especially since I taught it how to learn things for itself. Things I'd still like to do include linking it to a chatbot (cleverbot is meant to be quite good, or so I'm told), and adding more options in Google Documents.

Features (that I can think of off the top of my head) that were omitted from the video:

  • Directions in Google Maps
  • Creating events and reminders in Google Calendar
  • File management
  • Router management
  • Tracking visited websites in order to understand commands better (e.g. "Go to Cracked" will open, because I've been there a lot).
Of course, there's always more to do. As you can see, it doesn't always work perfectly, but it actually tends to work better when you speak normally than when you over-enunciate your words (see the "When is the next full moon" example).

A few points about the video. It's not fantastically high-quality, I know. I couldn't find any really good free desktop recording software, and the camera was filmed using a decent-ish digital jobby that's really meant for photographs. I do have a much better mic, but it's a stand one, and this really needed a headset. Sorry for the spitting and popping noises, I know it's pretty horrible. The blurring is irritating, I know, but it would be rather rude to display the details of everyone else in my inbox/on Facebook, so there we go.

Hope you enjoy it - if people are really interested I could post bits of the code, but most of it isn't hugely complicated.



  1. This is very cool, I've only recently started playing around with the speech recognition. I've been curious if this kind of thing was possible, especially since WSR's support / recognition for google chrome is rather underwhelming. I don't know anything about XML, but I may have to look into it if it means getting something this intuitive going.

    1. Thank you! You certainly don't need to know anything about XML. While it uses that kind of formatting all the basics are WSR specific. There is the option to add code snippets of VB or Java (which is what I've done), so if there's any real coding to be done it's in those languages.

      Support for non-Microsoft applications is pretty minimal; if you really want to get it working with Chrome (everything is done through a browser now, isn't it?), then I'd suggest taking a look at the keyboard shortcuts and working from there. The vast majority of what's shown in the video can be achieved that way, though things like mining browser history and some of the more context-sensitive bits require more complicated code.

  2. Hi Roxton,

    I have recently taken notice and interest of WSR myself and was rather impressed at the functionality you were able to accomplish with this utility. My problem is, I don't know a lick of coding. My inquiry here today is if you plan on releasing a post with fully-explained-function tutorials, a sort of scripting guide for what you have accomplished? If that is too time-consuming, do you plan on releasing a distributive .WSRMac file for the macros you have created?

    Thanks in advance and once again, nice work!

  3. Im with Kevin on this one, ever plan on releasing a how-to or anything of that sorts to accomplish this?

  4. Well Roxton,

    I am impressed. When I first got my Win 7 machine I immediately shelved the voice recognition feature because it wouldn't do what I want. I have a bit of a strange accent (even when typing apparently).

    I used to code when I was younger, although the most complicated algorithm I've ever done is a Suduku generation engine in Visual Basic--and only because someone said it couldn't be done.

    The other day for some reason I started using Windows Speech Recognition again mostly because I'm not getting as much done as I need to. I'm amazed at how much more productive it makes me. And yet, I'm at the wall again. I did some searching, and I found your demonstration.

    I've since downloaded the Voice Recognition macro designer, but. . .
    It won't work. And full disclosure I had to download a not-so-legit copy because for whatever reason the Windows Validation tool no longer works so you can't legitimately download it from Microsoft.

    I would appreciate some pointers. I can figure out most of the coding on my own. And I actually enjoy doing that. I just need to get the process started. As of now the Macros wont even respond to my voice. Spending a bit of time getting it to work, pays off well in the long run and I need to get back into coding anyway. I feel like the world is running away without me while I'm stuck defending people USCIS targets for deportation.