All Articles

Native Speech Recognition

This is day 20 in my #javascript30 journey. This is the free course from Wes Bos that lets you brush up on your JavaScript skills via 30 projects.

Yesterday we explored accessing & playing around with the webcam. You can keep track of all the projects we’re building here.

Today we’re exploring speech detection.


Day 20 - Native Speech Recognition

So… Speech Recognition is available directly in the browser with no need for libraries. That is amazing.

There is minimal support for SpeechRecognition available. We will only be able to run this app in Chrome or FireFox.

Even though there are some limitations this is still a really cool feature that I’m looking forward to implementing.

Accessing SpeechRecognition

We need to access SpeechRecognition on the window. Chrome requires a webkit prefix but we are going to associate them with the same keywords at the beginning of the page:

window.SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition  

Next up we will create a new instance of speech recognition. We will also set the interimResults to true. This allows us to view the text as we are speaking (as opposed to waiting until we are finished speaking to print the text).

const recognition = new SpeechRecognition()  
recognition.interimResults = true  
Printing the Text in the Browser

Once the browser has a stream of input coming in we need to print it out to the screen.

To do this we will create a paragraph. For each ‘pause’ in speech we want to create a new paragraph element.

We will only ever be editing the final element.

To do this we need to:

let p= = document.createElement('p')  
const words = document.querySelector('.words')  
words.appendChild(p)  

Now we need to add an event listener and convert the results into an Array:

recognition.addEventListener('result', e => {  
  const transcript = Array.from(e.result)
    .map(result => result[0])
    .map(result => result.transcript)
    .join('')
})

Now this will only run until the user has a pause in speech. We then need to add an event listener for when the recognition ends and have it listen for when it starts again:

recognition.addEventListener('end', recognition.start)  

Now that we have the transcription stream coming through we need to output it to the `

` elements that we create.

We need to create a new `

` for each pause in speech:

recognition.addEventListener('result', e => {  
  const transcript = Array.from(e.result)
    .map(result => result[0])
    .map(result => result.transcript)
    .join('')

    p.textContent = transcript
    if(e.results[0].isFinal) {
      p = document.createElement('p')
      words.appendChild(p)
    }
})

Now we have a working transcript!

You can play around with the speech detection & transcript here.

You can keep track of all the projects in this JavaScript30 challenge here.