fbpx
Quick start for Golang, Google Cloud API, and Speech Recognition Quick start for Golang, Google Cloud API, and Speech Recognition
Speech recognition is becoming increasingly powerful and helpful to developers across the world. In this short article, I would like to... Quick start for Golang, Google Cloud API, and Speech Recognition

Speech recognition is becoming increasingly powerful and helpful to developers across the world. In this short article, I would like to demonstrate how easy it is to set up in your own web application using Google’s powerful speech API. I will assume you have basic programming experience. To get started, you will need to download Go from https://golang.org/dl/. You may need to set up environment variables depending on your operating system. Once you have downloaded Go, try and run a simple hello world program in a file called test.go, to make sure all is setup correctly:

[Related Article: Speech Analytics Market is estimated to reach USD 1.59 Billion by 2021 at a CAGR of 18%]

package main

import “fmt”

func main() {
fmt.Println(“Hello ODSC!”)
}

 

Then run

go run test.go

Hopefully, you will get a message in your console saying Hello ODSC.  You will now need to setup a Google Cloud API account, and authenticate your account with your code. Please follow the DOCs to do this effectively https://cloud.google.com/docs/authentication/getting-started#auth-cloud-implicit-go . Also, you should find a sound file to test with, and it should include someone speaking. Finally, please import the following modules (each will be explained in more detail later)

import(
“fmt”
“net/http”
“io/ioutil”
“golang.org/x/net/context”
speech “cloud.google.com/go/speech/apiv1”
speechpb “google.golang.org/genproto/googleapis/cloud/speech/v1”
)

 

Once you are all set up, you need to start a new Context. A context is essentially an object that allows for more effective interaction with 3rd party APIs, by having individual handlers for when an API has completed a request, failed, or taken too long. Because we are going to be making requests to the Google Speech API, it is important to have handlers for these scenarios.

ctx := context.Background() 

client, err := speech.NewClient(ctx)
if(err != nil){ 
    fmt.Println(err)
}

The first line starts a new empty context, named ctx. This then gets passed into the NewClient method, which will use your authentication setup from earlier to confirm your specific account. The next block just catches any error that may be returned from the Speech API. If you do get an error, you may need to retry configuring your authentication keys. We now need to specify a sound file to use, and process it into an Audio data object:

fileDir := “Recording.wav”;

audioData, err := ioutil.ReadFile(fileDir);
if(err != nil){
     fmt.Println(err)
}

The module used is ioutil, and is responsible for formatting audio data into a kind of buffer, ready to be sent to the API. Next, we need to provide certain presets as a speechpb.RecognitionConfig object which is passed through as a parameter to the main method client.Recognize (which actually carries out the request):

response, err := client.Recognize(ctx, &speechpb.RecognizeRequest{
    Config: &speechpb.RecognitionConfig{
    Encoding: speechpb.RecognitionConfig_LINEAR16,
    SampleRateHertz: 22050,
    LanguageCode: “en-US”,
},
    Audio: &speechpb.RecognitionAudio{
    AudioSource: &speechpb.RecognitionAudio_Content{Content: audioData},
},
})

Most of the presets are self-explanatory, and easy to experiment with. The sample rate is essentially the quality of data, where the highest value allowed is 16000. The encoding field informs the program how to represent the data, where LINEAR16 refers to uncompressed 16-bit signed samples.

if(err != nil){
     fmt.Println(err)
}

Again, we check and output any errors from the response. Finally, we can output the transcript:

for _, result := range response.Results{
    for _, alt := range result.Alternatives{
         fmt.Println(alt.Transcript)

    }
}

Hopefully, the API has successfully returned the correct speech to text translation from your audio file, and your future projects may include a whole new user experience!

[Related Article: AI-Identified Health Policies, Hate Speech Detection Among September Industry Research]


Ready to learn more data science skills and techniques in-person? Register for ODSC West this October 31 – November 3 now and hear from world-renowned names in data science and artificial intelligence!

Caspar Wylie, ODSC

Caspar Wylie, ODSC

My name is Caspar Wylie, and I have been passionately computer programming for as long as I can remember. I am currently a teenager, 17, and have taught myself to write code with initial help from an employee at Google in Mountain View California, who truly motivated me. I program everyday and am always putting new ideas into perspective. I try to keep a good balance between jobs and personal projects in order to advance my research and understanding. My interest in computers started with very basic electronic engineering when I was only 6, before I then moved on to software development at the age of about 8. Since, I have experimented with many different areas of computing, from web security to computer vision.

1