Transcribe Speech to Text


Transcribe speech to text with Python or the Web Speech API:

Python

Make sure you have Python installed:

python --version

Python version 3 is recommended.

Install SpeechRecognition module:

pip install SpeechRecognition

Create script speech_to_text.py that transcribes audio file Hello World.wav to text:

# speech_to_text.py
import speech_recognition as sr

r = sr.Recognizer()
filename = "Hello World.wav"

with sr.AudioFile(filename) as source:
    audio = r.listen(source)

text = r.recognize_google(audio)
print(text)

Run script:

python speech_to_text.py # hello world

Library

SpeechRecognition supports the following engines/API’s:

  • recognize_sphinx (works offline)
  • recognize_google
  • recognize_wit
  • recognize_bing
  • recognize_api
  • recognize_houndify
  • recognize_ibm
Pros Cons
Free API limitations (e.g., network timeout, file too big, rate limiting)
Fairly accurate Transcript can be off
  No punctuation marks

See guide and GitHub repository for more details.

Web Speech API

If audio input can be directed to your microphone, then you can use the JavaScript Web Speech API:

See Web Speech API Demonstration.



Please support this site and join our Discord!