English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Initialisierung mehrdimensionaler Arrays mit C

Speech recognition is one of the most useful features in various applications such as home automation and artificial intelligence. In this section, we will learn how to perform speech recognition using Python and Google's Speech API.

In this case, we will use the microphone to provide audio for speech recognition. To configure the microphone, there are some parameters.

To use this module, we must install the SpeechRecognition module. There is also another module called pyaudio, which is optional. Using this feature, we can set different audio modes.

sudo pip3 install SpeechRecognition
sudo apt-get install python3-pyaudio

For external microphones or USB microphones, we need to provide an accurate microphone to avoid any difficulties. On Linux, if you enter 'lsusb' to display relevant information about USB devices.

The second parameter is 'block size'. Using this option, we can specify how much data to read at one time. This will be2of power, for example1024or2048...

We must also specify the sampling rate to determine the frequency of processing recorded data.

Since there may be some unavoidable noise around, we must adjust the ambient noise to obtain accurate sound.

Steps to recognize speech

  • Get other information related to the microphone.

  • Configure the microphone using block size, sample rate, ambient noise adjustment, and other settings.

  • Wait for a while to get the sound

    • After identifying the voice, try to convert it to text, otherwise some errors may occur.

  • Stop this process.

Beispielcode

import speech_recognition as spreg
#Setup the sampling rate and the data size
sample_rate = 48000
data_size = 8192
recog = spreg.Recognizer()
with spreg.Microphone(sample_rate = sample_rate, chunk_size = data_size) as source:
recog.adjust_for_ambient_noise(source)
print('Tell Something: ')
   speech = recog.listen(source)
try:
   text = recog.recognize_google(speech)
   print('Sie haben gesagt: ') + text)
except spreg.UnknownValueError:
   print('Das Audio kann nicht erkannt werden')
except spreg.RequestError as e: 
   print("Fehleranfrage vom Google Speech Recognition-Dienst; {}".format(e))

Ausgaberesultat

$ python3 318.speech_recognition.py
Erzählen Sie etwas: 
Sie haben gesagt: hier betrachten wir die asymptotische Notation Pico, um die obere Begrenzung zu berechnen 
der Zeitkomplexität, daher die Definition der Big-O-Notation ist wie folgt
$

Ohne Mikrofon können wir auch einige Audio-Dateien als Eingabe verwenden und sie in Sprache umwandeln.

Beispielcode

import speech_recognition as spreg
sound_file = 'sample_audio.wav'
recog = spreg.Recognizer()
with spreg.AudioFile(sound_file) as source:
   speech = recog.record(source)  # verwende record anstelle von listening
   try:
      text = recog.recognize_google(speech)
      print('Die Datei enthält: ') + text)
   except spreg.UnknownValueError:
      print('Das Audio kann nicht erkannt werden')
   except spreg.RequestError as e: 
      print("Fehleranfrage vom Google Speech Recognition-Dienst; {}".format(e))

Ausgaberesultat

$ python3 318a.speech_recognition_file.py 
Die Datei enthält: dem Trend voraus sein, Nachfrageplanung neue Technologie, es hilft Ihnen auch in Ihrer Karriere voranzukommen
$
SQLite-Anleitung