Speech Detection Setup

Human speech detection is performed using Google's Speech API. Audio files are recorded, then processed through Google's platform. Google Speech will process the audio, and return a transcript of what it believes was said as well as a confidence score.

To run speech detection, perform the following steps:

  1. Ensure there is no folder titled with the current year on the running directory. If there is, delete it.
  2. Start roscore
  3. Start the logging node with rosrun detection_logger visual_logger
  4. Start the audio listener node with rosrun audio_raw_saver save_raw_data
  5. Start the audio detection node with rosrun audio_detection audio_detection
  6. Start the audio stream with roslaunch audio_capture capture.launch

Audio will now be saved to a subdirectory under a folder titled with the current year, in another folder titled with the current month, then a folder titled the current day, and finally in a folder titled "audio". Audio files are stored sequentially as .wav files, with incrementing file names following the pattern audio0.wav, audio1.wav, audio2.wav, and so on.

There may be skips in these saved files. When a file is processed, and no chance of human voice is detected, the file will automatically be deleted, and a query will not be generated.

When a file has human voice detected, the file will be saved, and a record of it will be kept in a folder titled "text", in the same subdirectory as the "audio" folder. The file is titled output.txt, and will keep a record of the file type, the confidence of human voice, and the location/name of the audio file it is referencing.

results matching ""

    No results matching ""