Speech Detection Setup
Human speech detection is performed using Google's Speech API. Audio files are recorded, then processed through Google's platform. Google Speech will process the audio, and return a transcript of what it believes was said as well as a confidence score.
To run speech detection, perform the following steps:
- Ensure there is no folder titled with the current year on the running directory. If there is, delete it.
- Start
roscore
- Start the logging node with
rosrun detection_logger visual_logger
- Start the audio listener node with
rosrun audio_raw_saver save_raw_data
- Start the audio detection node with
rosrun audio_detection audio_detection
- Start the audio stream with
roslaunch audio_capture capture.launch
Audio will now be saved to a subdirectory under a folder titled with the current year, in another folder titled with the current month, then a folder titled the current day, and finally in a folder titled "audio". Audio files are stored sequentially as .wav
files, with incrementing file names following the pattern audio0.wav
, audio1.wav
, audio2.wav
, and so on.
There may be skips in these saved files. When a file is processed, and no chance of human voice is detected, the file will automatically be deleted, and a query will not be generated.
When a file has human voice detected, the file will be saved, and a record of it will be kept in a folder titled "text", in the same subdirectory as the "audio" folder. The file is titled output.txt
, and will keep a record of the file type, the confidence of human voice, and the location/name of the audio file it is referencing.