Verbatim transcriptions of interviews conducted with professionals for our projects is a labourous process and online transcription tools might pose privacy risks to our data.
In this wiki you can find a repository which can be installed and run on your local machine. The code currently runs in a linux environment with python3. In the future, support will be extended to Windows OS.
The repository can be downloaded here: https://github.com/Joesher15/audio_to_text
The wiki on the repository shows how to proceed: https://github.com/Joesher15/audio_to_text/blob/main/README.md
Assuming python and pip is installed, the followed commands on terminal will create a new virtual environment and
install all the necessary dependencies
sudo apt install python3-venv
python3 -m venv audio_2_text_env
source audio_2_text/bin/activate
pip install -r requirements.txt
The input to the algorithm is a audio file (only wav files supported) and the output is a text file where the first word
of each line corresponds to the name of the audio file from which it was transcribed. This audio file is 10 seconds long
and can be used to further correct the text translation, as the output of the speech recognition is sub-optimal. The 10
second files can be found in a subdirectory "split" in the original directory.
Set the directory path of the audio file in the directory variable and the filename of the audio file in the
audio_text.py code and run the following command to generate the transcription.
python audio_text.py
The output on the console will show the progress of the translation and create a text file in the directory specified by
the user.