How to Extract MP3 and Transcribe Audio on Linux
Disclaimer: Use at own risk!
This guide walks you through setting up two simple tools on your Fedora based Linux system:
A custom mp3 command that extracts audio from videos
A custom whisper command that transcribes audio into text
Everything is going to be installed in your personal folders to keep your system tidy and easy to maintain.
Part 1: Create a Custom MP3 Command
What we’re doing:
You’ll make your own command called mp3. After setting it up, you’ll be able to type:
mp3 your-video.mp4
…and it will create your-video.mp3 by extracting just the audio from the video.
Step 1: Install ffmpeg
ffmpeg is a free tool that can convert video and audio files.
To install it, open your terminal and type:
sudo dnf install ffmpeg
Press Enter, type your password if asked, then press Enter again. This installs ffmpeg from Fedora’s software library.
Step 2: Make a personal folder for your commands
Your own commands should go into a folder called bin inside your home directory. If it doesn’t exist yet, create it:
mkdir -p ~/bin
This command creates the folder bin if it doesn’t already exist. ~ means your home folder. This is a safe place to store your scripts.
Step 3: Create your mp3 command
Now let’s make the mp3 script.
Open a new file in kwrite: (or your favourite editor)
kwrite ~/bin/mp3
When kwrite opens, paste the following into the file:
#!/bin/bash
input="$1"
output="${input%.*}.mp3"
ffmpeg -i "$input" -vn -acodec libmp3lame -ab 192k "$output"
Here’s what all that means:
#!/bin/bash tells Linux this is a script
input="$1" means the first thing you type after mp3 is the filename
output="${input%.*}.mp3" sets the output name by removing the old extension and adding .mp3
ffmpeg -i "$input" is the actual command that does the conversion
-vn tells ffmpeg to skip the video
-acodec libmp3lame means use the MP3 format
-ab 192k sets the audio quality
Save and close the file.
Now make the script executable — this means you’re allowed to run it:
chmod +x ~/bin/mp3
You only need to do this once.
Step 4: Make sure Linux knows where to find your command
Linux looks in special folders to find commands. Let’s make sure your ~/bin folder is one of them.
Check your PATH by typing:
echo $PATH
If you see something like /home/yourname/bin in the list, you’re done.
If not, add it by editing your .bashrc file:
kwrite ~/.bashrc
Add this line at the bottom of the file:
export PATH=$HOME/bin:$PATH
Save and close, then apply the changes by typing:
source ~/.bashrc
Now your mp3 command will work from anywhere. Try it:
mp3 myvideo.mp4
This will create myvideo.mp3 in the same folder as your video.
Part 2: Install Whisper
What is Whisper?
Whisper is a tool from OpenAI that turns audio into text. whisper.cpp is a fast, local version that runs entirely on your computer — no internet needed.
We’ll install it into a projects folder in your home directory so it doesn’t clutter your system.
Step 1: Set up a projects folder
If you don’t already have one, make a folder to store your personal software:
mkdir -p ~/projects
Change into that folder:
cd ~/projects
Step 2: Download Whisper.cpp
Type this command to download the program from GitHub:
git clone https://github.com/ggerganov/whisper.cpp
This creates a folder called whisper.cpp.
Go into that folder:
cd whisper.cpp
Now make a new folder inside for building the program:
mkdir build
cd build
Step 3: Build the program
This part compiles the program. It turns the source code into something your computer can run.
First, set up the build:
cmake .. -DCMAKE_BUILD_TYPE=Release
Then build it:
cmake --build . --config Release
This creates several programs in the build folder.
Step 4: Download the speech model
Whisper needs a model to know how to recognize words.
This command downloads the medium-sized English model:
../models/download-ggml-model.sh medium
The download is about 1.5 GB.
Part 3: Make a whisper Command You Can Use Anywhere
Now let’s make a shortcut so you can type whisper myaudio.mp3 and Whisper will transcribe the file into text right in the terminal window.
Step 1: Create the script
Open kwrite:
kwrite ~/bin/whisper
Paste this into the file:
#!/bin/bash
~/projects/whisper.cpp/build/bin/whisper-cli -m ~/projects/whisper.cpp/models/ggml-medium.bin -f "$1"
Save and close.
Make it executable:
chmod +x ~/bin/whisper
Now you can type:
whisper yourfile.mp3
…and the transcription will appear in your terminal window, ready to copy.
That’s It!
You now have two simple, powerful tools:
mp3 to extract audio from any video
whisper to transcribe that audio into text
Everything is installed in your own folders, safely and cleanly. No extra software needed, no privacy concerns, just fast, local transcription on Linux.
Extra: Create a mic Command to Extract Just the Microphone Audio
If you're recording in OBS using multiple audio tracks, for example, one for game audio and one for your microphone, you can create a special command called "mic" that pulls out only the microphone audio (usually track 2).
This saves time and ensures you're always working from clean voice recordings for transcription.
Step 1: Create the script
Open a new script in kwrite:
kwrite ~/bin/mic
Then paste this:
#!/bin/bash
input="$1"
base="${input%.*}"
output="${base}.mic.mp3"
ffmpeg -i "$input" -map 0:a:1 -acodec libmp3lame -ab 192k "$output"
Here’s what the script does:
"$1" grabs the video file name you type in
It removes the extension from that name
Then it creates a new file name ending in .mic.mp3
It uses ffmpeg to extract only audio track 2 (your mic) and convert it to MP3
Save the file and close kwrite.
Step 2: Make it executable
To allow Linux to run the script like a command, type:
chmod +x ~/bin/mic
Step 3: Use it from any folder
As long as ~/bin is in your PATH (see earlier setup steps), you can run this from anywhere:
mic "2025-07-09 17-49-24.mkv"
It will create:
2025-07-09 17-49-24.mic.mp3
right in the same folder — just the microphone audio, ready for transcription.
Although this makes the process even more efficient as I can now transcribe directly from the raw recording, ideally you want to transcribe from the final cut so you get an accurate reflection on what's said in the video you are publishing.