How to Extract MP3 and Transcribe Audio on Linux

Disclaimer: Use at own risk!

This walks you through setting up two simple tools on your Fedora based Linux system:

A custom mp3 command that extracts audio from videos
A custom whisper command that transcribes audio into text

Everything is going to be installed in your personal folders to keep your system tidy and easy to maintain.

Part 1: Create a Custom MP3 Command

What we’re doing:

You’ll make your own command called mp3. After setting it up, you’ll be able to type:

mp3 your-video.mp4

…and it will create your-video.mp3 by extracting just the audio from the video.

Step 1: Install ffmpeg

ffmpeg is a free tool that can convert video and audio files.

To install it, open your terminal and type:

sudo dnf install ffmpeg

Press Enter, type your password if asked, then press Enter again. This installs ffmpeg from Fedora’s software library.

Step 2: Make a personal folder for your commands

Your own commands should go into a folder called bin inside your home directory. If it doesn’t exist yet, create it:

mkdir -p ~/bin

This command creates the folder bin if it doesn’t already exist. ~ means your home folder. This is a safe place to store your scripts.

Step 3: Create your mp3 command

Now let’s make the mp3 script.

Open a new file in kwrite: (or your favourite editor)

kwrite ~/bin/mp3

When kwrite opens, paste the following into the file:

#!/bin/bash
input="$1"
output="${input%.*}.mp3"
ffmpeg -i "$input" -vn -acodec libmp3lame -ab 192k "$output"

Here’s what all that means:

#!/bin/bash tells Linux this is a script
input="$1" means the first thing you type after mp3 is the filename
output="${input%.*}.mp3" sets the output name by removing the old extension and adding .mp3
ffmpeg -i "$input" is the actual command that does the conversion
-vn tells ffmpeg to skip the video
-acodec libmp3lame means use the MP3 format
-ab 192k sets the audio quality

Save and close the file.

Now make the script executable — this means you’re allowed to run it:

chmod +x ~/bin/mp3

You only need to do this once.

Step 4: Make sure Linux knows where to find your command

Linux looks in special folders to find commands. Let’s make sure your ~/bin folder is one of them.

Check your PATH by typing:

echo $PATH

If you see something like /home/yourname/bin in the list, you’re done.

If not, add it by editing your .bashrc file:

kwrite ~/.bashrc

Add this line at the bottom of the file:

export PATH=$HOME/bin:$PATH

Save and close, then apply the changes by typing:

source ~/.bashrc

Now your mp3 command will work from anywhere. Try it:

mp3 myvideo.mp4

This will create myvideo.mp3 in the same folder as your video.

Part 2: Install Whisper

What is Whisper?

Whisper is a tool from OpenAI that turns audio into text. whisper.cpp is a fast, local version that runs entirely on your computer — no internet needed.

We’ll install it into a projects folder in your home directory so it doesn’t clutter your system.

Step 1: Set up a projects folder

If you don’t already have one, make a folder to store your personal software:

mkdir -p ~/projects

Change into that folder:

cd ~/projects

Step 2: Download Whisper.cpp

Type this command to download the program from GitHub:

git clone https://github.com/ggerganov/whisper.cpp

This creates a folder called whisper.cpp.

Go into that folder:

cd whisper.cpp

Now make a new folder inside for building the program:

mkdir build
cd build

Step 3: Build the program

This part compiles the program. It turns the source code into something your computer can run.

First, set up the build:

cmake .. -DCMAKE_BUILD_TYPE=Release

Then build it:

cmake --build . --config Release

This creates several programs in the build folder.

Step 4: Download the speech model

Whisper needs a model to know how to recognize words.

This command downloads the medium-sized English model:

../models/download-ggml-model.sh medium

The download is about 1.5 GB.

Part 3: Make a whisper Command You Can Use Anywhere

Now let’s make a shortcut so you can type whisper myaudio.mp3 and Whisper will transcribe the file into text right in the terminal window.

Step 1: Create the script

Open kwrite:

kwrite ~/bin/whisper

Paste this into the file:

#!/bin/bash
~/projects/whisper.cpp/build/bin/whisper-cli -m ~/projects/whisper.cpp/models/ggml-medium.bin -f "$1"

Save and close.

Make it executable:

chmod +x ~/bin/whisper

Now you can type:

whisper yourfile.mp3

…and the transcription will appear in your terminal window, ready to copy.

That’s It!

You now have two simple, powerful tools:

mp3 to extract audio from any video
whisper to transcribe that audio into text

Everything is installed in your own folders, safely and cleanly. No extra software needed, no privacy concerns, just fast, local transcription on Linux.

Extra: Create a mic Command to Extract Just the Microphone Audio

If you're recording in OBS using multiple audio tracks, for example, one for game audio and one for your microphone, you can create a special command called "mic" that pulls out only the microphone audio (usually track 2).

This saves time and ensures you're always working from clean voice recordings for transcription.

Step 1: Create the script

Open a new script in kwrite:

kwrite ~/bin/mic

Then paste this:

#!/bin/bash

input="$1"

base="${input%.*}"

output="${base}.mic.mp3"

ffmpeg -i "$input" -map 0:a:1 -acodec libmp3lame -ab 192k "$output"

Here’s what the script does:

"$1" grabs the video file name you type in
It removes the extension from that name
Then it creates a new file name ending in .mic.mp3
It uses ffmpeg to extract only audio track 2 (your mic) and convert it to MP3

Save the file and close kwrite.

Step 2: Make it executable

To allow Linux to run the script like a command, type:

chmod +x ~/bin/mic

Step 3: Use it from any folder

As long as ~/bin is in your PATH (see earlier setup steps), you can run this from anywhere:

mic "2025-07-09 17-49-24.mkv"

It will create:

2025-07-09 17-49-24.mic.mp3

right in the same folder — just the microphone audio, ready for transcription.

Although this makes the process even more efficient as I can now transcribe directly from the raw recording, ideally you want to transcribe from the final cut so you get an accurate reflection on what's said in the video you are publishing.