Getting started with AI music using AudioCraft (why this is exciting)
AI music generation has become practical, fast, and genuinely creative.
With Meta’s AudioCraft (and its MusicGen models), you can go from a short text prompt — or even a guiding melody — to a polished audio sample in minutes using an AI music generator.
This post walks you through a reliable setup, a browser-based workflow, and a melody-guided two-step process you can reuse for any style.
You will discover how to set up your environment, and explore the the capabilities of Audiocraft’s AudioGen and MusicGen models.
Instructions for this video: https://ko-fi.com/s/1cf3103014
More relevant content in this playlist : https://www.youtube.com/playlist?list=PLdkryDe59y4bxVvpexwR6PMTHH6_vFXjA
Check out our tutorial here : https://www.youtube.com/watch?v=zrDIY-JqNrU
Introduction
We’re going to set up AudioCraft locally, so you can run the MusicGen web app on your machine.
You’ll have full control over CUDA acceleration, ffmpeg, and dependencies, without mystery black boxes.
Then we’ll use two short workflows: a quick 10-second prompt-only generation to confirm everything works, and a melody-guided 30-second generation to style a rock track using a Bach clip as the seed.
By the end, you’ll have a dependable, repeatable process to turn short ideas into vivid AI music.
Environment setup and cloning AudioCraft
This section creates your conda environment, checks your CUDA toolkit, installs PyTorch with CUDA, and clones the AudioCraft repo.
After this, you’ll be ready to install the Python requirements and supporting tools like ffmpeg.
### Create a clean conda environment for AudioCraft work. conda create --name audiocraft python=3.10.6 ### Activate the new environment so all installs land in the right place. conda activate audiocraft ### (Optional) Verify your local CUDA toolkit version for awareness/debugging. nvcc --version ### Install PyTorch with CUDA 11.7 support via official channels. conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia ### Fetch the AudioCraft source so you can run the MusicGen web UI locally. git clone https://github.com/facebookresearch/audiocraft.git ### Enter the repository to install dependencies and run demos. cd audiocraft
Instructions for this video: https://ko-fi.com/s/1cf3103014
You’re isolating dependencies inside a specific environment so the rest of your system stays clean.
Using the pytorch
and nvidia
channels aligns versions of CUDA-enabled packages with PyTorch.
Cloning the repo ensures you have the demos and assets locally for the web UI.
Keeping the CUDA check optional keeps the flow lightweight but debuggable.
Installing Python requirements and ffmpeg
AudioCraft relies on specific Python libraries and ffmpeg for audio processing.
We’ll install the package in editable mode and add ffmpeg from conda-forge for convenience and stability.
### Install AudioCraft and its Python dependencies in editable mode. pip install -e . ### Install ffmpeg so the web UI and rendering work as expected. conda install -c conda-forge ffmpeg
Instructions for this video: https://ko-fi.com/s/1cf3103014
pip install -e .
gives you a developer-friendly install that tracks local changes.
ffmpeg is critical for decoding, encoding, and transforming audio—don’t skip this.
Using conda-forge provides a straightforward, well-maintained ffmpeg build.
Quick summary of what you’ll see:
A local URL will appear in the terminal — copy it into your browser.
You’ll get a simple interface to choose models, set durations, and add prompts.
You can run quick tests with short durations, then scale up once you’re happy with the style.
If you hit any errors, double-check that your environment is active and ffmpeg is installed.
A simple prompt-only test to validate your setup
We’ll start with a fast 10-second generation to confirm everything is working end-to-end.
This is perfect for quick iterations on style, timbre, and mood before committing to longer renders.
Short generations return faster, help you dial in the prompt wording, and reduce wasted GPU time.
Once you like the color of the sound, you can extend the duration.
In the web UI:
Choose the melody model for flexibility even without a melody input.
Set duration to 10 seconds.
Add a prompt such as: medieval quiet music , harp and a flute
Generate and listen.
If it sounds in the ballpark, save it as a baseline.
Why this step matters:
Short runs surface potential issues quickly (driver/version mismatches, missing ffmpeg, GPU memory constraints).
You’ll learn how much the prompt phrasing affects instrumentation, space, and dynamics.
Ten seconds is enough to judge a vibe without waiting long.
Keep a notes file of prompts you like for reuse.
Melody-guided generation for stylistic control
Next, we’ll guide a longer generation with a reference melody.
We’ll use a Bach audio clip from the repo’s asset
folder to steer the phrasing and harmonic feel into a rock style.
This combination—classical seed + rock prompt—often yields inspiring contrasts.
You can replace the Bach clip with your own melody later.
In the web UI:
Set duration to 30 seconds.
Select the BACH audio file from the asset
folder.
Use a descriptive text prompt like: Rock music from the 90s
Generate and compare with your 10-second baseline.
Tips for better melody results:
Your melody clip shapes rhythm, contour, and groove — pick one that represents the feel you want.
Prompts still matter: instrumentation and era cues (e.g., “90s rock,” “analog synthwave,” “lo-fi jazz trio”) steer the production.
Try small prompt changes first (e.g., “punchy drums,” “clean guitar arpeggios,” “wide stereo”).
Iterate with 15–30 second clips before rendering full-length pieces.
Troubleshooting and performance notes (optional but handy)
If the app doesn’t start, confirm you’re inside the audiocraft
directory and your conda env is active.
If CUDA isn’t detected, ensure your GPU drivers are up to date and the PyTorch build matches your CUDA runtime.
If audio exports are silent or broken, re-install ffmpeg from conda-forge in the same environment.
If you face VRAM limits, lower sample rate or duration, or try smaller models first.
Summary of the core workflow
Create a clean conda environment and install PyTorch with CUDA 11.7.
Clone AudioCraft, install dependencies in editable mode, and add ffmpeg.
Launch the MusicGen web UI and test a quick 10-second prompt generation.
Move to melody-guided 30-second clips to lock in style — then scale up.
### Create and activate the environment. conda create --name audiocraft python=3.10.6 conda activate audiocraft ### Optional: check your CUDA toolkit. nvcc --version ### Install PyTorch with CUDA 11.7 support. conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia ### Clone and enter AudioCraft. git clone https://github.com/facebookresearch/audiocraft.git cd audiocraft ### Install AudioCraft (editable) and ffmpeg. pip install -e . conda install -c conda-forge ffmpeg ### Run the MusicGen web UI and open the shown URL in your browser. python -m demos.musicgen_app
Instructions for this video: https://ko-fi.com/s/1cf3103014
Post-run checklist:
Browser opens the local UI successfully.
10-second prompt-only generation completes and plays.
30-second melody-guided generation completes with the Bach clip.
Saved outputs are audible and match your creative intent.
Connect :
☕ Buy me a coffee — https://ko-fi.com/eranfeit
🖥️ Email : feitgemel@gmail.com
🤝 Fiverr : https://www.fiverr.com/s/mB3Pbb
Enjoy,
Eran