ACX Audio Requirements for AI Audiobooks: Complete Spec Guide

If your audiobook submission gets rejected by a distributor, the most common reason isn't the narration quality — it's a technical spec violation you could have caught in five minutes. RMS levels, noise floor, bit rate, sample rate: these numbers trip up first-time producers constantly, and when you're working with AI-generated audio, understanding them becomes even more important because you're often the one doing final quality control instead of a seasoned studio engineer.

This guide covers every ACX audio requirement you need to know, what each spec actually means in plain English, and how to verify your files before you submit — whether you're recording yourself, hiring a narrator, or generating audio with an AI platform.

What ACX Is and Why Its Specs Matter

ACX (Audiobook Creation Exchange) is Amazon's audiobook production and distribution marketplace. Books produced through ACX are sold on Audible, Amazon, and iTunes — three of the largest audiobook retail channels in the world. When you submit a finished audiobook to ACX, every audio file goes through an automated quality check before a human reviewer ever listens to it. Fail the technical check, and your book gets bounced back regardless of how good the narration sounds.

The ACX audio requirements in full: RMS loudness between -23 and -18 dBFS, peak level at or below -3 dBFS, noise floor at or below -60 dBFS, 44.1 kHz sample rate, and 192 kbps constant bit rate (CBR) MP3 format. Every chapter file must meet all of these simultaneously. Miss one, and the whole submission fails.

Understanding these specs isn't just bureaucratic box-ticking. They exist because Audible streams audio across wildly different listening environments — earbuds on a subway, car speakers, home theater systems. Standardized loudness and dynamic range ensure listeners don't have to adjust their volume every time a new chapter starts.

The ACX Audio Requirements Explained Spec by Spec

RMS Loudness: -23 to -18 dBFS

RMS (Root Mean Square) measures the average loudness of your audio over time. Think of it as the "perceived volume" of the narration, not the loudest moment but the sustained energy across the whole file.

ACX requires your RMS to land between -23 dBFS and -18 dBFS. Too quiet (below -23) and listeners struggle to hear it at normal volume. Too loud (above -18) and the audio sounds compressed and fatiguing. Most rejections happen because files are too quiet — a common issue when AI-generated audio hasn't been normalized or when home recordings are made in overly cautious gain settings.

If you're checking your files manually, tools like Adobe Audition, Audacity, or iZotope RX all have loudness analysis features that will show you RMS values per file.

Peak Level: At or Below -3 dBFS

Peak level is the loudest single moment in your audio — a sharp consonant, a dramatic shout, or a breath that got too close to the microphone. ACX requires this to stay at or below -3 dBFS, leaving headroom so the audio doesn't clip or distort during playback or encoding.

This is a "true peak" measurement, meaning it accounts for intersample peaks that can occur during the MP3 encoding process. A file that looks fine at -3 dBFS in your DAW can technically clip after encoding if you're not using a true peak limiter. Most professional mastering chains include one; if you're doing this yourself, make sure your limiter is set to true peak mode.

Noise Floor: At or Below -60 dBFS

The noise floor is the level of background sound when no one is speaking — the hiss of a room, the hum of a computer fan, the faint rumble of traffic outside. ACX requires this to be at or below -60 dBFS, which is a fairly demanding standard.

For AI-generated audio, this is typically a non-issue since the audio is synthesized rather than recorded in a physical space. However, if you're doing any post-processing — adding room tone, blending with music, or layering effects — you need to ensure that processing doesn't introduce noise above the threshold.

For human recordings, this is often the hardest spec to meet. ACX's own help documentation recommends treating your recording space and using a high-quality interface to keep the noise floor clean.

Sample Rate: 44.1 kHz

Sample rate determines how many audio samples are captured per second. 44.1 kHz is the CD standard and the only sample rate ACX accepts. If you record or export at 48 kHz (common in video production) or 96 kHz (common in high-resolution audio workflows), you need to convert to 44.1 kHz before submission.

This conversion is straightforward in any audio software, but it needs to happen before your final loudness mastering — resampling after normalization can slightly alter your RMS values.

File Format: MP3, 192 kbps CBR, Mono

ACX requires MP3 files at 192 kbps or higher, using constant bit rate (CBR) encoding, in mono. Variable bit rate (VBR) MP3s are not accepted even if the average bit rate is higher. Stereo files are also rejected — audiobook narration is a mono format.

The mono requirement surprises a lot of first-time producers. Stereo narration wastes file size and can cause phase issues on certain playback systems. Convert to mono before exporting, not after — summing stereo to mono in post can cause level changes that throw off your RMS.

File Structure and Submission Requirements

Beyond the audio specs, ACX has structural requirements for how your project is assembled:

Opening credits file: A short file (typically 30–60 seconds) that includes the book title, author name, and narrator name. This is required and must meet the same audio specs as all other files.
Closing credits file: Must include the copyright year, publisher (or "published by the author"), and a statement that the audiobook was produced by ACX.
Retail audio sample: A 1–5 minute excerpt from the book, separate from the chapter files, used as the preview on Audible.
Chapter files: Each chapter as a separate MP3. ACX does not accept a single combined file for the full book.
File naming: ACX recommends clear sequential naming (e.g., 01_opening_credits.mp3, 02_chapter_01.mp3) to avoid confusion during review.

Each individual file must meet the loudness, peak, and noise floor specs independently. A chapter that's fine on average but has one very quiet section will still fail if that section pulls the RMS below -23 dBFS.

AI Audiobooks and ACX: What You Need to Know

Here's the part that matters if you're planning to use AI narration: ACX currently prohibits AI-narrated audiobooks on its platform. Their guidelines require human performance, and submissions disclosing AI narration are not approved. This is a firm policy as of 2025, not a gray area.

That doesn't mean AI audio tools have no role in your workflow. Many authors use AI platforms to create a high-quality demo or proof-of-concept before hiring a human narrator, to check pacing and pronunciation before committing to a full recording session, or to produce audiobooks for distribution channels that do allow AI narration.

Platforms like Findaway Voices (distributed through Spotify, Hoopla, and hundreds of other retailers) allow AI-narrated content with proper disclosure. Draft2Digital and Publish Drive also distribute to channels where AI narration is permitted. For many indie authors, these channels reach a substantial audience without the ACX restriction.

If you want to understand the full landscape of how to take a manuscript all the way to a finished audiobook — including which platforms accept AI narration and how to manage the production process — the ACX audio requirements section of our complete AI audiobook guide walks through every step in detail.

Common Rejection Reasons and How to Fix Them

Based on the most frequent ACX submission failures, here are the issues to check before you hit submit:

RMS too low: Narration was recorded or generated at conservative levels and never normalized. Fix with a loudness normalization pass targeting -20 dBFS RMS as a safe midpoint.
Missing opening or closing credits: These files are required and are often forgotten. Create them before you start mastering chapter files so they go through the same processing chain.
Stereo files submitted instead of mono: Export settings weren't changed from the software default. Always set your export to mono explicitly.
VBR instead of CBR encoding: Some export presets default to VBR. In Audacity, select "Constant" under bit rate mode. In Adobe Audition, choose "Constant" in the MP3 export settings.
Sample rate mismatch: Files recorded at 48 kHz and exported without resampling. Check your project settings before recording begins.
Noise floor violations in human recordings: Background noise from HVAC, street noise, or computer fans. Treat the recording space, use a noise gate, or apply noise reduction in post — but apply it before loudness normalization.
True peak clipping after encoding: Peaks measured at exactly -3 dBFS in the DAW can exceed that after MP3 encoding. Set your true peak limiter to -3.5 dBFS to create a small safety margin.

Verifying Your Files Before Submission

Don't rely on your ears alone. Use a metering tool to confirm every file meets spec before uploading. Free options include:

Audacity's ACX Check plugin: Specifically designed for ACX submissions, it reads RMS, peak, and noise floor in one pass
Youlean Loudness Meter: Free VST/AU plugin with detailed LUFS and RMS readouts
ffmpeg with the ebur128 filter: Command-line tool for batch analyzing large numbers of files

For a full novel — typically 8 to 12 hours of audio across 30 or more chapter files — batch processing and verification is worth the setup time. Checking 35 files manually is how mistakes get missed.

StoryVox exports ACX-compliant MP3s by default — 192 kbps CBR, 44.1 kHz, mono — so the format side of the spec is handled automatically, leaving you to focus on loudness and quality review.

Getting your audio specs right the first time isn't a technical luxury — it's the difference between your book going live this week and spending another two weeks in revision cycles waiting for resubmission review.