Forget text-to-speech for your on-board announcements, pre-recorded audio is better.

Recorded Audio P31vpb0vtqf505ykph3shy7d096ealodsqvfn3axls

Turn up the volume on pre-recorded audio.

By John Maglio, President ETA Transit Systems

Full disclosure: I’ve always disliked text-to-speech (TTS). It’s a wholly impersonal means of conveying critical information to passengers. From the sanitary, robotic pronunciation to the complete lack of support for local multi-lingual dialects. TTS is a poor way to connect with your riders and fails to measure up to the benefits of pre-recorded audio.

I do understand the arguments for the use of TTS:

  • Convenient: You can quickly input a message and it’ll be annunciated
  • Real-time announcements: just type and go.
  • Economical: No need to invest in high-capacity hard drives

And truth be told, I’d rather have a text-to-speech announcement system as opposed to having nothing at all. It sure beats asking my drivers to take their eyes of the road to make a manual announcement over the loudspeaker. Safety, after all, is job one.

But here’s the rub. Text-to-speech systems do nothing to build your agency’s brand or convey to riders the nuance of local life.

  • Poor quality: No synthesized voice ever sounds a clear as genuine human words
  • Added maintenance: It takes a lot of effort to tweak pronunciation engines to correctly enunciate words and phrases

It’s part of my job to ride public transit systems, and I sometimes must stifle a laugh when I hear the text-to-speech audio botch the pronunciation of a stop.

What is not funny is to see the passengers cringe at the error. It’s not hard to see that it bothers them, or to strain to understand what the message says. That’s a disconnect of brand, value, and sense of community, and ultimately that hurts the relationship between agency and customer.

Pre-recorded audio enhances the agency’s relationship with its customers

Your riders are just as important to the lifeblood of your operations as your agency is to their ability to get from points A to B. It’s a co-dependent partnership worthy of effort to ensure the highest possible transit experience. Adding value to the transit experience is straightforward and can be as simple as getting the pronunciation of key streets and landmarks correct. It’s a matter of local pride; it shows investment in the community.

Pre-recorded audio is a solution to the shortfalls of text-to-speech systems, and it’s one I firmly believe provides transit agencies with greater options to engage customers and deliver a highly localized experience.

Pre-recorded audio is more economical than you’ve been led to believe

The knock on pre-recorded audio has long centered around two issues:

  • Expensive: Audio generates large file sizes and requires high-capacity hard drives
  • Long turnaround/production time: Audio recordings are manually processed and require coordination between voice talent and production studios

The cost issue is never one to be taken lightly, however in the specific case of the file size of custom audio is rooted in decade-old thinking. The truth is that modern high-capacity hard drives are cheap.

Pre-recorded is faster to implement that you may think

The other argument, the turnaround time remains a valid concern. However, I would offer that it is not as large an obstacle as you may think. Agencies with an established relationship with an audio production facility often turn around high priority recordings by the next business day—sometimes sooner.

When you think about it, how often do you really need to generate a real-time on-board message? The truth is that it’s a rare occurrence, often relating to a temporary traffic delay or situational emergency. By the next day, or even an hour later, that situation is no longer an issue. In most cases, your traveler information systems (i.e. bus-tracking websites, mobile apps, station and on-board signage), have already pushed out the notifications to your passengers and solved the immediate need for any TTS solution.

Your typical use for text-to-speech is for long-standing purposes. Permanent stops, or repetitive alerts. Any changes to routes or stations is going to come with a significant amount of planning and forethought, so why not make the effort to do it right?

I’ve been in the transit space since the early aughts. In all my time delivering transit technology to agencies across the country, I am hard pressed to recall many instances that truly merited real-time text to speech capabilities. With the right partnership between agency, recording studio, and thoughtful selection of local talent, custom audio production can easily be turned same day.

Use pre-recorded audio to span geographic and ethnic boundaries

Pre-recorded audio provides a chance to truly deliver precisely the tone and personality you seek to strike when communicating with your passengers. Barriers exclusive to TTS engines—gender, dialect, multilingual, emotion—all disappear with custom recorded audio.

Recently, we implemented a custom-audio solution for South Florida’s Tri-Rail system which provided not one, but three distinct custom recordings—English, Spanish, and Creole—to better appeal to regular customers and tourists, alike. The customer chose his vocal talents, directed the proper enunciation, and approved the final audio before deployment; a process that proved to be very popular with their riders, visitors to the region, and employees alike.

Check out the quality difference and options for the below announcement:

“Hillsboro Boulevard and NW 3rd Avenue Martin Luther King Junior Avenue, transfer location Broward County Transit Route 50”

Text to speech version:

Recorded audio (English)

Recorded audio (Spanish)

Recorded audio (Creole)

But what about voice talent?

What if the person who recorded your initial audio moves or is no longer available? It’s not an unreasonable question, and a persistent challenge to maintaining a consistent audio library. In the nearly 20 years that I’ve advocated custom audio, this issue has cropped up from time to time.

Our solution to this issue has been to re-record all audio for our transit customers at no additional charge. It’s not a sales pitch, nor is it a questionable business decision—not if maintaining the dedication to providing the best available product for our clients is the end goal. Lifetime recording is factored into the initial budget, and truthfully, we plan on changing out audio every couple of years as routes are added, dropped, and altered. Still, delivering the highest quality experience for passengers is the driving factor behind the decision, and when quality of engagement and an enhanced ridership experience is the goal, it’s worth it to us to provide this level of service.

Isn’t custom audio more expensive?

The answer truly depends on how one views the cost of a product or service. There’s the up-front cost of doing something, and then there’s the long-term cost; there’s the cost of having something done right, and the cost of having to repeatedly redo something because the effort to quality was sacrificed in favor of a lower initial investment.

What is the cost of having 100 audio files produced correctly over the span of a day or two, versus the cost of having to pay an employee to manually adjust and tweak the pronunciation of 100 text-to-speech files? How much time does that take when you factor in employee costs and lost productivity toward other critical job functions?

Consider a more real-world analogy: Have you ever bought the cheaper paper towels because you wanted to cut costs at the grocery store? Have you noticed that those cheaper towels weren’t as absorbent and you used twice as many to clean up that glass of spilled milk—or that because of the cheaper towels it took twice as long to clean up the mess?

Is the up-front cheaper cost really a savings if it takes more time to complete, or results in a less efficient solution?

Pre-recorded audio delivers on its potential.

Pre-recorded audio delivers a better end-product in terms of clarity, quality, and options to transit agencies looking to build long-term brand value and deliver a heightened ridership experience. Many of the factors that initially contributed to the rise of TTS (cost, turnaround) simple aren’t as relevant as they once were.

It’s not that text-to-speech is a bad solution, it’s just that it’s not as good a solution as custom audio. When the value impact of its production are factored in, including the boost to rider satisfaction and improved overall experience, doesn’t it make sense to give pre-recorded audio another listen?

For a demonstration of ETA’s SPOT® Intelligent Transit System, please click here or call 1-800-382-0917.

Subscribe for industry updates straight to your inbox.

This field is for validation purposes and should be left unchanged.

Related Posts

The Future of GTFS

Navigating the Transit Revolution: Part V—The Future

Read more
Gtfs Series Hero Image 4a

Navigating the Transit Revolution: Part IV—A Gateway to Historical Transit Operation Data

Read more

Click here to talk to one of our experts.