Over the past year and a half, audio and video apps have become a standard part of the workday. For many of us, what used to be a walk over to a coworker’s desk or an in-person meeting, is now almost always a phone call or video chat. We spend so much time on calls, we might as well start paying rent for our Zoom boxes.
Unfortunately for surveillance teams, risky conversations are much more likely to take place on an audio call than in written communications, so it’s crucial to have an iron-clad aComms strategy in place. But audio surveillance isn’t easy. It’s a unique challenge that requires a unique solution.
“[Teams need to realize] how difficult audio surveillance is. Taking an audio recording, converting it to text, treating it like an email, and inserting it into your surveillance platform is not going to solve your problem,” said Jordan Domash, Relativity Trace GM.
Jordan and Marcin Kokott, Relativity Trace product manager, tackled this topic at Relativity Fest in a session all about aComms. They shared the challenges to look out for as you optimize your audio surveillance program, and how automation and AI in Relativity Trace can address the challenges.
Challenge #1: Transcription is not an exact science.
Communications that happen over a keyboard—like text messages, emails, and chats—are black and white. There’s no mistaking the accuracy or content of what was said. The same, unfortunately, doesn’t apply to aComms.
On average, 80 to 90 percent accuracy in an audio transcription is considered “good.” That means that 10 to 20 percent of the words in your transcription are misused, missing, or just plain wrong. A scary thought, considering that some of those words might be important keywords. (It’s not hard to imagine WhatsApp being transcribed as what’s up.)
“It’s better to be right about the critical words than be right on average,” Marcin explained. Consider the below example:
The top transcription is more accurate than the bottom one, but it incorrectly transcribed some major keywords (surname, company name, and drug name), thus making it less useful in the lens of communication surveillance.
That’s where keyword search alternatives come into play. In Relativity Trace, this nifty feature lets you search for key terms that the transcription model may have mis-transcribed. For example, if you know that “WhatsApp” is an important term, you can search for it across your communications and pinpoint any areas where the term was considered as an alternative.
It’s an easy way to ensure you’re not missing a critical alert due to an incorrect transcription.
Challenge #2: People speak multiple world languages.
With business being conducted at a global scale, your surveillance of course needs to account for the different languages that might be used across your organization. However, surveilling for multiple languages is not a straightforward task.
Twenty-three languages account for more than half of the population, and each of those languages has its own regional variations, such as British English, American English, and Australian English. Additionally, multi-lingual speakers often have different accents and may language-switch in a single conversation.
“[Having one language model] creates a challenge,” Marcin said. “If you have different accents and variations, [you need] a specific language model and specific adaptation to capture [each language] correctly and get the accuracy you’re after.”
Relativity Trace was created with this exact problem in mind. With Trace, you can automatically transcribe in 26 global languages and dialects, including those multi-language calls in which speakers switch languages.
“Language switching is common in voice communications, and the ability to understand that a single conversation has multiple languages in it is something that’s critical,” Jordan said.
Challenge #3: Traders have a language of their own.
As if world languages weren’t enough, there’s yet another category of languages that surveillance teams need to consider: the business language.
Traders in particular have a unique lingo with fun terms like yard, bag, monkey, buy the dip, sell the rip, and don’t catch the falling knife. Plus, you also need to consider the acronyms, initialisms, and even codenames or projects that are specific to your organization.
“A lexicon will not have these words out of the box […] so models need to be built with these new words,” Marcin said. “At Relativity, we recognize that need and try to make it as easy as possible to add the context to the model.”
Relativity Trace customers, Marcin explained, can train the model on their own in a few easy ways. You can add specific jargon, phrases, words, or acronyms that you want the model to look for; you can copy and paste whole paragraphs; you can upload emails, chat, and other documents; or you can train the system on eComms data that you already have in Relativity.
Challenge #4: aComms is not just another data source—it’s a whole different category.
The constant explosion of data sources and types is a headache across the industry, but audio formats are on a whole different level. Your organization likely has several types of audio communications, including desk phones, mobile phones, trader turrets, collaboration software like Zoom or Microsoft Teams, and voice recorders. Each of those systems has a different set of metadata, and often multiple pieces of metadata for the same piece of content.
“There’s a huge need for normalization. A critical part of aComms surveillance is to not only pull [the audio], but also process it and match one voice recording to another to provide context for that. Having metadata and being able to match recordings is critical,” Marcin said.
Relativity Trace can capture data from any audio recording—but it’s “not a simple check-the-box exercise,” Jordan explained. Trace takes in all the different metadata for each recording, as well, so you can see the full context: What individuals are on the call? What’s the phone number? What was the duration of the call? What language was it in? And anything else you may need to get the full story.
“You can know someone’s phone number, Zoom account, Skype username, email address, Bloomberg Chat ID, Slack ID—all linked together in one platform,” Jordan said.
Challenge #5: Standard communication surveillance review won’t cut it.
Once you finally gather and transcribe your audio data, there’s the big finale: the review itself. Many surveillance programs aren’t equipped to handle a fast and efficient review that encompasses all communications, aComms, and eComms.
Trace, however, integrates aComms with eComms into one surveillance workspace for a unified review. With a single dashboard, you can search for content and metadata across all your different data sources and mediums—audio, email, and chats—in one place.
Additionally, Trace includes an integrated audio player that highlights the text as you listen, “karaoke style” as Jordan put it. As you read, you can also jump straight to your alert terms or phrases—the portion of the call that generated the alert in the first place—saving you hours of review time.
“Perfect alerting is not effective if you have to listen to an hour-long phone call. Trace allows you to listen to what matters most, ignore the rest, and speed through it really quickly all while having the context of the topics that occurred,” Jordan said.
Best Practice #1: In the end, remember to treat audio like audio.
Jordan and Marcin ended the session with a few key takeaways they wanted to drive home with the audience, but none more important than this: “Remember to treat audio like audio. It’s unique, it’s different, it’s imperfect,” said Jordan.
“A generic model that hasn’t been trained for the financial world is going to be inaccurate. Assume you’re in an imperfect science that needs specific technology.”