Enhancing Automatic Speech Recognition in Air Traffic Communications: Collaboration between A*STAR and AIR Lab

Introduction

In the high-stakes environment of air traffic communications, the use of Automatic Speech Recognition (ASR) could help reduce potential communication errors between the pilot and the Air Traffic Controllers (ATCOs). Hence, the accuracy of the ASR is important for enabling safety and maintaining efficiency through enabling clear and precise communication of pilot and air traffic controller commands to prevent miscommunication, which can lead to critical errors in flight paths and aircraft positioning.

Nonetheless, ASR performance drops when it is predicting totally new words which it has not seen during training or when the spoken words are unclear or immersed in too much noise. This creates problems especially when the spoken words are part of airline callsigns. In such a scenario, it would be beneficial if the ASR engine can be adapted with contextual information to transcribe callsigns more accurately.

Moreover, accurate ASR supports the swift processing of information, enabling faster decision-making and smoother coordination among the myriads of flights occupying the skies at any given moment. In essence, enhancing ASR accuracy in this domain not only bolsters aviation safety, mitigating the risk of in-air collisions and other hazards, but also contributes to the fluid management of air traffic, reducing delays and improving the overall efficiency of the aviation system.

A*STAR’s Institute for Infocomm Research (A*STAR I²R), a research institute under the Agency for Science, Technology and Research, Singapore’s lead public sector agency that spearheads economic oriented research to advance scientific discovery and develop innovative technologies—and AIR Lab (Aviation Innovation Research Laboratory), a partnership between CAAS and Thales focusing on the co-development of Air Traffic Management capabilities through cutting-edge and digital technologies, have joined forces to explore ways to improve the accuracy of ASR for ATM.  This collaboration marks a significant union between academia and industry to innovate and enhance aviation technologies.

Project overview

This project aims to utilise the contextual callsign provided in advance for improving the Callsign[1] Recognition Rate (CRR). The first method deals with the offline case where the audio and contextual callsign information is available in advance, while the second method is developed for real-time situations. We found that both methods could improve the CRR significantly in most cases.

The collaboration between A*STAR I²R and AIR Lab is a dynamic partnership built on shared roles and expertise, as well as seamless technology integration. Within this collaboration, AIR Lab allows the utilisation of Application Programming Interface (API) for acquiring contextual callsign information. This approach involves continuously querying the API to track flight context, ensuring that the system remains updated with the latest flight information. Through filtering and data accumulation over a configurable time frame, the collaboration ensures that only relevant and timely data are retained, enhancing the accuracy and efficiency of contextual callsigns for ASR decoding. AIR Lab’s participation in this endeavor brings expertise and resources, contributing significantly to the success and advancement of the project.

Methodologies

Specialising in advanced language intelligence that enhances machine comprehension and reasoning across sound, speech and text, A*STAR I²R’s Aural and Language Intelligence (ALI) team developed two methodologies to enhance the CRR.

Methodology 1 (for offline use cases): Contextual communication history is used to train a contextual language model to replace the normal language model and integrated into speech2text engine decoder.

Methodology 2 (for real-time use cases): Live contextual information from a time window interacts directly with the live speech recognition decoder to re-score the path probabilities with context information, partially eliminating possible errors caused by bad pronunciations.

Through these methodologies, significant improvements were observed in the call sign recognition rates:

  • Methodology 1 (for offline use cases): By utilising contextual information collected over approximately 30 minutes, a 77% relative reduction in the callsign error rate was achieved. This improvement resulted in a final callsign detection accuracy of 98.97%.
  • Methodology 2 (for real-time use cases): Using the AIR Lab contextual information API, a 23.96% reduction in the callsign error rate was achieved, resulting in a final callsign detection accuracy of 91.94%. This result is expected to reach a level of 57.07% reduction in the CRR, achieving a final accuracy of 95.45% with the next version of the AIR Lab API. 

Moving forward, efforts will focus on enhancing the speed and accuracy of the ASR in decoding ATC speech.

In conclusion, this collaboration between A*STAR I²R and AIR Lab represents a pivotal step in advancing aviation technologies through innovative approaches in callsign recognition by putting words into action. By merging academic expertise with industry resources, significant reductions in error rates and notable accuracy improvements have been achieved. This partnership underscores the critical role of collaborative efforts in pushing the boundaries of research, paving the way for enhanced air traffic management capabilities and safer skies globally.

About AIR Lab

In September 2019, CAAS and Thales signed a follow-up agreement on the sidelines of the International Civil Aviation Organization (ICAO) Innovation Fair in Montreal to establish a joint lab, focusing on the development of an Open ATM system architecture. Since then, AIR Lab has been exploring advanced and open technologies to advance the future of air traffic management. Leveraging Open Architecture, Artificial Intelligence and develops its Green Aviation Proof of Concept. The AIR Lab brings together multiple stakeholders in the ATM value chain – from air traffic controllers to AI experts, to software engineers and local start-ups – creating a veritable sandbox’ where the team can experiment and co-create the future ATM technologies.

Share This Post

More To Explore