TwinCAT Speech software: Speech input and output capabilities simplify plant operation and maintenance

“TwinCAT, start tracing!”

The human body has evolved into what could be called a system of perfectly harmonised functional units. The same applies in a figurative sense for an automation system, which combines the functions of intelligence, sensors, motion control and vision in perfect harmony. Now with the introduction of TwinCAT Speech software, automation systems can also learn to hear and speak.

TwinCAT enables automation systems to operate as efficiently and smoothly as the human body: an Industrial PC with a TwinCAT runtime provides the “intellectual capacity”, TwinCAT Motion Control ensures precise, dynamic movements, and I/O interfaces connect to the most varied sensors and bus systems to supply information. More recently, visual abilities have been added with TwinCAT Vision as a fully integrated component. With the new TwinCAT Speech software module, listening and speaking capabilities now complete the analogy with the capabilities of humans.

TwinCAT Speech allows multilingual input and output of queries or information in line with industry standards. This enables interaction with the automation system to be carried out much more efficiently and conveniently. The technology can be applied across numerous industries in a wide range of applications from machine design to building automation. When working on a machine component, for instance, operation and maintenance personnel can simply inquire about the impact of changed settings on the current control or simulation application without having to use a conventional operator interface. In addition, appropriate alarm messages can be given out acoustically when critical system values are reached.

Speech input is available as an offline function and is implemented on the basis of built-in Windows operating system functionalities. In other words, it is accomplished without requiring an Internet or cloud connection. Speech output from TwinCAT Speech is available both as an offline function and an online function. In the case of offline, support is provided by the appropriate Windows functionalities and in online cases via Polly, the text-to-speech service from Amazon. The realistic sounding speech output for these applications is synthesised with the aid of deep learning technologies. Multiple voices can be supported in addition to the caching of audio files generated online.