Alexa and Friends

Speech Recognition in Continuing Education

Prof. Dr. Michael MarmannKarlsruhe (GER)/Düsseldorf (GER), December 2019 - "Alexa, I’d like you to help me when I learn! Is a combination of voice user interfaces (VUI) and eLearning possible?" Prof. Dr. Michael Marmann of Düsseldorf’s University of Applied Sciences will address the issue of speech recognition in continuing education on 30 January at 15.30. Professor Marmann is involved intensively with the use and testing of emerging tools.

To a lot of people "Alexa, I’d like you to help me when I learn" still sounds like something in the future. How realistic is this expression nowadays?

Prof. Michael Marmann: If you say "Alexa, I’d like you to help me when I learn" to an Alexa device, you promptly receive the answer, "A ewe is a mature female sheep". Now you might comment, "Hmmmm! I actually learned something. It was unexpected and not at all what I wanted to learn, but at least I received an answer." Even this little example illustrates where problems lie at the moment. The input Alexa "understands" is limited, and the answers are correspondingly restricted.

This means that if you want to use Alexa specifically for learning, your only choice for the time being is to scour the entire Alexa Skills Kit and separate the wheat from the chaff. With a whole lot of luck, you’ll come across something suitable.

"Skills" is the official designation for Alexa apps that can be developed by content providers. So, for example, if an eLearning provider wants to design content for Alexa, it is usually done by developing a "skill". The range of these skills, though, is still very limited. For instance, if you take a look at the listings under "education and reference" in the Alexa Skills store, you will find a relatively large number of trainers for vocabulary and mental arithmetic, sets of flash cards for learning, and reference works, i.e. skills that work with quite simple user input and thus involve relatively little development effort.

If you really want to learn something, though, you’ll usually find yourself disillusioned pretty quickly. In this process, language is an extremely powerful and easy to use instrument. For example, it enables you to navigate a skill and to enter into a dialogue with it. Furthermore, there are new types of devices with integrated touch screens that make multimodal interactions and outputs possible, so you might well ask why the range of learning skills available is still as narrow as it is.

There is certainly a variety of reasons for this. For example, development processes for more complex skills require a great deal of effort and are absolutely not comparable to processes based on modern eLearning development tools. The interactions that are even possible also have to be assessed. Since the operating concepts can be very different from those for classical eLearning applications, authors also have to rethink their approaches: storyboards for Alexa and similar tools look different, so it remains to be seen what the future will bring.


What are the technical requirements for eLearning based on voice user interfaces?

Prof. Michael Marmann: As I’ve indicated, the technical demands for the development of voice navigated eLearning skills are already widely known, but given the current state of technology, a lot of effort is necessary. It can be done “by hand” using programming languages such as JavaScript and Python, etc., or with authoring tools that, for example, can display dialogues and have flow based inputs like Voiceflow or Cognigy, or Dexter for chatbots.

Nonetheless, we’ve discovered that some of these tools are still in beta or at least "feel" like they are, so that when problems occur, it’s not clear what the source of the error is and where to look in order to fix it. This quickly becomes a real chore, but it's clear that things are constantly improving.


In your opinion, will deploying VUIs facilitate learning or rather render it banal?

Prof. Michael Marmann: That’s a good question! I’d say it has something to do with the didactic design. Depending on the handling of the dialogues and the conception of the vocal navigation commands, a great user experience can, or actually is likely to be, generated with simple linguistic input. Nothing is more exhausting than long Alexa monologues where, by the time you reach the end, you’ve already forgotten what the choices at the beginning of the monologue were. This might not be uninteresting in memory training, but in the learning process itself, it offers no advantages.

Actually, an eLearning author can make a lot of mistakes here. If the concept is intelligent, though, I maintain the use of VUIs can result in a very convenient way of interacting with a technical system. Yes, I clearly see the learning process being facilitated, or at least the interaction.

From a practical point of view, too, VUIs offer numerous advantages. Once, when I asked my students how they would like to have their learning digitally supported, several of them commented, "While doing something else - like ironing or cooking - at the same time!" Learning en passant using VUIs? Why not? It’s certainly not appropriate for all learning content, but neither is mobile learning.


Do you see this learning approach being especially suitable for specific target groups or particular types of learning materials?

Prof. Michael Marmann: We are currently doing a detailed investigation of what is technically, creatively, and conceptually possible, and we’ve set our focus on multimodal devices like Amazon’s touchscreen-equipped Echo Show 2. We believe that we’ll be able to present a couple of exciting practical examples at the upcoming LEARNTEC. By then, I’ll probably be able to answer this question better. The tendency seems to be, for example, the integration of almost any types of media, tools to monitor learning achievement, storage of users’ interactions and learning progress, individualization, etc. That is, functions that have long been standard in the context of eLearning still have development potential in regard to Alexa and similar tools.


Do you see a specific point in time when VUIs will be as ubiquitous in eLearning as mobile learning is today?

Prof. Michael Marmann: I’m firmly convinced that touchscreen equipped VUIs, that is, ones featuring multimodal interaction and output, will have a firm place in knowledge and learning environments, but I’ll pass on predicting when and to what extent. During the bus or train trip to the university, mobile learning will probably remain the option of choice.