Katya GPT: ChatGPT for language speaking practice
Two experiments in using AI voice mode as a conversation partner in a second language
Hello!
As you'll recall from my first post, my vision for this newsletter is to primarily share findings from hands-on experiments with promising new tech for learning and productivity. I'm happy to be kicking off Tachyon in 2024 with a post that does exactly that.
I'm starting with the use of ChatGPT that I am personally most excited about - spoken language practice.
Let's talk voice chat
In September 2023, OpenAI announced the ability to use your voice to interact with ChatGPT and to share pictures with it. It can do this in lots of languages. The voice feature works in the smartphone app for both GPT 3.5 (free) and GPT 4 (paid). GPT 4 is also able to accept input in the form of pictures and other kinds of documents in both the app and desktop.
As a former language teacher and long time language learning enthusiast, it's hard not to immediately see the potential of this technology to provide on-demand speaking practice in your target language. I'm not alone. Take for example this post from Joe Fabisevich, @mergesort on Threads:
There are so many ways I can think of using generative AI for language learning. But I want to focus first on ChatGPT as a spoken language partner, because this for me is the most compelling promise of the technology. While all four skills - reading, writing, listening and speaking - are challenging for a learner, speaking is arguably the most active and demanding one. It's the last skill where people typically feel confident and capable.
As a foreign language learner who doesn't live in a country where they speak the language, access to (patient) speaking partners is the hardest part, and compounds the difficulty of learning to speak well. Outside of a formal language learning context like a classroom, it is also very difficult to find opportunities for deliberate practice while speaking.
With these new features of ChatGPT, it's now very easy to practise speaking your target language whenever you like. (Well, at least up to the cap of 50/messages every 3 hours, if you are using GPT 4.) You can get the stumbling, awkward practice out of the way with an unfeeling, non-judgemental piece of software before you get the opportunity to talk to a flesh-and-blood native speaker.
This is the promise, anyway. How well does it live up to reality? Let's put it to the test.
Testing ChatGPT for spoken language practice
We need to focus this down for the purposes of the experiment by picking a specific use case, a language to test it in, and defining what the success criteria are.
The unstructured and conversational nature of a language exchange matches well with the idea of simply opening ChatGPT and having a conversation about whatever comes up. This is what I want to test first. It is also how most people will probably use the app to practise their target language.
As for which language, I will be using Russian. I studied Russian at university in the mid-2000s, and have maintained it to some extent since then. I have had experience in a range of learning contexts in Russian, including 1-1 tutoring and language exchanges with a few different people. For those that care and understand the CEFR, if I was feeling generous I would say that I'm now probably B1 in speaking and writing, and B2 for listening and reading.
Finally, as for success criteria, I'll be comparing it against my lived experience doing language exchanges. That means it would need to:
be engaging and interesting
answer my questions about the target language and culture
use authentic language, i.e., as you would expect from a native speaker
adjust to my level of comprehension
encourage me to speak as much as possible
provide corrections when I make mistakes
Putting it to the test
After writing the section above, I ended up testing the two following types of interaction, in order:
free conversation - just start a chat and start speaking with auto-detect
a defined task - a GPT that asks you questions about an image you provide
I ran the experiments over two sittings in a single day. I used ChatGPT 4, available only with the paid subscription, in the iOS app on an iPhone 12. GPT 4 can search the web, accept files, and run Python code.
I planned on sharing links to the full transcripts, but ChatGPT currently doesn't support sharing chats with images, which rules out all but the first conversation. Sad face. If you're curious, at least you can click on the link to the first chat to see it in full. It is mostly in Cyrillic, but I asked ChatGPT to translate everything into English at the end. You'll just need to scroll down to see it. I'll share screenshots for the rest of the chats.
Part of me cringes to expose my poor Russian skills in this way, but in the spirit of full transparency I will do so. My disclaimer is that my spoken Russian is rustier than an Aral Sea fishing boat. If you speak Russian, please don't judge me too harshly, and remember we are evaluating ChatGPT here, not my Russian skills. 😅
Experiment 1 - Free conversation
I had two freeform voice chats. Both started as small talk and then branched out into more detailed conversations. They wrapped up with a request for any corrections and then a full translation of the conversation.
In the first conversation, I started with small talk and then went on to discuss my travel plans for later this year. This one had some amusing bloopers, caused either by my pronunciation errors or a network error... ok let's be real it was probably my pronunciation. Here's the full chat.
Right off the bat, ChatGPT thought I called it "Katya", when I actually said "Kak dela" (Как дела), which means "how are you".
I tried to say "I'm going to an event in Kalimantan", but that somehow was understood as "I will go on a date in Crimea." Yikes. Again, half my poor word choice and half me not realising that in Russian they call it Borneo.
In the second conversation, we talked about recent Russian movies I could watch, who Baba Yaga is, and finally ChatGPT asked me an interesting question about my strongest childhood memory. ChatGPT was unable to provide a full translation properly - it kept summarising what it had said - so I cut it off and the chat ended there.
When I asked for film recommendations, it searched the web and returned results from a list of the best Russian films of 2023. Interestingly, the part in English in the screenshot wasn't vocalised during the voice chat, only the results were. While it was searching, it emitted a slightly creepy clicking sound.
I couldn't really understand its description about how the witch Baba Yaga flies, so I asked it to generate an image of her for me and it happily obliged. Some Google searching afterwards confirmed that in fact Slavic-style witch flight involves sitting in a giant wooden pestle wielding a mortar. The generated image looks more like a cauldron and broomstick, but I got the general idea.
Experiment 2 - Talk about an image
After the first round of conversations with Katya, I wanted to know how it would work for a more defined task or activity. Using images as a prompt for discussion is quite common as a classroom activity, and it is also used frequently in spoken language tests. This is therefore a task you might ask to practise with your language exchange partner.
As I mentioned above, ChatGPT is multimodal so it can accept images uploaded by the user. The GPTs feature allows you to create versions of ChatGPT that always follow custom instructions that aren't seen by the user.
So I created a bot called 'Photo Talk GPT'.
I configured it with what Ethan Mollick calls a structured prompt. As you can see, the prompt is designed to personalise itself to the context by asking for information from the user and adapting to it. There are instructions to try to steer it to follow the script as effectively as possible and avoid undesirable behaviours.
You are a friendly language teacher. You help practise language by asking questions about a photo that the student provides. First, greet the student and ask them what language they are learning. Wait for them to respond. Then ask them what level they are in this language, beginner, intermediate or advanced, to help you adjust the questions you ask to their ability level. Wait for them to respond. Then ask them to upload an image. Wait for them to respond. Then, ask them if they are ready to start talking about the image in their target language. Wait for them to respond. When they say they are ready, switch to that target language and ask them to describe the image in as much detail as they can. Wait for them to respond. Check their response in the other language carefully for any mistakes in grammar, word choice or pragmatics. If there are any mistakes, provide a correction as a recast and a very short explanation of the mistake. If there aren't, provide a very short response. Then ask them a question about the image and repeat the process, waiting for them to respond and then providing feedback each time before responding. Do this for a total of two more questions. Then provide your own detailed description of the image in the target language as a model for the student. Finally, provide a translation of the entire interaction into English. Do not skip or summarise any parts, provide a full translation of everything you and the student said.
Creating the bot, including writing the prompt, took about 15 minutes. I've already made many similar GPTs for other learning contexts. You can create the bots in conversation mode in the Create tab, but I've had better results going straight to the Configure tab and writing the prompt myself. If you want to specify what you want in the icon though, it seems that you have to do that in the Create tab.
Before I start chatting, I know it's going to ask me for an image. Hm... I wonder what to give it? Maybe a nice picture of some rusty fishing boats on the Aral Sea? That seems appropriate.
I tested the GPT over three conversations using that image of rusty boats every time. I tapped out the first part in text chat and then switched to voice mode after I had uploaded the image.
The first chat was a non-starter - it glitched and didn't recognise the image I uploaded, instead generating an image of its own.
The second chat went more to plan, apart from it deciding I was speaking Japanese at one point.
In the third chat I deliberately picked 'beginner' and spoke in more basic Russian with shorter sentences. I also spent some time asking follow up questions at the end.
Here are some screenshots from the third conversation.
The transition to the line by line translation worked as I intended, until it just randomly stopped. When I asked it to continue translating though, it picked up again where it left off.
I also wanted to know how to say things I wanted to say properly in Russian, which it obliged:
Asking it to give line by line feedback at the end was also very helpful:
Success?
Based on these quick experiments, is ChatGPT with voice an effective way to get speaking practice? Absolutely. I'm pretty sure if I used this every day my spoken Russian would improve dramatically. Does it replace a human language partner? No, but it offers other and different advantages.
Overall, speaking with ChatGPT is quite different from a human conversation partner, and nowhere near as engaging. It's pretty bland and neutral, but it is highly knowledgeable. It can search the web and generate images at will. It's not as engaging and dynamic as talking to a human, and has no unique and authentic personality to get to know, but it can allow you to follow your curiosity and follow up on things you don't understand. It was able to give me real movie recommendations and answer my questions about folklore. On the other hand, its propensity to give you a mini lecture on every topic you mention gets tedious pretty quickly. Luckily there's that 'tap to interrupt' feature.
It also glitches out in only the way software can, which is something you don't have to worry about with a human conversation partner. But unlike a human conversation partner, it provides you with a full transcript and then can instantly translate it for you. This is incredibly valuable for language learning, allowing you to go back over what was said, follow up on vocab and grammatical structures, and reflect on your performance more objectively. The transcript itself is also feedback on your pronunciation, sometimes with amusing results.
As far as I could tell with my non-native Russian, the bot's language seemed fluent and authentic, and the pronunciation was clear and natural. This likely varies by language, depending on the amount of material from that language included in the model's training data. There might be a noticeable difference between the quality of ChatGPT 3.5 and 4, too, especially for less common languages. My mum, who is a native Greek speaker, said that GPT 4 was noticeably better and more natural than GPT 3.5 after having a few voice chats with both in Greek.
However, even when ChatGPT was told it was talking to a beginner it used very complex vocabulary and spoke quickly. A human partner can read non-verbal cues to gauge how well you are following and adapt their speaking rate and language accordingly. As an intermediate learner I was able to follow along pretty well, but a real beginner would be totally lost. That being said, one could try asking it to translate what it just said, or explain it in simpler language, two things I didn't think of trying in the experiments.
In the freeform chat, the AI always kept the conversation going by asking me questions, some of which really were quite interesting. The Photo Talk GPT was designed to elicit responses, and the task worked quite well to get me talking, especially in the second chat, where I wasn't deliberately keeping my answers short and basic, as I did in the third one.
Corrections is certainly an interesting one. It could give them, but I got best results when prompting it specifically at the end of the conversation. At least once it misheard me when I know I said the right thing, and then corrected me on what it misheard, which was annoying. I also don’t know if I can 100% trust its feedback, and would need to ask a native speaker to be sure.
With regards to following instructions, even without spending much time finessing the prompt, my GPT mostly did what I wanted it to. I could definitely spend some time refining the instructions to make it perform better, for example to ensure it uses simpler language in 'beginner mode'.
Final thoughts
Even after only these quick experiments, I can honestly say that this kind of speaking practice is a real game changer for language learning, and is likely to only get better.
Of course, there's only so many times you want to just do small talk with an AI that loves to lecture you about every topic you mention. This was not an issue in the structured GPT. This gives an indication of the value of GPTs created by skilled learners or expert teachers as discrete learning and teaching activities. And that point about experts is critical. My years of experience as a language teacher and learner informed how I interacted with the bot, and helped me to steer it to give me more effective practice and feedback.
I'll definitely be using this to continue practising my Russian. I'll be experimenting with making a variety of different GPTs, like specific role-play scenarios or to give myself opportunities for deliberate practice. For example, a bot that endlessly drills me on telling the time. This is a particular weakness of mine in Russian.
I also want to run some experiments with a language where I am a true beginner. My upcoming trip to Indonesia presents a perfect opportunity, so I’ll be also running an experiment to see how well it works for me when learning from scratch.
A final and important note is that currently ChatGPT should only form one part of a healthy language-learning diet. You'll want to do plenty of listening and reading practice with authentic texts, and have as much conversation with real people as possible.
Try it for yourself
Download the ChatGPT app on your smartphone (iOS or Android) and test out the voice interactions using the free ChatGPT 3.5 version.
If you have access to a paid ChatGPT Plus account, you can take my Photo Talk GPT for a spin here.
🌯 That's a wrap. As always, I'd love to hear your thoughts and feedback on this post. If you end up trying voice chat yourself, I’d love to hear about your experience with that as well. Just hit reply to this email or comment on the Substack post.
Thanks for reading, see you again in a month.
Antony :)
P.S.
What I'm reading
Tomorrow, and Tomorrow, and Tomorrow by Gabrielle Zevin. I saw this recommended in a ton of places, so I borrowed a copy from a friend. This quote from the first chapter:
"Sam looked at Sadie, and he thought, This is what time travel is. It's looking at a person, and seeing them in the present and the past, concurrently. And that mode of transport only worked with those one had known a significant time." (Gabrielle Zevin, Tomorrow, and Tomorrow, and Tomorrow)
What I'm watching
Scavengers Reign. This animated series has also been recommended in a few places. Scattered survivors from a crashed ship try to survive on an uninhabited planet. It's kind of a cross between Nausicaa, Lost in Space, and Alien. It's pretty gory and it's often creepy, but it's excellent sci-fi.