The SpeechToText singleton allows to convert recorded spoken audio into text. More...
Import Statement: | import Felgo 4.0 |
Since: | Felgo 4.1.0 |
This object is available as a singleton type from all QML components when Felgo is imported via import Felgo 3.0
. It is accessible with the name SpeechToText
.
The SpeechToText
singleton to convert recorded spoken audio into text, using the platform-native speech-to-text frameworks. It currently supports Android and iOS, both online transcription and on-device
recognition (offline).
To start recording and transcription on supported devices on Android and iOS, call startSpeechToText().
// Start speech to text recording in US English locale with online transcription SpeechToText.startSpeechToText("en-US")
After the recording starts, recognitionActive is true
.
On Android, the recording automatically stops if there is no more spoken input, you get the results via the speechToTextResult signal.
To manually stop the recording, call stopSpeechToText().
If any errors occur, you get notified with speechToTextError.
Before starting the transcription, you should check if speech to text is supported on the current device with the properties recognitionAvailable or onDeviceRecognitionAvailable.
The following example shows speech to text in a textbox:
import QtQuick import Felgo App { id: app property bool onDevice: false property string localeId: "en-US" Connections { target: SpeechToText onSpeechToTextResult: (result, isFinal) => { appTextEdit.text = result } onSpeechToTextError: error => { console.warn("Error:", error) } } NavigationStack { AppPage { title: qsTr("Speech To Text") Column { spacing: dp(20) width: parent.width - dp(40) anchors.centerIn: parent AppText { text: qsTr("Available: %1").arg(SpeechToText.recognitionAvailable) } AppText { text: qsTr("OnDevice available: %1").arg(SpeechToText.onDeviceRecognitionAvailable) } AppText { text: qsTr("Active: %1").arg(SpeechToText.recognitionActive) } AppText { text: qsTr("Locale: %1").arg(app.localeId) } Row { spacing: dp(20) AppText { id: onDeviceSwitch text: qsTr("On Device") } AppSwitch { anchors.verticalCenter: onDeviceSwitch.verticalCenter onCheckedChanged: { app.onDevice = !app.onDevice } } } Flow { spacing: dp(20) width: parent.width AppButton { text: qsTr("Start") onClicked: { appTextEdit.reset() SpeechToText.startSpeechToText(app.localeId, app.onDevice) } } AppButton { text: qsTr("Stop") onClicked: { SpeechToText.stopSpeechToText() } } AppButton { text: qsTr("Cancel") onClicked: { appTextEdit.reset() SpeechToText.cancelSpeechToText() } } } AppTextEdit { id: appTextEdit width: parent.width enabled: false function reset() { appTextEdit.text = "" } } } } } }
Some platforms require additional integration steps to use speech to text functionality.
iOS Integration
To use SpeechToText on iOS devices, make sure to add the Speech framework to your project. Add the following lines to your project file:
if(CMAKE_SYSTEM_NAME STREQUAL "iOS") target_link_libraries(yourAppTarget PRIVATE "-framework Speech") endif()
You also need to add the 'NSMicrophoneUsageDescription' and 'NSSpeechRecognitionUsageDescription' keys to the Project-Info.plist
file:
<key>NSMicrophoneUsageDescription</key> <string>Use the microphone for Speech To Text</string> <key>NSSpeechRecognitionUsageDescription</key> <string>Use Speech To Text</string>
Open your AndroidManifest.xml
file and make sure that the following permission is set:
<uses-permission android:name="android.permission.RECORD_AUDIO"/>
Also add the following queries
tag to the manifest
block:
<manifest ... > ... <!-- Speech To Text --> <queries> <intent> <action android:name="android.speech.RecognitionService" /> </intent> </queries> ... </manifest>
onDeviceRecognitionAvailable : bool |
Readonly property to check if speech-to-text on-device (offline) transcription is available on the current device.
Check this property before starting a new on-device transcription with:
// If SpeechToText.onDeviceRecognitionAvailable is true: SpeechToText.startSpeechToText("en-US", true)
See also recognitionActive, startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.
recognitionActive : bool |
Readonly property to check if a speech-to-text transcription is currently active.
You can start transcription with startSpeechToText(). Use stopSpeechToText() to stop or cancelSpeechToText() to cancel an active transcription.
See also startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.
recognitionAvailable : bool |
Readonly property to check if speech-to-text online transcription is available on the current device.
Check this property before starting a new online transcription with:
// If SpeechToText.recognitionAvailable is true: SpeechToText.startSpeechToText("en-US", false)
See also recognitionActive, startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.
speechToTextError(string error) |
Emitted after a speech-to-text transcription error resulted from a start or stop call.
The parameter error contains an error string, which you can use to react to the error in your UI.
Possible error strings are:
init_error
, not_supported
, audio_error
, client_error
, server_error
, permission_error
, network_error
, network_timeout
,
language_not_supported
, language_unavailable
, server_disconnected
, too_many_requests
, unknown_error
, speech_timeout
, no_match
,
busy
Note: The corresponding handler is onSpeechToTextError
.
See also recognitionActive, startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), and speechToTextResult.
Emitted after a speech-to-text transcription was stopped and new text was transcribed.
If the recogntion was started with partialResults
, the signal is additionally emmited multiple times during an active speech-to-text transcription when a new text was transcribed with isFinal parameter
set to false
.
The parameter result contains the transcribed string.
The parameter isFinal is false
if partial results are emitted, otherwise false
for a finished transcription.
Note: The corresponding handler is onSpeechToTextResult
.
See also recognitionActive, startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), and speechToTextError.
void cancelSpeechToText() |
Cancels speech-to-text transcription. After canceling, there is no result signal emitted but the recognitionActive is set to false
.
See also recognitionActive, startSpeechToText(), and stopSpeechToText().
Starts speech-to-text transcription on supported devices on Android and iOS.
The localeId specifies the target language for transcription.
The onDevice parameter can be used to start either an online transcription or an on-device transcription (offline). If you omit the onDevice parameter, online transcription is used.
To start transcription for US English language, call:
SpeechToText.startSpeechToText("en-US")
To start on-device transcription for US English language (not supported on all devices), call:
SpeechToText.startSpeechToText("en-US", true)
The transcription is emitted via the speechToTextResult signal after no more spoken input was detected on Android or stopSpeechToText() was called.
If you set the partialResults parameter to true
, the partially recognized result is continously emitted via the speechToTextResult
signal.
If there happens an error, speechToTextError is emitted.
Note: Make sure to check the availability via recognitionAvailable or onDeviceRecognitionAvailable properties before starting the transcription.
After the transcriptions starts, recognitionActive is true
.
See also recognitionActive, stopSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.
void stopSpeechToText() |
Manually stops speech-to-text transcription. The result is emitted via the speechToTextResult signal.
Usually you don't have to call this method, as the transcription automatically stops once there is no spoken input detected anymore.
After stopping speech-to-text transcription, recognitionActive is set to false
.
See also recognitionActive, startSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.