Learn what Felgo offers to help your business succeed. Start your free evaluation today! Felgo for Your Business

SpeechToText

The SpeechToText singleton allows to convert recorded spoken audio into text. More...

Import Statement: import Felgo 4.0
Since: Felgo 4.1.0

Properties

Signals

Methods

Detailed Description

This object is available as a singleton type from all QML components when Felgo is imported via import Felgo 3.0. It is accessible with the name SpeechToText.

The SpeechToText singleton to convert recorded spoken audio into text, using the platform-native speech-to-text frameworks. It currently supports Android and iOS, both online transcription and on-device recognition (offline).

To start recording and transcription on supported devices on Android and iOS, call startSpeechToText().

 // Start speech to text recording in US English locale with online transcription
 SpeechToText.startSpeechToText("en-US")

After the recording starts, recognitionActive is true.

On Android, the recording automatically stops if there is no more spoken input, you get the results via the speechToTextResult signal.

To manually stop the recording, call stopSpeechToText().

If any errors occur, you get notified with speechToTextError.

Before starting the transcription, you should check if speech to text is supported on the current device with the properties recognitionAvailable or onDeviceRecognitionAvailable.

Example Usage

The following example shows speech to text in a textbox:

 import QtQuick
 import Felgo

 App {
   id: app

   property bool onDevice: false
   property string localeId: "en-US"

   Connections {
     target: SpeechToText

     onSpeechToTextResult: (result, isFinal) => {
       appTextEdit.text = result
     }

     onSpeechToTextError: error => {
       console.warn("Error:", error)
     }
   }

   NavigationStack {
     AppPage {
       title: qsTr("Speech To Text")

       Column {
         spacing: dp(20)

         width: parent.width - dp(40)
         anchors.centerIn: parent

         AppText {
           text: qsTr("Available: %1").arg(SpeechToText.recognitionAvailable)
         }

         AppText {
           text: qsTr("OnDevice available: %1").arg(SpeechToText.onDeviceRecognitionAvailable)
         }

         AppText {
           text: qsTr("Active: %1").arg(SpeechToText.recognitionActive)
         }

         AppText {
           text: qsTr("Locale: %1").arg(app.localeId)
         }

         Row {
           spacing: dp(20)

           AppText {
             id: onDeviceSwitch
             text: qsTr("On Device")
           }

           AppSwitch {
             anchors.verticalCenter: onDeviceSwitch.verticalCenter
             onCheckedChanged: {
               app.onDevice = !app.onDevice
             }
           }
         }

         Flow {
           spacing: dp(20)
           width: parent.width

           AppButton {
             text: qsTr("Start")

             onClicked: {
               appTextEdit.reset()
               SpeechToText.startSpeechToText(app.localeId, app.onDevice)
             }
           }

           AppButton {
             text: qsTr("Stop")

             onClicked: {
               SpeechToText.stopSpeechToText()
             }
           }

           AppButton {
             text: qsTr("Cancel")

             onClicked: {
               appTextEdit.reset()
               SpeechToText.cancelSpeechToText()
             }
           }
         }

         AppTextEdit {
           id: appTextEdit

           width: parent.width
           enabled: false

           function reset() {
             appTextEdit.text = ""
           }
         }
       }
     }
   }
 }

Integration steps

Some platforms require additional integration steps to use speech to text functionality.

iOS Integration

iOS Integration

To use SpeechToText on iOS devices, make sure to add the Speech framework to your project. Add the following lines to your project file:

 if(CMAKE_SYSTEM_NAME STREQUAL "iOS")
   target_link_libraries(yourAppTarget PRIVATE "-framework Speech")
 endif()

You also need to add the 'NSMicrophoneUsageDescription' and 'NSSpeechRecognitionUsageDescription' keys to the Project-Info.plist file:

 <key>NSMicrophoneUsageDescription</key>
 <string>Use the microphone for Speech To Text</string>
 <key>NSSpeechRecognitionUsageDescription</key>
 <string>Use Speech To Text</string>

Android Integration

Open your AndroidManifest.xml file and make sure that the following permission is set:

 <uses-permission android:name="android.permission.RECORD_AUDIO"/>

Also add the following queries tag to the manifest block:

 <manifest ... >

   ...

   <!-- Speech To Text -->
   <queries>
       <intent>
           <action android:name="android.speech.RecognitionService" />
       </intent>
   </queries>

   ...

 </manifest>

Property Documentation

onDeviceRecognitionAvailable : bool

Readonly property to check if speech-to-text on-device (offline) transcription is available on the current device.

Check this property before starting a new on-device transcription with:

 // If SpeechToText.onDeviceRecognitionAvailable is true:
 SpeechToText.startSpeechToText("en-US", true)

See also recognitionActive, startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.


recognitionActive : bool

Readonly property to check if a speech-to-text transcription is currently active.

You can start transcription with startSpeechToText(). Use stopSpeechToText() to stop or cancelSpeechToText() to cancel an active transcription.

See also startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.


recognitionAvailable : bool

Readonly property to check if speech-to-text online transcription is available on the current device.

Check this property before starting a new online transcription with:

 // If SpeechToText.recognitionAvailable is true:
 SpeechToText.startSpeechToText("en-US", false)

See also recognitionActive, startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.


Signal Documentation

speechToTextError(string error)

Emitted after a speech-to-text transcription error resulted from a start or stop call.

The parameter error contains an error string, which you can use to react to the error in your UI.

Possible error strings are:

init_error, not_supported, audio_error, client_error, server_error, permission_error, network_error, network_timeout, language_not_supported, language_unavailable, server_disconnected, too_many_requests, unknown_error, speech_timeout, no_match, busy

Note: The corresponding handler is onSpeechToTextError.

See also recognitionActive, startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), and speechToTextResult.


speechToTextResult(string result, bool isFinal)

Emitted after a speech-to-text transcription was stopped and new text was transcribed.

If the recogntion was started with partialResults, the signal is additionally emmited multiple times during an active speech-to-text transcription when a new text was transcribed with isFinal parameter set to false.

The parameter result contains the transcribed string.

The parameter isFinal is false if partial results are emitted, otherwise false for a finished transcription.

Note: The corresponding handler is onSpeechToTextResult.

See also recognitionActive, startSpeechToText(), stopSpeechToText(), cancelSpeechToText(), and speechToTextError.


Method Documentation

void cancelSpeechToText()

Cancels speech-to-text transcription. After canceling, there is no result signal emitted but the recognitionActive is set to false.

See also recognitionActive, startSpeechToText(), and stopSpeechToText().


void startSpeechToText(string localeId, bool onDevice, bool partialResults)

Starts speech-to-text transcription on supported devices on Android and iOS.

The localeId specifies the target language for transcription.

The onDevice parameter can be used to start either an online transcription or an on-device transcription (offline). If you omit the onDevice parameter, online transcription is used.

To start transcription for US English language, call:

 SpeechToText.startSpeechToText("en-US")

To start on-device transcription for US English language (not supported on all devices), call:

 SpeechToText.startSpeechToText("en-US", true)

The transcription is emitted via the speechToTextResult signal after no more spoken input was detected on Android or stopSpeechToText() was called.

If you set the partialResults parameter to true, the partially recognized result is continously emitted via the speechToTextResult signal.

If there happens an error, speechToTextError is emitted.

Note: Make sure to check the availability via recognitionAvailable or onDeviceRecognitionAvailable properties before starting the transcription.

After the transcriptions starts, recognitionActive is true.

See also recognitionActive, stopSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.


void stopSpeechToText()

Manually stops speech-to-text transcription. The result is emitted via the speechToTextResult signal.

Usually you don't have to call this method, as the transcription automatically stops once there is no spoken input detected anymore.

After stopping speech-to-text transcription, recognitionActive is set to false.

See also recognitionActive, startSpeechToText(), cancelSpeechToText(), speechToTextResult, and speechToTextError.


Qt_Technology_Partner_RGB_475 Qt_Service_Partner_RGB_475_padded