Several audio input languages are available (default: Luxembourgish). If the recording contains more than one speaker, setting diarization to ‚On‘ will separate the text of every speaker in the recording along with time codes for their turns. Note that diarization adds some extra time to the recognition process. Four output formats are available: plain text (txt), SubRip Subtitles (srt), JSON (with or without time codes for words) and Praat TextGrid. These files can be downloaded through the link below the transcription. The recognition duration takes up to 5% of the audio file’s duration. Once the recognition process has started, an estimated time and a timer will be displayed to keep track of the progress.
As an experimental feature for the text translation to other languages has been added, which will output the recognized text in English, German, Portuguese, Spanish or French. Note that translations take more time to run and will run only for short audios. The quality of these translations may vary.
The maximal size for upload is 500 MB. The preferred file format for audio files is ‚wav‘ with a sampling frequency of 16,000 Hz.
We are opening API access now. The LuxASR API can be reached via:
curl -X POST "https://luxasr.uni.lu/v2/asr?diarization=Enabled&outfmt=text" \
-H "accept: application/json" \
-F "audio_file=@PATH/TO/AUDIO FILE;type=audio/wav"
The API returns the transcription in the specified output format.
Enabled
(default) or Disabled
to include or exclude speaker diarization.text
– plain text transcript (default)json
– detailed JSON outputsrt
– SubRip subtitle formattextgrid
– Praat TextGrid formatAccepted audio formats are .wav, .mp3, and .m4a.
Below is a basic Python script that replicates the functionality of the curl
command
with added flexibility. You can specify the audio file and optionally choose whether to enable diarization and which output format to use.
import requests
import argparse
import os
import sys
def main():
parser = argparse.ArgumentParser(
description="Send an audio file to the LuxASR API for transcription."
)
parser.add_argument(
"audio_file",
type=str,
help="Path to the audio file (.wav, .mp3, .m4a)"
)
parser.add_argument(
"--diarization",
choices=["Enabled", "Disabled"],
default="Enabled",
help="Enable or disable speaker diarization (default: Enabled)"
)
parser.add_argument(
"--outfmt",
choices=["text", "json", "srt", "textgrid"],
default="text",
help="Output format: text, json, srt, or textgrid (default: text)"
)
args = parser.parse_args()
if not os.path.isfile(args.audio_file):
print(f"Error: File '{args.audio_file}' not found.")
sys.exit(1)
url = f"https://luxasr.uni.lu/v2/asr?diarization={args.diarization}&outfmt={args.outfmt}"
headers = {
"accept": "application/json"
}
# Determine MIME type
ext = args.audio_file.lower()
if ext.endswith(".wav"):
mime_type = "audio/wav"
elif ext.endswith(".mp3"):
mime_type = "audio/mpeg"
elif ext.endswith(".m4a"):
mime_type = "audio/mp4"
else:
mime_type = "application/octet-stream"
with open(args.audio_file, "rb") as audio:
files = {
"audio_file": (os.path.basename(args.audio_file), audio, mime_type)
}
response = requests.post(url, headers=headers, files=files)
print(response.text)
if __name__ == "__main__":
main()
python luxasr_transcribe.py path/to/your_audio.wav --diarization Enabled --outfmt json
Replace path/to/your_audio.wav
with your actual audio file. The --diarization
and --outfmt
options are optional and default to Enabled
and text
respectively.
Note that the transcription and the translation are run on a dedicated server at the University of Luxembourg. All data thus stays within Luxembourg and the University’s network. Nobody has access to the uploaded audio or the text output. The audio data is streamed to this server and no files are stored on this server or in the network. No data is used to further train the model and no data is transferred to third parties.
Learn more about LuxASR. LuxASR is under constant development by Peter Gilles, Léopold Hillah, and Nina Hosseini-Kivanani at the University of Luxembourg and is supported by the Chambre des Députes du Grand-Duché de Luxembourg.