LuxASR

Automatic Speech Recognition for Luxembourgish

Use LuxASR to transcribe Luxembourgish speech to text. Either upload an audio/video file or use the microphone to record your speech, then hit Transcribe! LuxASR is fast: It can transcribe up to 170 words per second. Try the examples. LuxASR is also available as a smartphone app on App Store and Google Play.

Drag & Drop your audio/video file here
or click to browse (max 500 MB; supported: wav, mp3, mp2, m4a, mp4, mov, qta, ogg, webm, wmv or wmav2)

Audio Language

Diarization

Speakers

Output Format

SRT line length

Translation

Prompt

Queue: –

ETA:

With this interface, we are giving access to our most performant tool for automatic speech recognition of Luxembourgish (speech-to-text). It has been trained on 150+ hours of carefully controlled pairs of audio and transcription snippets and is achieving a word error rate below 10%, i.e. 10 errors per 100 running words (punctuation and capitalization included). We are providing this tool to facilitate the transcription of Luxembourgish audio recordings into written text for research purposes, but also for general public use.

Available options

The input form offers audio language selection (Luxembourgish or German), a diarization switch, an optional number of speakers field to guide diarization, and an optional prompt to improve spelling of names, brands, and technical terms (keep under 900 characters). For output, the web interface supports Text (enriched text, downloadable as TXT/DOCX), Word DOCX, Subtitles (SRT), Subtitles (VTT), MAXQDA transcript, Praat TextGrid, Praat TextGrid aligned (based on WhisperX hybrid alignment and MFA forced alignment for Luxembourgish), and JSON. When srt or vtt is selected, you can also set subtitle line length. These files can be downloaded through the link below the transcription. The recognition duration takes up to 5% of the audio file’s duration (e.g., 3 minutes for 60 minutes of audio). Once the recognition process has started, an estimated time and a timer will be displayed to keep track of the progress.

As an experimental feature, translation can output the recognized text in English, French, German, Spanish, Portuguese, or Luxembourgish. Note that translations take more time to run. The quality of these translations may vary. You can try also our stand-alone translation LuxMT.

The maximal size for upload is 500 MB. Audio files should be in WAV, MP3 or M4A format, video files in MP4 format.

Further information

Timecodes for SRT and WebVTT have been improved and should be largely correct. Also, it is now possible to select the length of the subtitle segments (default 42 characters). Line breaks follow syntactic boundaries for better readability. In case of a translation, the original transcription and the translation can be downloaded in separate files.

Long audios (up to a file size of 500 MB) can be processed by LuxASR, although it may take considerably longer to process them. In case of problems it is advise to split the audio into shorter segments. Some users have reported problems here and we have implemented a new queue system where you can monitor the status of your transcription job. Please ensure that your browser connection is not interrupted.

Sometimes short turns by speakers or many overlapping speaker turns are not segmented correctly. This is a limitation of the diarization method we are using; we are trying to improve this. If you already know the speaker count, set Number of speakers in the input form to help stabilize segmentation.

LuxASR can transcribe different languages in the same audio to a limited extent. The best results are achieved for Luxembourgish, as this is our main focus. Due to its closeness to Luxembourgish, German sections in an audio may sometimes contain errors or a mixture of Luxembourgish and German. As a workaround, to transcribe German-only audios, set the input language to German.

Translation is still experimental. Long transcripts. For better translation quality you can switch to a better, but much slower model (10 times slower). For translation-only on plain text, you can also use LuxMT.

A copy button has been added to conveniently copy the entire transcript to another application. On mobile devices you can also use the share button.

Use output format maxqda to generate transcript text for MAXQDA import. Each speaker paragraph ends with a timestamp in MAXQDA-compatible form [hh:mm:ss.x].

Issues with the misplacement of the apostrophe in Luxembourgish ‘d’’ (e.g. in ‘d’Land’) have been fixed.

We provide some of our fine-tuned ASR models with an Open Weights license. The smaller models (tiny, small, base, medium) are thus available on Hugging Face to be integrated into your apps. Our most performant model (large-v3-turbo), which is used on this website, in our API and Apps, is not yet freely available. Contact us if you are interested in acquiring a license.

If you encounter errors while using LuxASR, please report them to peter.gilles@uni.lu with the exact time when it occurred and the approximate file size of the audio.

We are opening API access now for limited access. We reserve the right to modify or suspend access to the API at any time. If you plan to integrate our service into another application, contact us first for permission and conditions. The LuxASR transcription API base URL is https://luxasr.uni.lu.

Deprecation notice: The old API flow is deprecated and no longer supported. Clients must use the queued /asr2 job flow and send the file as raw bytes in the request body (not multipart/form-data), with Content-Type: audio/* or video/*.

Step A: Submit transcription job
POST /asr2?...params... with the raw file bytes in the request body (the same bytes you would save to disk — not a multipart form upload).
Set Content-Type to a matching audio/* or video/* MIME type (e.g. audio/wav, audio/mpeg, audio/mp4, video/mp4) and optionally X-Filename with the original file name. The server checks that the payload is decodable media with an audio track (same formats as the web upload: wav, mp3, mp2, m4a, mp4, mov, qta, ogg, webm, wmv, wmav2, and other common audio/video containers).
Response: {"job_id":"<id>","status":"queued"}
HTTP: 202

Step B: Poll status
GET /v3/asr/jobs/<job_id> until status is completed or failed.
Status values:

queued
processing
completed
failed

Step C: Fetch transcription result
GET /v3/asr/jobs/<job_id>/result (only after status is completed). The response body format matches the outfmt query parameter.

Parameter usage

Send parameters as query string (recommended for raw body), e.g.:

language=lb
diarization=Enabled|Disabled (defaults to Enabled)
outfmt=colored_text|text|srt|vtt|maxqda|json|textgrid|textgrid_aligned
beam_size=5
min_silence_duration_ms=2000
maxlen=42 (for SRT and VTT)
prompt=... (optional; see below)

Whisper prompt (prompt)
Optional short hint to improve spelling of uncommon names, brands, and technical terms across the entire recording (not only the opening segment). Keep it under 900 characters; text beyond that limit is ignored. Put the most important words first, use exact spelling and capitalization, and separate items with commas or periods. Example:

Iechternach, Weiswampach, Préizerdaul.

Only include words or phrases that may actually appear in the audio — avoid filler text you would not say aloud, as it can show up in the transcript. URL-encode the value in query strings (spaces as %20, commas as %2C).

Curl examples

A) /asr2 with WAV raw body

curl -X POST "https://luxasr.uni.lu/asr2?language=lb&diarization=Enabled&outfmt=json" \
-H "Content-Type: audio/wav" \
-H "X-Filename: sample.wav" \
--data-binary "@/path/to/sample.wav"

B) /asr2 with MP3 raw body and prompt

curl -X POST "https://luxasr.uni.lu/asr2?language=lb&diarization=Disabled&outfmt=text&prompt=Iechternach%2C%20Weiswampach%2C%20Pr%C3%A9izerdaul." \
-H "Content-Type: audio/mpeg" \
-H "X-Filename: sample.mp3" \
--data-binary "@/path/to/sample.mp3"

C) Poll + fetch

curl -s "https://luxasr.uni.lu/v3/asr/jobs/<job_id>"
curl -s "https://luxasr.uni.lu/v3/asr/jobs/<job_id>/result"

Data protection

Note that the transcription and the translation are run on a dedicated server at the University of Luxembourg. All data thus stays within Luxembourg and the University’s network. Nobody has access to the uploaded audio or the text output. The audio data is streamed to this server and no files are stored on this server or in the network. No data is used to further train the model and no data is transferred to third parties.

Contact

Learn more about LuxASR. LuxASR is under constant development by Peter Gilles, Léopold Hillah, and Nina Hosseini-Kivanani at the University of Luxembourg and is supported by the Chambre des Députes du Grand-Duché de Luxembourg.

LuxASR

Upload Audio or Video File

Record from Microphone

Available options

Further information

Subtitles (SRT / VTT)

Long audio files

Diarization

Different languages in one audio

Translation

Export and copy

MAXQDA export

Output formatting

Availability of the ASR models

Error reporting

API access

Parameter usage

Curl examples

Data protection

Contact