- Add input option "language" that can be passed at each request
- Add result of language detection (or given language) in the output, for each segment
- Add speaker identification ("speakerIdentification" option in "diarizationConfig")
- Do not fail when asking to convert numbers with env. variable LANGUAGE=*
- Improve heuristics to merge transcription and diarization results (for words in between two speaker turns)
- Add heuristics to avoid too long speech segment sent to STT (limit risk of memory overflow)
- Fix failure with token "- Et"
- Avoid a 1H timeout that was causing celery task to re-run (and fail)
- VAD: Improve heuristics about audio segment durations to better adapt to Whisper setting (minDuration=30)
- Preserve exponents ("²") in word normalization
- In full transcription: proper normalization of spaces before/after traditional punctuation marks (for French and English at least)
- In word normalization: improve distinction of characters (word / punctuation / symbol that can be pronounced / garbage symbol)
- Fix typo fr_FR -> fr-FR
- Fix inconsistency in transcription confidence score (now always computed from word confidence scores, not segment confidence scores)
- Remove punctuations in words (to avoid spaces as in "allez-vous ?")
- Fix possible worker conflict when multiple workers are running on the same file (the audio file could be deleted by the worker that first finishes)
- Fix speaker segment splitted in two when diarization detects another speaker with no word assigned.
- Fix a bug in the formatting progression "status" ("StepState.PENDING" -> "pending"), which was introduced because python version was not fixed in Dockerfile (python 3.11 changes behaviour when converting enum to string)
- Add options for VAD (minimum duration of segments, ...)
- Fix corner case of empty transcriptions
- Fix corner cases to assign words to speaker turns (overlapping diarization segments, words in between two segments)
- Added multifiles route and processing.
- Changed straddling word diarization resolve
- Added diarization results within transcription result.
- Added recover to redis search index drop.
- Added Bearer Authentication to swagger.
- Updated README.
- Added timestamp interpolation for non-consecutive diarization segments.
- Added Makefile for styling
- Refactored code to PEP8 (black)
- Reorganized repository folder structure.
- Added service discovery for subtasks
- Added service resolve and service resolve policy
- Added task logs and log query route
- Added possibility to upload a timestamps file.
- Fixed convertnumber converting spk id 1
- Fixed usersub not applied to subtitles
- Fixed text cleaning and substitutions not applied to chunks of subtitle.
- Added raw_return and convert_number to VTT and SRT format
- Removed accept header check on /job/ route
- Cleanup
- Added: Text normalisation.
- Added: Text to Number.
- Added: Result presentation options as query string.
- Added: MongoDB error handler.
- Changed: Steps progression.
- Updated: README
- Updated: API specs.
- Updated: transcription_request test script.
- Added: A new route has been added /results/{result_id} allows to fetch transcription result and to specify the result format.
- Changed: MongoDB server availibility timeout check greatly reduced to prevent hanging when mongo is unavailable.
- Changed: The /job/{job_id} route now returns a ressource_id to be fetch on the /results/{result_id} when the task is completed.
- Changed: Diarization is ignored when number of speaker is 1
- Changed: GUNICORN_WORKER replaced with CONCURRENCY.
- Fixed: Transcription worker concurrency is now set using CONCURRENCY env variable.
- Updated: README.
- Updated: Swagger's document.
- Removed: no_cache request option has been removed.
- Added: Subtitling return format for VTT and SRT
- Added: Accept headers for subtitle formats
- Added: jobid in result database
- Changed: segment in TranscriptionResult will be equals to raw_segment in absence of postprocessing
- Added: fetch result in db using jobid
- Moved: transcription related file to workers/utils
- Updated: README
- Removed: no longer used formating.py file
- Removed: SubtitleConfig in TranscriptionConfig
- Added force_sync param for forced synchronous call
- Added vad processing to split large files into subfiles
- Added password variable for the service broker
- Changed API to the TranscriptionConfig format.
- Changed results return format
- Updated test_transcription.py
- Fixed wavefile not being converted when samplerate was wrong
- Removed flower
- Updated swagger to OpenAPI 3.0 and added new specifications.git
- Added wait-for-it for service dependencies
- Added LICENSE
- Added README
- Added swagger
- Fixed post-processing failing with speaker diarization
- Fixed transcription task initial state not returning proper format
- Removed unecessary ENV variables
- Moved test/ to repository root
- Initial version
- Allow client to perform asynchronous transcription request
- Results are stored in a database