CrisperWhisper is an advancement of OpenAI's Whisper, developed for fast, precise, and verbatim speech-to-text (STT) with accurate timestamps at the word level. In contrast to the original Whisper, which tends to skip filler words and pauses, CrisperWhisper transcribes every spoken word exactly as it is, including filler words, pauses, stutters, and incomplete sentences. The model aims for a verbatim transcription that captures every detail of spoken language.
CrisperWhisper is characterized by several key features:
CrisperWhisper significantly outperforms Whisper Large v3, particularly on datasets with a verbatim transcription style like AMI and TED-LIUM. The improved performance is evident in both transcription accuracy and segmentation. Especially noteworthy is the precise capturing of filler words and pauses, which are relevant for the analysis of speech patterns and cognitive processes.
CrisperWhisper can be used in various applications, including:
The model can be integrated into common frameworks like Transformers and Faster Whisper and offers flexible application possibilities. A Streamlit app is available for user-friendly operation.
For Mindverse, a German company that develops AI-powered content tools, CrisperWhisper represents a valuable addition to the portfolio. The precise speech recognition technology can be integrated into various Mindverse solutions, such as chatbots, voicebots, AI search engines, and knowledge systems. This allows Mindverse customers to benefit from improved accuracy and efficiency in processing speech data.
Bibliography Wagner, L., Thallinger, B., Zusag, M. (2024). CrisperWhisper: Accurate Timestamps on Verbatim Speech Transcriptions. INTERSPEECH 2024. https://github.com/nyrahealth/CrisperWhisper/blob/main/README.md https://replicate.com/collectiveai-team/crisperwhisper/readme https://www.gradio.app/guides/real-time-speech-recognition https://arxiv.org/html/2408.16589v1 https://openai.com/index/whisper/ https://www.isca-archive.org/interspeech_2024/zusag24_interspeech.html https://github.com/SYSTRAN/faster-whisper https://hub.docker.com/r/liquidinvestigations/openai-whisper-gradio ```