generated from axioma-ai-labs/python-template
-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Labels
Milestone
Description
Is your feature request related to a problem? Please describe.
Nevron currently lacks the ability to process audio files or voice inputs. This limits its usability for scenarios where users may want to provide voice notes, meeting recordings, or podcasts for analysis, memory updates, or decision-making workflows.
Describe the solution you'd like
Add functionality to transcribe audio files into text using a reliable speech-to-text solution. This feature will enable Nevron to process voice-based inputs and use the transcriptions in workflows.
Proposed Implementation Steps:
-
Audio File Support:
- Support common audio file formats such as
.mp3,.wav,.flac.
- Support common audio file formats such as
-
Integration with Speech-to-Text Services:
- Use an external library or API for transcription:
- Whisper API.
- AssemblyAI.
- Allow users to choose the transcription backend through
settings.py.
- Use an external library or API for transcription:
-
Integration with Workflows:
- Add new Execution tool
-
Error Handling:
- Handle errors such as:
- Unsupported file formats.
- Poor audio quality leading to incomplete transcriptions.
- API errors during transcription.
- Log detailed error messages for debugging.
- Handle errors such as:
-
Configuration Options:
- Add configuration options to
settings.py, including:- Maximum file size.
- Transcription backend and API keys.
- Language settings for transcription.
- Add configuration options to
-
Unit Tests:
- Write unit tests to validate audio transcription functionality using sample audio files:
- Clear audio with text output verification.
- Poor quality audio with expected errors.
- Unsupported file formats.
- Write unit tests to validate audio transcription functionality using sample audio files:
Additional Context
- We need to first check if audio is secure (doesn't have any malware)