WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Add Audio Transcription Functionality #94

@gromdimon

Description

@gromdimon

Is your feature request related to a problem? Please describe.

Nevron currently lacks the ability to process audio files or voice inputs. This limits its usability for scenarios where users may want to provide voice notes, meeting recordings, or podcasts for analysis, memory updates, or decision-making workflows.


Describe the solution you'd like

Add functionality to transcribe audio files into text using a reliable speech-to-text solution. This feature will enable Nevron to process voice-based inputs and use the transcriptions in workflows.

Proposed Implementation Steps:

  1. Audio File Support:

    • Support common audio file formats such as .mp3, .wav, .flac.
  2. Integration with Speech-to-Text Services:

    • Use an external library or API for transcription:
      • Whisper API.
      • AssemblyAI.
    • Allow users to choose the transcription backend through settings.py.
  3. Integration with Workflows:

    • Add new Execution tool
  4. Error Handling:

    • Handle errors such as:
      • Unsupported file formats.
      • Poor audio quality leading to incomplete transcriptions.
      • API errors during transcription.
    • Log detailed error messages for debugging.
  5. Configuration Options:

    • Add configuration options to settings.py, including:
      • Maximum file size.
      • Transcription backend and API keys.
      • Language settings for transcription.
  6. Unit Tests:

    • Write unit tests to validate audio transcription functionality using sample audio files:
      • Clear audio with text output verification.
      • Poor quality audio with expected errors.
      • Unsupported file formats.

Additional Context

  • We need to first check if audio is secure (doesn't have any malware)

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or requestfutureSome issue for future

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions