SongGuess
    Preparing search index...

    Interface Ai_Cf_Openai_Whisper_Large_V3_Turbo_Input

    interface Ai_Cf_Openai_Whisper_Large_V3_Turbo_Input {
        audio: string | { body?: object; contentType?: string };
        beam_size?: number;
        compression_ratio_threshold?: number;
        condition_on_previous_text?: boolean;
        hallucination_silence_threshold?: number;
        initial_prompt?: string;
        language?: string;
        log_prob_threshold?: number;
        no_speech_threshold?: number;
        prefix?: string;
        task?: string;
        vad_filter?: boolean;
    }
    Index

    Properties

    audio: string | { body?: object; contentType?: string }
    beam_size?: number

    The number of beams to use in beam search decoding. Higher values may improve accuracy at the cost of speed.

    compression_ratio_threshold?: number

    Threshold for filtering out segments with high compression ratio, which often indicate repetitive or hallucinated text.

    condition_on_previous_text?: boolean

    Whether to condition on previous text during transcription. Setting to false may help prevent hallucination loops.

    hallucination_silence_threshold?: number

    Optional threshold (in seconds) to skip silent periods that may cause hallucinations.

    initial_prompt?: string

    A text prompt to help provide context to the model on the contents of the audio.

    language?: string

    The language of the audio being transcribed or translated.

    log_prob_threshold?: number

    Threshold for filtering out segments with low average log probability, indicating low confidence.

    no_speech_threshold?: number

    Threshold for detecting no-speech segments. Segments with no-speech probability above this value are skipped.

    prefix?: string

    The prefix appended to the beginning of the output of the transcription and can guide the transcription result.

    task?: string

    Supported tasks are 'translate' or 'transcribe'.

    vad_filter?: boolean

    Preprocess the audio with a voice activity detection model.