프롬프트 지원

whisper에서도 prompt를 사용하여 좀더 다양한 효과를 줄수 있어보인다.

https://github.com/openai/whisper/blob/main/whisper/transcribe.py

def transcribe(
    model: "Whisper",
    audio: Union[str, np.ndarray, torch.Tensor],
    *,
    verbose: Optional[bool] = None,
    temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
    compression_ratio_threshold: Optional[float] = 2.4,
    logprob_threshold: Optional[float] = -1.0,
    no_speech_threshold: Optional[float] = 0.6,
    condition_on_previous_text: bool = True,
    initial_prompt: Optional[str] = None,
    word_timestamps: bool = False,
    prepend_punctuations: str = "\"'“¿([{-",
    append_punctuations: str = "\"'.。,,!!??::”)]}、",
    **decode_options,
):
    """
    initial_prompt: Optional[str]
        Optional text to provide as a prompt for the first window. This can be used to provide, or
        "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns
        to make it more likely to predict those word correctly.
    """

prompt를 지원

tag : v0.8 을 보자.

query string으로 받아서 옵션에 넣어주자.

@app.post("/asr")
def transcribe(
                audio_file: UploadFile = File(...),
                language: Union[str, None] = Query(default=None, enum=LANGUAGE_CODES),
                task : Union[str, None] = Query(default="transcribe", enum=["transcribe", "translate"]),
                initial_prompt: Union[str, None] = Query(default=None), # 여기추가
                ):
    audio = load_audio(audio_file.file)
    options_dict = {"language" : language  }
    if task:
        options_dict["task"] = task
    # 여기추가
    if initial_prompt:
        options_dict["initial_prompt"] = initial_prompt
    with model_lock:
        result = model.transcribe(audio, **options_dict)
    return result["text"]

테스트해보자.

curl -F "audio_file=@kr.mp3" http://whisper/asr\?initial_prompt=strong%20mind

프롬프트를 넣엇을때 성과가 어떤지는 테스트가 좀 안된다. //todo

Last updated

Was this helpful?