프롬프트 지원
whisper에서도 prompt를 사용하여 좀더 다양한 효과를 줄수 있어보인다.
https://github.com/openai/whisper/blob/main/whisper/transcribe.py
def transcribe(
model: "Whisper",
audio: Union[str, np.ndarray, torch.Tensor],
*,
verbose: Optional[bool] = None,
temperature: Union[float, Tuple[float, ...]] = (0.0, 0.2, 0.4, 0.6, 0.8, 1.0),
compression_ratio_threshold: Optional[float] = 2.4,
logprob_threshold: Optional[float] = -1.0,
no_speech_threshold: Optional[float] = 0.6,
condition_on_previous_text: bool = True,
initial_prompt: Optional[str] = None,
word_timestamps: bool = False,
prepend_punctuations: str = "\"'“¿([{-",
append_punctuations: str = "\"'.。,,!!??::”)]}、",
**decode_options,
):
"""
initial_prompt: Optional[str]
Optional text to provide as a prompt for the first window. This can be used to provide, or
"prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns
to make it more likely to predict those word correctly.
"""
prompt를 지원
tag : v0.8 을 보자.
query string으로 받아서 옵션에 넣어주자.
@app.post("/asr")
def transcribe(
audio_file: UploadFile = File(...),
language: Union[str, None] = Query(default=None, enum=LANGUAGE_CODES),
task : Union[str, None] = Query(default="transcribe", enum=["transcribe", "translate"]),
initial_prompt: Union[str, None] = Query(default=None), # 여기추가
):
audio = load_audio(audio_file.file)
options_dict = {"language" : language }
if task:
options_dict["task"] = task
# 여기추가
if initial_prompt:
options_dict["initial_prompt"] = initial_prompt
with model_lock:
result = model.transcribe(audio, **options_dict)
return result["text"]
테스트해보자.
curl -F "audio_file=@kr.mp3" http://whisper/asr\?initial_prompt=strong%20mind
프롬프트를 넣엇을때 성과가 어떤지는 테스트가 좀 안된다. //todo

Last updated
Was this helpful?