ffmpeg
## 字幕流格式转化 srt 转webvtt格式
ffmpeg -i output.srt output.vtt
## 字幕流格式转化 转 ASS
ffmpeg -i output.srt output.ass
## 将.m4a音频转换为.wav文件
ffmpeg -i song.m4a song.wav
ffmpeg --help
ffmpeg version 5.1.4-0+deb12u1 Copyright (c) 2000-2023 the FFmpeg developers
built with gcc 12 (Debian 12.2.0-14)
configuration: --prefix=/usr --extra-version=0+deb12u1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libglslang --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librist --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libsvtav1 --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --disable-sndio --enable-libjxl --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-libplacebo --enable-librav1e --enable-shared
libavutil 57. 28.100 / 57. 28.100
libavcodec 59. 37.100 / 59. 37.100
libavformat 59. 27.100 / 59. 27.100
libavdevice 59. 7.100 / 59. 7.100
libavfilter 8. 44.100 / 8. 44.100
libswscale 6. 7.100 / 6. 7.100
libswresample 4. 7.100 / 4. 7.100
libpostproc 56. 6.100 / 56. 6.100
Hyper fast Audio and Video encoder
usage: ffmpeg [options] [[infile options] -i infile]... {[outfile options] outfile}...
Getting help:
-h -- print basic options
-h long -- print more options
-h full -- print all options (including all format and codec specific options, very long)
-h type=name -- print all options for the named decoder/encoder/demuxer/muxer/filter/bsf/protocol
See man ffmpeg for detailed description of the options.
Print help / information / capabilities:
-L show license
-h topic show help
-? topic show help
-help topic show help
--help topic show help
-version show version
-buildconf show build configuration
-formats show available formats
-muxers show available muxers
-demuxers show available demuxers
-devices show available devices
-codecs show available codecs
-decoders show available decoders
-encoders show available encoders
-bsfs show available bit stream filters
-protocols show available protocols
-filters show available filters
-pix_fmts show available pixel formats
-layouts show standard channel layouts
-sample_fmts show available audio sample formats
-dispositions show available stream dispositions
-colors show available color names
-sources device list sources of the input device
-sinks device list sinks of the output device
-hwaccels show available HW acceleration methods
Global options (affect whole program instead of just one file):
-loglevel loglevel set logging level
-v loglevel set logging level
-report generate a report
-max_alloc bytes set maximum size of a single allocated block
-y overwrite output files
-n never overwrite output files
-ignore_unknown Ignore unknown stream types
-filter_threads number of non-complex filter threads
-filter_complex_threads number of threads for -filter_complex
-stats print progress report during encoding
-max_error_rate maximum error rate ratio of decoding errors (0.0: no errors, 1.0: 100% errors) above which ffmpeg returns an error instead of success.
-vol volume change audio volume (256=normal)
Per-file main options:
-f fmt force format
-c codec codec name
-codec codec codec name
-pre preset preset name
-map_metadata outfile[,metadata]:infile[,metadata] set metadata information of outfile from infile
-t duration record or transcode "duration" seconds of audio/video
-to time_stop record or transcode stop time
-fs limit_size set the limit file size in bytes
-ss time_off set the start time offset
-sseof time_off set the start time offset relative to EOF
-seek_timestamp enable/disable seeking by timestamp with -ss
-timestamp time set the recording timestamp ('now' to set the current time)
-metadata string=string add metadata
-program title=string:st=number... add program with specified streams
-target type specify target file type ("vcd", "svcd", "dvd", "dv" or "dv50" with optional prefixes "pal-", "ntsc-" or "film-")
-apad audio pad
-frames number set the number of frames to output
-filter filter_graph set stream filtergraph
-filter_script filename read stream filtergraph description from a file
-reinit_filter reinit filtergraph on input parameter changes
-discard discard
-disposition disposition
Video options:
-vframes number set the number of video frames to output
-r rate set frame rate (Hz value, fraction or abbreviation)
-fpsmax rate set max frame rate (Hz value, fraction or abbreviation)
-s size set frame size (WxH or abbreviation)
-aspect aspect set aspect ratio (4:3, 16:9 or 1.3333, 1.7777)
-vn disable video
-vcodec codec force video codec ('copy' to copy stream)
-timecode hh:mm:ss[:;.]ff set initial TimeCode value.
-pass n select the pass number (1 to 3)
-vf filter_graph set video filters
-ab bitrate audio bitrate (please use -b:a)
-b bitrate video bitrate (please use -b:v)
-dn disable data
Audio options:
-aframes number set the number of audio frames to output
-aq quality set audio quality (codec-specific)
-ar rate set audio sampling rate (in Hz)
-ac channels set number of audio channels
-an disable audio
-acodec codec force audio codec ('copy' to copy stream)
-vol volume change audio volume (256=normal)
-af filter_graph set audio filters
Subtitle options:
-s size set frame size (WxH or abbreviation)
-sn disable subtitle
-scodec codec force subtitle codec ('copy' to copy stream)
-stag fourcc/tag force subtitle tag/fourcc
-fix_sub_duration fix subtitles duration
-canvas_size size set canvas size (WxH or abbreviation)
-spre preset set the subtitle options to the indicated preset
whisper
./whisper --help
sage: whisper [-h] [--model MODEL] [--model_dir MODEL_DIR] [--device DEVICE]
[--output_dir OUTPUT_DIR] [--output_format {txt,vtt,srt,tsv,json,all}]
[--verbose VERBOSE] [--task {transcribe,translate}]
[--language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}]
[--temperature TEMPERATURE] [--best_of BEST_OF] [--beam_size BEAM_SIZE]
[--patience PATIENCE] [--length_penalty LENGTH_PENALTY]
[--suppress_tokens SUPPRESS_TOKENS] [--initial_prompt INITIAL_PROMPT]
[--condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT] [--fp16 FP16]
[--temperature_increment_on_fallback TEMPERATURE_INCREMENT_ON_FALLBACK]
[--compression_ratio_threshold COMPRESSION_RATIO_THRESHOLD]
[--logprob_threshold LOGPROB_THRESHOLD]
[--no_speech_threshold NO_SPEECH_THRESHOLD]
[--word_timestamps WORD_TIMESTAMPS]
[--prepend_punctuations PREPEND_PUNCTUATIONS]
[--append_punctuations APPEND_PUNCTUATIONS]
[--highlight_words HIGHLIGHT_WORDS] [--max_line_width MAX_LINE_WIDTH]
[--max_line_count MAX_LINE_COUNT]
[--max_words_per_line MAX_WORDS_PER_LINE] [--threads THREADS]
audio
positional arguments:
audio audio file(s) to transcribe
optional arguments:
-h, --help show this help message and exit
--model MODEL name of the Whisper model to use (default: small)
--model_dir MODEL_DIR
the path to save model files; uses ~/.cache/whisper by default
(default: None)
--device DEVICE device to use for PyTorch inference (default: cpu)
--output_dir OUTPUT_DIR, -o OUTPUT_DIR
directory to save the outputs (default: .)
--output_format {txt,vtt,srt,tsv,json,all}, -f {txt,vtt,srt,tsv,json,all}
format of the output file; if not specified, all available
formats will be produced (default: all)
--verbose VERBOSE whether to print out the progress and debug messages (default:
True)
--task {transcribe,translate}
whether to perform X->X speech recognition ('transcribe') or
X->English translation ('translate') (default: transcribe)
--language {af,am,ar,as,az,ba,be,bg,bn,bo,br,bs,ca,cs,cy,da,de,el,en,es,et,eu,fa,fi,fo,fr,gl,gu,ha,haw,he,hi,hr,ht,hu,hy,id,is,it,ja,jw,ka,kk,km,kn,ko,la,lb,ln,lo,lt,lv,mg,mi,mk,ml,mn,mr,ms,mt,my,ne,nl,nn,no,oc,pa,pl,ps,pt,ro,ru,sa,sd,si,sk,sl,sn,so,sq,sr,su,sv,sw,ta,te,tg,th,tk,tl,tr,tt,uk,ur,uz,vi,yi,yo,yue,zh,Afrikaans,Albanian,Amharic,Arabic,Armenian,Assamese,Azerbaijani,Bashkir,Basque,Belarusian,Bengali,Bosnian,Breton,Bulgarian,Burmese,Cantonese,Castilian,Catalan,Chinese,Croatian,Czech,Danish,Dutch,English,Estonian,Faroese,Finnish,Flemish,French,Galician,Georgian,German,Greek,Gujarati,Haitian,Haitian Creole,Hausa,Hawaiian,Hebrew,Hindi,Hungarian,Icelandic,Indonesian,Italian,Japanese,Javanese,Kannada,Kazakh,Khmer,Korean,Lao,Latin,Latvian,Letzeburgesch,Lingala,Lithuanian,Luxembourgish,Macedonian,Malagasy,Malay,Malayalam,Maltese,Mandarin,Maori,Marathi,Moldavian,Moldovan,Mongolian,Myanmar,Nepali,Norwegian,Nynorsk,Occitan,Panjabi,Pashto,Persian,Polish,Portuguese,Punjabi,Pushto,Romanian,Russian,Sanskrit,Serbian,Shona,Sindhi,Sinhala,Sinhalese,Slovak,Slovenian,Somali,Spanish,Sundanese,Swahili,Swedish,Tagalog,Tajik,Tamil,Tatar,Telugu,Thai,Tibetan,Turkish,Turkmen,Ukrainian,Urdu,Uzbek,Valencian,Vietnamese,Welsh,Yiddish,Yoruba}
language spoken in the audio, specify None to perform language
detection (default: None)
--temperature TEMPERATURE
temperature to use for sampling (default: 0)
--best_of BEST_OF number of candidates when sampling with non-zero temperature
(default: 5)
--beam_size BEAM_SIZE
number of beams in beam search, only applicable when temperature
is zero (default: 5)
--patience PATIENCE optional patience value to use in beam decoding, as in
https://arxiv.org/abs/2204.05424, the default (1.0) is
equivalent to conventional beam search (default: None)
--length_penalty LENGTH_PENALTY
optional token length penalty coefficient (alpha) as in
https://arxiv.org/abs/1609.08144, uses simple length
normalization by default (default: None)
--suppress_tokens SUPPRESS_TOKENS
comma-separated list of token ids to suppress during sampling;
'-1' will suppress most special characters except common
punctuations (default: -1)
--initial_prompt INITIAL_PROMPT
optional text to provide as a prompt for the first window.
(default: None)
--condition_on_previous_text CONDITION_ON_PREVIOUS_TEXT
if True, provide the previous output of the model as a prompt
for the next window; disabling may make the text inconsistent
across windows, but the model becomes less prone to getting
stuck in a failure loop (default: True)
--fp16 FP16 whether to perform inference in fp16; True by default (default:
True)
--temperature_increment_on_fallback TEMPERATURE_INCREMENT_ON_FALLBACK
temperature to increase when falling back when the decoding
fails to meet either of the thresholds below (default: 0.2)
--compression_ratio_threshold COMPRESSION_RATIO_THRESHOLD
if the gzip compression ratio is higher than this value, treat
the decoding as failed (default: 2.4)
--logprob_threshold LOGPROB_THRESHOLD
if the average log probability is lower than this value, treat
the decoding as failed (default: -1.0)
--no_speech_threshold NO_SPEECH_THRESHOLD
if the probability of the <|nospeech|> token is higher than this
value AND the decoding has failed due to `logprob_threshold`,
consider the segment as silence (default: 0.6)
--word_timestamps WORD_TIMESTAMPS
(experimental) extract word-level timestamps and refine the
results based on them (default: False)
--prepend_punctuations PREPEND_PUNCTUATIONS
if word_timestamps is True, merge these punctuation symbols with
the next word (default: "'“¿([{-)
--append_punctuations APPEND_PUNCTUATIONS
if word_timestamps is True, merge these punctuation symbols with
the previous word (default: "'.。,,!!??::”)]}、)
--highlight_words HIGHLIGHT_WORDS
(requires --word_timestamps True) underline each word as it is
spoken in srt and vtt (default: False)
--max_line_width MAX_LINE_WIDTH
(requires --word_timestamps True) the maximum number of
characters in a line before breaking the line (default: None)
--max_line_count MAX_LINE_COUNT
(requires --word_timestamps True) the maximum number of lines in
a segment (default: None)
--max_words_per_line MAX_WORDS_PER_LINE
(requires --word_timestamps True, no effect with
--max_line_width) the maximum number of words in a segment
(default: None)
--threads THREADS number of threads used by torch for CPU inference; supercedes
MKL_NUM_THREADS/OMP_NUM_THREADS (default: 0)