Skip to main content

Onnxruntime

ONNX Runtime loader. This is a process-level singleton — only one instance exists per process.

Methods

MethodDescription
static loadOnce(string $filename = ''): selfLoads and initializes the ONNX Runtime. On subsequent calls the argument is ignored and the existing instance is returned.
static get(): ?selfReturns the existing instance, or null if not yet initialized.
supportedDevices(): stringReturns available device information as a JSON string.
static libVersionedFilename(): stringReturns the versioned filename of the ONNX Runtime library (e.g., libvoicevox_onnxruntime.1.17.3.dylib).
static libUnversionedFilename(): stringReturns the unversioned filename of the ONNX Runtime library.

Constants

ConstantDescription
LIB_NAMELibrary base name (voicevox_onnxruntime)
LIB_VERSIONRecommended ONNX Runtime version

OpenJtalk

Text analyzer using OpenJTalk.

Methods

MethodDescription
__construct(string $openJtalkDictDir)Initializes with the path to the OpenJTalk dictionary directory.
analyze(string $text): stringAnalyzes Japanese text and returns an accent phrase array as a JSON string.
useUserDict(UserDict $userDict): voidApplies a user dictionary. Call again if the dictionary is updated.

VoiceModelFile

Voice model file (.vvm file).

Methods

MethodDescription
static open(string $path): selfOpens a .vvm file.
id(): stringReturns the voice model ID as a hex string (16 bytes).
createMetasJson(): stringReturns speaker metadata as a JSON string.
close(): voidCloses the file and releases resources.

Synthesizer

Main class for text-to-speech synthesis.

Methods

MethodDescription
__construct(Onnxruntime $onnxruntime, OpenJtalk $openJtalk, AccelerationMode $accelerationMode = Auto, int $cpuNumThreads = 0)Initializes the synthesizer.
onnxruntime(): OnnxruntimeReturns the Onnxruntime instance held by this synthesizer.
isGpuMode(): boolReturns whether GPU mode is enabled.
metas(): stringReturns metadata for all loaded speakers as a JSON string.
loadVoiceModel(VoiceModelFile $model): voidLoads a voice model.
unloadVoiceModel(string $voiceModelId): voidUnloads a voice model by its hex ID.
isLoadedVoiceModel(string $voiceModelId): boolReturns whether a voice model is loaded.
createAudioQuery(string $text, int $styleId): stringGenerates an AudioQuery JSON from Japanese text.
createAudioQueryFromKana(string $kana, int $styleId): stringGenerates an AudioQuery JSON from AquesTalk-style kana notation.
createAccentPhrases(string $text, int $styleId): stringGenerates an accent phrase array JSON from Japanese text.
createAccentPhrasesFromKana(string $kana, int $styleId): stringGenerates an accent phrase array JSON from kana notation.
replaceMoraData(string $accentPhrasesJson, int $styleId): stringReturns updated accent phrases with mora pitch and phoneme length from another style.
replacePhonemeLength(string $accentPhrasesJson, int $styleId): stringReturns updated accent phrases with phoneme lengths from another style.
replaceMoraPitch(string $accentPhrasesJson, int $styleId): stringReturns updated accent phrases with mora pitch from another style.
synthesis(string $audioQueryJson, int $styleId, bool $enableInterrogativeUpspeak = true): stringSynthesizes speech from an AudioQuery JSON. Returns WAV binary.
tts(string $text, int $styleId, bool $enableInterrogativeUpspeak = true): stringSynthesizes speech from Japanese text in one step. Returns WAV binary.
ttsFromKana(string $kana, int $styleId, bool $enableInterrogativeUpspeak = true): stringSynthesizes speech from kana notation. Returns WAV binary.
createSingFrameAudioQuery(string $scoreJson, int $styleId): stringGenerates a singing synthesis query JSON from a score JSON.
frameSynthesis(string $frameAudioQueryJson, int $styleId): stringSynthesizes singing audio from a frame audio query. Returns WAV binary.
createSingFrameF0(string $scoreJson, string $frameAudioQueryJson, int $styleId): stringGenerates per-frame F0 (fundamental frequency) values as a JSON float array.
createSingFrameVolume(string $scoreJson, string $frameAudioQueryJson, int $styleId): stringGenerates per-frame volume values as a JSON float array.

VoicevoxCore

Global utility functions for VOICEVOX Core.

Methods

MethodDescription
static getVersion(): stringReturns the VOICEVOX Core version as a SemVer string.
static audioQueryCreateFromAccentPhrases(string $accentPhrasesJson): stringGenerates an AudioQuery JSON from an accent phrase array JSON.
static audioQueryValidate(string $audioQueryJson): voidValidates an AudioQuery JSON. Throws VoicevoxException if invalid.
static accentPhraseValidate(string $accentPhraseJson): voidValidates an AccentPhrase JSON. Throws VoicevoxException if invalid.
static moraValidate(string $moraJson): voidValidates a Mora JSON. Throws VoicevoxException if invalid.
static scoreValidate(string $scoreJson): voidValidates a Score JSON. Throws VoicevoxException if invalid.
static noteValidate(string $noteJson): voidValidates a Note JSON. Throws VoicevoxException if invalid.
static frameAudioQueryValidate(string $frameAudioQueryJson): voidValidates a FrameAudioQuery JSON. Throws VoicevoxException if invalid.
static framePhonemeValidate(string $framePhonemeJson): voidValidates a FramePhoneme JSON. Throws VoicevoxException if invalid.
static ensureCompatible(string $scoreJson, string $frameAudioQueryJson): voidChecks that a score and frame audio query are compatible. Throws VoicevoxException if not.

UserDict

User dictionary for registering custom word pronunciations.

Methods

MethodDescription
__construct()Creates an empty user dictionary.
load(string $path): voidLoads a user dictionary from a file.
save(string $path): voidSaves the user dictionary to a file.
addWord(string $surface, string $pronunciation, int $accentType, UserDictWordType $wordType = CommonNoun, int $priority = 5): stringAdds a word and returns its UUID as a hex string.
updateWord(string $wordUuid, string $surface, string $pronunciation, int $accentType, UserDictWordType $wordType = CommonNoun, int $priority = 5): voidUpdates an existing word identified by UUID.
removeWord(string $wordUuid): voidRemoves the word identified by UUID.
importDict(UserDict $other): voidImports words from another UserDict.
toJson(): stringReturns all words as a JSON string.

Enum: AccelerationMode

Hardware acceleration mode for the synthesizer.
CaseValueDescription
Auto0Automatically selects the best available mode.
Cpu1Forces CPU mode.
Gpu2Forces GPU mode.

Enum: UserDictWordType

Part-of-speech for user dictionary entries.
CaseValueDescription
ProperNoun0Proper noun
CommonNoun1Common noun
Verb2Verb
Adjective3Adjective
Suffix4Suffix

VoicevoxException

Thrown when the VOICEVOX Core C API call returns an error code. The exception message contains the error description from the library.
Last modified on May 19, 2026