Skip to main content

Basic speech synthesis

The following talk.php demonstrates how to synthesize speech from text and write it to a WAV file.
PHP FFI is usually disabled in web server environments such as FPM. Run this script from the CLI with php talk.php.
<?php

require __DIR__ . '/vendor/autoload.php';

use Revolution\Voicevox\Core\Enums\AccelerationMode;
use Revolution\Voicevox\Core\Onnxruntime;
use Revolution\Voicevox\Core\OpenJtalk;
use Revolution\Voicevox\Core\Synthesizer;
use Revolution\Voicevox\Core\VoiceModelFile;

// Adjust the path to match your voicevox_core installation
$voicevoxCoreDir = getenv('HOME') . '/.local/voicevox_core';
$onnxruntimeFilename = $voicevoxCoreDir . '/onnxruntime/lib/' . Onnxruntime::libVersionedFilename();
$dictDir = $voicevoxCoreDir . '/dict/open_jtalk_dic_utf_8-1.11';
$vvmPath  = $voicevoxCoreDir . '/models/vvms/0.vvm';

// Text and style ID to synthesize
$text    = 'この音声は、ボイスボックスを使用して、出力されています。';
$styleId = 0;
$outPath = './output.wav';

// Initialize
$onnxruntime = Onnxruntime::loadOnce($onnxruntimeFilename);
$openJtalk   = new OpenJtalk($dictDir);
$synthesizer = new Synthesizer($onnxruntime, $openJtalk, AccelerationMode::Auto);

// Load the voice model
$model = VoiceModelFile::open($vvmPath);
$synthesizer->loadVoiceModel($model);

// Synthesize speech
$audioQuery = $synthesizer->createAudioQuery($text, $styleId);
$wav        = $synthesizer->synthesis($audioQuery, $styleId);

file_put_contents($outPath, $wav);
echo 'Wrote ' . $outPath . PHP_EOL;
Run it:
php talk.php

One-step synthesis with tts

If you want to skip the two-step createAudioQuery + synthesis flow, use the tts() method:
$wav = $synthesizer->tts($text, $styleId);
file_put_contents('./output.wav', $wav);

Synthesis from kana notation

You can also synthesize from AquesTalk-style kana notation:
// Using ttsFromKana
$wav = $synthesizer->ttsFromKana("コノオンセイワ'、ボイスボックスオ'/シヨーシテ'、シュツリョクサレテイマ'ス。", $styleId);
file_put_contents('./output.wav', $wav);

Using a user dictionary

Use UserDict to register custom word pronunciations:
use Revolution\Voicevox\Core\UserDict;
use Revolution\Voicevox\Core\Enums\UserDictWordType;

$userDict = new UserDict();
$uuid = $userDict->addWord(
    surface: 'ボイボ',
    pronunciation: 'ボイボ',
    accentType: 0,
    wordType: UserDictWordType::ProperNoun,
    priority: 10,
);

// Save the dictionary
$userDict->save('./mydict.json');

// Apply the dictionary to OpenJTalk
$openJtalk->useUserDict($userDict);

Adjusting AudioQuery

You can fine-tune the intonation and tempo by modifying the JSON returned by createAudioQuery before synthesis:
$audioQueryJson = $synthesizer->createAudioQuery($text, $styleId);

// Decode and adjust the JSON
$audioQuery = json_decode($audioQueryJson, true);
$audioQuery['speedScale'] = 1.2;   // Speed up by 1.2x
$audioQuery['pitchScale'] = 0.05;  // Raise pitch slightly

$wav = $synthesizer->synthesis(json_encode($audioQuery), $styleId);
file_put_contents('./output.wav', $wav);

Retrieving accent phrases

You can retrieve and inspect accent phrase information for a text:
$accentPhrasesJson = $synthesizer->createAccentPhrases($text, $styleId);
$accentPhrases = json_decode($accentPhrasesJson, true);

// Overwrite mora pitch and phoneme length with another style
$updatedJson = $synthesizer->replaceMoraData($accentPhrasesJson, $styleId);

GPU mode

In environments where a GPU is available, specify AccelerationMode::Gpu for faster synthesis:
use Revolution\Voicevox\Core\Enums\AccelerationMode;

$synthesizer = new Synthesizer($onnxruntime, $openJtalk, AccelerationMode::Gpu);

// Check whether GPU mode is active
if ($synthesizer->isGpuMode()) {
    echo 'Running in GPU mode' . PHP_EOL;
}

Using multiple voice models

Load multiple .vvm files and switch between style IDs to use different character voices:
$model0 = VoiceModelFile::open($voicevoxCoreDir . '/models/vvms/0.vvm');
$model1 = VoiceModelFile::open($voicevoxCoreDir . '/models/vvms/1.vvm');

$synthesizer->loadVoiceModel($model0);
$synthesizer->loadVoiceModel($model1);

// Inspect speaker metadata included in the models
$metas = json_decode($synthesizer->metas(), true);

// Unload a model when it is no longer needed
$synthesizer->unloadVoiceModel($model0->id());
Last modified on May 17, 2026