• OpenAI has built a voice cloning tool called Voice Engine that can generate synthetic speech matching any voice from a 15-second sample.
• Voice Engine powers the voice capabilities in ChatGPT and OpenAI's text-to-speech API, and has been used by Spotify for podcast dubbing.
• The model was trained on a mix of licensed and publicly available speech data, though details are not provided.
• Voice Engine generates speech on-the-fly without building custom models, allowing cheap pricing around $1 per hour of audio.
• It lacks controls to adjust characteristics like pitch and tone, though it aims to mimic the expressiveness of the sample voice.
• The tool could commoditize voice acting work, though OpenAI is exploring actor compensation models.
• Voice cloning carries risks like harassment, fraud, and election interference via deepfakes.
• OpenAI is limiting initial Voice Engine access and use cases while exploring mitigations like watermarking.
• Future plans include security by having users read randomized text to prove consent.
• OpenAI is reluctant to commit to a general release until safety issues from the pilot are understood.