Bryce's Piper TTS Voices

About

My goal is to create several voice models for use with the Piper text-to-speech software. I want there to be more voices available without restrictive licenses. Many of the widely available datasets and thus the voices created from them have "for research and learning purposes only"-style licenses.

Voice Models

Voice	Notes	Sample
LJSpeech (medium)	US English female voice. Single speaker. Trained from scratch for 1000 epochs on medium quality settings using the LJ Speech dataset. I reencoded the recordings to a bit rate of 22050 Hz so it would match other voices released for Piper TTS License: public domain Downloads: .onnx file \| config json file \| .ckpt file (for training)
LJSpeech (high)	US English female voice. Single speaker. Trained from scratch for 2000 epochs on high quality settings using the LJ Speech dataset. I reencoded the recordings to a bit rate of 22050 Hz so it would match other voices released for Piper TTS License: public domain Downloads: .onnx file \| config json file \| .ckpt file (for training)
Jenny (Dioco)	UK English female voice (Irish). Single speaker. Trained from scratch for 287 epochs on medium quality settings using the ~30 hour Jenny (Dioco) dataset. I reencoded the recordings to a bit rate of 22050 Hz so it would match other voices released for Piper TTS License: Attribution required, similar to CC BY See the link for license details. I impose no further license or restrictions. Downloads: .onnx file \| config json file
Clean 100	US English voice. Multi-Speaker with 244 speakers. Trained from scratch on medium quality settings for 1000 epochs. This comes from the "Clean 100" subset of the LibriTTS-r dataset, which is a sound-improved version of the LibriTTS dataset, which is derived from the original materials of the LibriSpeech corpus. I removed the files they have listed as "failed speech restoration." The '100' comes from 100 hours of recordings in the original dataset. The sample is just the first speaker in the set. A zip containing a sample for each speaker in the set can be downloaded here License: CC BY 4.0 This is for the original dataset. I impose no further license or restriction. Downloads: .onnx file \| config json file
Cori (high)	UK English female voice. Single Speaker. Trained from scratch on high quality settings for 500 epochs. I put together the dataset, which ended up with about 24 hours of recordings. All recordings came from LibriVox.org. License: public domain Downloads: .onnx file \| config json file \| .ckpt file (for training)
Cori (medium)	UK English female voice. Single Speaker. Trained from scratch on medium quality settings for 640 epochs. I put together the dataset, which ended up with about 24 hours of recordings. All recordings came from LibriVox.org. License: public domain Downloads: .onnx file \| config json file \| .ckpt file (for training)
Kristin	US English female voice. Single Speaker. Trained from scratch on medium quality settings for 2000 epochs. I put together the dataset, which ended up with about 11.5 hours of recordings. All recordings came from LibriVox.org. License: public domain Downloads: .onnx file \| config json file \| .ckpt file (for training)
John	US English male voice. Single Speaker. Finetuned from Kristin (above) on medium quality settings for an additional 600 epochs. I put together the dataset, which ended up with about 12.5 hours of recordings. All recordings came from LibriVox.org. License: public domain Downloads: .onnx file \| config json file \| .ckpt file (for training)
Bryce	US English male voice. Single Speaker. Finetuned for an additional 1000 epochs from an unreleased voice which had 2500 epochs. This is my voice, I only recorded about 750 samples. License: public domain Downloads: .onnx file \| config json file \| .ckpt file (for training)
Norman	US English male voice. Single Speaker. Trained from scratch on medium quality settings for 1200 epochs. I put together the dataset, which ended up with about 15.5 hours of recordings. All recordings came from LibriVox.org. I forgot to save the ckpt file on this one, sorry. License: public domain Downloads: .onnx file \| config json file
ManyVoice	A mix of 12 US and 4 UK English voices. 16 total speakers. Trained from scratch on medium quality settings for 400 epochs. I put together the dataset, which ended up with (I think about) 8 hours of recordings per voice, with a couple of voices only having 4. All recordings were public domain and came from LibriVox.org. I used various tools to improve the sound quality of the recordings. To my ear, the speakers that I've trained as solo voices sound the same as when they appear as part of this multivoice model. The sample is just a speaker in the set. A zip containing a sample for each speaker in the set can be downloaded here License: public domain Downloads: .onnx file \| config json file

Note: Feel free to use these for any legal and ethical purpose. If somebody wants to upload these to HuggingFace or somewhere similar, you have my blessing.