ChatTTS-UI for Mac, the Ultimate Tool for Text-to-Speech Conversion, API Supported!
ChatTTS-UI for Mac: The Ultimate Tool for Text-to-Speech Conversion, API Supported!
The download link is: https://www.patreon.com/user/shop/chattts-ui-for-mac-ultimate-tool-for-to-249930?u=122446863&utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=productshare_creator&utm_content=join_link.
Recently, there’s an exciting project called ChatTTS-UI. It’s a simple local web interface that uses ChatTTS to convert text to speech and supports external API calls. The project is open-source and can be found here: https://github.com/jianchang512/ChatTTS-ui
Features of ChatTTS-UI
Compared to the official usage page, ChatTTS-UI has the following features:
- Mixed effects of text, numbers, and symbol control characters
- Optimization of Chinese text normalization
- Added speed control (speed 1-9)
- Fixed voice tone
- Added
compile=true
to enable inference optimization - Numbers are converted to corresponding language pronunciations
- Added Chinese-English word segmentation function
- Improved interface and API
- Added
top_p
andtop_k
parameter controls - Multi-line text synthesized line by line
- Custom voice tone seed value
- Option to skip the
refine text
stage - Online text synthesis via web page
- API interface support
Installation Guide
To simplify the installation process, I have packaged the tool into a standalone startup package. Users can run it with a simple click without the need for configuring a complex Python environment. Here are the detailed steps to obtain and install the application:
Download the Application
Note: Only supports devices with Mac M1/2/3 series chips.
Installation Steps
Download the DMG image file from the link above and drag the
app
file into theApplications
folder.After copying and completing the installation, do not open it from the Launchpad for the first time. Instead, right-click in the Applications folder to open it.
The software will automatically open the operation interface in the default browser, and you can start using it in the browser.
ChatTTS Usage Tips
Control Symbols Available in Text
You can intersperse control symbols in the original text to be synthesized. Currently, two control symbols are supported: laughter and pause.
- [laugh] represents laughter
- [uv_break] represents a pause
For example:
1 |
|
In the actual synthesis, [laugh]
will be replaced by laughter, and [uv_break]
will introduce a pause. The intensity of laughter and pauses can be controlled by passing prompt
in the params_refine_text
parameter.
1 |
|
Skipping the Refine Text Stage
The actual synthesis will re-organize (refine) the text with control symbols. If you do not want this, you can set the skip_refine_text
parameter to True
.
1 |
|
Fixing the Voice Tone
By default, each synthesis randomly calls a different voice tone, which can be unfriendly. To simply fix the voice role, you can manually set a random seed and then get a random speaker.
1 |
|
Through testing, 2222 7869 6653
are male voices, and 3333 4099 5099
are female voices. You can test different seed numbers to find more voices.
API Request Tutorial
Request Method: POST
Request URL: http://127.0.0.1:9966/tts
Request Parameters:
- text: str | Required, the text to be synthesized
- voice: int | Optional, default 2222, the number determining the voice tone, 2222 | 7869 | 6653 | 4099 | 5099, you can choose one or pass any value to use a random voice tone
- prompt: str | Optional, default empty, set laughter, pause, for example [oral_2][laugh_0][break_6]
- temperature: float | Optional, default 0.3
- top_p: float | Optional, default 0.7
- top_k: int | Optional, default 20
- skip_refine: int | Optional, default 0, 1=skip refine text, 0=do not skip
- custom_voice: int | Optional, default 0, the seed value for custom voice tone, needs to be an integer greater than 0, if set, it will take precedence and ignore
voice
- is_split: int | Optional, default 0, 1=convert numbers to text for correct pronunciation, 0=keep unchanged
Response: JSON data
Successful response:
1 |
|
Where audio_files
is an array of dictionaries, each element dict
is {filename: absolute path of the wav file, url: downloadable wav URL}
Error response:
1 |
|
API Call Code
1 |
|
Successful call result:
1 |
|
Error result:
1 |
|