ChatTTS-UI for Mac, the Ultimate Tool for Text-to-Speech Conversion, API Supported!

ChatTTS-UI for Mac: The Ultimate Tool for Text-to-Speech Conversion, API Supported!

The download link is: https://www.patreon.com/user/shop/chattts-ui-for-mac-ultimate-tool-for-to-249930?u=122446863&utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=productshare_creator&utm_content=join_link.

Recently, there’s an exciting project called ChatTTS-UI. It’s a simple local web interface that uses ChatTTS to convert text to speech and supports external API calls. The project is open-source and can be found here: https://github.com/jianchang512/ChatTTS-ui

Features of ChatTTS-UI

Compared to the official usage page, ChatTTS-UI has the following features:

  • Mixed effects of text, numbers, and symbol control characters
  • Optimization of Chinese text normalization
  • Added speed control (speed 1-9)
  • Fixed voice tone
  • Added compile=true to enable inference optimization
  • Numbers are converted to corresponding language pronunciations
  • Added Chinese-English word segmentation function
  • Improved interface and API
  • Added top_p and top_k parameter controls
  • Multi-line text synthesized line by line
  • Custom voice tone seed value
  • Option to skip the refine text stage
  • Online text synthesis via web page
  • API interface support

Installation Guide

To simplify the installation process, I have packaged the tool into a standalone startup package. Users can run it with a simple click without the need for configuring a complex Python environment. Here are the detailed steps to obtain and install the application:

Download the Application

Note: Only supports devices with Mac M1/2/3 series chips.

Installation Steps

  1. Download the DMG image file from the link above and drag the app file into the Applications folder.

  2. After copying and completing the installation, do not open it from the Launchpad for the first time. Instead, right-click in the Applications folder to open it.

  3. The software will automatically open the operation interface in the default browser, and you can start using it in the browser.

ChatTTS Usage Tips

Control Symbols Available in Text

You can intersperse control symbols in the original text to be synthesized. Currently, two control symbols are supported: laughter and pause.

  • [laugh] represents laughter
  • [uv_break] represents a pause

For example:

1
text="Hello [uv_break] friends, I heard today is a good day, isn't it[uv_break]?[laugh]"

In the actual synthesis, [laugh] will be replaced by laughter, and [uv_break] will introduce a pause. The intensity of laughter and pauses can be controlled by passing prompt in the params_refine_text parameter.

1
2
chat.infer([text], params_refine_text={"prompt":'[oral_2][laugh_0][break_6]'})
chat.infer([text], params_refine_text={"prompt":'[oral_2][laugh_2][break_4]'})

Skipping the Refine Text Stage

The actual synthesis will re-organize (refine) the text with control symbols. If you do not want this, you can set the skip_refine_text parameter to True.

1
chat.infer([text], skip_refine_text=True, params_refine_text={"prompt":'[oral_2][laugh_0][break_6]'})

Fixing the Voice Tone

By default, each synthesis randomly calls a different voice tone, which can be unfriendly. To simply fix the voice role, you can manually set a random seed and then get a random speaker.

1
2
3
torch.manual_seed(2222)
rand_spk = chat.sample_random_speaker()
chat.infer([text], use_decoder=True, params_infer_code={'spk_emb': rand_spk})

Through testing, 2222 7869 6653 are male voices, and 3333 4099 5099 are female voices. You can test different seed numbers to find more voices.

API Request Tutorial

Request Method: POST
Request URL: http://127.0.0.1:9966/tts

Request Parameters:

  • text: str | Required, the text to be synthesized
  • voice: int | Optional, default 2222, the number determining the voice tone, 2222 | 7869 | 6653 | 4099 | 5099, you can choose one or pass any value to use a random voice tone
  • prompt: str | Optional, default empty, set laughter, pause, for example [oral_2][laugh_0][break_6]
  • temperature: float | Optional, default 0.3
  • top_p: float | Optional, default 0.7
  • top_k: int | Optional, default 20
  • skip_refine: int | Optional, default 0, 1=skip refine text, 0=do not skip
  • custom_voice: int | Optional, default 0, the seed value for custom voice tone, needs to be an integer greater than 0, if set, it will take precedence and ignore voice
  • is_split: int | Optional, default 0, 1=convert numbers to text for correct pronunciation, 0=keep unchanged

Response: JSON data

Successful response:

1
{code:0, msg:ok, audio_files:[dict1, dict2]}

Where audio_files is an array of dictionaries, each element dict is {filename: absolute path of the wav file, url: downloadable wav URL}

Error response:

1
{code:1, msg:reason for error}

API Call Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import requests

res = requests.post('http://127.0.0.1:9966/tts', data={
"text": "If you don't understand, no need to fill it out",
"prompt": "",
"voice": "3333",
"temperature": 0.3,
"top_p": 0.7,
"top_k": 20,
"skip_refine": 0,
"custom_voice": 0,
"is_split": 1
})
print(res.json())

Successful call result:

1
{code:0, msg:'ok', audio_files:[{filename: E:/python/chattts/static/wavs/20240601-22_12_12-c7456293f7b5e4dfd3ff83bbd884a23e.wav, url: http://127.0.0.1:9966/static/wavs/20240601-22_12_12-c7456293f7b5e4dfd3ff83bbd884a23e.wav}]}

Error result:

1
{code:1, msg:"error"}