ChatTTS-UI for Mac, the Ultimate Tool for Text-to-Speech Conversion, API Supported!

ChatTTS-UI for Mac: The Ultimate Tool for Text-to-Speech Conversion, API Supported!

The download link is: https://www.patreon.com/user/shop/chattts-ui-for-mac-ultimate-tool-for-to-249930?u=122446863&utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=productshare_creator&utm_content=join_link.

Recently, there’s an exciting project called ChatTTS-UI. It’s a simple local web interface that uses ChatTTS to convert text to speech and supports external API calls. The project is open-source and can be found here: https://github.com/jianchang512/ChatTTS-ui

Features of ChatTTS-UI

Compared to the official usage page, ChatTTS-UI has the following features:

Mixed effects of text, numbers, and symbol control characters
Optimization of Chinese text normalization
Added speed control (speed 1-9)
Fixed voice tone
Added compile=true to enable inference optimization
Numbers are converted to corresponding language pronunciations
Added Chinese-English word segmentation function
Improved interface and API
Added top_p and top_k parameter controls
Multi-line text synthesized line by line
Custom voice tone seed value
Option to skip the refine text stage
Online text synthesis via web page
API interface support

Installation Guide

To simplify the installation process, I have packaged the tool into a standalone startup package. Users can run it with a simple click without the need for configuring a complex Python environment. Here are the detailed steps to obtain and install the application:

Download the Application

Note: Only supports devices with Mac M1/2/3 series chips.

Installation Steps

Download the DMG image file from the link above and drag the app file into the Applications folder.
After copying and completing the installation, do not open it from the Launchpad for the first time. Instead, right-click in the Applications folder to open it.
The software will automatically open the operation interface in the default browser, and you can start using it in the browser.

ChatTTS Usage Tips

Control Symbols Available in Text

You can intersperse control symbols in the original text to be synthesized. Currently, two control symbols are supported: laughter and pause.

[laugh] represents laughter
[uv_break] represents a pause

For example:

1	`text="Hello [uv_break] friends, I heard today is a good day, isn't it[uv_break]?[laugh]"`

In the actual synthesis, [laugh] will be replaced by laughter, and [uv_break] will introduce a pause. The intensity of laughter and pauses can be controlled by passing prompt in the params_refine_text parameter.

1 2	`chat.infer([text], params_refine_text={"prompt":'[oral_2][laugh_0][break_6]'}) chat.infer([text], params_refine_text={"prompt":'[oral_2][laugh_2][break_4]'})`

Skipping the Refine Text Stage

The actual synthesis will re-organize (refine) the text with control symbols. If you do not want this, you can set the skip_refine_text parameter to True.

1	`chat.infer([text], skip_refine_text=True, params_refine_text={"prompt":'[oral_2][laugh_0][break_6]'})`

Fixing the Voice Tone

By default, each synthesis randomly calls a different voice tone, which can be unfriendly. To simply fix the voice role, you can manually set a random seed and then get a random speaker.

1
2
3

torch.manual_seed(2222)
rand_spk = chat.sample_random_speaker()
chat.infer([text], use_decoder=True, params_infer_code={'spk_emb': rand_spk})

Through testing, 2222 7869 6653 are male voices, and 3333 4099 5099 are female voices. You can test different seed numbers to find more voices.

API Request Tutorial

Request Method: POST
Request URL: http://127.0.0.1:9966/tts

Request Parameters:

text: str | Required, the text to be synthesized
voice: int | Optional, default 2222, the number determining the voice tone, 2222 | 7869 | 6653 | 4099 | 5099, you can choose one or pass any value to use a random voice tone
prompt: str | Optional, default empty, set laughter, pause, for example [oral_2][laugh_0][break_6]
temperature: float | Optional, default 0.3
top_p: float | Optional, default 0.7
top_k: int | Optional, default 20
skip_refine: int | Optional, default 0, 1=skip refine text, 0=do not skip
custom_voice: int | Optional, default 0, the seed value for custom voice tone, needs to be an integer greater than 0, if set, it will take precedence and ignore voice
is_split: int | Optional, default 0, 1=convert numbers to text for correct pronunciation, 0=keep unchanged

Response: JSON data

Successful response:

1	`{code:0, msg:ok, audio_files:[dict1, dict2]}`

Where audio_files is an array of dictionaries, each element dict is {filename: absolute path of the wav file, url: downloadable wav URL}

Error response:

1	`{code:1, msg:reason for error}`

API Call Code

import requests

res = requests.post('http://127.0.0.1:9966/tts', data={
  "text": "If you don't understand, no need to fill it out",
  "prompt": "",
  "voice": "3333",
  "temperature": 0.3,
  "top_p": 0.7,
  "top_k": 20,
  "skip_refine": 0,
  "custom_voice": 0,
  "is_split": 1
})
print(res.json())

Successful call result:

{code:0, msg:'ok', audio_files:[{filename: E:/python/chattts/static/wavs/20240601-22_12_12-c7456293f7b5e4dfd3ff83bbd884a23e.wav, url: http://127.0.0.1:9966/static/wavs/20240601-22_12_12-c7456293f7b5e4dfd3ff83bbd884a23e.wav}]}

Error result:

1	`{code:1, msg:"error"}`

AI Audio Generators

#Mac #TTS

Mac Version One-Click Video Background Removal App Previous

Mac Image Restoration Application, Revitalize Your Old Photos Next