Speech Cloning: Text-To-Speech Using VITS

Journal Title: Engineering and Technology Journal - Year 2024, Vol 9, Issue 05

Abstract

Voice is one of the most common and natural communication methods for humans. Voice is becoming the primary interface for AI voice assistants like Amazon Alexa, as well as in autos and smart home devices. Homes and so on. As human-machine communication becomes more common, researchers are exploring technology that mimics genuine speech. Speech cloning is the practice of copying or mimicking another person's speech, usually utilizing modern technology and artificial intelligence (AI). This entails producing a synthetic or cloned version of someone's voice that sounds very similar to the actual speaker. The objective is to produce speech that is indistinguishable from the genuine person, both in tone and intonation. Instant Voice Cloning (IVC) in text-to-speech (TTS) synthesis refers to the TTS model's capacity to copy the voice of any reference speaker based on a short audio sample, without requiring extra speaker-specific training. This method is usually referred to as zero-shot TTS. IVC provides users with the flexibility to tailor the generated voice, offering significant value across diverse real-world applications. Examples include media content creation, personalized chatbots, and multi-modal interactions between humans and computers or extensive language models.

Authors and Affiliations

Utkarsh Verma, Dr. Padmanaban R,

Keywords

Related Articles

EVALUATED THE PROPERTIES OF A COMPOSITE MATERIAL WHICH REINFORCED BY FIBER GLASS

In this research, a polymer composite material was prepared from two polymers (Epoxy resin- Resole resin) and reinforced by fiber glass, The phenol formaldehyde resin was used with different weight ratio ( 10% , 20% , 30...

Developing and Implementing a Value-Based Operations and Maintenance Performance Management in Nigeria Electric Power Industry

The Nigeria electric power industry is yet to recognize and adjust management practices, especially those concerned with operations and maintenance, to the changed and changing business conditions in electric power produ...

Quality Control for The Maintenance Organization in Clark Pampanga for Line Maintenance

Unscheduled maintenance happens where an unknown problem occurs that is not in the schedule of tasks to be performed by a mechanic unlike scheduled maintenance, line maintenance is done every landing and before take-off...

Prediction and Verification of Groundwater Potential in Qingyang Area Based on Reliability Test, Validity Test and ROC Curve Method

The combination of remote sensing and GIS has become a common method to locate groundwater potential. Selecting reasonable and effective evaluation factors has become the most important thing in groundwater potential pre...

REVIEW ON VARIOUS HAZE REMOVAL METHODS FOR IMAGE DE-HAZING

Haze is an atmospheric effect in which turbid media like fog, dust particles, smoke, haze, snow abs truces the captured image resulting in reduction in scene visibility, increase in color fading thus reducing the color c...

Download PDF file
  • EP ID EP735380
  • DOI 10.47191/etj/v9i05.10
  • Views 9
  • Downloads 0

How To Cite

Utkarsh Verma, Dr. Padmanaban R, (2024). Speech Cloning: Text-To-Speech Using VITS. Engineering and Technology Journal, 9(05), -. https://europub.co.uk/articles/-A-735380