Voice-change, Lip-sync, Text-to-speech, Music/Audio tools for projects
mindsong
Posts: 1,701
A place to collect thoughts, tips, and tools on voice-changing, lip-sync, text-to-speech, and ausio processing for our projects (esp. animations).
Add your own tips, links, resources, etc. below, or send them to me and we'll maintain a knowledgebase here...
Because of the overlap between tools and workflows, I'd say that anything you know of in this domain is useful, be it Carrara-specific, or using any one of the standard Carrara plugins (Poser, DS, Iclone, etc. :)
Standalone utils and scripts are always relevant as well. Anything related to making a 3D mouth move, or generating the sound that goes with it is welcomed.
--ms
Post edited by mindsong on
Comments
Voice Changing tools:
These are standalone tools that work in batch or realtime to alter incoming sound streams (usually voice) from one frequency range and/or timbre to another - e.g. male to female, or child to adult, etc.
Most also have settings that can be used to produce cartoony or robotic voices as well. To my experience, most of the outputs end up sounding a bit synthetic, but if you are willing to 'bend' your own voice on the microphone toward the target sound (e.g male trying to sound more female), and use the various application's sound adjustments with restraint, some pretty compelling outputs can be produced, and presets can be saved for these settings. The results have no copyright constraints (assuming the inputs aren't copyrighted...). Once a preset is saved, some of these tools allow for batch conversion, allowing for consistent pre-recording and conversion of full animation voice sequences, using multiple characters.
The inputs and outputs can generally be standard soundfiles (WAVE, MP3, AAC, etc.), that can come from microphones, sound-files, audio-streams, etc., then be processed (in many ways) and used with lip-sync tools for our 3D efforts, and also inserted into the final video-edits.
Any products/tools mentioned below are NOT endorsed, but simply available, and I have no affiliation with any of these products or companies other than possibly being an owner/user. YMMV
Commercial Voice Changing Software
Product: Screaming Bee's Morphvox Series
Source: https://screamingbee.com
Platform(s): Windows
Notes: Free and Payfor versions for both realtime and batch voice conversion. Some good multi-voice and script-writing utilities as well. Presets available for realistic and cartoon/fantasy voices.
Product: Audio4Fun Voice Changer Series
Source: https://www.audio4fun.com/voice-changer.htm
Platform(s): Windows
Notes: Various versions for realtime and batch conversion. Presets available for realistic and cartoon/fantasy voices.
text-to-speech (TTS) and speech-to-text:
These are tools that attempt to convert text to audio, and audio to text.
In our 3D domain (esp. animation), text to speech is probably the most relevant, as sound would typically be the most useful end-product. That said, any tool that lets us rework our data in ways that let's creative folks work toward their/our target goals will enable creative spirits. At any rate, all tools and techniques are welcomed and encouraged.
Almost all mainstream computer environments have basic text-to-speech capabilities built in - usually as a tool to support users with disabilities, etc. Similarly, speech-to-text is also available in the form of Apple's 'SIRI' and Microsoft's 'Cortana'.
As inexpensive computing capacity becomes available these tools are becoming increasingly sophisticated in that they're quickly becoming more sensitive to linguistic and idiomatic differences, but this also adds to the complexity of using these tools.
As we return (technologically) to our story-telling roots, these tools will become more prolific, capable, and interesting to uus in our creative endeavors.
Text-to-speech tools:
Microsoft WIndows 'Text-to-speech' (built-in, with extensions):
Speech-to-text tools:
Apple's SIRI - native to current MacOS/IOS devices
Microsoft's Cortana - native to WIndows devices
Nuance: Dragon Dictate Series: https://www.nuance.com/dragon.html
IBM's speech-to-text: https://www.ibm.com/watson/services/speech-to-text/
Google's text-to-speech: https://cloud.google.com/text-to-speech/ - from interesting thread here
related (google) from REIVAX
DNA Software (almost all Japanese) free TTS application: http://dnasoft.web.fc2.com/soft/texttowav/index.html (from the same discussion thread above)
Lip-Sync Resources:
mcjaudiomation by MCasual: (free, but donate!) https://sites.google.com/site/mcasualsdazscripts2/mcjaudiomation DAZ/Poser Animation controlled by sound file contents. This little gem creates Poser-style PZ2 animation streams (tied to any figure sliders you like), based on the ongoing energy levels in sound files. While the examples in the documentation maps sounds to VU meters, lights, speaker movement, etc. It can also drive a cartoon mouth or emotion sliders with elegance. Windows w/ DS scripts.
DAZ Inc. sound-to-motion mapping tools for lip-sync:
These work with any figures that have an available '*.DMC' viseme/slider mapping files. Most DAZ figures have some form of DMC file available, and many non-DAZ figures have some that are available on sites like sharecg.com.
Mimic Pro for Carrara: microphone input to figure viseme (defined mouth shape) motions.
Mimic Live: (DAZ Studio, but can be exported to Carrara) microphone input to figure viseme (defined mouth shape) motions. Windows?
Mimic Lite: No longer available 'lite' version of the standalone Mimic Pro utility (also no longer available? toolfarm.com?) for Poser/DAZ figures Windows
Mimic Pro: No longer available standalone sound to viseme mapping tool. Exports PZ2 files for conversion/import to other tools. Last known to be available at www.toolfarm.com. Windows
DAZ Studio 4.x 32bit - 'lip-sync' (built-in plugin) - only found in the 32-bit versions of DAZ Studio (a Carrara Plugin :), this plugin leverages the early DAZ lip-sync tool libraries to enable sound-to-viseme mapping in DAZ figures that have so-called DMC mapping files available. Results can be exported as PZ2 Pose presets, or duf files for importing into Carrara.
Papagayo lip-sync tool - http://www.lostmarble.com/papagayo/ and python version: https://morevnaproject.org/papagayo-ng/ - don't know much about this one, but it's been around for a long time and might be useful in your workflow. Outputs to Moho (2D animation tool), and Blender. MacOS and Windows. Update: It looks like a DS script has been written to import papagayo outputs to DS availble at sharecg: Papagayo to DS Importer. Forum thread with instructions: https://www.daz3d.com/forums/discussion/336526/alternative-audio-based-lipsinc-for-daz-studio
Relevant Links (forums/discussions/tutorials - anything that can eventually be used in Carrara):
https://www.daz3d.com/forums/discussion/336526/alternative-audio-based-lipsinc-for-daz-studio
which mentions:
https://www.daz3d.com/forums/discussion/336526/alternative-audio-based-lipsinc-for-daz-studio
https://www.sharecg.com/v/88621/view/8/Script/Script-For-Importer-Files-Lip-Sync-in-DAZ-Studio
Other Audio Tools (music, MIDI, sound editors, DAWs, video-sound, etc.):
Sound Editors: (there are zillions of these, but a few stand out for popularity/price/etc.)
Audacity - Free/Open Source sound editor: https://www.audacityteam.org/
Mature full-featured sound recorder, editor, and conversion tool.
Windows, MacOS, and Linux
Magix Music Maker (and other DAWs): https://www.magix.com/us/music/ - Free base software with lo-rez loops, payfor add-on instruments and hi-rez sound loop collections. Kind of like DAZ Studio for music SW/Content model. Note that there are both personal-use and commercial-use licenses limitations to these sound loops with prices to match...! Works like AniBlocks or NLA blocks but with sounds (MIDI and sound samples).
IK Multimedia's series, especially 'SampleTank' : https://www.ikmultimedia.com/ - Full-range of sample/MIDI composition tools with beginner->pro versions and sound sample collections for sale. I believe these samples are all assumed to be used as professional commercial outputs. (any know otherwise?)
Music Notation and lyrics to audio/sound files:
Myriad Software - Musical Notation to audio tools: https://www.myriad-online.com/en/products/virtualsinger.htm
Hello mindsong
one thing fun in vb script this read reel time computer. save this txt in xxxx.vbs
Dim texte, lecture
Set lecture=CreateObject("sapi.spvoice")
texte="Il est "& time()
lecture.speak texte
clik on
Ps sorry this in french
and now one speech to txt. you can try its free
https://www.ibm.com/watson/services/speech-to-text/
and one virtual singer. creator are french. but the soft is in many langage and win32 64 /mac
https://www.myriad-online.com/en/products/virtualsinger.htm
@REIVAX : Virtualsinger - la voix de Stephen Hawking, qui chante "Strangers in the Night" - c'est bien drole !
I've never heard of virtual singer - but it seems like a lot of fun! The example of Strangers in the night sounds like Stephen Hawking!
I have used virtual singer in Myriad Melody Assistant for probably 10 years
but not as frequently now as tend to do music without lyrics
Here's a fun one to try...
MUSIC: g(4)¦c(8)b(8)f(8)g(8)a(4)f(4)¦e(4)f(4)c(4)e(4)¦f(8)e(8)d(4)a(4)g(8)a(8)¦e(2+4)
WORDS: I ¦have to do a lit-tle ¦house-work ba-by, ¦when I feel an-gry at ¦you.
¦ - barline
(4) crotchet, quarter note
(8) quaver, eighth note
(2+4) dotted minim, 3 beats
If that was a music score I could sing it sight reading but letters and numbers I would really have to think about it, not on my PC right now, I guess most people used to piano rolls and numbers now, I have to use bar lines being raised with it learning piano from 7yo just cannot cope with other DAW software at all.
The music is Troika from Lieutenant Kije
An early video
I literally have hundreds of them BTW
A more recent one
That's fab, Wendy - couldn't pick out a single word of what was being 'sung' but liked the note sliding. Maybe 'Mmmmmmm' would work better?
Like the Booty Fall Doll - very arty !
I downloaded the trial and here's my first attempt - straight out the box, seems pretty easy to use...
Now with piano accompaniment (takes me back to A' Level Music where we had to harmonise [Bach] Chorals) and continuing lyrics...
Thanks to all for the great inputs already. It looks like this thread is already striking a chord...
I'll try to coordinate the contents in the TOC/header notes as time goes on.
neato!
--ms
hello all
Speech breathing
the pdf
http://www.arishapiro.com/SpeechBreathingwithstudy_VHCIE2019.pdf
perhaps you don't know dance from arishapiro ; dance character animation and simulation.
with physics
http://www.arishapiro.com/dance/
Somebody give the man a virtual inhaler
Interesting when pointed out... I haven't thought about it, but it does have an impact on the continuity/realism of the speech.
Someone pointed out that humans generally blink before they change/redirect their gaze. Once you notice it, it's kind of distracting when you see it everywhere...
cool links!
--ms
hi wendy
it use python pygubu
I added this tool to the reference posts (above), but it bears explicit mention:
MCasual, our local DS freebie script hero, wrote some scripts and a sound analysis utility (windows) that binds soundfile characteristics (energy levels) to arbitrary poser/DAZ sliders of any sort. It's called 'mcjaudiomation' (free, but donate!) from https://sites.google.com/site/mcasualsdazscripts2/mcjaudiomation.
This little gem creates Poser-style PZ2 animation streams (tied to any figure sliders you like), based on the ongoing energy levels in sound files. While the examples in the documentation map ties sounds to things like VU meters, lights, speaker movement, etc. It can also drive a figure's mouth or emotion sliders with a certain elegance - works really well for cartoon vocals.
The results can be imported into Carrara or be used in any other workflow that starts with PZ2 'pose preset' or duf files. I have mimic-pro, which does sophisticated sound analysis to map visemes and the like, and I find that this far more basic approach works pretty darned well in comparison.
I presume it could be used by someone to drive any motion by simply making well-times sounds (vocal or otherwise), to drive arbitrary sliders. E.g. saying "tick tock tick tock" to control a clock pendulum, etc.
--ms
after all this time funally bought my microphone. went usb
Blue Microphones - Snowball USB Cardioid and Omnidirectional Electret Condenser Vocal Microphone
ahem mee mee mee maa maa maa moh moh moh muuu muu moo
doh ray mee fah soh lah tee dohh
Windows 10 has piles of Text to speech voices you can get for Narrator including Australian accents but you can only access 4 of them through any other apps ie Balabolka, iClone etc without a registry hack, two American and British English ones (each gender)
the registry hack scares me too much to try
there is another hack for Cortana too
https://www.ghacks.net/2018/08/11/unlock-all-windows-10-tts-voices-system-wide-to-get-more-of-them/
for now I have used Narrator
prepared text
Audacity set to use stereomix as my microphone and my non existent motherboard sound output to playback as I have the Nvidia one on my monitor speaker
Cool stuff!
It's been a long time since I've used Mimic Pro for Carrara, and even longer since using Mimic Pro (standalone) for Poser.
The later creates PZ2 animated pose (or is it the Face files?) with the sound injected into it, so when you apply the pose (or is it expression FC2?), the sound comes with it. Pretty cool. It's also a workshop for tweaking the visemes, expressions, and extra motion to your liking before writing the final file.
The thing that I really love about Mimic Pro for Carrara (besides the fact that it works directly in Carrara and works really well) is that we can create our own viseme shapes as NLA poses individually for any give character - so I can make Rosie talk like Rosie (visually), Dart talk like... well... me, the bad guy talk like the bad guy, etc.,
Okay, after saying all of that, I am eager to try mCasual's plugin!
seeing there is an audacity pro version undecided on it
Don't
its a rip off like the people who sell a version of Blender 3D
it opensource software
https://en.wikipedia.org/wiki/Audacity_(audio_editor)
the feature i really need is the remove background noise.
my place has noisy refridgerator.
the stores don't plug in the refrigerators, can't hear them before buying.
would love to give my actors Australian accents
what kind of accent would be good for a minotaur?
Greek?