this post was submitted on 25 Jul 2024
1007 points (97.5% liked)
Technology
59582 readers
2645 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I was never able to get appreciably better results from 11 labs than using some (minorly) trained RVC model :/ The long scripts problem is something pretty much any text-to-something model suffers from. The longer the context the lower the cohesion ends up.
I do rotoscoping with SDXL i2i and controlnet posing together. Without I found it tends to smear. Do you just do image2image?
The voice library 11labs added includes some really reliable and expressive models. I've only trained a few voice clones, but I find them totally usable for swapping out short lines to avoid having to bring a subject back in to record. I'll fabricate a sentence or two, but for longer form stuff, I only use AI for the rough cuts. Then I'll practically record as a last step, once everything's gone through revision cycles. The "generate a few and chop em together" method is fine for short clips, but becomes tedious for longer stuff.
Funnily enough, when I say roto, I really just mean tracing the subject to remove it from the background. Background removal's so baked in to things now, I dunno if people even think of it as roto. But I mostly still prefer the Adobe solutions on this - roto brush in After Effects, for the AI/manual collaboration. As for roto in the A Scanner Darkly sense, I've played with a few of the video to video models, but mostly as a lark for fluff B-roll.