this post was submitted on 01 Aug 2023
527 points (82.4% liked)
Technology
59436 readers
2970 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
You put it really succinctly.
These models are trained on images generally depicting a lot of concepts. If I do a simple prompt like
monkey
I'm probably going to get a lot of images of monkeys in a natural setting, because when trained it's given images essentially stating that this is an image of "monkey, tree, fruit, foliage, green" and so forth. Over time this pattern repeats, and it will start associating one concept with another. Because a lot of pictures that depictmonkey
also depictfoliage, nature
and what have you, it makes sense for it to fill in the rest ofmonkey
withfoliage, green, nature
and lessvolcano, playful dog, ice cream
since those concepts haven't been very prominent when presented withmonkey
.That is essentially the problem.
Here is the result for "monkey"
And here is
monkey, volcano, playful dog, ice cream
The datasets are permeated with these kinds of examples. If you prompt for
nurse
you'll get a lot of women in blue/hospitaly clothing, inside hospitals or non-descript backgrounds, and very few men.Here's
photo of a nurse
The more verbose and specific you get though, the likelier it is that you'll get the outcome you want. Here for example is
male (nurse:1) (wearing white scrubs:1) with pink hair skydiving
This was so outlandish that without the
(wearing white scrubs:1)
it just put the skydiver in a random pink outfit, even with the added weight onwearing white scrubs
it has a tendency to put the subject in something pink. Without the extra weight on(nurse:1)
it gave me generic pink (mostly) white men.If we were to fix the biases present in our society, we'd possibly see less biases in the datasets, and subsequently less bias in the final models. The issue I believe, isn't so much a technological one, as it is a societal one. The machines are only racist because we are teaching them to be, by example.
More to the point, there are so many parameters that can be tweaked, to throw your image into a “generator” without knowing what controls you have, what the prompt is doing, what model it’s using etc is like saying “the internet” is toxic because you saw a webpage that had a bad word on it somewhere.
I put her actual photo into SD 1.5 (the same model you used) with 30 step Euler, 5.6 cfg and 0.7 noise and got these back. I’d say 3/4 of them are Asian (and the model had a 70% chance to influence that away if it were truly “biased” in the way the article implies), obviously none of them look like the original lady because that’s not how it works. You could generate a literally infinite number of similar-looking women who won’t look like this lady with this technique.
The issue isn’t so much that the models are biased — that is both tautologically obvious and as mentioned previously, probably preferred (rather than just randomly choosing anything not specified at all times — for instance, your monkey prompt didn’t ask for forest, so should it have generated a pool hall or plateau just to fill something in? The amount of specificity anyone would need would be way too high; people might be generated without two eyes because it wasn’t asked for, for instance); it is that the models don’t reflect the biases of all users. It’s not so much that it made bad choices but that it made choices that the user wouldn’t have made. When the user thinks “person”, she thinks “Asian person” because this user lives in Asia and that’s what her biases toward “person” start with, so seeing a model biased toward people from the Indian subcontinent doesn’t meet her biases. On top of that, there’s a general potential impossibility of having some sort of generic “all people” model given that all people are likely to interpret its biases differently.
With a much lower denoising value I was able to get basically her but airbrushed. It does need a higher denoising value in order to achieve any sort of "creativity" with the image though, so at least with the tools and "skill" I have with said tools, there's a fair bit of manual editing needed in order to get a "professional linkedin" photo.
Yeah if she wants the image to be transformed lower denoising won’t really do it.
Honestly, I know what she did, because I had the same expectation out of the system. She threw in an image and was expecting to receive an infinite number of variations of specifically her but in the style of a “LinkedIn profile photo”, as though by providing the single image, it would map her face to a generic 3d face and then apply that in a variety of different poses, lighting situations and clothing. Rather, what it does is learn the relations between elements in the photo combined with a healthy amount of static noise and then work its way toward something described in the prompt. With enough static noise and enough bias in the model, it might interpret lots of fuzzy stuff around her eyes as “eyes”, but specifically Caucasian eyes since it wasn’t specified in the prompt and it just sees noise around eyes. It’s similarly easy to get a model like Chillout (as someone mentioned in another thread) to bias toward Asian women.
(This is the same picture, just a model change, same parameters. Prompt is: “professional quality linkedin profile photo of a young professional”)
After looking at a number of different photos it’s also easy to start to see where the model is overfit toward a specific look, which is a problem in a technical sense (in that it’s just bad at generating a variety of people, which is probably the intention of the model) and in an ethical sense (in that it’s also subjectively biased).
You could get a more controlled result if you use inpainting. I resized the image in an image editor first, because it gave me really strange results when I gave it the strange resolution of the image from the article. I masked out her face and had the model go ham on the rest of the image, while leaving the face and the hairline untouched.
After that I removed the mask, changed the denoising strength to 0.2, applied the "face fix", and this is the end result.
It's usable, but I think it's a weird use-case for these kinds of tools. Like yeah you can retouch photos if you're careful enough, but at the same time, wouldn't it be easier to just dress up and take some new photos? I dunno, the idea of using an AI generated profile image feels kind of sketchy to me, but then again we have had beautification filters available for ages - my work phone, a Google Pixel 6, comes with it built into the camera application. Every time the camera opens on my face I get this weird uncanny feeling.
Anyway. The article does touch upon a problem that definitely worries me too
I really hope no company would use an image model to analyse candidates profile photos, because as an ugly person that makes me want to scream. However, this has been a problem in the past, Amazon developed a tool for use by recruiters, which turned out to have a bias against women. I can easily see a "CV analysis tool" having a bias against people with names of non-European origin for example.
At this point I think it's impossible to put the genie back into the bottle, given the chance I definitely would, but I think all we can do now is ensure that we try and mitigate potential harm caused by these tools as much as possible.