Cavendish

joined 1 year ago
MODERATOR OF
[–] Cavendish@lemmynsfw.com 2 points 1 year ago (2 children)

Exactly. Think of a portrait orientation image: top 25% sky, next 25% head, bottom 50% torso. Will come out way different than top 60% head, bottom 40% chest. Using keywords like "closeup, medium shot, cowboy shot" are less effective for me, but that's what you see in lots of tutorial posts for controlling composition through prompting alone. You can even go crazy with the positioning. Portrait photo split vertically, with head in the left column, body in the right column, will make them lean over or arched back, etc.

[–] Cavendish@lemmynsfw.com 6 points 1 year ago (10 children)

You can get a lot of interesting pose variety by messing with the aspect ratio. See also regional prompting to carve out spaces within the larger frame. I find putting head/hair/face prompts in their own region, then scaling that region, to be extremely effective in controlling close-up to wide shot framing.

[–] Cavendish@lemmynsfw.com 1 points 1 year ago

Stable Diffusion always (? I think anyway) puts prompt metadata into the output image. The problem is that it's easy to strip it out converting to jpg or other formats. Even just uploading to Lemmy will strip the metadata, That is why I use catbox.moe which preserves all of that info.

[–] Cavendish@lemmynsfw.com 5 points 1 year ago

Post it again, and screw the downvotes. I thought that image was pretty good and am kicking myself now for not saying so before you took it down. Please don't lurk, this community needs more diversity.

[–] Cavendish@lemmynsfw.com 3 points 1 year ago (2 children)

Normally, you can use a tool like pngchunk.com to read the metadata, but I just realized that I did an SD Upscale on this one and it didn't preserve the prompt. Sorry about that, I'll put the entire metadata dump below in a spoiler. I'm not sure it'll be that helpful though, this image uses a custom LoRA I'm working on that I haven't released yet, and is complicated by the fact that I started with one model (ToonYou beta 6) for the first 40% of the generation, then switch to a realistic checkpoint merge for the last 60% as a refiner.

prompt


{ "parameters": "Photo of a young 21yo woman posing in (cav_rdrguarma:1.5),\nmasterwork, best quality, soft shadow\n(Photograph with film grain, Sony A7 camera, f1.2, shallow depth of field, 85mm lens),\nnight photo of a jungle \n\nADDCOMM\n\n(sun-kissed to honey ombre hair color in a voluminous curls style:1.2),\nADDROW\n\npendant necklace,\n(light khaki smocked bodice sundress with a flowy skirt and puff sleeves:1.1) (top pulled down showing breasts:1.2) droptop \n\n(flat breast, normal_nipples :1.3), \n(tan lines, beauty marks:0.6)\n\n(SkinHairDetail:0.5)\n\nNegative prompt: (child, childlike) BadDream UnrealisticDream Asian-Less-Neg\r\namateur, blurry, logo, watermark, signature, cropped, out of frame, worst quality, low quality, jpeg artifacts, poorly lit, overexposed, underexposed, glitch, error, out of focus, \r\n(semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, digital art, anime, manga:1.3), \r\n(poorly drawn hands, poorly drawn face:1.2), deformed iris, deformed pupils, morbid, duplicate, mutilated, extra fingers, mutated hands, poorly drawn eyes, mutation, deformed, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, incoherent,\r\ngrayscale, jeans, denim\nSteps: 50, Sampler: DPM++ 2S a Karras, CFG scale: 7, Seed: 1046635747, Size: 640x832, Model hash: e8d456c42e, Model: toonyou_beta6, VAE hash: 63aeecb90f, VAE: vae-ft-mse-840000-ema-pruned.safetensors, Clip skip: 2, RP Active: True, RP Divide mode: Matrix, RP Matrix submode: Rows, RP Mask submode: Mask, RP Prompt submode: Prompt, RP Calc Mode: Attention, RP Ratios: "1,1", RP Base Ratios: 0.2, RP Use Base: False, RP Use Common: True, RP Use Ncommon: False, RP Change AND: False, RP LoRA Neg Te Ratios: 0, RP LoRA Neg U Ratios: 0, RP threshold: 0.4, RP LoRA Stop Step: 0, RP LoRA Hires Stop Step: 0, RP Flip: False, Lora hashes: "cav_rdrguarma-v4: 0540b2c6b046, cav_rdrguarma-v4: 0540b2c6b046, droptop: 24494c0ed389, Breasts_Helper_Trail_v2: 470f04826a09", TI hashes: "SkinHairDetail: edf710bf1ea5, BadDream: 758aac443515, UnrealisticDream: a77451e7ea07, Asian-Less-Neg: 22d2f003e76f", Refiner: Cavendish_Hotel [8ead5e5021], Refiner switch at: 0.4, Version: v1.6.0, Hashes: {"vae": "735e4c3a44", "embed:Asian-Less-Neg": "22d2f003e7", "embed:BadDream": "758aac4435", "embed:SkinHairDetail": "edf710bf1e", "embed:UnrealisticDream": "a77451e7ea", "lora:cav_rdrguarma-v4": "aac45a2863", "lora:droptop": "917fcd35a6", "lora:Breasts_Helper_Trail_v2": "124fe77b5d", "model": "e8d456c42e"}\nTemplate: Photo of a young 21yo woman posing in (cav_rdrguarma:1.5),\nmasterwork, best quality, soft shadow\n(Photograph with film grain, Sony A7 camera, f1.2, shallow depth of field, 85mm lens),\nnight photo of a jungle \n\n, \n\n(sun-kissed to honey ombre hair color in a voluminous curls style:1.2),\nBREAK Photo of a young 21yo woman posing in (cav_rdrguarma:1.5),\nmasterwork, best quality, soft shadow\n(Photograph with film grain, Sony A7 camera, f1.2, shallow depth of field, 85mm lens),\nnight photo of a jungle \n\n, \n\npendant necklace,\n(light khaki smocked bodice sundress with a flowy skirt and puff sleeves:1.1) (top pulled down showing breasts:1.2) droptop \n\n(flat breast, normal_nipples :1.3), \n(tan lines, beauty marks:0.6)\n\n(SkinHairDetail:0.5)\n\nNegative Template: (child, childlike) BadDream UnrealisticDream Asian-Less-Neg\r\namateur, blurry, logo, watermark, signature, cropped, out of frame, worst quality, low quality, jpeg artifacts, poorly lit, overexposed, underexposed, glitch, error, out of focus, \r\n(semi-realistic, cgi, 3d, render, sketch, cartoon, drawing, digital art, anime, manga:1.3), \r\n(poorly drawn hands, poorly drawn face:1.2), deformed iris, deformed pupils, morbid, duplicate, mutilated, extra fingers, mutated hands, poorly drawn eyes, mutation, deformed, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, incoherent,\r\ngrayscale, jeans, denim" }

[–] Cavendish@lemmynsfw.com 1 points 1 year ago

Thank you! I use a bunch of different custom merges. See this very large xy comparison grid i posted earlier: https://files.catbox.moe/1k6mmr.jpg

I'd recommend Absolute Reality or LazyMix for an off-the-shelf model.

[–] Cavendish@lemmynsfw.com 5 points 1 year ago (2 children)

There's not much out there on training LoRAs that aren't anime characters, and that just isn't my thing. I don't know a chibi from a booru, and most of those tutorials sound like gibberish to me. So I'm kind of just pushing buttons and seeing what happens over lots of iterations.

For this, I settled on the class of place. I tried location but it gave me strange results, like lots of pictures of maps, and GPS type screens. I didn't use any regularization images. Like you mentioned, i couldn't think of what to use. I think the regularization would be more useful in face training anyway.

I read that a batch size of one gave more detailed results, so I set it there and never changed it. I also didn't use any repeats since I had 161 images.

I did carefully tag each photo with a caption .txt file using Utilities > BLIP Captioning in Kohya_ss. That improved results over the versions I made with no tags. Results improved again dramatically when I went back and manually cleaned up the captions to be more consistent. For instance, consolidating building, structure, barn, church, house all to just cabin.

Epochs was 150, which gave me 24,150 steps. Is that high or low? I have no idea. They say 2000 steps or so for a face, and a full location is way more complex than a single face... It seems to work, but it took me 8 different versions to get a model I was happy with.

Let me know what ends up working for you. I'd love to have more discussions about this stuff. As a reward for reading this far, here's a sneak peek at my next lora based on RDR2's Guarma island. https://files.catbox.moe/w1jdya.png. Still a work in progress.

[–] Cavendish@lemmynsfw.com 4 points 1 year ago

🫡

Nice work!

[–] Cavendish@lemmynsfw.com 5 points 1 year ago (2 children)

AckbarItsATrap.gif

[–] Cavendish@lemmynsfw.com 3 points 1 year ago

Thank you, it still feels like magic to me, so it's fun to see how SD reacts to different inputs.

[–] Cavendish@lemmynsfw.com 3 points 1 year ago

I get a lot of half shirts and sweaters with very unconventional cut outs for sure. SD has trouble "bunching" fabric for the lift/reveal shots, so it likes to just cut things off.

I do like that black & blue knit+leather number though. Its unusual, but really cute. I did a double take when that came out of the diffusion soup.

view more: ‹ prev next ›