this post was submitted on 20 Jan 2025
47 points (88.5% liked)

Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ

55583 readers
1035 users here now

⚓ Dedicated to the discussion of digital piracy, including ethical problems and legal advancements.

Rules • Full Version

1. Posts must be related to the discussion of digital piracy

2. Don't request invites, trade, sell, or self-promote

3. Don't request or link to specific pirated titles, including DMs

4. Don't submit low-quality posts, be entitled, or harass others



Loot, Pillage, & Plunder

📜 c/Piracy Wiki (Community Edition):


💰 Please help cover server costs.

Ko-Fi Liberapay
Ko-fi Liberapay

founded 2 years ago
MODERATORS
 

Of course, I'm not in favor of this "AI slop" that we're having in this century (although I admit that it has some good legitimate uses but greed always speaks louder) but I wonder if it will suffer some kind of piracy, if it is already suffering or people simple are not interested in "pirated AI"

all 46 comments
sorted by: hot top controversial new old
[–] lukewarm_ozone@lemmy.today 5 points 1 day ago

Incredibly weird that this thread was up for two days without anyone posting a link to the actual answer to OP's question, which is g4f.

[–] Zementid@feddit.nl 2 points 1 day ago

Well... sufficient local processing power would enable personalized "creators" which are pre-trained to provide certain content (e.g. A game). Those thingies will definitely be pirates, hacked and modded.

As they currently already are...

There are groups that give access to pirated AI. When I was a student, i used them to make projects. As for how they get access to it? They usually jailbreak websites that provide free trials and automate the account creation process. The higher quality ones scam big companies for startup credits. Then there are also some leaked keys.

Anyways thats what i would call "pirated AI". (Not the locally run AI)

[–] 3dmvr@sh.itjust.works 2 points 1 day ago

Its already free, you cant pirate cloud services but stable diffusion is free, deepspeek is free, you just need the hardware to run it

[–] daniskarma@lemmy.dbzer0.com 6 points 2 days ago* (last edited 2 days ago) (1 children)

Not pirated. But my country, Spain, released an open AI model completely for free. Everything is open. The training data the models and everything. It's supposedly ethically trained with open data(I have not personally dig in the training data but it's there published).

It's focused on spanish and regional languages of spain. But I think it can also do things in English.

Not piracy per se, as it's completely legal. But there's something you don't depend on any bussiness to run.

[–] jherazob@beehaw.org 2 points 1 day ago (1 children)

First i hear about it, any links?

[–] daniskarma@lemmy.dbzer0.com 3 points 1 day ago (1 children)
[–] jherazob@beehaw.org 2 points 1 day ago

Thanks! Need to see if they have documented their datasets AND are actually public

[–] arararagi@ani.social 4 points 1 day ago

Meta's model was pirated in a sense, someone leaked it early last year I think, but Llama isn't that impressive, and after using it on whatsapp seems like nothing got better.

[–] SweetCitrusBuzz@beehaw.org 39 points 3 days ago* (last edited 3 days ago) (2 children)

I mean they stole people's actual work already, so they're the bad kind of pirates.

[–] catloaf@lemm.ee 1 points 3 days ago (2 children)

How is what they're doing different from, say, an IPTV provider?

[–] SweetCitrusBuzz@beehaw.org 4 points 3 days ago

I'm talking about the data sets LLMs use, just so we're on the same page.

[–] notfromhere@lemmy.ml 0 points 3 days ago (2 children)

Just like people steal movies from the high seas? I hope this is sarcasm.

[–] keksi@sopuli.xyz 13 points 3 days ago (1 children)

More like, they took content and make money from it without paying to content creator.

[–] INHALE_VEGETABLES@aussie.zone 2 points 1 day ago (1 children)
[–] Bronzebeard@lemm.ee 1 points 1 day ago

Explain how.

Because training an AI is similar to training a person. You give it a bunch of examples to learn the rules from, then it applies what it learned to the prompt it is asked. The training data is not included in the end result.

[–] SweetCitrusBuzz@beehaw.org 14 points 3 days ago (1 children)

Nope, there's a difference there in that they aren't taking something from ordinary people who need the money in order to survive. Actors, producers, directors etc have already been paid and besides, hollywood etc aren't exactly using that money to give back to society in any meaningful way most of the time.

[–] notfromhere@lemmy.ml -4 points 3 days ago (1 children)

They did not take money from anyone. Are ‘t we on the priacy community? What is with the double standards? It’s theft if it’s against the Little Guy(tm) but it’s civil copyright violation if it’s against the Corpos?

[–] SweetCitrusBuzz@beehaw.org 21 points 3 days ago (2 children)

I'm against corporations. Not actual people. I don't see how that's double standards at all.

[–] Pippipartner@discuss.tchncs.de 1 points 1 day ago (1 children)

Best comment I read in a long time ❤️

[–] SweetCitrusBuzz@beehaw.org 2 points 1 day ago

Thank you 💜

[–] notfromhere@lemmy.ml 1 points 2 days ago

That tracks.

[–] Aceticon@lemmy.dbzer0.com 26 points 3 days ago* (last edited 3 days ago) (2 children)

I'm pretty sure those things are trained on content which was obtained without paying royalties to the creators, hence by definition pirated content - so that would count as "piracy around them".

On the opposite side, as far as I know the things created with Generative AI so far can't be copyrighted, hence by definition can't be pirated as they've always belonged to the Public Domain.

As for the engines themselves, there are good fully open source options out there which can be locally installed (if you have enough memory in your graphics card) and there seem to be thriving communities around it (at least it looks like it from what bit I dipped into that stuff so far). I'm not sure if it's at all possible to pirate the closed source engines since I expect those things are designed to be deployed to very specific server farm architectures.

[–] OminousOrange@lemmy.ca 6 points 2 days ago* (last edited 2 days ago) (1 children)

There are quite a few options for running your own LLM. Ollama makes it fairly easy to run (with a big selection of models - there's also Hugging Face with even more models to suit various use cases) and OpenWebUI makes it easy to operate.

Some self-hosting experience doesn't hurt, but it's pretty straightforward to configure if you follow along with Networkchuck in this video.

[–] can@sh.itjust.works 1 points 2 days ago (1 children)

Any that are easier to set up on a phone? I tried something before but had trouble despite having enough RAM.

[–] OminousOrange@lemmy.ca 3 points 2 days ago

Not that I'm familiar with. I would guess that the limited processing power of a phone would bring a pretty poor experience though.

[–] mindbleach@sh.itjust.works 3 points 1 day ago

Training is transformative use. Sluicing data through a pile of linear algebra, to mechanically distill the essence of words like "fantasy," is not what copyright protects against.

[–] heavydust@sh.itjust.works 17 points 3 days ago (1 children)

That makes no sense. Define pirated AI first.

[–] Grandwolf319@sh.itjust.works 12 points 3 days ago

Yeah the whole of generated AI feels like legal piracy (that they charge for) based on how they train their data

[–] PerogiBoi@lemmy.ca 14 points 3 days ago (2 children)

There already is. You can download copies of AI that are similar or better than ChatGPT from hugging face. I run different models locally to create my own useless AI slop without paying for anything.

[–] domi@lemmy.secnd.me 1 points 2 days ago (2 children)

Which model would you say is better than GPT-4? All I tried are cool but are not quite on GPT-4 level.

The very newly released Deepseek R1 "reasoning model" from China beats OpenAI's o1 model on multiple areas, it seems – and you can even see all the steps of the pre-answering "thinking" that's hidden from the user in o1. It's a huge model, but it (and the paper about it) will probably positively impact future "open source" models in general, now the "thinking" cat's outta the bag. Though, it can't think about Tiananmen Square or Taiwan's autonomy – but many derivative models will probably be modified to effectively remove such Chinese censorship.

[–] PerogiBoi@lemmy.ca 1 points 2 days ago

I’ve had good success with mistral

[–] maxprime@lemmy.ml 3 points 3 days ago (1 children)

Are you referring to ollama?

[–] PerogiBoi@lemmy.ca 5 points 3 days ago (1 children)

No because that is just an API that can run LLMs locally. GPT4All is an all in one solution that can run the .gguf file. Same with kobold ai.

[–] maxprime@lemmy.ml 2 points 3 days ago

Cool I’ll check that out

[–] 31337@sh.itjust.works 6 points 3 days ago

Some of the "open" models seem to have augmented their training data with OpenAI and Anthropic requests (I. E. they sometimes say they're ChatGPT or Claude). I guess that may be considered piracy. There are a lot of customer service bots that just hook into OpenAI APIs and don't have a lot of guardrails, so you can do stuff like ask a car dealership's customer service to write you Python code. Actual piracy would require someone leaking the model.

[–] Kaboom@reddthat.com 5 points 3 days ago (2 children)

You can just run Automatic1111 locally if you want to generate images. I don't know what the text equivalent is though, but I'm sure there's one out there.

There's no real need for pirate ai when better free alternatives exist.

[–] lukewarm_ozone@lemmy.today 2 points 1 day ago* (last edited 1 day ago)

There’s no real need for pirate ai when better free alternatives exist.

There's plenty of open-source models, but they very much aren't better, I'm afraid to say. Even if you have a powerful workstation GPU and can afford to run the serious 70B opensource models at low quantization, you'll still get results significantly worse than the cutting-edge cloud models. Both because the most advanced models are proprietary, and because they are big and would require hundreds of gigabytes of VRAM to run, which you can trivially rent from a cloud service but can't easily get in your own PC.

The same goes for image generation - compare results from proprietary services like midjourney to the ones you can get with local models like SD3.5. I've seen some clever hacks in image generation workflows - for example, using image segmentation to detect a generated image's face and hands and then a secondary model to do a second pass over these regions to make sure they are fine. But AFAIK, these are hacks that modern proprietary models don't need, because they have gotten over those problems and just do faces and hands correctly the first time.

This isn't to say that running transformers locally is always a bad idea; you can get great results this way - but people saying it's better than the nonfree ones is mostly cope.

[–] Zikeji@programming.dev 3 points 3 days ago* (last edited 3 days ago)

There are quite a few text equivalents. text-generation-webui looks and feels like Automatic1111, and supports a few backends to run the LLMs. My personal favorite is open-webui for that look and feel, and then there is Silly Tavern for RP stuff.

For generation backends I prefer ollama due to how simple it is, but there are other options.

[–] AceFuzzLord@lemm.ee 3 points 3 days ago (1 children)

Not sure it it counts in any way as piracy per say, but there is at least jail broken bing's copilot AI (Sydney version) using SydneyQT from Juzeon on github.

[–] can@sh.itjust.works 1 points 2 days ago

Tried to get bing to find the jailbreak for me not couldn't quite get it.

[–] mindbleach@sh.itjust.works 1 points 2 days ago

Dixie.Flatline-TiNYiSO

Jailbreaking LLMs and Diffusers is a thing. But I wouldn't call it piracy

[–] petrescatraian@libranet.de 0 points 3 days ago

@incognito08 AI could be a direction in piracy too imo