Computer Help and Support

douglas9

(4,481 posts) Sun Jul 16, 2023, 04:33 AM Jul 2023

The shady world of Brave selling copyrighted data for AI training

I'm fairly certain that I was not the only person in the world who thought to himself, "Did they just yoink the entire Internet and bundle it together into a glorified copy and paste machine?" upon the release of ChatGPT.

And even though there are some concerns about the type of data that was used to train OpenAI's latest model, it seems that the overall stance of OpenAI and other companies working on similar projects is that it is fair use. Whether or not that is going to hold up in the long run, remains to be seen.

After Google published an announcement saying they're interested in exploring alternatives to robots.txt to provide broader control over AI-related content issues, I was curious to see what other search engines are doing in regard to AI, both for dealing with AI-generated content but also handling data.

Personally, I'm not a big fan of these conglomerates ingesting other people's work and then reselling it, which also leads me to the story I'm going to talk about today.

https://stackdiary.com/brave-selling-copyrighted-data-for-ai-training/

2 replies

= new reply since forum marked as read

Highlight:

The shady world of Brave selling copyrighted data for AI training (Original Post) douglas9 Jul 2023 OP

Intriguing Tetrachloride Jul 2023 #1

It's the latest rage. Do ya remember the metaverse? usonian Jul 2023 #2

Tetrachloride

(8,478 posts)

1. Intriguing

Reply to douglas9 (Original post)

Sun Jul 16, 2023, 06:43 AM

Jul 2023

usonian

(14,352 posts)

2. It's the latest rage. Do ya remember the metaverse?

Reply to douglas9 (Original post)

Sun Jul 16, 2023, 09:46 AM

Jul 2023

That's why El0n is walling in twitter. Now, it's his personal and secret trove of training data.

There are too many articles to post on the scraping of copyrighted works --- beyond "fair use" --- and also stripping off copyright notices in the data harvest. Lawsuits and more to come. And it's the repurposing of the works that is also at issue. Ripping off content to "create" a flood of similar and competing content.

Those are the current arguments being raised.
https://venturebeat.com/ai/what-sarah-silvermans-lawsuit-against-openai-and-meta-really-means-the-ai-beat/

A giant shitshow. These are broad strokes. Techies can scan Hacker News https://news.ycombinator.com/newest and others, for more. Lots more. HN has a search box that you can sort by popularity or date. For the most popular items at any given time: https://news.ycombinator.com/best

And then, those cyber criminals:
WormGPT - The Generative AI Tool Cybercriminals Are Using to Launch BEC Attacks | SlashNext

https://slashnext.com/blog/wormgpt-the-generative-ai-tool-cybercriminals-are-using-to-launch-business-email-compromise-attacks/

Did your boss write that email? Maybe not. Sure looks genuine.

Reply to this discussion