Member-only story

Is ChatGPT Stealing from Authors?

Karistina Lafae
4 min readSep 25, 2023

--

As a writer who has published literally millions of words online both personally and professionally, it’s practically guaranteed that at least some of my writing has been used to train large language models (LLMs) like ChatGPT. As a self-published author hoping to become traditionally published when my first major WIP is completed, I do have some skin in the game when it comes to AI writing models.

Digital image of a man sitting at a computer desk surrounded by a city strewn with piles and piles of books and papers, symbolic of a person using ChatGPT. Image created by Karistina Lafae using Midjourney.
Image created by Karistina Lafae using Midjourney and Paint Shop Pro

While there are a variety of options available in the AI writing game — including Sudowrite, Poe, Claude, NovelAI, and others — ChatGPT is the most accessible to amateur writers and small-time professional writers, like myself, due to its price point of FREE. (There is also, of course, a paid version that offers greater functionality.) So I decided to do a layman’s analysis of whether or not authors like George R.R. Martin, who is famously one of the authors suing OpenAI claiming ChatGPT is plagiarizing their work.

ChatGPT was trained on 570 GB (5.7e+11 bytes) worth of data from a dataset known as Common Crawl and additional data from Wikipedia (which is in the public domain) and other sources.

I’m looking in my own Documents folder, and the Scrivener file for my 100k WIP is only 108 KB. If it wasn’t in Scrivener format, but plain text, it would be far smaller. (It’s 100,026 words long, to be precise. Quintessentially roundable to 100k.)

--

--

Karistina Lafae
Karistina Lafae

Written by Karistina Lafae

Queer Disabled Immunocompromised Author | Sudowrite Teacher | Midjourney Guide | Opinions are my own | Chaotic Good Bisexual Polyamorous Faerie Godmother

No responses yet