This is an automated archive made by the Lemmit Bot.
The original was posted on /r/datahoarder by /u/lonelyroom-eklaghor on 2024-11-04 14:45:49+00:00.
First things first, the literal link from where you can request your Reddit data. If you have an alt account bearing a lot of evidence against a legal problem, then I HIGHLY advise you to request your own data. Unencrypted messages are a bane, but a boon too.
I don't know about the acts involved, but I have used GDPR to access the data. Anyone of you can add any additional legal info in the comments if you know about it or about the other acts.
Importing the files into your device
What do you get?
A zip file containing a bunch of CSV files, that can be opened on any spreadsheet you know.
How am I going to show it? (many can skip this part if you prefer spreadsheet-like softwares)
I will be using SQLite to show whatever is out there (SQLite is just the necessary parts from all the flavours of SQL, such MySQL or Oracle SQL). If you want to follow my steps, you can download the DB Browser for SQLite (not a web browser lol) as well as the actual SQLite (if you want, you can open the files on any SQL flavour you know). The following steps are specific for Windows PCs, though both of the softwares are available for Windows, macOS and Linux (idk about the macOS users, I think they'll have to use DB Browser only).
After unzipping the folder, make a new database on the DB Browser (give it a name) and close the "Edit Table Definition" window that opens.
From there, go to File > Import > Table from CSV file. Open the folder and select all the files. Then, tick the checkboxes "Column names in First Line", "Trim Fields?", and "Separate Tables".
A screenshot of the Import CSV File window, of GiantJupiter45 (my old account)
After importing all that, save the file, then exit the whole thing, or if you want, you can type SQL queries there only.
After exiting the DB browser, launch SQLite in the command prompt by entering sqlite3 .db
. Now, just do a small thing for clarity: .mode box
. Then, you can use ChatGPT to get a lot of SQL queries, or if you know SQL, you can type it out yourself.
The rest of the tutorial is for everyone, but we'll mention the SQLite-specific queries too as we move along.
Analyzing what files are present
We could have found which files are there, but we haven't. Let's check just that.
If you are on SQLite, just enter .table
or .tables
. It will show you all the files that Reddit has shared as part of the respective data request policy (please comment if there is any legal detail you'd like to talk about regarding any of the acts of California, or the act of GDPR, mentioned on the data request page). Under GDPR, this is what I got:
A screenshot of all the files I got
account_gender, approved_submitter_subreddits, chat_history, checkfile, comment_headers, comment_votes, comments, drafts, friends, gilded_content, gold_received, hidden_posts, ip_logs, linked_identities, linked_phone_number, message_headers, messages, moderated_subreddits, multireddits, payouts, persona, poll_votes, post_headers, post_votes, posts, purchases, saved_comments, saved_posts, scheduled_posts, sensitive_ads_preferences, statistics, stripe, subscribed_subreddits, twitter, user_preferences.
That's all.
Check them out yourself. You may check out this answer from Reddit Support for more details.
The most concerning one is that Reddit stores your chat history and IP logs and can tell what you say in which room. Let me explain just this, you'll get the rest of them.
Chat History
.schema
gives you how all the tables are structured, but .schema chat_history
will show the table structure of only the table named chat_history
.
CREATE TABLE IF NOT EXISTS "chat_history" (
"message_id" TEXT,
"created_at" TEXT,
"updated_at" TEXT,
"username" TEXT,
"message" TEXT,
"thread_parent_message_id" TEXT,
"channel_url" TEXT,
"subreddit" TEXT,
"channel_name" TEXT,
"conversation_type" TEXT
);
"Create table if not exists" is basically an SQL query, nothing to worry about.
So, message_id is unique, username
just gives you the username of the one who messaged, message
is basically... well, whatever you wrote.
thread_parent_message_id
, as you may understand, is basically the ID of the parent message from which a thread in the chat started, you know, those replies basically.
About channel_url:
channel_url
is the most important thing in this. It just lets you get all the messages of a "room" (either a direct message to someone, a group, or a subreddit channel). What can you do to get all the messages you've had in a room?
Simple. For each row, you will have a link in the channel_url column, which resembles with https://chat.reddit.com/room/!:reddit.com
, where this `` has your room ID.
Enter a query, something like this, with it:
SELECT * FROM chat_history WHERE channel_url LIKE "%%";
Here, the %
symbol on both the sides signify that there are either 0, 1, or multiple characters in place of that symbol. You can also try out something like this, since the URL remains the same (and this one's safer):
SELECT * FROM chat_history WHERE channel_url = (SELECT channel_url FROM chat_history WHERE username = "");
where recipient username is without that "u slash" and should have messaged once, otherwise you won't be able to get it. Also, some people may have their original Reddit usernames shown instead of their changed usernames, so be careful with that.
The fields "subreddit" and "channel_name" are applicable for subreddit channels.
Lastly, the conversation type will tell you which is which. Basically, what I was saying as a subreddit channel is just known as community
, what I was saying as a group is known as private_group
, and DMs are basically direct
.
Conclusion
Regarding the chat history, if these DMs contain sensitive information essential to you, it is highly advised that you import them into a database before you try to deal with them, because these are HUGE stuff. Either use MS Access or some form of SQL for this.
In case you want to learn SQL, then a video to learn it:
I myself learnt from this amazing guy.
Also, I hope that this guide gives you a little push on analyzing your Reddit data.