Santabot

27 readers
1 users here now

Santabot is an automated moderation tool, designed to reduce moderation load and remove bad actors in a way that is transparent to all and mostly resistant to abuse and evasion.

This is a community devoted to meta discussion for the bot.

founded 5 months ago
MODERATORS
auk
1
-311
submitted 5 months ago* (last edited 1 month ago) by auk to c/santabot
 
 

Santa is a robot moderator. Santa will decide if you're naughty or nice. Santa has no chill.

Hi everyone!

The slrpnk admins were nice enough to let me try a little moderation experiment. I made a moderation bot called Santa, which tries to ease the amount of busywork for moderators, and reduce the level of unpleasantness in conversations.

If someone's interactions are attracting a lot of downvotes compared to their upvotes, they are probably not contributing to the community, even if they are not technically breaking any rules. That's the simple core of it. Then, on top of that, the bot gives more weight to users that other people upvote frequently, so it is much more accurate than simply adding up the up and down vote totals. In testing, it seemed to do a pretty good job figuring out who was productive and not.

Most people upvote more than they downvote. To accumulate a largely negative opinion from the community, your content has to be very dislikable. The current configuration bans less than 3% of the users that it evaluates, but there are some vocal posters in that 3%, which is the whole point.

It is currently live and moderating !pleasantpolitics@slrpnk.net. It is experimental. Please don't test it by posting bad content there. If you have a generally good posting history, it will probably let you get away with being obnoxious, and it won't be a good test. Test it by posting good things that you think will attract real-life jerks, and let it test its banhammer against them instead of you.

FAQ

Q: I just saw content that wasn't pleasant!

A: "Pleasant" was the wrong word for the test community. People will sometimes say things you find unpleasant, potentially more so, since the human moderation is lighter. That's by design. Many Lemmy communities contain a large amount of content which is "polite" or "civil" but which in total is detracting significantly from the experience. I do plan to allow content which is offensive, up to a certain point, as long as it doesn't become a dominant force.

The theory is that we're all adults, and we can handle an occasional rude comment or viewpoint we don't like. If someone is a habitual line-stepper, then they will get shown the door, but part of the whole point is that the good actors can be free of a moderator looking over their shoulder on every comment deciding whether or not they're allowed to say it.

That's not to mean this is a "free speech" community. If content that's offensive for the sake of offensiveness starts to proliferate, then I'll probably put rules into place to address it. But you will find content that is not "pleasant."

Q: Why was my comment deleted?

A: Sorry. If you haven't posted a lot in the recent past, but you've been getting some downvotes, the bot will err on the side of caution and not let you post. This isn't a perfect solution, since it starts to verge on removing unpopular viewpoints, but it's necessary to protect the community from malicious content from throwaway accounts.

If you don't have a lot of recent activity in your account, but you've posted some unpopular things, Santa may come after you. It may not be fair. The best thing to do is to post productively and actively outside of controversial topics, wait a few days, and try again.

Q: Why was I banned?

A: You may be a jerk. Sorry you had to find out this way.

It's not hard to accumulate more weighted upvotes than downvotes. In the current configuration, 99% of the users on Lemmy manage it. If you are one of the 1%, it's because you have enough posting history that the bot has observed a firm community consensus that your contributions are more negative than positive.

The bot is not making a decision about you. The community is. If you are banned, it's because you are being downvoted overwhelmingly. The viewpoint you are expressing is probably not the issue. The Lemmy community is very tolerant of a wide variety of views. Some people may disagree with you and you may find that oppressive, but the bot will not ban you simply because some users argue with you when you say certain things. Those users are allowed to have their view, just like you have yours.

If you find you are banned and you're willing to hear suggestions about how to present your argument without everyone downvoting you, leave a comment. Reducing your downvotes will help the bot recognize you as reasonable, but it will also probably help you get your point across more successfully. In order for the bot to ban you, you have to be received overwhelmingly negatively by the community, which probably means you're not convincing very many people of what you're saying.

If you're not willing to hear those suggestions and simply want to insist that it's everyone else that is the problem, the bot is being evil to you, your free speech is being infringed, and I am a tyrant if I don't let you into the community to annoy everybody, I would respectfully request that you take it somewhere else.

Q: How long do bans last?

A: Bans are transient and based on user sentiment going back one month from the present day. If you have not posted much in the last month, even a single downvoted comment could result in a ban. If that happened to you, it should be easy to reverse the ban in a few days by engaging and posting outside of the moderated community, showing good faith and engagement, and bringing your average back up.

If you are at all a frequent poster on Lemmy and received a ban, you might have some negative rank in your average, and your ban may be indefinite until your habitual type of postings and interactions changes, and your previous interactions age past the one month limit.

Q: How can I avoid getting banned?

A: Engage positively with the community, respect others’ opinions, and contribute constructively. Santabot’s algorithm values the sentiment of trusted community members, so positive interactions are key.

If you want to hear examples of positive and negative content from your history, let me know and I can help. Pure voting totals are not always a good guideline to what the bot is reacting to.

Q: How does it work?

A: The code is in a Codeberg repository. There's a more detailed description of the algorithm there, or you can look at the code.

Q: Won't this create an echo chamber?

A: It might. I looked at its moderation decisions a lot and it's surprisingly tolerant of unpopular opinions as long as they're accompanied by substantial posting outside of the unpopular opinion. More accurately, the Lemmy community is surprisingly tolerant of a wide range of opinions, and that consensus is reflected when the bot parses the global voting record.

If you're only posting your unpopular opinion, or you tend to get in arguments about it, then that's going to be a problem, much more than someone who expresses an unusual opinion but still in a productive fashion or alongside a lot of normal interactions.

If you feel strongly that some particular viewpoint, or some particular person's ability to stand up for it, is going to be censored, post a comment below with your concerns, and we can talk. It's a fair concern, and there might be cases where it's justified, and the bot's behavior needs to be adjusted. Without some particular case to reference, though, it's impossible to address the concern, so please be specific if you want to do this.

Q: Won't people learn to fake upvotes for themselves and trick the bot?

A: They might. The algorithm is resistant to it but not perfectly. I am worried about that, to be honest, much more than about the bot's decisions about aboveboard users being wrong all that often.

Q: Why doesn't the bot notify for bans?

There are a few users who get banned or unbanned very day, as the pattern of user comments and votes changes over time. It's important that bans be "lightweight," and always reversible for anyone who is banned. It's not a heavy thing like most Lemmy moderation. It already bothers me that the flow of bans creates spam in the modlog. I don't want to amplify that to DM spam across all of Lemmy.

I did have functionality at one point to notify for certain situations, and it triggered once, and that user complained to me that my bot was notifying them about a ban in a community they had never heard of and didn't care about at all. I think they were right to complain. I don't want to send out spam. Multiplying that interaction by 100 user actions per month isn't something I want to do.

I do want to make sure it's transparent to people why they are banned, and what they can do to get unbanned, if it comes up. If anyone has any ideas about how I can make it more clear to people who do try to post and find they are banned, that they are banned and why, I'm open to the suggestion. I've tried to do this, but I found that the people who are banned aren't interested in any reasonable conversation about any of that, so I doubt that anything I could do on my end would make it work any better. You have to be very unreasonable for the bot to blacklist you outright.

What do you think?

It may sound like I've got it all figured out, but I don't think I do. Please let me know what you think. The bot is live on !pleasantpolitics@slrpnk.net so come along and give it a try. Post controversial topics and see if the jerks arrive and overwhelm the bot. Or, just let me know in the comments. I'm curious what the community thinks.

Thank you!

2
2
submitted 2 months ago* (last edited 2 months ago) by auk to c/santabot
 
 

I promised to start sending spot-checks on Santa's judgements. Here they are. I'll post them as individual comments, because the formatting is more understandable that way.

NOT BANNED indicates someone who isn't banned by the algorithm, but almost.

BANNED indicates someone who is banned by the algorithm, but only just.

For each user, you'll see a snapshot of what comments they have left recently that garnered negative rank (red stripes, downvotes) or positive rank (blue stripes, upvotes). Time goes left to right, the top part of the bar shows their bad or good rank, and the bottom part of the bar links it up to the key down below, showing which posts led to what rankings. Then, down below that, there's a quote showing a representative example of why the algorithm thinks they're a problem.

My hope is that by showing where the boundary is, how obnoxious you have to be and how consistently your comments have to get downvotes in order for the algorithm to ban you, people will better understand how it works.

But who knows? Maybe I have it all wrong, and this robot moderator is ushering in a whole new robot dystopia right here on Slrpnk, and this is the evidence. Let me know what you think in the comments.

Edit: Explained the bar code more, and the purpose behind this whole thing.

3
-307
submitted 3 months ago* (last edited 3 months ago) by auk to c/santabot
 
 

After last week's post and talking with some people, here is my proposal for bringing transparency to the Santa bot. I can set it up to, every week, post a spot-check of some of its judgements, using this visualization format:

The bar code section at the top shows red for negative rank (downvotes), and blue for positive rank (upvotes). Time goes left to right, over the course of the past month, with a stripe for each impactful comment and post. Then, below the big red/blue bar code, there's a key with some other colors, linking particular sections to participation in particular threads.

In this case, you can see that there were some unpopular things said by this person, but outweighed by normal positive participation. This is generally what it looks like if someone posts controversial opinions but isn't trolling. You can see some red, but it's not a clear pattern. It's just that they say things sometimes that people don't like.

Compare that with this:

That's a pretty large amount of pretty bright red, and looking over the comments, what's garnering the downvotes isn't even their opinion. It is long exchanges bitterly attacking any person who disagrees with their viewpoint, and being noisy and abrasive in general. That's a lot more unpopular on Lemmy than a simple unpopular opinion. This user is banned by Santa. This is what I mean by "unpleasant," when I say "pleasant politics."

What I propose to do is to regularly post some of these breakdowns, along with example comments, for people who are just barely over the line on one side or the other. The goal is twofold:

  1. People can see how Santa works instead of it just being me telling people to trust me.
  2. It sheds some light on what people on Lemmy give a lot of hate for, hopefully helping people to be able to engage and communicate their point better. Usually, when people are getting widely downvoted, it's not at all the opinion they're expressing. It's their delivery. Either intentionally or not, they're being blaring or confrontational with their delivery of it, showing disrespect to anyone who thinks differently, getting in long angry disputes, and suchlike. For some reason, this is usually coupled with claiming that they're not being well-received because their opinion is unpopular, even though the core of the opinion is more often than not something that's widely popular, or at least tolerated, when people express it more productively.

Here is one of those snapshots, exactly as I would set up Santa to post it periodically. The goal is that by posting what things look like just on the good side of the line, and just on the bad side, it sheds light on where the line is and what it looks like.


Users who are not banned, but getting close

Example one (not banned)

Example comment, from 7 Takeaways From the Seemingly Endless Fire Season | While the Line fire burns in Southern California, what can we learn from how a changing climate has affected an expanding fire season?:

California is not a good example of wildfires caused by climate change. California is an excellent example of how not letting natural fires burn over the last few decades has created unhealthy forests full of dead tree/bushes that are now powderkegs waiting to go off.

California, especially things like redwoods, evolved to NEED a cleansing fire every so often. THAT is our natural climate and we have been fighting against THAT for years.

This isn't "climate change". This is "the climate isn't what we humans want so we tried to change it and now we're suffering the effects of that."

Example two (not banned)

Example comment, from Craaaawling in my skiiin:

Bad taste then, bad taste now. No development or change in perspective. If this is you, you should be disappointed in yourself. Grow as you age.

Edit: hahaha people did NOT like this comment. They're BAD, guys. Whiny, cringey, melodramatic. It's music for a 12 year old.

Examples of users that are banned, but only just

Example one (banned)

Example comment, from Mexico will amend its constitution this weekend to require all judges to be elected:

Between these two options:

  1. indulging in the delusion of neutral judges and letting the elite pick the ones who do the best job of pretending to be neutral while representing their interests

  2. discarding the illusion of neutral judges and picking ones who openly state (and ideally have a record) that they will seek to pursue and enact justice as both they and the better part of the population interpret it

I think one of these is clearly superior for "promoting justice". Do you disagree?

Yes, I disagree. I already stated why.

But you yourself admitted that there may be no such thing as "neutral," "apolitical" justices. If there aren't, what good does pretending do?

Example two (banned)

Example comment, from People who hate fat people disguise their hate as science.:

why not hate them for both scientific reasons and viscerally?


Thoughts?

4
-304
submitted 3 months ago* (last edited 3 months ago) by auk to c/santabot
 
 

The steady stream of people who are telling me that the Santa moderation bot is going to delete anyone who's downvoted or disagrees with the group, is continuing unabated.

Here's an olive branch: You've got a point. It's just a black box and I juggle the parameters to some secret process to ban the people who got some downvotes, I can understand how that comes across as toxic. I might or might not be lying about taking careful time to look over its judgements and make sure that I think the impact is more positive than negative, but at the end of the day, it doesn't matter. You still have to trust my intentions and trust the bot to make good decisions, and trusting that to an automated system rarely works out well.

To me, delegating the moderation of the community to the segment of that community that's trusted and consistently upvoted by the rest of us is better than giving it to a handful of people who wield unilateral power according to random rules. I like the bot's judgements most of the time when I look at them. The question is simply whether this algorithm is actually doing that delegation effectively, or if it's just banhammering anyone who gets a couple of downvotes. I'm confident that it's doing the first thing almost all of the time.

In talks behind the scenes with other moderators, I've been going into a lot of detail about specific users and going back and forth about judgements. I also do a ton of checking behind the scenes. I don't want to do that publicly. I think it would be deeply informative to post a list of the "top ten" and "bottom ten" users, and go into detail about why the low-ranked users got where they are, but that's probably not a good idea.

What I would like to do is share that information on some level, so that people can see what's going on, instead of it being me relaying that everything's good. It's tough because I can't break down every level of detail without invading all kinds of people's privacy. That said, I do think that there's a way to be found to open up the process so people can see and give input to what's going on.

One happy medium I could do would be to have the bot post its spot-check automatically about once a week. It could pick out one random user who's barely on the borderline, and post a couple of the worst comments they made. Usually, when I'm messing around with its parameters, that's what I am trying to do. There are some comments that are clearly toxicity that have no business anywhere. There are some comments that are clearly free speech, and even if they're getting downvotes, they deserve to be heard. Then there are some comments that are on the borderline between. My goal is to set up the parameters so that the borderline rank value for a ban matches up with the users who are on that borderline.

I can see some upsides and downsides to posting that publicly. What do people think, though? What would you want to see, in order to make an informed decision about what you think of this whole approach?

5
 
 
  1. It would be extraordinarily easy to bot it and just silence anyone you want.
  2. I agree, moderation is absolutely necessary to maintaine civil discussion, but silencing people, because they have unpopular opinions, is a really bad idea.
  3. I love lemmy because it is the ultimate embodiment of decentralised free speech. This destroys that.
  4. If I were a bad actor, hypothetically, let's just say lammy.ml or haxbear and I decided I wanted to silence anyone who disagrees with what I have to say. Then I could just make a fork of this project to only value my instances votes and censor anyone who doesn't agree with what my community thinks.
  5. This tool simply acts as a force multiplier for those who want to use censorship as a tool for mass silencing of descent.

Yes, I've read the Q&A, But I can simply think of more ways to abuse this bot for bad than it can be used for good.

6
 
 

Everything's been working smoothly, with nothing to report about the moderation bot. The community has been quiet but productive, which was precisely the goal, and the bot working smoothly with no issues. However, something almost went wrong in a particular entertaining fashion which I thought I would share.

The algorithm for classifying troll users doesn't have any polarity. It only knows which users are opposed to which other users. 50% of the time, it'll get its whole ranking system backwards, so the troll users are the normal ones, and everyone else gets negative rank, because the math works just as well under that ranking regime. Generally this isn't a problem, because there's a step:

        # Flip the sign if we arrived at a majority-negative ranking, which can happen
        if -min_val > max_val:
            rank[1:] *= -1

The most popular user is always more popular than the least popular troll is unpopular, by quite a big margin, so that works fine.

However. Things have changed. MediaBiasFactChecker@lemmy.world is so unpopular that it's almost (1% margin) more unpopular than the highest-rank user is popular. If that had happened, the whole polarity would have flipped, every user would have been banned, all the trolls would have been unbanned. Mass hysteria. I only happened to notice it before it happened and stop the bot. It's on track to be the least popular user on Lemmy, with about 5 times lower rank than some of the most notorious trolls.

Have fun with this information. I started checking the median rank of all users, instead. Thanks MediaBiasFactChecker.

7
 
 

Some people have been accusing me of creating this bot so I can manifest a one-viewpoint echo chamber. They tell me that they already know that I'm trying to create an echo chamber, anything I say otherwise is a lie, and they're not interested in talking about the real-world behavior of the bot, even when I offer to fix anything that seems like a real echo chamber effect that it's creating.

I don't think it's creating an echo chamber. We've had a Zionist, an opponent of US imperialism, a lot of centrists, some never-Bideners, some fact checking, and one "fuck you." The code to delete downvoted comments from throwaway accounts is pretty much working, but it's only been triggered once. Someone said Mike Johnson's ears were ugly and that made him a bad person, which everyone hated and downvoted, so the bot deleted it since the person that said it didn't have other recent history to be able to use to categorize them. I sent the user a note explaining how the throwaway detection works.


I want to list out the contentious topics from the week, and how I judge the bot's performance and the result for each one, to see if the community agrees with me about how things are looking:

Biden's supreme court changes

I like the performance here. The pleasant comments have a diversity of opinion, but people aren't fighting or shouting their opinions back and forth at each other. The lemmy.world section looks argumentative and low-quality.

Blue MAGA

I don't love the one-sidedness of the pleasant comments section. It's certainly more productive with less argumentation, which is good, but there are only two representatives of one of the major viewpoints chiming in, which starts to sound like an attempt at an echo chamber.

I read the lemmy.world version for a while, and I started to think the result here is acceptable. The pleasant version still has people who have every ability to speak up for the minority viewpoint, but it was limited to people who were being coherent about it, and giving reasons. A lot of the people who spoke up in the lemmy.world version, on both sides, were combative and got engaged in long hostile exchanges, without listening or backing up what they were saying. That's what I don't want.

Biden's Palestine policy

I don't love "fuck you." I debated whether it was protected political speech expressing a viewpoint on the article, or a personal attack, and I couldn't decide, so I left it up. For one thing, I think it's good to err on the side of letting people say what they want to the admins, to bend over backwards just slightly to avoid a situation where some users or their viewpoints are more special, or shielded from firm disagreement, than others. And yes, I recognize the irony.

This one is my least favorite comments section. The user who's engaging in a hostile exchange of short messages has a lot of "rank" to be able to say what they want, and the current model assumes that since people generally like their comments, they should be allowed to speak their mind. The result, however, is starting to look combative to me. It's still far better than the exchanges from lemmy.world, but I don't love it.

What does everyone else think? I don't know if anyone but me cares about these issues in this depth, but I'm interested in hearing any feedback.

8
 
 

It took longer than I thought, but I came up with a promising approach for throwaway accounts. The bot can't use the same parameter set to accounts with only a few interactions as it does for normal accounts, without getting it either too loose for the new accounts or too strict for the old accounts. I had to make a special stricter setting for any account that only has a few interactions in its recent history.

1.3% of users have enough interaction data to judge for sure that people have problems with them, and they get banned just like before. 2% more users on top of that will trigger the stricter filter if they try to post, and get a polite message that they need to interact more before they can participate. 97% of users don't need to worry about any of this, just like before.

I think that approach will work. It's not done yet but I have the parameters in place for it. I think the bot is doing a good job. I was expecting it to get it wrong a few times, and I have found a couple of users it made mistakes on, but it's doing better than I thought it would.

9
 
 

Hi everyone.

I am mostly pleased with how Santabot performed in its first live test today, but I also found issues which I am fixing. I made it a lot more strict and fixed a bug. Details to follow:

Mostly, I was surprised and pleased that the bot was coming up with right judgements for people I was talking with, but this thread has one comment that seems reasonable, and one comment that seemed like exactly the kind of inflammatory content I wanted the bot to eliminate. I looked into it for a while, and eventually had to revisit my attitude toward users that have a lot of "positive rank" counterbalancing also a lot of "negative rank." I think that the ratio of upvoted content to negative content that it's reasonable to ask someone to produce should be higher than 1:1. So I made it 2:1. That eliminated the user posting the offending comment while leaving the user posting the legitimate comment. I didn't have time to go into a full detailed analysis, but I did some spot checks on how that affected its other judgements, and generally I liked what I saw:

  • The percentage of total users banned has gone way up, from 0.4% to 2.8%.
  • @Kaboom@reddthat.com, @LibertyLizard@slrpnk.net, and other specific users that I looked into and thought should remain unbanned, are still unbanned by the bot's analysis.
  • Some users who weren't banned by this morning's configuration, but which seemed like they should be, are now banned again.

In addition to the algorithm issues, I found a bad bug. The bot is supposed to remove recent postings when banning someone, to stop throwaway accounts from coming in and posting and then being banned but with the content staying up. However, the bot was simply removing all recent content from all users whenever it banned anybody, and my test suite was too simple-minded to catch the problem until it was live and removed a random innocent user's comment and sent them a message that they were banned. Oops. I partly fixed it once I saw it. Maybe I shouldn't have used Futurama Santa as the avatar.

That's the update. More to come. If you have questions, comments, or concerns, please by all means say something and I'll do my best at a response.