Many of us have dashed off a mean-spirited reply in the heat of the moment. Now, Twitter wants to appeal to the good inside even the most callous trolls in an attempt to improve the tone of its social network.

From Thursday, the company will roll out a new prompt to users who are about to send a tweet that its algorithms believe could be “harmful or offensive”. Those who try to send such a message will be asked if they “want to review this before tweeting”, with the options to edit, delete, or send anyway.

The feature, coming first to iPhones and later to users on Android devices, has been in testing for the last year, and the social network says it has meaningfully reduced the volume of abuse.

“These tests ultimately resulted in people sending less potentially offensive replies across the service, and improved behavior on Twitter,” Anita Butler and Alberto Parrella, respectively director, product design, and a product manager at the company, wrote. “We learned that: if prompted, 34% of people revised their initial reply or decided to not send their reply at all. After being prompted once, people composed, on average, 11% fewer offensive replies in the future; if prompted, people were less likely to receive offensive and harmful replies back.”

Initial tests provoked some criticism, Butler and Parrella admitted, since the algorithms which attempted to discern abusive language “struggled to capture the nuance in many conversations and often didn’t differentiate between potentially offensive language, sarcasm, and friendly banter”. Users who were part of the test reported tweets being flagged for simply using swear words, even in friendly messages to mutual followers.

“If two accounts follow and reply to each other often, there’s a higher likelihood that they have a better understanding of preferred tone of communication,” the pair said, explaining how they have avoided such errors.

Unlike many experiments in AI moderation, the social network can afford to err on the side of caution, since the penalties for guessing wrong are a simple pop-up, rather than censorship, account bans, or worse.

Twitter has been leading the way in attempts to “nudge” users into better behaviour on social networks by adding “friction” to undesirable activities. The company also warns users who are about to retweet an article they have not read that the headline “may not tell the full story”, and recommends they click through to read the piece – but still allows them to continue regardless.

In October and November last year, in an effort to “encourage people to add their own commentary prior to amplifying content” in the run-up to the US elections, the company temporarily altered the retweet button so that it would default to a “quote tweet”. Again, users could ignore the prompt if they desired, but Twitter said at the time that “we hope it will encourage everyone to not only consider why they are amplifying a tweet, but also increase the likelihood that people add their own thoughts, reactions and perspectives to the conversation”.

Others have proposed adding even greater friction. Novelist and technologist Robin Sloan, for instance, has suggested putting delays on retweets, and capping the maximum number of people who can see any individual message. “Social media platforms should run small, and slow, and cool to the touch,” he wrote in 2019.