How to get automated content moderation right

Every minute, 240,000 photos are shared by Facebook users, 65,000 images are posted on Instagram and 575,000 tweets are posted on Twitter. Never mind all the content on Reddit, photos shared on Tinder and fancy vacation digs posted on AirBnB.

That’s a lot of user-generated content—all of it potentially harmful to your platform users and, by extension, to your platform’s reputation.

Trust and safety seriously matters, so it all needs to be moderated. And given the almost unfathomable volume of user-generated content, moderation efforts usually require some degree of automation.

We’ve put together this handy introduction to automated content moderation, including all the tools, models, methods and metrics you should be considering.

For a comprehensive overview of general content moderation, head to The ultimate guide to content moderation.

Table of contents:

  1. What is automated content moderation?
  2. Automated content moderation tools
  3. The strengths and weaknesses of automated systems
  4. Balancing automated and human content moderation
  5. Content moderation models/methods and how automation fits in
  6. Metrics for measuring your automated systems

1. What is automated content moderation?

Automated content moderation is performed by automated tools, some of which leverage AI/ML models and some that use logic, such as filters that tag objectionable words, images and video. These tools usually augment teams of human content moderators that would be unable to review the volume of user-generated content in their platform.

The efficiency of automated content moderation tools enables online platforms to quickly and cost-effectively screen vast amounts of content, helping preserve user trust without slowing down content uploads.

Automated tools can screen content either pre- or post-upload, depending on your organization’s preference. They are effective at catching unambiguously unacceptable content, but the limitations of AI mean they won’t be able to make a sound decision about all content.

Instead, many systems flag potentially unacceptable and relay it to human content moderators for review.

Before moving onto the opportunities and risks that come with automated content moderation, let’s take a closer look at the types of tools you can deploy.

2. Automated content moderation tools

User-generated content isn’t just high-volume—it’s also highly varied, coming in many types and media formats. Here’s a list of the most commonly deployed tools to manage the array of content types:

For moderating unambiguously unacceptable content:

  • Automated filters such as word filters (which detect banned words and can flag up or block messages containing them) and IP ban lists (which stop repeat offenders)

For moderating user-generated text content, such as comments, reviews and marketplace listings:

  • Natural language processing (NLP) algorithms can be used to parse text and make accurate predictions about its meaning, including, through the use of sentiment analysis, its emotional content

For moderating imagery in user-uploaded pictures and videos:

  • Computer vision/image recognition can recognise potentially unacceptable visual content in images and video; for example, object recognition can be used to identify particular objects such as weapons—and even to detect camera angles that may be associated with unsuitable content
  • Some tools use digital hashing, which converts images and videos from an existing database into a hash or digital signature that remains tied to the image/video and can be used to identify other iterations of the content.

For moderating audio in audio and video files:

  • Audio algorithms can convert speech into human-readable text and detect inappropriate content within that text

For moderating content in general:

  • Metadata filtering searches the metadata of files to identify content fitting certain parameters; metadata filtering is commonly used to identify copyright-infringement—but because it’s easily manipulated, it’s easy for people to bypass this safeguard

3. The strengths and weaknesses of automated systems

For online platforms that rely on user-generated content, content moderation would be practically impossible without some degree of automation. High-volume platforms would either grind to a halt or become filled with illegal and unacceptable content.

Conversely, automated systems can’t make sound decisions about all content without help from human moderators.

That’s because AI, like humans, has strengths and weaknesses. Here are some of the biggest:

Automated content moderation strengths:

  • They’re fast: Automated systems can screen thousands of pieces of user-generated content in a matter of seconds, and are particularly adept at screening routine cases that look identical or near identical.
  • They’re infinitely scalable: Automated systems can easily and cost-effectively scale up or down as demand dictates, whereas scaling a team of human content moderators is slower and costly.
  • They’re not emotionally vulnerable: Automated systems catch a lot of upsetting content and prevent human moderators from being exposed to it. (Human content moderators need to be carefully looked after—see our article on Protecting the wellbeing of content moderators.

Weaknesses of automated systems:

  • They’re fallible: Algorithms make mistakes—and, being so fast, are capable of making the same mistake thousands of times before it’s corrected.
  • They can be worked around: Determined users can learn what they can and can’t get past AI and exploit those loopholes to smuggle problematic content onto your platform. Human moderators are needed to identify workarounds like this and ensure the algorithm is trained to protect against them.
  • They’re often trained with poor data: ML models require huge datasets to learn, but the datasets are too often produced by either raw data harvested from the internet, crowdsourced or created by armies of underpaid and insufficiently trained workers who may not represent a wide range of human identities—and bad data leads to questionable and often biased models.
  • They lack cultural and contextual understanding: Automation remains a relatively blunt instrument when it comes to interpreting human behavior, and automated moderation tools still struggle to understand the intent behind user-generated content. Local cultural context also presents a big problem for algorithms—they can be flummoxed by words that are offensive in one culture or context and inoffensive in another, for example.
  • They’re not transparent: It’s difficult to get insights into how complex algorithms work (particularly if they’re powering a third-party tool)—and therefore difficult to get to the root cause of when things go wrong.

Many countries are now implementing severe penalties for failure to adequately moderate harmful and illegal content. Companies who fail to comply with the EU’s Digital Services Act, for example, are subject to fines of up to 5% of their global annual turnover.

Given these circumstances, relying only entirely on automated tools for content moderation is risky. Depending on your platform’s needs, you’re probably going to need human moderators to work in concert with your automated tools.

4. Balancing automated and human content moderation

There are three options to consider here:

Option 1: Manual content moderation

Here, automation is minimal, although automated filters may be applied to block simple issues like offensive language and spam, and human content moderators carry out the vast majority of content moderation.

Large platforms who choose this option tend to rely on outsourced contract workers to complete this work, while smaller platforms either employ full-time, in-house moderators or rely on their platform users to review and moderate content.

Option 2: Automated content moderation

That’s as in: fully automated content moderation. Automated tools flag and remove inappropriate user-generated content. Human intervention is almost nonexistent.

Due to the limitations of AI, platforms rarely rely on fully automated moderation—although, provided there are clear parameters for flagging and removing the content uploaded to your platform, ML models that have been trained on enough data can achieve reasonably high levels of accuracy.

Option 3: Hybrid content moderation

This hybrid approach—the most popular one—incorporates elements of manual and automated content moderation.

Automated tools flag and prioritize content cases for human content moderators to review. The decisions made by the human moderators are then fed into the algorithms to refine them.

Both fully human and fully automated content moderation are flawed, and weaknesses on both sides are largely mitigated by combining the best of both worlds.

Hybrid moderation is also the most flexible of the three approaches. There are numerous ways in which automation can help human content moderators do their jobs, and vice versa, and this partnership of human and machine can be tailored to fit a range of content moderation methods.


We’ve mentioned outsourcing a few times here, and if you’d like to learn more about the pros and cons of that approach, we direct you to our article Outsourcing content moderation: how to get it right.

Alternatively, if you think you’d prefer to build your own team of content moderators, you can learn more about that approach in our article How to build your own content moderation team.

5. Content moderation methods and how automation fits in

There are four primary methods of content moderation, two centralized and two decentralized.

Depending on the nature of your platform, the size of your audience and the type of content your users upload, you might employ more than one method and even a mix of all four.

Centralized content moderation methods
Under a centralized model, your platform will employ large teams of moderators (usually augmented by AI), and will either train, manage and direct these teams internally or outsource to a BPO.

This model has the virtue of enabling complete consistency in the application of your platform’s content policies. The two primary centralized content moderation methods are:

  • Pre-moderation: All user-submitted content is placed in a moderator’s queue before it can be approved and made visible to the platform’s users
    • Automated systems can pre-moderate the majority of content so quickly that it can be effectively posted in real-time, while a minority of content will be relayed to human moderators to review
  • Post-moderation (proactive): User-submitted content is immediately displayed publicly and placed in a queue for moderators to approve or remove
    • Automated systems can be used here to proactively search for and remove inappropriate content and fake accounts—relieving your human content moderators of the burden of looking for needles in the haystack

Decentralized content moderation methods
Under a decentralized model, the responsibility of enforcing the platform’s content policies falls to your platform’s users—usually overseen by a small team of full-time moderation staff.

This model enables you to bring a wider diversity of viewpoints to bear on moderation—while saving the cost of employing a large team of human moderators.

The two primary decentralized content moderation methods are:

  • Reactive moderation: Community members are encouraged to flag content that breaches platform rules. This can be the sole method of moderation your platform uses or—more likely—a safety net to back up centralized moderation models.
    • Automated tools can process and triage user-flagged content and relay relevant content to your human content moderators.
  • Distributed moderation: In this democratic method, responsibility for moderating every piece of user-generated content is distributed among a group of people. For example, a platform may have a rating system that enables (and/or obliges) platform users to vote to determine if a piece of user-generated content adheres to the platform’s guidelines.
    • Automation here functions to flag-up sufficiently downrated content for the attention of human content moderators.

You can find success with any of these methods so long as the approach is a good fit for your platform and organization.

But whatever method you choose, you should continuously monitor the effectiveness of your automated moderation tools.

6. Metrics for measuring your automated systems

The performance of your content moderation algorithms should improve over time as they’re fed more and more data and have their decisions to flag up content tested against the actions taken by your human moderators.

Some metrics organizations track include:

  • Average time to process items or items processed per hour
  • Percentage of items processed by AI vs. items processed by human moderators
  • Percentage of items flagged
  • Percentage of flagged items rejected/published
  • Accuracy level of predictions (number of correct predictions divided by total number of predictions)
  • Percentage of moderation decisions appealed against by platform users

Tracking the performance of your human content moderators will also help you understand how well your automated systems are doing, and whether you have the right balance of automated and human content moderation.

Your content moderation needs

So there it is: the basics of automated content moderation. But keep in mind it’s evolving rapidly as the volume of content grows, Web 3.0 starts to take shape, and new tools and needs emerge.

Many organizations choose to outsource this complex task to BPOs like Webhelp—in fact, we provide moderation services to over 200 clients worldwide.

Our solutions are built around pragmatic engineering: we use external and internal tools that help us solve a range of problems, in partnership with a team of over 4,000 scientifically selected and managed content moderators who make decisions about 1 billion pieces of content a year—in more than 25 languages.
So if you’re looking to scale your platform’s content moderation quickly and with high quality—and need a helping hand—get in touch.

Let’s talk