Steve API
Platform

Image Moderation

Global disturbing-image pre-check, admin controls, and operational behavior.

Steve now supports a global image moderation pre-step that runs before the normal workflow pipeline.

Its purpose is to stop submissions that contain:

  • explicit nudity or sexual imagery
  • visible genitals or exposed breasts presented as nude content
  • graphic violence, blood, open wounds, or severe injury
  • self-harm, corpse imagery, assault aftermath, or other disturbing scenes

Where it runs

The moderation pre-step is executed near the top of convex/engine/process.ts, after the submission enters processing but before:

  • enhancement
  • AI extraction
  • fraud checks
  • Open Loyalty sync

If a submission is flagged as unsafe, the pipeline marks it failed and stops there.

Configuration model

Image moderation is a global config, not a workflow-version stage.

It is stored in the imageModerationConfig table with two primary controls:

  • enabled: on/off switch for the pre-step
  • prompt: the editable moderation prompt template

This keeps the safety gate consistent across all workflows instead of making every workflow maintain its own moderation policy.

Admin controls

Super admins can manage the feature from:

Configuration -> Image Moderation

The settings page allows them to:

  • enable or disable the moderation pre-step
  • edit the moderation prompt
  • reset the prompt back to the platform default

The prompt supports the {{image_legend}} placeholder, which is replaced with the uploaded file labels before the model call is made.

Model routing

Image moderation uses a dedicated AI pipeline ID:

image_moderation

Default behavior:

  • provider: openrouter
  • default model: google/gemini-3.1-pro-preview

This is separate from the workflow's normal OCR or analysis model selection.

Audit and visibility

When moderation runs, Steve records:

  • token usage under sourceType: image_moderation
  • review timeline events such as content_moderation_complete or content_moderation_blocked
  • a failure reason when the submission is rejected by the pre-step

That means moderation usage appears in the admin usage dashboard as its own traffic category.

Operational effect

When enabled:

  1. A submission enters processing.
  2. The moderation model evaluates the uploaded images.
  3. If safe, the normal pipeline continues.
  4. If unsafe, the submission is marked failed and downstream stages are skipped.

When disabled:

  1. The submission skips the moderation pre-step.
  2. The normal workflow pipeline starts immediately.

On this page