Image Moderation
Global disturbing-image pre-check, admin controls, and operational behavior.
Steve now supports a global image moderation pre-step that runs before the normal workflow pipeline.
Its purpose is to stop submissions that contain:
- explicit nudity or sexual imagery
- visible genitals or exposed breasts presented as nude content
- graphic violence, blood, open wounds, or severe injury
- self-harm, corpse imagery, assault aftermath, or other disturbing scenes
Where it runs
The moderation pre-step is executed near the top of convex/engine/process.ts, after the submission enters processing but before:
- enhancement
- AI extraction
- fraud checks
- Open Loyalty sync
If a submission is flagged as unsafe, the pipeline marks it failed and stops there.
Configuration model
Image moderation is a global config, not a workflow-version stage.
It is stored in the imageModerationConfig table with two primary controls:
enabled: on/off switch for the pre-stepprompt: the editable moderation prompt template
This keeps the safety gate consistent across all workflows instead of making every workflow maintain its own moderation policy.
Admin controls
Super admins can manage the feature from:
The settings page allows them to:
- enable or disable the moderation pre-step
- edit the moderation prompt
- reset the prompt back to the platform default
The prompt supports the {{image_legend}} placeholder, which is replaced with the uploaded file labels before the model call is made.
Model routing
Image moderation uses a dedicated AI pipeline ID:
Default behavior:
- provider:
openrouter - default model:
google/gemini-3.1-pro-preview
This is separate from the workflow's normal OCR or analysis model selection.
Audit and visibility
When moderation runs, Steve records:
- token usage under
sourceType: image_moderation - review timeline events such as
content_moderation_completeorcontent_moderation_blocked - a failure reason when the submission is rejected by the pre-step
That means moderation usage appears in the admin usage dashboard as its own traffic category.
Operational effect
When enabled:
- A submission enters
processing. - The moderation model evaluates the uploaded images.
- If safe, the normal pipeline continues.
- If unsafe, the submission is marked
failedand downstream stages are skipped.
When disabled:
- The submission skips the moderation pre-step.
- The normal workflow pipeline starts immediately.