Alert Engine Guide

The Alert Engine transforms raw container signals into actionable alerts and safe remediation steps. This guide explains the building blocks, workflows, and best practices for crafting reliable rules.

Core concepts

Concept	Description
Rule	A saved definition that listens for events (logs, status changes, performance thresholds) and executes actions.
Trigger	One condition per rule (keyword match, metric threshold, or container event) that determines when actions fire.
Scope	Determines which containers a rule inspects (all, specific labels/groups, or explicit container IDs).
Action	What happens when the trigger fires: notify, restart, stop, kill, start, or run a script.
Advanced Settings	The "Advanced Settings" panel in the UI (Gatekeeper & keyword tabs) where cooldowns, verification delays, rate limits, and keyword behaviour are configured.

Rule lifecycle

Create -" Start from a template or blank rule within the Alert Engine UI.
Scope -" Select containers/groups. Use include/exclude lists for precision.
Trigger -" Choose a trigger type (keywords, container events, metrics) and configure thresholds.
Actions -" Add one or more actions with optional delays between steps.
Advanced Settings -" Configure cooldowns, max executions, verification delays, backoff, and keyword behaviour.
Activate -" Enable the rule. Evaluations begin immediately.
Review -" Inspect alert history, acknowledgements, and audit logs to tune behavior.

Trigger types

Trigger	Description	Example
Log keyword	Matches one or many substrings (ANY/ALL) in container logs. Optional timeline settings require N matches within M minutes before firing.	Alert when `OutOfMemoryError` appears 3 times in 2 minutes for `backend-*` containers.
Performance metric (LogForge Pro)	Evaluates CPU, memory, or restart counters against a threshold, with optional sustained-time windows.	Trigger when memory usage stays above 85% for 5 minutes.
Container event	Reacts to lifecycle events emitted by the LogForge backend (`start`, `stop`, `die`, `oom`, etc.), with optional "N events in M minutes" thresholds.	Notify when a database container restarts twice within 10 minutes.

Each rule supports only one trigger type; create additional rules if you need to combine different signal types.

Actions

Action	Details
Notify	Sends the alert payload to one or more channels configured in the Notifier service. Supports templated bodies and includes context (container, rule, timestamps).
Restart / Stop / Start / Kill	Executes Docker lifecycle operations via the backend. Guardrails stop repeated restarts if the container fails health checks.
Run script	Executes the first executable `.sh` script found under `/logforge-scripts/` inside the container. Ensure the directory exists, scripts are executable, and a shell (`/bin/sh`) is present.
Delay	Chain actions with delays to stage responses (e.g., notify immediately, restart after 30 seconds if not acknowledged).

Each action has additional safeguards:

Verification delay -" Wait for a steady state before confirming success.
Max executions -" Cap the number of times the action runs within a cooldown window.
Cooldown -" Minimum wait before the rule can fire again.

Templates

The UI includes templates covering common reliability and security cases:

Crash loop detection
High memory or CPU usage
Log spike / noisy errors
TLS certificate renewal reminder
Security keyword detection
Container start/stop notifications

Templates are editable after import. Use them to ensure guardrails are pre-populated.

Building a rule -" example

Goal: Restart a worker if it throws repeated queue errors and notify Slack.

Scope: Containers tagged with group workers.
Trigger: Log keyword Failed to fetch job with frequency 3 times in 60 seconds.
Actions:
- Notify Slack channel #on-call (immediate).
- Delay 30 seconds.
- Restart container. Verification delay 45 seconds.
Guardrails:
- Cooldown: 10 minutes.
- Max executions per hour: 2.
- Abort if the container was restarted manually in the last 5 minutes.

This pattern avoids restart storms while keeping operators informed.

Alert history & insights

The Alerts dashboard shows the latest events, total alert count, and a rolling view of recent triggers.
Switch to the Stats sub-tab to explore trend charts, rule and container breakdowns, and timeline analytics.
Free edition retains the most recent alerts (displayed at the top of the page); upgrading lifts that limit for deeper history.
Use the built-in filters (rule, container, timeframe) to focus on the signals that matter before exporting data manually if needed.

Troubleshooting rules

Verify rule definitions in the Alert Engine UI and confirm the trigger preview matches your intent.
Review backend logs (docker compose logs alert-engine-backend) for evaluation errors or guardrail messages.
Ensure the Notifier service is reachable if notifications fail; inspect the Notifier dashboard (Logs tab) for recent delivery attempts and response codes.
For script actions, confirm the container has /logforge-scripts/ with an executable .sh script and that /bin/sh is available.

Advance to the Automation Playbooks for high-level strategies that combine multiple rules.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alert Engine Guide

Core concepts

Rule lifecycle

Trigger types

Actions

Templates

Building a rule -" example

Alert history & insights

Troubleshooting rules

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally