Monitoring

Dashboards

Platform Health — request rate, p95 latency, error ratio, CPU/mem per service.
Attendance Funnel — punches ingested, processed, errored, flagged by code (per 5 min).
Queue Health — Horizon queue depth + throughput per queue.
Business KPIs — check-in completion rate by unit, late-check-in ratio, week-over-week trends.

Alerts

Alert	Threshold	Severity
`/attendance/punch` error ratio	> 2% over 5 min	page
Queue depth `punches`	> 5,000 for 10 min	page
DB CPU	> 80% sustained 10 min	notify
Backup job failure	any	page
TLS cert expiry	< 14 days	notify
Login failure rate	> 20% over 10 min	notify (possible attack)

SLOs

Punch ingestion — 99.9% of requests in 2026 are accepted (HTTP 2xx) within 500 ms p95.
Processing lag — 99% of punches produce an attendance row within 60 seconds.
Export — 95% of exports delivered under 2 minutes.

Log Taxonomy

Structured fields on every log line:

{ "ts": "...", "level": "...", "trace_id": "...", "user_id": 42,
  "org_id": 1, "unit_id": 12, "action": "attendance.punch",
  "outcome": "accepted", "flags": ["LATE_CHECK_IN"] }

Synthetic Monitoring

A canary job runs every 5 minutes:

Logs in as a seeded synthetic employee.
Performs a check-in with simulated GPS.
Verifies the attendance row appears within 30 s.
Alerts if the end-to-end pipeline breaks, independent of per-host probes.

Dashboards​

Alerts​

SLOs​

Log Taxonomy​

Synthetic Monitoring​

Dashboards

Alerts

SLOs

Log Taxonomy

Synthetic Monitoring