Files
imap-copier/docs/superpowers/specs/2026-07-03-scheduled-task-runs-design.md
T
2026-07-03 12:40:26 +07:00

9.9 KiB

Scheduled (recurring) task runs — design

Date: 2026-07-03 Status: awaiting user review

Context

Migrations are run manually today (the operator clicks "Run migration"). For ongoing source→destination sync, the operator wants a task to run itself on a recurring interval (e.g. every 1h or 6h) without babysitting it. Requirements from the user:

  • Per-task recurrence interval; the task runs automatically.
  • Never start a run while the previous one is still running; measure the interval from the completion of the last run (not its start).
  • Show when the next run is due, in the browser's local time.
  • A separate modal showing a log of runs with their status and totals.
  • Deleting a task must clean up all this data along with the task.

Decisions confirmed with the user:

  • Run-log detail: per-run totals only (reuse runs); no per-account snapshot.
  • Enable gate: a schedule can be enabled only when all accounts test OK.
  • Breaker: if a scheduled run finishes with errors, disable the schedule and mark the task broken (red icon in the tasks list and the task detail). Manual runs never trip the breaker.
  • First run: one interval after enabling (schedule_anchor + interval).
  • Intervals: presets 1 / 3 / 6 / 12 / 24 h, plus Off.

Current architecture (as-is)

  • runs table already has started_at, finished_at, status, and totals (migrations/0001_init.up.sql:35-44). CreateRun/FinishRun in internal/store/runs.go. No list-by-task query yet.
  • Orchestrator.Run(ctx, taskID) gates on gateOK (all accounts test ok) and TryMarkTaskRunning (atomic single-run), creates a run, and launches runAll asynchronously; runAll sets the task status and calls FinishRun (internal/orchestrator/orchestrator.go).
  • ON DELETE CASCADE from tasks already removes runs, accounts, and migrated_messages when a task is deleted.
  • main.go wires store → orchestrator → hub → server and calls ResetRunningOnStartup to clear phantom "running" after a restart. No scheduler.
  • Task DTO / TS Task interface: internal/httpapi/tasks.go, web/src/api.ts.

Decision: in-process polling scheduler

A single background goroutine (a Scheduler) started from main.go, ticking every 30 s. Each tick queries the DB for schedulable tasks and triggers the due ones through the existing Orchestrator.Run. The DB is the source of truth, so the scheduler is stateless and restart-safe (paired with the existing ResetRunningOnStartup). Chosen over per-task timers (restart-fragile, in-memory state) and external cron (breaks the single-binary model, needs auth wiring).

30 s poll precision is ample for hour-scale intervals.

Data model

Migration 0004_task_scheduling:

-- up
ALTER TABLE tasks ADD COLUMN schedule_interval_seconds INT NOT NULL DEFAULT 0;
ALTER TABLE tasks ADD COLUMN schedule_anchor TIMESTAMPTZ;
ALTER TABLE tasks ADD COLUMN broken BOOLEAN NOT NULL DEFAULT false;
ALTER TABLE runs  ADD COLUMN trigger TEXT NOT NULL DEFAULT 'manual';
-- down
ALTER TABLE runs  DROP COLUMN trigger;
ALTER TABLE tasks DROP COLUMN broken;
ALTER TABLE tasks DROP COLUMN schedule_anchor;
ALTER TABLE tasks DROP COLUMN schedule_interval_seconds;
  • schedule_interval_seconds — 0 = Off. Presets map to seconds (3600, 10800, 21600, 43200, 86400).
  • schedule_anchor — set to now() when a schedule is enabled; the baseline for the first run when no finished run exists yet.
  • broken — the schedule breaker tripped; task needs operator attention.
  • runs.trigger'manual' | 'scheduled'; drives the breaker and enriches the run log.

store.Task gains ScheduleIntervalSeconds int64, ScheduleAnchor *time.Time, Broken bool. store.Run gains Trigger string, plus StartedAt time.Time and FinishedAt *time.Time (needed for the log + due computation; not currently scanned).

Cascade / deletion

No new cleanup code. runs cascade-delete on task delete already; the schedule columns live on tasks and vanish with the row. The in-memory scheduler polls the DB, so a deleted task simply stops appearing. (Confirmed against migrations/0001_init.up.sql FKs.)

Store additions

  • SetTaskSchedule(ctx, id, intervalSeconds int64) error — sets interval, sets schedule_anchor = now() when enabling (interval>0) / leaves anchor when disabling, and clears broken on any call (any explicit schedule change un-breaks the task; enabling is additionally gated on all-accounts-OK in the HTTP layer, disabling is always allowed).
  • SetTaskBroken(ctx, id) errorbroken = true, schedule_interval_seconds = 0 in one UPDATE (the breaker).
  • CreateRun(ctx, taskID, trigger string) — extend the existing signature to record the trigger.
  • ListRunsByTask(ctx, taskID) ([]Run, error) — newest first, for the modal; scans id, started_at, finished_at, status, totals, trigger.
  • LastFinishedRunAt(ctx, taskID) (*time.Time, error) — most recent run with a non-null finished_at, for due computation. (Or fold into ListSchedulable.)
  • ListSchedulableTasks(ctx) ([]ScheduledTask, error) — tasks with schedule_interval_seconds > 0 AND NOT broken AND status <> 'running', joined with their last finished run's finished_at, so the scheduler decides in one query per tick. Returns the fields dueAt needs.

GetTask/ListTasks SELECT + Scan extend for the three new task columns.

Scheduler (internal/scheduler/)

New package, one focused responsibility.

  • type Scheduler struct { store *store.Store; orch *orchestrator.Orchestrator }
  • func (s *Scheduler) Start(ctx context.Context)time.NewTicker(30s) loop; on each tick calls s.tick(ctx); stops on ctx.Done().
  • func (s *Scheduler) tick(ctx)ListSchedulableTasks, then for each due task calls s.orch.Run(ctx, taskID) with trigger = "scheduled". Run errors (ErrAlreadyRunning, ErrNotTested) are logged, not fatal.
  • Pure, unit-tested decision function: func dueAt(interval time.Duration, anchor time.Time, lastFinished *time.Time) time.TimelastFinished + interval if a finished run exists, else anchor + interval. A task is due when now >= dueAt(...). Kept pure so the tick logic is testable without a DB or clock injection at the call site.

main.go starts it: go scheduler.New(st, orch).Start(context.Background()).

Trigger threading + breaker

Orchestrator.Run gains a trigger string parameter (manual callers pass "manual"; the scheduler passes "scheduled"). Run forwards it to CreateRun and into runAll. In runAll, after the final status is computed:

if trigger == "scheduled" && totErr > 0 {
    _ = o.store.SetTaskBroken(ctx, task.ID)
    o.hub.Publish(Event{Type: "task_broken", TaskID: task.ID, ...})
}

The HTTP handleRun passes "manual". This keeps the breaker precise: only scheduled runs trip it.

HTTP API

  • PUT /api/tasks/{id}/schedule — body {interval_seconds: int}. Enabling (interval>0) first checks gateOK (all accounts test OK) via ListAccountsByTask; returns 409 with a clear message if not. Then SetTaskSchedule. Disabling (0) always allowed.
  • GET /api/tasks/{id}/runsListRunsByTask → run-log rows.
  • Task DTO (tasks.go) gains schedule_interval_seconds, broken, and a computed next_run_at *string (RFC3339, UTC; null when off / running / broken). next_run_at is computed server-side from the same dueAt logic so the client only formats it. ListTasks DTO also includes broken (for the red icon in the list).

Frontend

  • api.ts: Task gains schedule_interval_seconds?, broken?, next_run_at?. New setTaskSchedule(taskId, intervalSeconds) and listRuns(taskId) (+ a Run interface).
  • TaskDetail.tsx: in the run-control panel, an interval <select> (Off/1h/3h/6h/12h/24h) bound to setTaskSchedule; a "Next run: " line formatting next_run_at with toLocaleString() (shows "running…" when a run is active, nothing when off); a red broken badge when broken. A Runs button opens a new modal.
  • RunLogModal.tsx (new): lists listRuns(taskId) rows — started/finished in browser-local time, trigger, a StatusBadge, and copied/skipped/errors. Reuses the existing Modal + StatusBadge + .tbl patterns.
  • Tasks.tsx: red icon/indicator on rows where broken.
  • WS: on task_broken/run_done, TaskDetail reloads (existing reload wiring); Tasks list refreshes.

Error handling

  • Enable-while-not-tested → 409, surfaced in the existing role="alert" banner.
  • Scheduler Run errors are logged and skipped; a broken task is skipped every tick until the operator re-enables (which clears broken).
  • Restart: ResetRunningOnStartup clears phantom running; the scheduler recomputes due times from persisted finished_at/anchor.

Testing

  • store: SetTaskSchedule (sets interval+anchor, clears broken), SetTaskBroken (interval→0, broken→true), CreateRun records trigger, ListRunsByTask ordering + fields, GetTask returns new columns.
  • scheduler (pure): dueAt / due decision — not-yet-due, due-from-finished, first-run-from-anchor, running skipped, broken skipped.
  • orchestrator: scheduled run with errors trips SetTaskBroken; manual run with errors does not.
  • cascade: deleting a task removes its runs (extend existing cascade test).
  • E2E (prod, per project rules): enable a schedule on a tested task → "Next run" shows correct local time; (with a short test interval) the run auto-starts; the Runs modal shows the entry with scheduled trigger; break an account → the next scheduled run disables the schedule and marks the task broken (red icon in list + detail).

Out of scope (YAGNI)

  • Arbitrary cron expressions; catch-up/backfill of missed runs (run once when due); notifications; per-account breakdown in the run log (totals chosen); configurable poll interval.