imap-copier/docs/superpowers/specs/2026-07-03-scheduled-task-runs-design.md

# Scheduled (recurring) task runs — design

**Date:** 2026-07-03
**Status:** awaiting user review

## Context

Migrations are run manually today (the operator clicks "Run migration"). For
ongoing source→destination sync, the operator wants a task to run itself on a
recurring interval (e.g. every 1h or 6h) without babysitting it. Requirements
from the user:

- Per-task recurrence interval; the task runs automatically.
- Never start a run while the previous one is still running; measure the interval
  from the **completion** of the last run (not its start).
- Show when the next run is due, in the **browser's local time**.
- A separate modal showing a log of runs with their status and totals.
- Deleting a task must clean up all this data along with the task.

Decisions confirmed with the user:

- **Run-log detail:** per-run totals only (reuse `runs`); no per-account snapshot.
- **Enable gate:** a schedule can be enabled only when **all accounts test OK**.
- **Breaker:** if a *scheduled* run finishes with errors, disable the schedule and
  mark the task **broken** (red icon in the tasks list and the task detail).
  Manual runs never trip the breaker.
- **First run:** one interval after enabling (`schedule_anchor + interval`).
- **Intervals:** presets 1 / 3 / 6 / 12 / 24 h, plus Off.

### Current architecture (as-is)

- `runs` table already has `started_at`, `finished_at`, `status`, and totals
  (`migrations/0001_init.up.sql:35-44`). `CreateRun`/`FinishRun` in
  `internal/store/runs.go`. No list-by-task query yet.
- `Orchestrator.Run(ctx, taskID)` gates on `gateOK` (all accounts test ok) and
  `TryMarkTaskRunning` (atomic single-run), creates a run, and launches `runAll`
  asynchronously; `runAll` sets the task status and calls `FinishRun`
  (`internal/orchestrator/orchestrator.go`).
- `ON DELETE CASCADE` from `tasks` already removes `runs`, `accounts`, and
  `migrated_messages` when a task is deleted.
- `main.go` wires store → orchestrator → hub → server and calls
  `ResetRunningOnStartup` to clear phantom "running" after a restart. No scheduler.
- Task DTO / TS `Task` interface: `internal/httpapi/tasks.go`, `web/src/api.ts`.

## Decision: in-process polling scheduler

A single background goroutine (a `Scheduler`) started from `main.go`, ticking
every **30 s**. Each tick queries the DB for schedulable tasks and triggers the
due ones through the existing `Orchestrator.Run`. The DB is the source of truth,
so the scheduler is stateless and restart-safe (paired with the existing
`ResetRunningOnStartup`). Chosen over per-task timers (restart-fragile, in-memory
state) and external cron (breaks the single-binary model, needs auth wiring).

30 s poll precision is ample for hour-scale intervals.

## Data model

**Migration `0004_task_scheduling`:**

```sql
-- up
ALTER TABLE tasks ADD COLUMN schedule_interval_seconds INT NOT NULL DEFAULT 0;
ALTER TABLE tasks ADD COLUMN schedule_anchor TIMESTAMPTZ;
ALTER TABLE tasks ADD COLUMN broken BOOLEAN NOT NULL DEFAULT false;
ALTER TABLE runs  ADD COLUMN trigger TEXT NOT NULL DEFAULT 'manual';
-- down
ALTER TABLE runs  DROP COLUMN trigger;
ALTER TABLE tasks DROP COLUMN broken;
ALTER TABLE tasks DROP COLUMN schedule_anchor;
ALTER TABLE tasks DROP COLUMN schedule_interval_seconds;
```

- `schedule_interval_seconds` — 0 = Off. Presets map to seconds (3600, 10800,
  21600, 43200, 86400).
- `schedule_anchor` — set to `now()` when a schedule is enabled; the baseline for
  the first run when no finished run exists yet.
- `broken` — the schedule breaker tripped; task needs operator attention.
- `runs.trigger` — `'manual' | 'scheduled'`; drives the breaker and enriches the
  run log.

`store.Task` gains `ScheduleIntervalSeconds int64`, `ScheduleAnchor *time.Time`,
`Broken bool`. `store.Run` gains `Trigger string`, plus `StartedAt time.Time` and
`FinishedAt *time.Time` (needed for the log + due computation; not currently
scanned).

### Cascade / deletion

No new cleanup code. `runs` cascade-delete on task delete already; the schedule
columns live on `tasks` and vanish with the row. The in-memory scheduler polls
the DB, so a deleted task simply stops appearing. (Confirmed against
`migrations/0001_init.up.sql` FKs.)

## Store additions

- `SetTaskSchedule(ctx, id, intervalSeconds int64) error` — sets interval, sets
  `schedule_anchor = now()` when enabling (interval>0) / leaves anchor when
  disabling, and clears `broken` on any call (any explicit schedule change
  un-breaks the task; enabling is additionally gated on all-accounts-OK in the
  HTTP layer, disabling is always allowed).
- `SetTaskBroken(ctx, id) error` — `broken = true, schedule_interval_seconds = 0`
  in one UPDATE (the breaker).
- `CreateRun(ctx, taskID, trigger string)` — extend the existing signature to
  record the trigger.
- `ListRunsByTask(ctx, taskID) ([]Run, error)` — newest first, for the modal;
  scans `id, started_at, finished_at, status, totals, trigger`.
- `LastFinishedRunAt(ctx, taskID) (*time.Time, error)` — most recent run with a
  non-null `finished_at`, for due computation. (Or fold into `ListSchedulable`.)
- `ListSchedulableTasks(ctx) ([]ScheduledTask, error)` — tasks with
  `schedule_interval_seconds > 0 AND NOT broken AND status <> 'running'`, joined
  with their last finished run's `finished_at`, so the scheduler decides in one
  query per tick. Returns the fields `dueAt` needs.

`GetTask`/`ListTasks` SELECT + Scan extend for the three new task columns.

## Scheduler (`internal/scheduler/`)

New package, one focused responsibility.

- `type Scheduler struct { store *store.Store; orch *orchestrator.Orchestrator }`
- `func (s *Scheduler) Start(ctx context.Context)` — `time.NewTicker(30s)` loop;
  on each tick calls `s.tick(ctx)`; stops on `ctx.Done()`.
- `func (s *Scheduler) tick(ctx)` — `ListSchedulableTasks`, then for each due task
  calls `s.orch.Run(ctx, taskID)` with `trigger = "scheduled"`. `Run` errors
  (`ErrAlreadyRunning`, `ErrNotTested`) are logged, not fatal.
- **Pure, unit-tested decision function:**
  `func dueAt(interval time.Duration, anchor time.Time, lastFinished *time.Time) time.Time`
  → `lastFinished + interval` if a finished run exists, else `anchor + interval`.
  A task is due when `now >= dueAt(...)`. Kept pure so the tick logic is testable
  without a DB or clock injection at the call site.

`main.go` starts it: `go scheduler.New(st, orch).Start(context.Background())`.

### Trigger threading + breaker

`Orchestrator.Run` gains a `trigger string` parameter (manual callers pass
`"manual"`; the scheduler passes `"scheduled"`). `Run` forwards it to
`CreateRun` and into `runAll`. In `runAll`, after the final status is computed:

```
if trigger == "scheduled" && totErr > 0 {
    _ = o.store.SetTaskBroken(ctx, task.ID)
    o.hub.Publish(Event{Type: "task_broken", TaskID: task.ID, ...})
}
```

The HTTP `handleRun` passes `"manual"`. This keeps the breaker precise: only
scheduled runs trip it.

## HTTP API

- **`PUT /api/tasks/{id}/schedule`** — body `{interval_seconds: int}`. Enabling
  (interval>0) first checks `gateOK` (all accounts test OK) via
  `ListAccountsByTask`; returns **409** with a clear message if not. Then
  `SetTaskSchedule`. Disabling (0) always allowed.
- **`GET /api/tasks/{id}/runs`** — `ListRunsByTask` → run-log rows.
- **Task DTO** (`tasks.go`) gains `schedule_interval_seconds`, `broken`, and a
  computed `next_run_at *string` (RFC3339, UTC; null when off / running / broken).
  `next_run_at` is computed server-side from the same `dueAt` logic so the client
  only formats it. `ListTasks` DTO also includes `broken` (for the red icon in the
  list).

## Frontend

- **`api.ts`**: `Task` gains `schedule_interval_seconds?`, `broken?`,
  `next_run_at?`. New `setTaskSchedule(taskId, intervalSeconds)` and
  `listRuns(taskId)` (+ a `Run` interface).
- **`TaskDetail.tsx`**: in the run-control panel, an interval `<select>`
  (Off/1h/3h/6h/12h/24h) bound to `setTaskSchedule`; a "Next run: <local time>"
  line formatting `next_run_at` with `toLocaleString()` (shows "running…" when a
  run is active, nothing when off); a red **broken** badge when `broken`. A
  **Runs** button opens a new modal.
- **`RunLogModal.tsx`** (new): lists `listRuns(taskId)` rows — started/finished in
  browser-local time, `trigger`, a `StatusBadge`, and copied/skipped/errors.
  Reuses the existing `Modal` + `StatusBadge` + `.tbl` patterns.
- **`Tasks.tsx`**: red icon/indicator on rows where `broken`.
- WS: on `task_broken`/`run_done`, `TaskDetail` reloads (existing reload wiring);
  `Tasks` list refreshes.

## Error handling

- Enable-while-not-tested → 409, surfaced in the existing `role="alert"` banner.
- Scheduler `Run` errors are logged and skipped; a broken task is skipped every
  tick until the operator re-enables (which clears `broken`).
- Restart: `ResetRunningOnStartup` clears phantom running; the scheduler recomputes
  due times from persisted `finished_at`/`anchor`.

## Testing

- **store**: `SetTaskSchedule` (sets interval+anchor, clears broken),
  `SetTaskBroken` (interval→0, broken→true), `CreateRun` records trigger,
  `ListRunsByTask` ordering + fields, `GetTask` returns new columns.
- **scheduler (pure)**: `dueAt` / due decision — not-yet-due, due-from-finished,
  first-run-from-anchor, running skipped, broken skipped.
- **orchestrator**: scheduled run with errors trips `SetTaskBroken`; manual run
  with errors does not.
- **cascade**: deleting a task removes its runs (extend existing cascade test).
- **E2E** (prod, per project rules): enable a schedule on a tested task → "Next
  run" shows correct local time; (with a short test interval) the run auto-starts;
  the Runs modal shows the entry with `scheduled` trigger; break an account → the
  next scheduled run disables the schedule and marks the task broken (red icon in
  list + detail).

## Out of scope (YAGNI)

- Arbitrary cron expressions; catch-up/backfill of missed runs (run once when due);
  notifications; per-account breakdown in the run log (totals chosen); configurable
  poll interval.