Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.9 KiB
Scheduled (recurring) task runs — design
Date: 2026-07-03 Status: awaiting user review
Context
Migrations are run manually today (the operator clicks "Run migration"). For ongoing source→destination sync, the operator wants a task to run itself on a recurring interval (e.g. every 1h or 6h) without babysitting it. Requirements from the user:
- Per-task recurrence interval; the task runs automatically.
- Never start a run while the previous one is still running; measure the interval from the completion of the last run (not its start).
- Show when the next run is due, in the browser's local time.
- A separate modal showing a log of runs with their status and totals.
- Deleting a task must clean up all this data along with the task.
Decisions confirmed with the user:
- Run-log detail: per-run totals only (reuse
runs); no per-account snapshot. - Enable gate: a schedule can be enabled only when all accounts test OK.
- Breaker: if a scheduled run finishes with errors, disable the schedule and mark the task broken (red icon in the tasks list and the task detail). Manual runs never trip the breaker.
- First run: one interval after enabling (
schedule_anchor + interval). - Intervals: presets 1 / 3 / 6 / 12 / 24 h, plus Off.
Current architecture (as-is)
runstable already hasstarted_at,finished_at,status, and totals (migrations/0001_init.up.sql:35-44).CreateRun/FinishRunininternal/store/runs.go. No list-by-task query yet.Orchestrator.Run(ctx, taskID)gates ongateOK(all accounts test ok) andTryMarkTaskRunning(atomic single-run), creates a run, and launchesrunAllasynchronously;runAllsets the task status and callsFinishRun(internal/orchestrator/orchestrator.go).ON DELETE CASCADEfromtasksalready removesruns,accounts, andmigrated_messageswhen a task is deleted.main.gowires store → orchestrator → hub → server and callsResetRunningOnStartupto clear phantom "running" after a restart. No scheduler.- Task DTO / TS
Taskinterface:internal/httpapi/tasks.go,web/src/api.ts.
Decision: in-process polling scheduler
A single background goroutine (a Scheduler) started from main.go, ticking
every 30 s. Each tick queries the DB for schedulable tasks and triggers the
due ones through the existing Orchestrator.Run. The DB is the source of truth,
so the scheduler is stateless and restart-safe (paired with the existing
ResetRunningOnStartup). Chosen over per-task timers (restart-fragile, in-memory
state) and external cron (breaks the single-binary model, needs auth wiring).
30 s poll precision is ample for hour-scale intervals.
Data model
Migration 0004_task_scheduling:
-- up
ALTER TABLE tasks ADD COLUMN schedule_interval_seconds INT NOT NULL DEFAULT 0;
ALTER TABLE tasks ADD COLUMN schedule_anchor TIMESTAMPTZ;
ALTER TABLE tasks ADD COLUMN broken BOOLEAN NOT NULL DEFAULT false;
ALTER TABLE runs ADD COLUMN trigger TEXT NOT NULL DEFAULT 'manual';
-- down
ALTER TABLE runs DROP COLUMN trigger;
ALTER TABLE tasks DROP COLUMN broken;
ALTER TABLE tasks DROP COLUMN schedule_anchor;
ALTER TABLE tasks DROP COLUMN schedule_interval_seconds;
schedule_interval_seconds— 0 = Off. Presets map to seconds (3600, 10800, 21600, 43200, 86400).schedule_anchor— set tonow()when a schedule is enabled; the baseline for the first run when no finished run exists yet.broken— the schedule breaker tripped; task needs operator attention.runs.trigger—'manual' | 'scheduled'; drives the breaker and enriches the run log.
store.Task gains ScheduleIntervalSeconds int64, ScheduleAnchor *time.Time,
Broken bool. store.Run gains Trigger string, plus StartedAt time.Time and
FinishedAt *time.Time (needed for the log + due computation; not currently
scanned).
Cascade / deletion
No new cleanup code. runs cascade-delete on task delete already; the schedule
columns live on tasks and vanish with the row. The in-memory scheduler polls
the DB, so a deleted task simply stops appearing. (Confirmed against
migrations/0001_init.up.sql FKs.)
Store additions
SetTaskSchedule(ctx, id, intervalSeconds int64) error— sets interval, setsschedule_anchor = now()when enabling (interval>0) / leaves anchor when disabling, and clearsbrokenon any call (any explicit schedule change un-breaks the task; enabling is additionally gated on all-accounts-OK in the HTTP layer, disabling is always allowed).SetTaskBroken(ctx, id) error—broken = true, schedule_interval_seconds = 0in one UPDATE (the breaker).CreateRun(ctx, taskID, trigger string)— extend the existing signature to record the trigger.ListRunsByTask(ctx, taskID) ([]Run, error)— newest first, for the modal; scansid, started_at, finished_at, status, totals, trigger.LastFinishedRunAt(ctx, taskID) (*time.Time, error)— most recent run with a non-nullfinished_at, for due computation. (Or fold intoListSchedulable.)ListSchedulableTasks(ctx) ([]ScheduledTask, error)— tasks withschedule_interval_seconds > 0 AND NOT broken AND status <> 'running', joined with their last finished run'sfinished_at, so the scheduler decides in one query per tick. Returns the fieldsdueAtneeds.
GetTask/ListTasks SELECT + Scan extend for the three new task columns.
Scheduler (internal/scheduler/)
New package, one focused responsibility.
type Scheduler struct { store *store.Store; orch *orchestrator.Orchestrator }func (s *Scheduler) Start(ctx context.Context)—time.NewTicker(30s)loop; on each tick callss.tick(ctx); stops onctx.Done().func (s *Scheduler) tick(ctx)—ListSchedulableTasks, then for each due task callss.orch.Run(ctx, taskID)withtrigger = "scheduled".Runerrors (ErrAlreadyRunning,ErrNotTested) are logged, not fatal.- Pure, unit-tested decision function:
func dueAt(interval time.Duration, anchor time.Time, lastFinished *time.Time) time.Time→lastFinished + intervalif a finished run exists, elseanchor + interval. A task is due whennow >= dueAt(...). Kept pure so the tick logic is testable without a DB or clock injection at the call site.
main.go starts it: go scheduler.New(st, orch).Start(context.Background()).
Trigger threading + breaker
Orchestrator.Run gains a trigger string parameter (manual callers pass
"manual"; the scheduler passes "scheduled"). Run forwards it to
CreateRun and into runAll. In runAll, after the final status is computed:
if trigger == "scheduled" && totErr > 0 {
_ = o.store.SetTaskBroken(ctx, task.ID)
o.hub.Publish(Event{Type: "task_broken", TaskID: task.ID, ...})
}
The HTTP handleRun passes "manual". This keeps the breaker precise: only
scheduled runs trip it.
HTTP API
PUT /api/tasks/{id}/schedule— body{interval_seconds: int}. Enabling (interval>0) first checksgateOK(all accounts test OK) viaListAccountsByTask; returns 409 with a clear message if not. ThenSetTaskSchedule. Disabling (0) always allowed.GET /api/tasks/{id}/runs—ListRunsByTask→ run-log rows.- Task DTO (
tasks.go) gainsschedule_interval_seconds,broken, and a computednext_run_at *string(RFC3339, UTC; null when off / running / broken).next_run_atis computed server-side from the samedueAtlogic so the client only formats it.ListTasksDTO also includesbroken(for the red icon in the list).
Frontend
api.ts:Taskgainsschedule_interval_seconds?,broken?,next_run_at?. NewsetTaskSchedule(taskId, intervalSeconds)andlistRuns(taskId)(+ aRuninterface).TaskDetail.tsx: in the run-control panel, an interval<select>(Off/1h/3h/6h/12h/24h) bound tosetTaskSchedule; a "Next run: " line formattingnext_run_atwithtoLocaleString()(shows "running…" when a run is active, nothing when off); a red broken badge whenbroken. A Runs button opens a new modal.RunLogModal.tsx(new): listslistRuns(taskId)rows — started/finished in browser-local time,trigger, aStatusBadge, and copied/skipped/errors. Reuses the existingModal+StatusBadge+.tblpatterns.Tasks.tsx: red icon/indicator on rows wherebroken.- WS: on
task_broken/run_done,TaskDetailreloads (existing reload wiring);Taskslist refreshes.
Error handling
- Enable-while-not-tested → 409, surfaced in the existing
role="alert"banner. - Scheduler
Runerrors are logged and skipped; a broken task is skipped every tick until the operator re-enables (which clearsbroken). - Restart:
ResetRunningOnStartupclears phantom running; the scheduler recomputes due times from persistedfinished_at/anchor.
Testing
- store:
SetTaskSchedule(sets interval+anchor, clears broken),SetTaskBroken(interval→0, broken→true),CreateRunrecords trigger,ListRunsByTaskordering + fields,GetTaskreturns new columns. - scheduler (pure):
dueAt/ due decision — not-yet-due, due-from-finished, first-run-from-anchor, running skipped, broken skipped. - orchestrator: scheduled run with errors trips
SetTaskBroken; manual run with errors does not. - cascade: deleting a task removes its runs (extend existing cascade test).
- E2E (prod, per project rules): enable a schedule on a tested task → "Next
run" shows correct local time; (with a short test interval) the run auto-starts;
the Runs modal shows the entry with
scheduledtrigger; break an account → the next scheduled run disables the schedule and marks the task broken (red icon in list + detail).
Out of scope (YAGNI)
- Arbitrary cron expressions; catch-up/backfill of missed runs (run once when due); notifications; per-account breakdown in the run log (totals chosen); configurable poll interval.