docs: sync session-persistence spec to leaner RestartSurface-based design

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-06-15 15:20:02 +07:00
parent 1f69973606
commit e37faf49d3
@@ -119,48 +119,49 @@ table is a `const`/static, not inline literals in branching logic.
### 5. Protocol — `crates/spacesh-proto/src/message.rs` ### 5. Protocol — `crates/spacesh-proto/src/message.rs`
- `Cmd::StartSurface { surface_id: SurfaceId, resume: bool }` start a stopped The codebase already has `Cmd::RestartSurface { surface_id }` (starts a stopped
surface. `resume = true` builds `command + resume_args(command)` (falling surface from its spec, guarded by `is_running`) and an `Attach` response that
back to the original args when no resume mapping exists); `resume = false` already carries `{ snapshot, cols, rows, cursor_row, cursor_col, stopped }`.
builds the original `command + args`. cwd and geometry come from the spec. So no new command or wire type is needed beyond one field:
- `Cmd::GetSnapshot { surface_id: SurfaceId }` → response carries
`Option<SnapshotView>`. - Extend `Cmd::RestartSurface` with `#[serde(default)] resume: bool`. `resume =
- `SnapshotView { ansi, cols, rows, cursor_row, cursor_col }` — a proto-level true` builds `command + resume_args(command)` (falling back to the original
mirror of core `Snapshot`, so `spacesh-proto` does not depend on args when no mapping exists); `resume = false` keeps the original
`spacesh-core`. The daemon converts core `Snapshot` into `SnapshotView` at `command + args` (today's behavior). The `#[serde(default)]` keeps old frames
the protocol boundary. decoding to `resume = false`.
- No `GetSnapshot`, no `StartSurface`, no `SnapshotView`: a stopped-panel
`Attach` returns the **disk** snapshot (see §6) using the existing response
shape.
`spacesh-core::snapshot::Snapshot` gains `Deserialize` (alongside `Serialize`) `spacesh-core::snapshot::Snapshot` gains `Deserialize` (alongside `Serialize`)
so it can be loaded back from disk into `SnapshotRecord` conversions in tests so the store can load it back from disk.
and the store.
### 6. Server handlers — `crates/spaceshd/src/server.rs` ### 6. Server handlers — `crates/spaceshd/src/server.rs`
- `StartSurface`: look up the spec; if missing → error response. Build a - `RestartSurface { surface_id, resume }`: unchanged flow (spec lookup,
`SpawnSpec` with resume-or-plain args, the spec's cwd, and current geometry; `spawn_from_spec`, `set_live`, `SurfaceRestarted` broadcast). When `resume`,
`spawn_surface_deferred(...)`; `registry.set_live(handle)`; broadcast spawn with a spec whose `args` are replaced by `config.resume_args(command)`
`workspace_changed` so all clients flip `running` to true. (when present); otherwise spawn the original spec.
- `GetSnapshot`: read from the snapshot store, convert to `SnapshotView`, - `Attach` for a **stopped** surface: instead of returning the empty
return `Option`. `{ snapshot: "", stopped: true }`, load the disk snapshot via the snapshot
- On surface close/remove: call `snapshot_store.remove(sid)` (via the writer or store and return `{ snapshot: <ansi>, cols, rows, cursor_row, cursor_col,
a direct store handle) so stale files do not accumulate. stopped: true }`. Missing file → empty snapshot, still `stopped: true`.
- Surface close/remove (`Close`, `CloseWorkspace`, `remove_surface` paths):
send a remove to the snapshot writer so stale `<sid>.json` files do not
accumulate.
### 7. App — `app/src` and `app/src-tauri` ### 7. App — `app/src` and `app/src-tauri`
- `socketBridge.ts`: `startSurface(id, resume)`, `getSnapshot(id)`, and a - `socketBridge.ts`: `restartSurface(id, resume = false)` gains the `resume`
`SnapshotView` type. arg; `AttachResult` gains optional `cursor_row`/`cursor_col`/`stopped`.
- `app/src-tauri/src/bridge.rs`: `start_surface` and `get_snapshot` invoke - `app/src-tauri/src/bridge.rs`: `restart_surface` forwards a `resume: bool`
handlers forwarding to the daemon, wired into the Tauri `invoke_handler` and arg into `Cmd::RestartSurface`.
the JS bridge. - `LayoutEngine.tsx` stopped branch (`running[id] === false`): paint the disk
- `LayoutEngine.tsx` / `TerminalView.tsx`: when a surface's `running === false`, snapshot into a dimmed, read-only `xterm` behind the controls, and offer two
render a stopped overlay instead of a live terminal: buttons — **Resume** → `restartSurface(id, true)` and **Restart fresh** →
- fetch `getSnapshot(id)` and paint the ANSI into a read-only, dimmed `restartSurface(id, false)`. On success the daemon's `workspace_changed`
`xterm` instance for visual context; flips `running` to true, the overlay unmounts, and the live `TerminalView`
- centered controls: **Resume**`startSurface(id, true)`, mounts.
**Restart fresh**`startSurface(id, false)`;
- on success the daemon's `workspace_changed` sets `running = true`, the
overlay unmounts, and the normal live `TerminalView` mounts.
- a small "stopped" indicator in the panel header.
## Data flow ## Data flow
@@ -170,18 +171,17 @@ running surface ──(on exit)────────────────
daemon shutdown ──(final pass over live)────────────▶ writer task ──▶ <sid>.json daemon shutdown ──(final pass over live)────────────▶ writer task ──▶ <sid>.json
reboot ▶ daemon cold start ▶ Registry::restore(state.json) ▶ all surfaces stopped reboot ▶ daemon cold start ▶ Registry::restore(state.json) ▶ all surfaces stopped
client ▶ GetSnapshot(sid) ▶ paint dimmed read-only screen + Resume/Restart client ▶ Attach(sid) [stopped] ▶ disk snapshot ▶ paint dimmed read-only screen
user clicks Resume ▶ StartSurface{resume:true} ▶ spawn(command + resume_args, cwd) user clicks Resume ▶ RestartSurface{resume:true} ▶ spawn(command + resume_args, cwd)
workspace_changed(running=true) ▶ live TerminalView mounts SurfaceRestarted + running=true ▶ live TerminalView mounts
``` ```
## Error handling ## Error handling
- Missing/corrupt snapshot file → `GetSnapshot` returns `None`; the overlay - Missing/corrupt snapshot file → stopped `Attach` returns an empty snapshot;
shows an empty dimmed panel with the Resume/Restart controls (still usable). the overlay shows an empty dimmed panel with the Resume/Restart controls.
- `StartSurface` on an unknown/already-running surface → error response; client - `RestartSurface` on an already-running surface → no-op ok (existing
ignores or surfaces a toast. No duplicate actor: guard on `is_running` guard); unknown surface → `NOT_FOUND`.
`registry.is_running(sid)`.
- Resume command for an agent without a mapping → falls back to the original - Resume command for an agent without a mapping → falls back to the original
spec args (plain restart), never fails the spawn. spec args (plain restart), never fails the spawn.
- Writer task failure to write one file is logged and dropped; it must not stall - Writer task failure to write one file is logged and dropped; it must not stall
@@ -202,9 +202,9 @@ user clicks Resume ▶ StartSurface{resume:true} ▶ spawn(command + resume_args
built-in default, then `None`; missing section defaults cleanly. built-in default, then `None`; missing section defaults cleanly.
- **surface actor:** `SurfaceMsg::Snapshot` returns the current grid contents; - **surface actor:** `SurfaceMsg::Snapshot` returns the current grid contents;
`dirty` is true after output and false immediately after a snapshot. `dirty` is true after output and false immediately after a snapshot.
- **server:** `StartSurface{resume:true}` builds `command + resume_args`; - **server:** `RestartSurface{resume:true}` spawns with `command + resume_args`;
`{resume:false}` builds `command + args`; `GetSnapshot` returns the saved `{resume:false}` spawns with `command + args`; stopped `Attach` returns the
view; `is_running` guard prevents a second actor. saved disk snapshot; `is_running` guard prevents a second actor.
- **registry:** starting a stopped surface re-populates the live map and the - **registry:** starting a stopped surface re-populates the live map and the
view flips `running` to true. view flips `running` to true.