# Session Persistence (resurrect + resume) Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** After a daemon restart (reboot / battery / `kill -9`) the user can bring each panel back: it shows its last on-screen state and offers a one-click **Resume** that respawns the agent with its session-continue flag (e.g. `claude --continue`). **Architecture:** The daemon already persists structure (`state.json`) and already shows stopped panels with a restart overlay; `RestartSurface` already respawns a stopped surface from its spec. This plan adds (1) periodic on-disk snapshots of each surface's visible screen, (2) a `[resume]` config map producing resume args, (3) a `resume` flag on `RestartSurface`, and (4) painting the saved screen behind the overlay plus a Resume button. We reuse `spacesh_core::snapshot::snapshot_ansi` (the live-reattach serializer) for the on-disk snapshot. **Tech Stack:** Rust (tokio actors, serde, alacritty_terminal grid), Tauri 2 bridge, React/TS + xterm.js. **Spec:** `docs/superpowers/specs/2026-06-15-session-persistence-design.md` --- ## Orientation (read before starting) Key existing code this plan builds on: - `crates/spacesh-core/src/snapshot.rs` — `Snapshot { ansi, cols, rows, cursor_row, cursor_col }` (derives `Serialize` only) and `snapshot_ansi(&GridSurface) -> Snapshot`. - `crates/spaceshd/src/state_store.rs` — `JsonStateStore` pattern: atomic write (temp → `sync_all` → rename), corrupt-file tolerance. Mirror this for snapshots. - `crates/spaceshd/src/surface.rs` — surface actor. `spawn_from_spec` → `spawn_surface_deferred` → `run_actor`; eager `spawn_surface` for tests. `SurfaceMsg` enum. `run_actor` owns `grid: GridSurface` and exits via `exit_tx.send((id, code))` after `pty.wait()`. - `crates/spaceshd/src/server.rs` — `serve(socket, store, event_store)`, the `router` single-task loop over `ServerMsg`, `handle_request`, `RestartSurface` handler, the stopped-`Attach` branch, and ~12 `serve(...)` callsites in `#[cfg(test)]`. - `crates/spaceshd/src/config.rs` — `Config` with `#[serde(default)]` sub-tables. - `crates/spacesh-proto/src/message.rs` — `Cmd::RestartSurface { surface_id }`. - `app/src/LayoutEngine.tsx` — `Leaf` renders the `running[id] === false` overlay ("Process exited" + Restart button). - `app/src/socketBridge.ts` — `restartSurface`, `AttachResult`. `app/src-tauri/src/bridge.rs` — `restart_surface`, `attach` invoke handlers. Build/test commands: `cargo test -p spacesh-core`, `cargo test -p spacesh-proto`, `cargo test -p spaceshd`, and `cd app && npx tsc --noEmit`. --- ## Task 1: `Snapshot` gains `Deserialize` **Files:** - Modify: `crates/spacesh-core/src/snapshot.rs` - Test: same file (`#[cfg(test)]` module) - [ ] **Step 1: Write the failing test** Add to the `tests` module in `crates/spacesh-core/src/snapshot.rs`: ```rust #[test] fn snapshot_round_trips_through_json() { let mut g = GridSurface::new(20, 4); g.feed(b"hello"); let snap = snapshot_ansi(&g); let json = serde_json::to_string(&snap).unwrap(); let back: Snapshot = serde_json::from_str(&json).unwrap(); assert_eq!(back.ansi, snap.ansi); assert_eq!((back.cols, back.rows), (snap.cols, snap.rows)); assert_eq!((back.cursor_row, back.cursor_col), (snap.cursor_row, snap.cursor_col)); } ``` - [ ] **Step 2: Run test to verify it fails** Run: `cargo test -p spacesh-core snapshot_round_trips_through_json` Expected: FAIL — `Snapshot` does not implement `Deserialize` (compile error `the trait bound Snapshot: Deserialize<'_> is not satisfied`). - [ ] **Step 3: Add the derive** In `crates/spacesh-core/src/snapshot.rs`, change the `Snapshot` derive and the `serde` import: ```rust use serde::{Deserialize, Serialize}; ``` ```rust /// Serializable snapshot returned by `attach` and persisted to disk. #[derive(Debug, Clone, PartialEq, Serialize, Deserialize)] pub struct Snapshot { /// ANSI byte dump suitable for `xterm.write()`. pub ansi: String, pub cols: u16, pub rows: u16, /// 1-based cursor position. pub cursor_row: u16, pub cursor_col: u16, } ``` (`PartialEq` is added so tests can compare snapshots directly.) - [ ] **Step 4: Run test to verify it passes** Run: `cargo test -p spacesh-core snapshot_round_trips_through_json` Expected: PASS. Also run `cargo test -p spacesh-core` — all green. - [ ] **Step 5: Commit** ```bash git add crates/spacesh-core/src/snapshot.rs git commit -m "feat(core): Snapshot derives Deserialize + PartialEq for disk persistence" ``` --- ## Task 2: `snapshot_store` — per-surface disk store **Files:** - Create: `crates/spaceshd/src/snapshot_store.rs` - Modify: `crates/spaceshd/src/main.rs` (add `mod snapshot_store;`) - Test: in the new file's `#[cfg(test)]` module - [ ] **Step 1: Register the module** In `crates/spaceshd/src/main.rs`, add to the module list (keep alphabetical near `state_store`): ```rust mod snapshot_store; ``` - [ ] **Step 2: Write the failing test** Create `crates/spaceshd/src/snapshot_store.rs` with the test module first (it will not compile until Step 3 adds the types — that is the failing state): ```rust use std::path::PathBuf; use spacesh_core::snapshot::Snapshot; use spacesh_proto::SurfaceId; /// Stores one visible-screen snapshot per surface as `/.json`. pub trait SnapshotStore: Send + Sync { fn save(&self, sid: &SurfaceId, snap: &Snapshot); fn load(&self, sid: &SurfaceId) -> Option; fn remove(&self, sid: &SurfaceId); } /// Writer command: persist or delete a surface's snapshot. Shared by the /// router ticker, the close/remove paths, and each actor's on-exit dump, so a /// single channel type flows everywhere. pub enum SnapshotMsg { Save(SurfaceId, Snapshot), Remove(SurfaceId), } /// A no-op store for tests and contexts that do not persist snapshots. pub struct NullSnapshotStore; impl SnapshotStore for NullSnapshotStore { fn save(&self, _sid: &SurfaceId, _snap: &Snapshot) {} fn load(&self, _sid: &SurfaceId) -> Option { None } fn remove(&self, _sid: &SurfaceId) {} } /// JSON file store. Filenames are the surface id (e.g. `s_1f.json`); ids are /// `^[a-z]_[0-9a-f]+$` so they are always safe path components. pub struct JsonSnapshotStore { dir: PathBuf, } impl JsonSnapshotStore { pub fn new(dir: PathBuf) -> Self { let _ = std::fs::create_dir_all(&dir); Self { dir } } fn path(&self, sid: &SurfaceId) -> PathBuf { self.dir.join(format!("{}.json", sid.0)) } } impl SnapshotStore for JsonSnapshotStore { fn save(&self, sid: &SurfaceId, snap: &Snapshot) { let path = self.path(sid); let tmp = path.with_extension("json.tmp"); let Ok(bytes) = serde_json::to_vec(snap) else { return }; if std::fs::write(&tmp, &bytes).is_err() { return; } if let Ok(f) = std::fs::File::open(&tmp) { let _ = f.sync_all(); } let _ = std::fs::rename(&tmp, &path); } fn load(&self, sid: &SurfaceId) -> Option { let bytes = std::fs::read(self.path(sid)).ok()?; serde_json::from_slice(&bytes).ok() } fn remove(&self, sid: &SurfaceId) { let _ = std::fs::remove_file(self.path(sid)); } } #[cfg(test)] mod tests { use super::*; fn tmp_dir(name: &str) -> PathBuf { let n = std::time::SystemTime::now().duration_since(std::time::UNIX_EPOCH).unwrap().as_nanos(); let p = std::env::temp_dir().join(format!("spacesh-snap-{name}-{n}")); std::fs::create_dir_all(&p).unwrap(); p } fn sample() -> Snapshot { Snapshot { ansi: "\u{1b}[mhello".into(), cols: 80, rows: 24, cursor_row: 1, cursor_col: 6 } } #[test] fn save_then_load_round_trips() { let dir = tmp_dir("roundtrip"); let store = JsonSnapshotStore::new(dir.clone()); let sid = SurfaceId("s_1".into()); store.save(&sid, &sample()); assert_eq!(store.load(&sid), Some(sample())); let _ = std::fs::remove_dir_all(dir); } #[test] fn missing_loads_none() { let store = JsonSnapshotStore::new(tmp_dir("missing")); assert_eq!(store.load(&SurfaceId("s_none".into())), None); } #[test] fn corrupt_loads_none() { let dir = tmp_dir("corrupt"); let store = JsonSnapshotStore::new(dir.clone()); let sid = SurfaceId("s_2".into()); std::fs::write(dir.join("s_2.json"), b"{ not json").unwrap(); assert_eq!(store.load(&sid), None); let _ = std::fs::remove_dir_all(dir); } #[test] fn remove_deletes_file() { let dir = tmp_dir("remove"); let store = JsonSnapshotStore::new(dir.clone()); let sid = SurfaceId("s_3".into()); store.save(&sid, &sample()); assert!(store.load(&sid).is_some()); store.remove(&sid); assert_eq!(store.load(&sid), None); let _ = std::fs::remove_dir_all(dir); } #[test] fn null_store_is_inert() { let store = NullSnapshotStore; let sid = SurfaceId("s_4".into()); store.save(&sid, &sample()); assert_eq!(store.load(&sid), None); store.remove(&sid); } } ``` - [ ] **Step 3: Run tests to verify they pass** The module body above already contains the implementation, so this task writes test + impl together (the store is pure I/O with no logic worth a red-then-green split beyond compilation). Run: `cargo test -p spaceshd snapshot_store` Expected: PASS — 5 tests (`save_then_load_round_trips`, `missing_loads_none`, `corrupt_loads_none`, `remove_deletes_file`, `null_store_is_inert`). - [ ] **Step 4: Commit** ```bash git add crates/spaceshd/src/snapshot_store.rs crates/spaceshd/src/main.rs git commit -m "feat(daemon): per-surface JSON snapshot store (atomic write, corrupt-tolerant)" ``` --- ## Task 3: Resume config + snapshot interval **Files:** - Modify: `crates/spaceshd/src/config.rs` - Test: same file (`#[cfg(test)]` module) - [ ] **Step 1: Write the failing test** Add to the `tests` module in `crates/spaceshd/src/config.rs`: ```rust #[test] fn resume_args_user_then_default_then_none() { let mut c = Config::default(); // built-in defaults present without any config assert_eq!(c.resume_args("claude").as_deref(), Some(&["--continue".to_string()][..])); assert_eq!(c.resume_args("codex").as_deref(), Some(&["resume".to_string()][..])); // a path is reduced to its basename before lookup assert_eq!(c.resume_args("/usr/local/bin/claude").as_deref(), Some(&["--continue".to_string()][..])); // unknown command → None assert_eq!(c.resume_args("bash"), None); // user override wins over the default c.resume.commands.insert("claude".into(), vec!["--resume".into(), "last".into()]); assert_eq!(c.resume_args("claude"), Some(vec!["--resume".into(), "last".into()])); } #[test] fn snapshot_interval_defaults_to_5s() { let c = Config::default(); assert_eq!(c.snapshot_interval_secs(), 5); } #[test] fn parses_resume_table_and_interval() { let dir = std::env::temp_dir().join(format!("spacesh-cfg-resume-{}", std::process::id())); std::fs::create_dir_all(&dir).unwrap(); let path = dir.join("config.toml"); std::fs::write(&path, "snapshot_interval_secs = 10\n[resume.commands]\ngemini = [\"--resume\"]\n").unwrap(); let c = Config::from_path(&path); assert_eq!(c.snapshot_interval_secs(), 10); assert_eq!(c.resume_args("gemini"), Some(vec!["--resume".into()])); let _ = std::fs::remove_file(&path); } ``` - [ ] **Step 2: Run tests to verify they fail** Run: `cargo test -p spaceshd resume_args_user_then_default_then_none` Expected: FAIL — compile error: no field `resume`, no method `resume_args`/`snapshot_interval_secs`. - [ ] **Step 3: Implement config additions** In `crates/spaceshd/src/config.rs`, add the struct and a default table, and extend `Config`: ```rust /// Built-in resume args for known agents, used when config has no override. /// (command basename, resume args) const DEFAULT_RESUME: &[(&str, &[&str])] = &[ ("claude", &["--continue"]), ("codex", &["resume"]), ("deepseek", &["resume"]), ]; #[derive(Debug, Clone, Default, Deserialize, Serialize)] pub struct ResumeConfig { /// command basename -> args that continue its previous session. #[serde(default)] pub commands: std::collections::HashMap>, } ``` Add the fields to `Config`: ```rust #[derive(Debug, Clone, Default, Deserialize, Serialize)] pub struct Config { #[serde(default, skip_serializing_if = "Option::is_none")] pub default_shell: Option, #[serde(default)] pub terminal: TerminalConfig, #[serde(default)] pub appearance: AppearanceConfig, #[serde(default)] pub resume: ResumeConfig, /// How often (seconds) the daemon dumps changed grids to disk. #[serde(default, skip_serializing_if = "Option::is_none")] pub snapshot_interval_secs: Option, } ``` Add the resolver methods in the `impl Config` block: ```rust /// Resume args for a command, by basename: user map → built-in default → None. pub fn resume_args(&self, command: &str) -> Option> { let base = std::path::Path::new(command) .file_name() .map(|s| s.to_string_lossy().to_string()) .unwrap_or_else(|| command.to_string()); if let Some(args) = self.resume.commands.get(&base) { return Some(args.clone()); } DEFAULT_RESUME.iter() .find(|(name, _)| *name == base) .map(|(_, args)| args.iter().map(|s| s.to_string()).collect()) } /// Snapshot dump cadence in seconds (config → default 5, clamped to [1, 3600]). pub fn snapshot_interval_secs(&self) -> u64 { self.snapshot_interval_secs.unwrap_or(5).clamp(1, 3600) } ``` - [ ] **Step 4: Run tests to verify they pass** Run: `cargo test -p spaceshd config` Expected: PASS — including the three new tests and the existing config tests. - [ ] **Step 5: Commit** ```bash git add crates/spaceshd/src/config.rs git commit -m "feat(daemon): [resume] config map + snapshot_interval_secs with built-in defaults" ``` --- ## Task 4: Actor `Snapshot` message + dirty flag + on-exit dump **Files:** - Modify: `crates/spaceshd/src/surface.rs` - Test: same file (`#[cfg(test)]` module) This adds a snapshot channel threaded through every spawn entry point. The channel carries `SnapshotMsg` (defined in Task 2) to the writer (Task 5); here the actor only ever sends `SnapshotMsg::Save(id, snap)` on exit and answers on-demand `SurfaceMsg::Snapshot` requests. Add the import at the top of `surface.rs`: `use crate::snapshot_store::SnapshotMsg;`. - [ ] **Step 1: Write the failing tests** Add to the `tests` module in `crates/spaceshd/src/surface.rs`. Note the existing test helper `spawn_surface(...)` signature gains a trailing `snapshot_tx`; these tests use it. ```rust #[tokio::test(flavor = "multi_thread", worker_threads = 2)] async fn snapshot_msg_returns_grid_and_tracks_dirty() { let _serial = crate::test_support::serial(); let pty = PtyHandle::spawn(spec("printf DIRTYME; sleep 0.4")).unwrap(); let (state_tx, _s) = mpsc::unbounded_channel(); let (exit_tx, _e) = mpsc::unbounded_channel(); let (snap_tx, _snap_rx) = mpsc::unbounded_channel(); let handle = spawn_surface(SurfaceId("s_1".into()), WorkspaceId("w_1".into()), pty, 80, 24, false, state_tx, exit_tx, snap_tx); // Give the child time to print. tokio::time::sleep(Duration::from_millis(150)).await; let (reply_tx, reply_rx) = oneshot::channel(); handle.tx.send(SurfaceMsg::Snapshot { reply: reply_tx }).await.unwrap(); let (snap, dirty) = reply_rx.await.unwrap(); assert!(snap.ansi.contains("DIRTYME"), "snapshot: {:?}", snap.ansi); assert!(dirty, "first snapshot after output should be dirty"); // Immediately snapshot again with no new output → not dirty. let (reply_tx, reply_rx) = oneshot::channel(); handle.tx.send(SurfaceMsg::Snapshot { reply: reply_tx }).await.unwrap(); let (_snap2, dirty2) = reply_rx.await.unwrap(); assert!(!dirty2, "second snapshot with no new output should be clean"); } #[tokio::test(flavor = "multi_thread", worker_threads = 2)] async fn final_snapshot_sent_on_exit() { let _serial = crate::test_support::serial(); let pty = PtyHandle::spawn(spec("printf BYE")).unwrap(); // exits immediately let (state_tx, _s) = mpsc::unbounded_channel(); let (exit_tx, _e) = mpsc::unbounded_channel(); let (snap_tx, mut snap_rx) = mpsc::unbounded_channel(); let _handle = spawn_surface(SurfaceId("s_x".into()), WorkspaceId("w_1".into()), pty, 80, 24, false, state_tx, exit_tx, snap_tx); let msg = tokio::time::timeout(Duration::from_secs(2), snap_rx.recv()).await.unwrap().unwrap(); match msg { crate::snapshot_store::SnapshotMsg::Save(sid, snap) => { assert_eq!(sid.0, "s_x"); assert!(snap.ansi.contains("BYE"), "final snapshot: {:?}", snap.ansi); } _ => panic!("expected a Save message on exit"), } } ``` - [ ] **Step 2: Run tests to verify they fail** Run: `cargo test -p spaceshd snapshot_msg_returns_grid_and_tracks_dirty` Expected: FAIL — compile error: `SurfaceMsg::Snapshot` variant missing and `spawn_surface` takes too few arguments. - [ ] **Step 3: Add the message variant and snapshot channel** In `crates/spaceshd/src/surface.rs`: Add the variant to `SurfaceMsg`: ```rust pub enum SurfaceMsg { Input(Vec), Resize { cols: u16, rows: u16 }, Attach { reply: oneshot::Sender>> }, /// Attach with snapshot: subscribe AND capture the grid in one actor turn. AttachSnapshot { reply: oneshot::Sender<(Snapshot, broadcast::Receiver>)> }, /// On-demand snapshot without subscribing; bool = dirty since last snapshot. Snapshot { reply: oneshot::Sender<(Snapshot, bool)> }, Close, } ``` Thread a `snapshot_tx: mpsc::UnboundedSender` parameter through `spawn_from_spec`, `spawn_surface`, `spawn_surface_deferred`, and `run_actor`. For each, add the parameter (last position) and pass it down. `spawn_from_spec` signature + body: ```rust #[allow(clippy::too_many_arguments)] pub fn spawn_from_spec( id: SurfaceId, workspace_id: WorkspaceId, spec: &SurfaceSpec, extra_env: Vec<(String, String)>, hooks_active: bool, state_tx: mpsc::UnboundedSender<(SurfaceId, SurfaceState)>, exit_tx: mpsc::UnboundedSender<(SurfaceId, i32)>, snapshot_tx: mpsc::UnboundedSender, ) -> std::io::Result { let mut env = vec![("SPACESH_SURFACE_ID".to_string(), id.0.clone())]; env.extend(extra_env); let spawn_spec = SpawnSpec { command: spec.command.clone(), args: spec.args.clone(), cwd: std::path::PathBuf::from(&spec.cwd), cols: spec.cols, rows: spec.rows, env, }; Ok(spawn_surface_deferred(id, workspace_id, spawn_spec, spec.cols, spec.rows, hooks_active, state_tx, exit_tx, snapshot_tx)) } ``` `spawn_surface` (eager, test path): ```rust #[allow(clippy::too_many_arguments)] pub fn spawn_surface( id: SurfaceId, workspace_id: WorkspaceId, pty: PtyHandle, cols: u16, rows: u16, hooks_active: bool, state_tx: mpsc::UnboundedSender<(SurfaceId, SurfaceState)>, exit_tx: mpsc::UnboundedSender<(SurfaceId, i32)>, snapshot_tx: mpsc::UnboundedSender, ) -> SurfaceHandle { let (tx, rx) = mpsc::channel::(64); let (bcast, _) = broadcast::channel::>(BROADCAST_CAP); tokio::spawn(run_actor(id.clone(), pty, cols, rows, hooks_active, bcast, rx, state_tx, exit_tx, Vec::new(), snapshot_tx)); SurfaceHandle { id, workspace_id, tx } } ``` `spawn_surface_deferred`: add `snapshot_tx: mpsc::UnboundedSender` as the final parameter; inside the pre-spawn loop, answer the new message with the empty grid; and pass `snapshot_tx` into `run_actor`. In the pre-spawn `select!`, add: ```rust Some(SurfaceMsg::Snapshot { reply }) => { let snap = snapshot_ansi(&GridSurface::new(cols, rows)); let _ = reply.send((snap, false)); } ``` and change the spawn call: ```rust Ok(pty) => run_actor(actor_id, pty, cols, rows, hooks_active, bcast, rx, state_tx, exit_tx, prebuf, snapshot_tx).await, ``` `run_actor`: add `snapshot_tx: mpsc::UnboundedSender` as the final parameter. Introduce a `dirty` flag, set it when output arrives, clear it on a snapshot, answer the new message, and send the final snapshot on exit. The relevant edits inside `run_actor`'s grid block: Declare alongside the other loop locals: ```rust let mut dirty = false; ``` In the `SurfaceMsg::AttachSnapshot` arm, after building `snap`, also clear dirty (the screen has just been handed out fresh): ```rust Some(SurfaceMsg::AttachSnapshot { reply }) => { let sub = bcast.subscribe(); let snap = snapshot_ansi(&grid); dirty = false; let _ = reply.send((snap, sub)); } ``` Add the new arm next to it: ```rust Some(SurfaceMsg::Snapshot { reply }) => { let snap = snapshot_ansi(&grid); let was_dirty = dirty; dirty = false; let _ = reply.send((snap, was_dirty)); } ``` In the PTY output arm, when bytes arrive (the `Some(bytes) =>` branch), set `dirty = true;` after extending `pending`: ```rust Some(bytes) => { pending.extend_from_slice(&bytes); dirty = true; if flush_deadline.is_none() { flush_deadline = Some(Instant::now() + FLUSH_INTERVAL); } if pending.len() >= FLUSH_BYTES { flush(&mut pending, &mut grid, &mut osc, &mut deterministic, &mut last_state, &detect_id, &bcast, &state_tx); flush_deadline = None; } } ``` Replace the exit tail of the block (currently `let code = pty.wait(); let _ = exit_tx.send((actor_id, code));`) with a final snapshot first: ```rust let final_snap = snapshot_ansi(&grid); let _ = snapshot_tx.send(SnapshotMsg::Save(actor_id.clone(), final_snap)); let code = pty.wait(); let _ = exit_tx.send((actor_id, code)); } } ``` > Note: `actor_id` is currently moved into `detect_id`/used once; clone as needed so it is available for both the snapshot send and `exit_tx`. If the compiler reports a move, change the earlier `let detect_id = id;` / `let actor_id = id.clone();` setup so both `actor_id` (cloneable) and `detect_id` exist, and use `actor_id.clone()` for the snapshot send. Update the existing in-file tests `attach_receives_output` and `attach_snapshot_reflects_prior_output` (and any other `spawn_surface(...)` callers in this file's tests) to pass a snapshot sender. Add `let (snap_tx, _snap_rx) = mpsc::unbounded_channel();` before each `spawn_surface` call and append `, snap_tx` to the call. - [ ] **Step 4: Run tests to verify they pass** Run: `cargo test -p spaceshd -- surface` Expected: PASS — the two new tests plus the pre-existing surface tests (now passing the extra arg). - [ ] **Step 5: Commit** ```bash git add crates/spaceshd/src/surface.rs git commit -m "feat(daemon): actor Snapshot message + dirty tracking + final snapshot on exit" ``` --- ## Task 5: Snapshot writer task **Files:** - Modify: `crates/spaceshd/src/snapshot_store.rs` - Test: same file (`#[cfg(test)]` module) The writer owns the store and serializes all disk writes off the router/actor hot paths. It accepts saves and removes over one channel. - [ ] **Step 1: Write the failing test** Add to `crates/spaceshd/src/snapshot_store.rs` (`SnapshotMsg` was already defined in Task 2; this task adds only the writer + its test). The test needs tokio: ```rust /// Spawn the writer task; returns the sender used by the router and actors. pub fn spawn_writer(store: std::sync::Arc) -> tokio::sync::mpsc::UnboundedSender { let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::(); tokio::spawn(async move { while let Some(msg) = rx.recv().await { match msg { SnapshotMsg::Save(sid, snap) => store.save(&sid, &snap), SnapshotMsg::Remove(sid) => store.remove(&sid), } } }); tx } ``` Test: ```rust #[tokio::test] async fn writer_saves_and_removes() { let dir = tmp_dir("writer"); let store: std::sync::Arc = std::sync::Arc::new(JsonSnapshotStore::new(dir.clone())); let tx = spawn_writer(store.clone()); let sid = SurfaceId("s_w".into()); tx.send(SnapshotMsg::Save(sid.clone(), sample())).unwrap(); // Poll until the writer has flushed (bounded). let mut saved = None; for _ in 0..50 { if let Some(s) = store.load(&sid) { saved = Some(s); break; } tokio::time::sleep(std::time::Duration::from_millis(10)).await; } assert_eq!(saved, Some(sample())); tx.send(SnapshotMsg::Remove(sid.clone())).unwrap(); let mut gone = false; for _ in 0..50 { if store.load(&sid).is_none() { gone = true; break; } tokio::time::sleep(std::time::Duration::from_millis(10)).await; } assert!(gone, "writer should have removed the snapshot file"); let _ = std::fs::remove_dir_all(dir); } ``` - [ ] **Step 2: Run test to verify it passes** Implementation is included above (the writer is a thin loop). Run: `cargo test -p spaceshd writer_saves_and_removes` Expected: PASS. - [ ] **Step 3: Commit** ```bash git add crates/spaceshd/src/snapshot_store.rs git commit -m "feat(daemon): snapshot writer task (Save/Remove over one channel)" ``` --- ## Task 6: Server wiring — store param, ticker, stopped-Attach reads disk, remove on close **Files:** - Modify: `crates/spaceshd/src/server.rs` - Modify: `crates/spaceshd/src/main.rs` - Test: `crates/spaceshd/src/server.rs` (`#[cfg(test)]`) - [ ] **Step 1: Thread the snapshot store into `serve` and `router`** In `crates/spaceshd/src/server.rs`: Add imports near the other `use crate::...` lines: ```rust use crate::snapshot_store::{SnapshotStore, SnapshotMsg, spawn_writer}; ``` Change `serve` to accept the store, build the writer + ticker, and pass both the writer sender and an `Arc` clone (for reads) into `router`: ```rust pub async fn serve( socket: &Path, store: Arc, event_store: Arc, snapshot_store: Arc, ) -> Result<()> { let listener = UnixListener::bind(socket)?; let (router_tx, router_rx) = mpsc::channel::(256); // ... existing exit_tx / state_tx bridges unchanged ... let snapshot_tx = spawn_writer(snapshot_store.clone()); // Periodic snapshot tick → router. let tick_router = router_tx.clone(); let interval_secs = crate::config::Config::load().snapshot_interval_secs(); tokio::spawn(async move { let mut tick = tokio::time::interval(Duration::from_secs(interval_secs)); tick.tick().await; // consume the immediate first tick loop { tick.tick().await; if tick_router.send(ServerMsg::SnapshotTick).await.is_err() { break; } } }); let persister = persist::spawn(store.clone(), Duration::from_millis(500)); let initial = store.load().unwrap_or_default(); let event_persister = event_store::spawn(event_store.clone(), Duration::from_millis(500)); let event_initial = event_store.load().unwrap_or_default(); let started_at_ms = now_millis(); let shutdown = tokio::spawn(router( router_rx, router_tx.clone(), exit_tx, state_tx, persister, initial, event_persister, event_initial, started_at_ms, snapshot_store, snapshot_tx, )); // ... existing accept loop unchanged ... } ``` Add `SnapshotTick` to the `ServerMsg` enum (around line 23): ```rust enum ServerMsg { // ... existing variants ... SnapshotTick, } ``` Change `router`'s signature to take the two new params (final positions): ```rust async fn router( mut rx: mpsc::Receiver, router_tx: mpsc::Sender, exit_tx: mpsc::UnboundedSender<(SurfaceId, i32)>, state_tx: mpsc::UnboundedSender<(SurfaceId, SurfaceState)>, persister: Persister, initial: crate::state_store::PersistState, event_persister: EventPersister, event_initial: crate::event_log::EventLogState, started_at_ms: u64, snapshot_store: Arc, snapshot_tx: mpsc::UnboundedSender, ) { ``` - [ ] **Step 2: Handle `SnapshotTick` and thread the snapshot sender to spawns** In the `router` match loop, add the tick arm. It snapshots each live surface and forwards dirty ones to the writer: ```rust ServerMsg::SnapshotTick => { let ids: Vec = reg.live_ids(); for sid in ids { let Some(handle) = reg.live(&sid) else { continue }; let (reply_tx, reply_rx) = oneshot::channel(); if handle.tx.send(SurfaceMsg::Snapshot { reply: reply_tx }).await.is_err() { continue; } if let Ok((snap, dirty)) = reply_rx.await { if dirty { let _ = snapshot_tx.send(SnapshotMsg::Save(sid.clone(), snap)); } } } } ``` This needs a `live_ids()` accessor on `Registry`. In `crates/spaceshd/src/registry.rs` add: ```rust /// Ids of all currently-live surfaces. pub fn live_ids(&self) -> Vec { self.live.keys().cloned().collect() } ``` Pass `snapshot_tx.clone()` into every `spawn_from_spec(...)` call inside `handle_request`. There are four callsites (NewSurface, SplitSurface, ApplyPreset, RestartSurface). Each currently ends `..., state_tx.clone(), exit_tx.clone())`; change to `..., state_tx.clone(), exit_tx.clone(), snapshot_tx.clone())`. To make `snapshot_tx` reachable inside `handle_request`, add it as a parameter to `handle_request` and pass it from the `ServerMsg::Request` arm: ```rust ServerMsg::Request { id, cmd, client, out } => { handle_request(id, cmd, client, out, &mut reg, &mut subs, &clients, &router_tx, &exit_tx, &state_tx, &persister, &mut event_log, &event_persister, started_at_ms, &mut config, &snapshot_store, &snapshot_tx).await; } ``` and in `handle_request`'s signature add the two trailing params: ```rust snapshot_store: &Arc, snapshot_tx: &mpsc::UnboundedSender, ``` - [ ] **Step 3: Stopped-`Attach` returns the disk snapshot; close/remove deletes it** In the `Cmd::Attach` handler, replace the stopped-panel branch (the `else` that returns the empty snapshot) with a disk read: ```rust } else { // stopped panel: no live stream. Paint the last on-disk screen if we have one. match snapshot_store.load(&surface_id) { Some(snap) => { let _ = out.send(ok(id, serde_json::json!({ "snapshot": snap.ansi, "cols": snap.cols, "rows": snap.rows, "cursor_row": snap.cursor_row, "cursor_col": snap.cursor_col, "stopped": true, }))).await; } None => { let _ = out.send(ok(id, serde_json::json!({ "snapshot": "", "cols": 0, "rows": 0, "stopped": true }))).await; } } } ``` In the `Cmd::Close` handler and `Cmd::CloseWorkspace` handler, after the surface(s) are removed, drop their snapshot files. For `Close { surface_id }` add, right after `reg.remove_surface(&surface_id)` (or wherever the removal happens): ```rust let _ = snapshot_tx.send(SnapshotMsg::Remove(surface_id.clone())); ``` For `CloseWorkspace { workspace_id }`, the handler already collects `let ids = reg.close_workspace(&workspace_id);`. After the existing cleanup loop, add: ```rust for sid in &ids { let _ = snapshot_tx.send(SnapshotMsg::Remove(sid.clone())); } ``` - [ ] **Step 4: Update `main.rs` to build and pass the store** In `crates/spaceshd/src/main.rs`, in `run_daemon`, after the event store is built: ```rust let snapshots_dir = lifecycle::spacesh_dir()?.join("snapshots"); let snapshot_store: std::sync::Arc = std::sync::Arc::new(snapshot_store::JsonSnapshotStore::new(snapshots_dir)); eprintln!("spaceshd listening on {}", sock.display()); server::serve(&sock, store, event_store, snapshot_store).await ``` - [ ] **Step 5: Fix all `serve(...)` test callsites** In `crates/spaceshd/src/server.rs`'s `#[cfg(test)]` module there are ~12 calls of the form `serve(&sockX, store, event_store)` (and `..._b` variants). Append a `NullSnapshotStore` argument to each. Add this import inside the test module: ```rust use crate::snapshot_store::NullSnapshotStore; ``` and change each call, e.g.: ```rust tokio::spawn(async move { let _ = serve(&sock_for_task, store2, event_store, std::sync::Arc::new(NullSnapshotStore)).await; }); ``` Apply the same `, std::sync::Arc::new(NullSnapshotStore)` insertion before `.await` to **every** `serve(...)` call in the test module (~12 sites, including the `_b` second-daemon ones). Compilation will fail until all are updated — use the compiler errors as the checklist. - [ ] **Step 6: Write the stopped-Attach integration test** Add a new test in the `server.rs` test module. It starts a daemon with a real `JsonSnapshotStore` over a temp dir, opens a workspace + surface, lets it print, forces a snapshot tick by waiting (or by closing the surface so the on-exit final snapshot lands), then re-attaches a fresh client and asserts the disk snapshot comes back for the stopped surface. ```rust #[tokio::test(flavor = "multi_thread", worker_threads = 4)] async fn stopped_attach_returns_disk_snapshot() { let _serial = crate::test_support::serial(); let dir = unique_tmp_dir("stopped-snap"); // use the module's existing temp-dir helper let sock = dir.join("sock"); let store: std::sync::Arc = std::sync::Arc::new(crate::state_store::JsonStateStore::new(dir.join("state.json"))); let event_store: std::sync::Arc = std::sync::Arc::new(crate::event_store::JsonEventStore::new(dir.join("events.json"))); let snap_store: std::sync::Arc = std::sync::Arc::new(crate::snapshot_store::JsonSnapshotStore::new(dir.join("snapshots"))); let sock2 = sock.clone(); tokio::spawn(async move { let _ = serve(&sock2, store, event_store, snap_store).await; }); wait_for_socket(&sock).await; // module helper let mut c = connect(&sock).await; // module helper let ws = open_workspace(&mut c, dir.to_str().unwrap()).await; // adapt to existing helpers let sid = new_surface(&mut c, &ws, Some("/bin/sh"), vec!["-c".into(), "printf SNAPDISK; sleep 0.2".into()]).await; // Let it print and exit; the actor sends a final snapshot on exit. tokio::time::sleep(Duration::from_millis(500)).await; // Fresh client attaches to the now-stopped surface. let mut c2 = connect(&sock).await; let r = req(&mut c2, 99, Cmd::Attach { surface_id: spacesh_proto::SurfaceId(sid.clone()) }).await; let data = res_data(&r); assert_eq!(data["stopped"], serde_json::json!(true)); assert!(data["snapshot"].as_str().unwrap().contains("SNAPDISK"), "snapshot: {:?}", data["snapshot"]); let _ = std::fs::remove_dir_all(dir); } ``` > Adapt the helper calls (`unique_tmp_dir`, `wait_for_socket`, `connect`, `open_workspace`/`new_surface`, `req`, `res_data`) to the exact helpers already used by the neighbouring tests (see `reattach_returns_snapshot_with_prior_output` for the established pattern). The assertions are the contract: `stopped == true` and the ANSI contains the printed marker. - [ ] **Step 7: Run tests** Run: `cargo test -p spaceshd` Expected: PASS — all daemon tests including the new `stopped_attach_returns_disk_snapshot`. Watch for any missed `serve(...)` callsite (compile error) and fix. - [ ] **Step 8: Commit** ```bash git add crates/spaceshd/src/server.rs crates/spaceshd/src/main.rs crates/spaceshd/src/registry.rs git commit -m "feat(daemon): periodic snapshot ticker + stopped-attach reads disk snapshot + cleanup on close" ``` --- ## Task 7: Protocol — `RestartSurface` gains `resume` **Files:** - Modify: `crates/spacesh-proto/src/message.rs` - Test: same file (`#[cfg(test)]`) - [ ] **Step 1: Write the failing test** Add to the `tests` module in `crates/spacesh-proto/src/message.rs`: ```rust #[test] fn restart_surface_resume_defaults_false_and_round_trips() { // Legacy frame without `resume` decodes to false. let legacy = r#"{"kind":"req","id":5,"cmd":{"cmd":"restart_surface","args":{"surface_id":"s_1"}}}"#; let env: Envelope = serde_json::from_str(legacy).unwrap(); match env { Envelope::Req { cmd: Cmd::RestartSurface { resume, .. }, .. } => assert!(!resume), _ => panic!("wrong variant"), } // resume=true round-trips. let e = Envelope::Req { id: 6, cmd: Cmd::RestartSurface { surface_id: SurfaceId("s_1".into()), resume: true } }; let back: Envelope = serde_json::from_str(&serde_json::to_string(&e).unwrap()).unwrap(); assert_eq!(back, e); } ``` - [ ] **Step 2: Run test to verify it fails** Run: `cargo test -p spacesh-proto restart_surface_resume` Expected: FAIL — `Cmd::RestartSurface` has no `resume` field. - [ ] **Step 3: Add the field** In `crates/spacesh-proto/src/message.rs`, change the variant: ```rust RestartSurface { surface_id: SurfaceId, #[serde(default)] resume: bool, }, ``` - [ ] **Step 4: Run test to verify it passes** Run: `cargo test -p spacesh-proto` — all green. > This breaks the daemon and Tauri callers that construct `Cmd::RestartSurface`. They are fixed in Tasks 8 and 9; if you build the whole workspace now it will fail to compile there — that is expected and resolved by the next tasks. - [ ] **Step 5: Commit** ```bash git add crates/spacesh-proto/src/message.rs git commit -m "feat(proto): RestartSurface gains resume flag (defaults false)" ``` --- ## Task 8: Server honors `resume` **Files:** - Modify: `crates/spaceshd/src/server.rs` - Test: same file (`#[cfg(test)]`) - [ ] **Step 1: Write the failing test for the pure helper** Add a unit test (no process spawn) for a helper that swaps args when resuming: ```rust #[test] fn resume_spec_swaps_args_when_mapped() { use spacesh_proto::workspace::SurfaceSpec; let spec = SurfaceSpec { command: "claude".into(), args: vec!["--foo".into()], cwd: "/tmp".into(), agent_label: Some("claude".into()), cols: 80, rows: 24, autostart: false, }; let cfg = crate::config::Config::default(); // resume=false → original args let plain = resume_spec(&spec, false, &cfg); assert_eq!(plain.args, vec!["--foo".to_string()]); // resume=true with a default mapping → resume args let resumed = resume_spec(&spec, true, &cfg); assert_eq!(resumed.args, vec!["--continue".to_string()]); // resume=true for an unmapped command → original args (graceful fallback) let mut shell = spec.clone(); shell.command = "bash".into(); let resumed_shell = resume_spec(&shell, true, &cfg); assert_eq!(resumed_shell.args, shell.args); } ``` - [ ] **Step 2: Run test to verify it fails** Run: `cargo test -p spaceshd resume_spec_swaps_args_when_mapped` Expected: FAIL — `resume_spec` not defined. - [ ] **Step 3: Implement the helper and use it in the handler** Add the helper near `spawn_env` in `crates/spaceshd/src/server.rs`: ```rust /// Build the spawn spec for a (re)start. When `resume` and the command has a /// resume mapping, its args are replaced with the resume args; otherwise the /// original spec args are kept. fn resume_spec( spec: &spacesh_proto::workspace::SurfaceSpec, resume: bool, cfg: &crate::config::Config, ) -> spacesh_proto::workspace::SurfaceSpec { let mut out = spec.clone(); if resume { if let Some(args) = cfg.resume_args(&spec.command) { out.args = args; } } out } ``` Update the `Cmd::RestartSurface` handler to destructure `resume` and spawn from the resume spec: ```rust Cmd::RestartSurface { surface_id, resume } => { if reg.is_running(&surface_id) { let _ = out.send(ok(id, serde_json::Value::Null)).await; return; // already running } let Some(spec) = reg.surface_spec(&surface_id) else { let _ = out.send(err(id, "NOT_FOUND", "surface")).await; return; }; let spec = resume_spec(&spec, resume, config); let ws_id = reg.workspace_of(&surface_id).unwrap(); let (env, hooks_active) = spawn_env(&surface_id, &spec); match crate::surface::spawn_from_spec(surface_id.clone(), ws_id.clone(), &spec, env, hooks_active, state_tx.clone(), exit_tx.clone(), snapshot_tx.clone()) { Ok(handle) => { spawn_output_bridge(surface_id.clone(), &handle, router_tx.clone()); reg.set_live(handle); reg.set_state(&surface_id, spacesh_proto::SurfaceState::Idle); broadcast_evt(clients, &Envelope::Evt(Evt::SurfaceRestarted { surface_id: surface_id.clone() })); let _ = out.send(ok(id, serde_json::Value::Null)).await; } Err(e) => { let _ = out.send(err(id, "SPAWN_FAILED", &e.to_string())).await; } } } ``` > `config` is the `&mut Config` already in scope in `handle_request`; pass it as `&*config` / `config` to `resume_spec` (which takes `&Config`). Adjust the borrow as the compiler requires (e.g. `resume_spec(&spec, resume, config)` where `config: &mut Config` coerces to `&Config`). Note: the `snapshot_tx.clone()` added to this `spawn_from_spec` call is the same one threaded in Task 6 Step 2 — ensure all four spawn callsites carry it. - [ ] **Step 4: Run tests to verify they pass** Run: `cargo test -p spaceshd resume_spec_swaps_args_when_mapped` Expected: PASS. Then `cargo test -p spaceshd` — all green. - [ ] **Step 5: Commit** ```bash git add crates/spaceshd/src/server.rs git commit -m "feat(daemon): RestartSurface honors resume — swap to resume_args when mapped" ``` --- ## Task 9: Tauri bridge + socketBridge resume arg **Files:** - Modify: `app/src-tauri/src/bridge.rs` - Modify: `app/src/socketBridge.ts` - Test: `cd app && npx tsc --noEmit` - [ ] **Step 1: Update the Tauri command** In `app/src-tauri/src/bridge.rs`, change `restart_surface` to accept and forward `resume`: ```rust #[tauri::command] pub async fn restart_surface(state: BridgeState<'_>, surface_id: String, resume: bool) -> Result { data_of(state.request(Cmd::RestartSurface { surface_id: SurfaceId(surface_id), resume }).await.map_err(|e| e.to_string())?) } ``` (Any other place in `bridge.rs` constructing `Cmd::RestartSurface` must pass `resume`. The version-handshake/attach code does not; only this handler builds it.) - [ ] **Step 2: Update the JS binding and AttachResult** In `app/src/socketBridge.ts`: ```ts export interface AttachResult { snapshot: string; cols: number; rows: number; cursor_row?: number; cursor_col?: number; stopped?: boolean; } export async function restartSurface(surfaceId: string, resume = false): Promise { await invoke("restart_surface", { surfaceId, resume }); } ``` - [ ] **Step 3: Verify types compile** Run: `cd app && npx tsc --noEmit` Expected: PASS (no type errors). Note: existing callers of `restartSurface(id)` remain valid because `resume` defaults to `false`. Also build the Rust side: `cargo check -p spaceshd` and `cargo check --manifest-path app/src-tauri/Cargo.toml` (or `cargo check` in `app/src-tauri`). Expected: clean. - [ ] **Step 4: Commit** ```bash git add app/src-tauri/src/bridge.rs app/src/socketBridge.ts git commit -m "feat(app): plumb resume flag through restart_surface bridge + binding" ``` --- ## Task 10: Stopped overlay — paint last screen + Resume button **Files:** - Modify: `app/src/LayoutEngine.tsx` - Test: `cd app && npx tsc --noEmit` + manual check - [ ] **Step 1: Add a read-only snapshot painter component** In `app/src/LayoutEngine.tsx`, add a small component that fetches the stopped surface's disk snapshot via `attachSurface` and paints it into a dimmed, read-only xterm. Import what is needed at the top of the file: ```tsx import { useEffect, useRef } from "react"; import { Terminal } from "@xterm/xterm"; import { attachSurface } from "./socketBridge"; ``` (Confirm against `TerminalView.tsx` for the exact xterm import path and theme/font options it uses; mirror them so the dimmed preview matches the live terminal's look. Reuse the same `font`/`palette` props already threaded into `Leaf`.) ```tsx function StoppedSnapshot({ surfaceId, font, palette }: { surfaceId: string; font: TermFont; palette: TermPalette }) { const hostRef = useRef(null); useEffect(() => { const host = hostRef.current; if (!host) return; const term = new Terminal({ fontFamily: font.family, fontSize: font.size, theme: palette, cursorBlink: false, disableStdin: true, convertEol: false, scrollback: 0, }); term.open(host); let disposed = false; void attachSurface(surfaceId, () => {}).then((res) => { if (!disposed && res.snapshot) term.write(res.snapshot); }); return () => { disposed = true; term.dispose(); }; }, [surfaceId, font, palette]); return
; } ``` > Use the exact `TermFont`/`TermPalette` types already defined/imported in this file for the `font`/`palette` props (see `Leaf`'s props). If `TerminalView` wraps `Terminal` construction in a helper, prefer reusing that helper instead of constructing `Terminal` directly. - [ ] **Step 2: Render the snapshot + Resume button in the stopped branch** Replace the `if (running[id] === false) { ... }` block in `Leaf` with one that layers the snapshot behind centered controls and adds a Resume button. Keep the existing `RotateCw`/`Minimize2` imports; add `Play` from lucide-react at the file's icon import. ```tsx if (running[id] === false) { return card(
Stopped
{zoomed === id && ( )}
); } ``` - [ ] **Step 3: Verify types compile** Run: `cd app && npx tsc --noEmit` Expected: PASS. - [ ] **Step 4: Manual verification** Build and run (`make reinstall` then launch, or `make dev`). Steps: 1. Open a workspace, add a `claude` (or shell) panel, let it print output. 2. Quit the GUI and `pkill -x spaceshd` (simulate reboot), then relaunch the app. 3. The panel shows its **last screen dimmed** with **Resume** + **Restart fresh**. 4. Click **Resume** → the agent relaunches (for claude/codex with its continue flag) and the live terminal returns. Confirm keypress→echo still feels instant and no prompt-duplication regression on focus switches. - [ ] **Step 5: Commit** ```bash git add app/src/LayoutEngine.tsx git commit -m "feat(app): stopped panel paints last screen + Resume/Restart fresh controls" ``` --- ## Final verification - [ ] Run the full suite: ```bash cargo test cd app && npx tsc --noEmit ``` Expected: all Rust tests pass; tsc clean. - [ ] Dispatch a final code review over the whole branch, then use **superpowers:finishing-a-development-branch** to merge. ## Notes / gotchas - **Snapshot tick blocks the router briefly** while it awaits each live actor's reply. Visible-screen snapshots are tiny and the await is per-surface and sequential; with a 5s cadence this is negligible. Do not move the disk write into the router — it stays in the writer task. - **Resume is best-effort.** A new process is started; the literal in-flight process cannot survive a daemon death. For agents without a resume mapping, Resume == Restart fresh (original args). - **`actor_id` move in `run_actor`:** the final-snapshot send needs `actor_id` before `exit_tx` consumes it — clone as the compiler directs. - **Do not silently skip any `serve(...)` test callsite** (Task 6 Step 5): the compiler enumerates them; fix every one.