Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
49 KiB
Session Persistence (resurrect + resume) Implementation Plan
For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (
- [ ]) syntax for tracking.
Goal: After a daemon restart (reboot / battery / kill -9) the user can bring each panel back: it shows its last on-screen state and offers a one-click Resume that respawns the agent with its session-continue flag (e.g. claude --continue).
Architecture: The daemon already persists structure (state.json) and already shows stopped panels with a restart overlay; RestartSurface already respawns a stopped surface from its spec. This plan adds (1) periodic on-disk snapshots of each surface's visible screen, (2) a [resume] config map producing resume args, (3) a resume flag on RestartSurface, and (4) painting the saved screen behind the overlay plus a Resume button. We reuse spacesh_core::snapshot::snapshot_ansi (the live-reattach serializer) for the on-disk snapshot.
Tech Stack: Rust (tokio actors, serde, alacritty_terminal grid), Tauri 2 bridge, React/TS + xterm.js.
Spec: docs/superpowers/specs/2026-06-15-session-persistence-design.md
Orientation (read before starting)
Key existing code this plan builds on:
crates/spacesh-core/src/snapshot.rs—Snapshot { ansi, cols, rows, cursor_row, cursor_col }(derivesSerializeonly) andsnapshot_ansi(&GridSurface) -> Snapshot.crates/spaceshd/src/state_store.rs—JsonStateStorepattern: atomic write (temp →sync_all→ rename), corrupt-file tolerance. Mirror this for snapshots.crates/spaceshd/src/surface.rs— surface actor.spawn_from_spec→spawn_surface_deferred→run_actor; eagerspawn_surfacefor tests.SurfaceMsgenum.run_actorownsgrid: GridSurfaceand exits viaexit_tx.send((id, code))afterpty.wait().crates/spaceshd/src/server.rs—serve(socket, store, event_store), theroutersingle-task loop overServerMsg,handle_request,RestartSurfacehandler, the stopped-Attachbranch, and ~12serve(...)callsites in#[cfg(test)].crates/spaceshd/src/config.rs—Configwith#[serde(default)]sub-tables.crates/spacesh-proto/src/message.rs—Cmd::RestartSurface { surface_id }.app/src/LayoutEngine.tsx—Leafrenders therunning[id] === falseoverlay ("Process exited" + Restart button).app/src/socketBridge.ts—restartSurface,AttachResult.app/src-tauri/src/bridge.rs—restart_surface,attachinvoke handlers.
Build/test commands: cargo test -p spacesh-core, cargo test -p spacesh-proto, cargo test -p spaceshd, and cd app && npx tsc --noEmit.
Task 1: Snapshot gains Deserialize
Files:
-
Modify:
crates/spacesh-core/src/snapshot.rs -
Test: same file (
#[cfg(test)]module) -
Step 1: Write the failing test
Add to the tests module in crates/spacesh-core/src/snapshot.rs:
#[test]
fn snapshot_round_trips_through_json() {
let mut g = GridSurface::new(20, 4);
g.feed(b"hello");
let snap = snapshot_ansi(&g);
let json = serde_json::to_string(&snap).unwrap();
let back: Snapshot = serde_json::from_str(&json).unwrap();
assert_eq!(back.ansi, snap.ansi);
assert_eq!((back.cols, back.rows), (snap.cols, snap.rows));
assert_eq!((back.cursor_row, back.cursor_col), (snap.cursor_row, snap.cursor_col));
}
- Step 2: Run test to verify it fails
Run: cargo test -p spacesh-core snapshot_round_trips_through_json
Expected: FAIL — Snapshot does not implement Deserialize (compile error the trait bound Snapshot: Deserialize<'_> is not satisfied).
- Step 3: Add the derive
In crates/spacesh-core/src/snapshot.rs, change the Snapshot derive and the serde import:
use serde::{Deserialize, Serialize};
/// Serializable snapshot returned by `attach` and persisted to disk.
#[derive(Debug, Clone, PartialEq, Serialize, Deserialize)]
pub struct Snapshot {
/// ANSI byte dump suitable for `xterm.write()`.
pub ansi: String,
pub cols: u16,
pub rows: u16,
/// 1-based cursor position.
pub cursor_row: u16,
pub cursor_col: u16,
}
(PartialEq is added so tests can compare snapshots directly.)
- Step 4: Run test to verify it passes
Run: cargo test -p spacesh-core snapshot_round_trips_through_json
Expected: PASS. Also run cargo test -p spacesh-core — all green.
- Step 5: Commit
git add crates/spacesh-core/src/snapshot.rs
git commit -m "feat(core): Snapshot derives Deserialize + PartialEq for disk persistence"
Task 2: snapshot_store — per-surface disk store
Files:
-
Create:
crates/spaceshd/src/snapshot_store.rs -
Modify:
crates/spaceshd/src/main.rs(addmod snapshot_store;) -
Test: in the new file's
#[cfg(test)]module -
Step 1: Register the module
In crates/spaceshd/src/main.rs, add to the module list (keep alphabetical near state_store):
mod snapshot_store;
- Step 2: Write the failing test
Create crates/spaceshd/src/snapshot_store.rs with the test module first (it will not compile until Step 3 adds the types — that is the failing state):
use std::path::PathBuf;
use spacesh_core::snapshot::Snapshot;
use spacesh_proto::SurfaceId;
/// Stores one visible-screen snapshot per surface as `<dir>/<surface_id>.json`.
pub trait SnapshotStore: Send + Sync {
fn save(&self, sid: &SurfaceId, snap: &Snapshot);
fn load(&self, sid: &SurfaceId) -> Option<Snapshot>;
fn remove(&self, sid: &SurfaceId);
}
/// Writer command: persist or delete a surface's snapshot. Shared by the
/// router ticker, the close/remove paths, and each actor's on-exit dump, so a
/// single channel type flows everywhere.
pub enum SnapshotMsg {
Save(SurfaceId, Snapshot),
Remove(SurfaceId),
}
/// A no-op store for tests and contexts that do not persist snapshots.
pub struct NullSnapshotStore;
impl SnapshotStore for NullSnapshotStore {
fn save(&self, _sid: &SurfaceId, _snap: &Snapshot) {}
fn load(&self, _sid: &SurfaceId) -> Option<Snapshot> { None }
fn remove(&self, _sid: &SurfaceId) {}
}
/// JSON file store. Filenames are the surface id (e.g. `s_1f.json`); ids are
/// `^[a-z]_[0-9a-f]+$` so they are always safe path components.
pub struct JsonSnapshotStore {
dir: PathBuf,
}
impl JsonSnapshotStore {
pub fn new(dir: PathBuf) -> Self {
let _ = std::fs::create_dir_all(&dir);
Self { dir }
}
fn path(&self, sid: &SurfaceId) -> PathBuf {
self.dir.join(format!("{}.json", sid.0))
}
}
impl SnapshotStore for JsonSnapshotStore {
fn save(&self, sid: &SurfaceId, snap: &Snapshot) {
let path = self.path(sid);
let tmp = path.with_extension("json.tmp");
let Ok(bytes) = serde_json::to_vec(snap) else { return };
if std::fs::write(&tmp, &bytes).is_err() { return; }
if let Ok(f) = std::fs::File::open(&tmp) { let _ = f.sync_all(); }
let _ = std::fs::rename(&tmp, &path);
}
fn load(&self, sid: &SurfaceId) -> Option<Snapshot> {
let bytes = std::fs::read(self.path(sid)).ok()?;
serde_json::from_slice(&bytes).ok()
}
fn remove(&self, sid: &SurfaceId) {
let _ = std::fs::remove_file(self.path(sid));
}
}
#[cfg(test)]
mod tests {
use super::*;
fn tmp_dir(name: &str) -> PathBuf {
let n = std::time::SystemTime::now().duration_since(std::time::UNIX_EPOCH).unwrap().as_nanos();
let p = std::env::temp_dir().join(format!("spacesh-snap-{name}-{n}"));
std::fs::create_dir_all(&p).unwrap();
p
}
fn sample() -> Snapshot {
Snapshot { ansi: "\u{1b}[mhello".into(), cols: 80, rows: 24, cursor_row: 1, cursor_col: 6 }
}
#[test]
fn save_then_load_round_trips() {
let dir = tmp_dir("roundtrip");
let store = JsonSnapshotStore::new(dir.clone());
let sid = SurfaceId("s_1".into());
store.save(&sid, &sample());
assert_eq!(store.load(&sid), Some(sample()));
let _ = std::fs::remove_dir_all(dir);
}
#[test]
fn missing_loads_none() {
let store = JsonSnapshotStore::new(tmp_dir("missing"));
assert_eq!(store.load(&SurfaceId("s_none".into())), None);
}
#[test]
fn corrupt_loads_none() {
let dir = tmp_dir("corrupt");
let store = JsonSnapshotStore::new(dir.clone());
let sid = SurfaceId("s_2".into());
std::fs::write(dir.join("s_2.json"), b"{ not json").unwrap();
assert_eq!(store.load(&sid), None);
let _ = std::fs::remove_dir_all(dir);
}
#[test]
fn remove_deletes_file() {
let dir = tmp_dir("remove");
let store = JsonSnapshotStore::new(dir.clone());
let sid = SurfaceId("s_3".into());
store.save(&sid, &sample());
assert!(store.load(&sid).is_some());
store.remove(&sid);
assert_eq!(store.load(&sid), None);
let _ = std::fs::remove_dir_all(dir);
}
#[test]
fn null_store_is_inert() {
let store = NullSnapshotStore;
let sid = SurfaceId("s_4".into());
store.save(&sid, &sample());
assert_eq!(store.load(&sid), None);
store.remove(&sid);
}
}
- Step 3: Run tests to verify they pass
The module body above already contains the implementation, so this task writes test + impl together (the store is pure I/O with no logic worth a red-then-green split beyond compilation).
Run: cargo test -p spaceshd snapshot_store
Expected: PASS — 5 tests (save_then_load_round_trips, missing_loads_none, corrupt_loads_none, remove_deletes_file, null_store_is_inert).
- Step 4: Commit
git add crates/spaceshd/src/snapshot_store.rs crates/spaceshd/src/main.rs
git commit -m "feat(daemon): per-surface JSON snapshot store (atomic write, corrupt-tolerant)"
Task 3: Resume config + snapshot interval
Files:
-
Modify:
crates/spaceshd/src/config.rs -
Test: same file (
#[cfg(test)]module) -
Step 1: Write the failing test
Add to the tests module in crates/spaceshd/src/config.rs:
#[test]
fn resume_args_user_then_default_then_none() {
let mut c = Config::default();
// built-in defaults present without any config
assert_eq!(c.resume_args("claude").as_deref(), Some(&["--continue".to_string()][..]));
assert_eq!(c.resume_args("codex").as_deref(), Some(&["resume".to_string()][..]));
// a path is reduced to its basename before lookup
assert_eq!(c.resume_args("/usr/local/bin/claude").as_deref(), Some(&["--continue".to_string()][..]));
// unknown command → None
assert_eq!(c.resume_args("bash"), None);
// user override wins over the default
c.resume.commands.insert("claude".into(), vec!["--resume".into(), "last".into()]);
assert_eq!(c.resume_args("claude"), Some(vec!["--resume".into(), "last".into()]));
}
#[test]
fn snapshot_interval_defaults_to_5s() {
let c = Config::default();
assert_eq!(c.snapshot_interval_secs(), 5);
}
#[test]
fn parses_resume_table_and_interval() {
let dir = std::env::temp_dir().join(format!("spacesh-cfg-resume-{}", std::process::id()));
std::fs::create_dir_all(&dir).unwrap();
let path = dir.join("config.toml");
std::fs::write(&path,
"snapshot_interval_secs = 10\n[resume.commands]\ngemini = [\"--resume\"]\n").unwrap();
let c = Config::from_path(&path);
assert_eq!(c.snapshot_interval_secs(), 10);
assert_eq!(c.resume_args("gemini"), Some(vec!["--resume".into()]));
let _ = std::fs::remove_file(&path);
}
- Step 2: Run tests to verify they fail
Run: cargo test -p spaceshd resume_args_user_then_default_then_none
Expected: FAIL — compile error: no field resume, no method resume_args/snapshot_interval_secs.
- Step 3: Implement config additions
In crates/spaceshd/src/config.rs, add the struct and a default table, and extend Config:
/// Built-in resume args for known agents, used when config has no override.
/// (command basename, resume args)
const DEFAULT_RESUME: &[(&str, &[&str])] = &[
("claude", &["--continue"]),
("codex", &["resume"]),
];
#[derive(Debug, Clone, Default, Deserialize, Serialize)]
pub struct ResumeConfig {
/// command basename -> args that continue its previous session.
#[serde(default)]
pub commands: std::collections::HashMap<String, Vec<String>>,
}
Add the fields to Config:
#[derive(Debug, Clone, Default, Deserialize, Serialize)]
pub struct Config {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub default_shell: Option<String>,
#[serde(default)]
pub terminal: TerminalConfig,
#[serde(default)]
pub appearance: AppearanceConfig,
#[serde(default)]
pub resume: ResumeConfig,
/// How often (seconds) the daemon dumps changed grids to disk.
#[serde(default, skip_serializing_if = "Option::is_none")]
pub snapshot_interval_secs: Option<u64>,
}
Add the resolver methods in the impl Config block:
/// Resume args for a command, by basename: user map → built-in default → None.
pub fn resume_args(&self, command: &str) -> Option<Vec<String>> {
let base = std::path::Path::new(command)
.file_name()
.map(|s| s.to_string_lossy().to_string())
.unwrap_or_else(|| command.to_string());
if let Some(args) = self.resume.commands.get(&base) {
return Some(args.clone());
}
DEFAULT_RESUME.iter()
.find(|(name, _)| *name == base)
.map(|(_, args)| args.iter().map(|s| s.to_string()).collect())
}
/// Snapshot dump cadence in seconds (config → default 5, clamped to [1, 3600]).
pub fn snapshot_interval_secs(&self) -> u64 {
self.snapshot_interval_secs.unwrap_or(5).clamp(1, 3600)
}
- Step 4: Run tests to verify they pass
Run: cargo test -p spaceshd config
Expected: PASS — including the three new tests and the existing config tests.
- Step 5: Commit
git add crates/spaceshd/src/config.rs
git commit -m "feat(daemon): [resume] config map + snapshot_interval_secs with built-in defaults"
Task 4: Actor Snapshot message + dirty flag + on-exit dump
Files:
- Modify:
crates/spaceshd/src/surface.rs - Test: same file (
#[cfg(test)]module)
This adds a snapshot channel threaded through every spawn entry point. The
channel carries SnapshotMsg (defined in Task 2) to the writer (Task 5); here
the actor only ever sends SnapshotMsg::Save(id, snap) on exit and answers
on-demand SurfaceMsg::Snapshot requests. Add the import at the top of
surface.rs: use crate::snapshot_store::SnapshotMsg;.
- Step 1: Write the failing tests
Add to the tests module in crates/spaceshd/src/surface.rs. Note the existing test helper spawn_surface(...) signature gains a trailing snapshot_tx; these tests use it.
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn snapshot_msg_returns_grid_and_tracks_dirty() {
let _serial = crate::test_support::serial();
let pty = PtyHandle::spawn(spec("printf DIRTYME; sleep 0.4")).unwrap();
let (state_tx, _s) = mpsc::unbounded_channel();
let (exit_tx, _e) = mpsc::unbounded_channel();
let (snap_tx, _snap_rx) = mpsc::unbounded_channel();
let handle = spawn_surface(SurfaceId("s_1".into()), WorkspaceId("w_1".into()), pty, 80, 24, false, state_tx, exit_tx, snap_tx);
// Give the child time to print.
tokio::time::sleep(Duration::from_millis(150)).await;
let (reply_tx, reply_rx) = oneshot::channel();
handle.tx.send(SurfaceMsg::Snapshot { reply: reply_tx }).await.unwrap();
let (snap, dirty) = reply_rx.await.unwrap();
assert!(snap.ansi.contains("DIRTYME"), "snapshot: {:?}", snap.ansi);
assert!(dirty, "first snapshot after output should be dirty");
// Immediately snapshot again with no new output → not dirty.
let (reply_tx, reply_rx) = oneshot::channel();
handle.tx.send(SurfaceMsg::Snapshot { reply: reply_tx }).await.unwrap();
let (_snap2, dirty2) = reply_rx.await.unwrap();
assert!(!dirty2, "second snapshot with no new output should be clean");
}
#[tokio::test(flavor = "multi_thread", worker_threads = 2)]
async fn final_snapshot_sent_on_exit() {
let _serial = crate::test_support::serial();
let pty = PtyHandle::spawn(spec("printf BYE")).unwrap(); // exits immediately
let (state_tx, _s) = mpsc::unbounded_channel();
let (exit_tx, _e) = mpsc::unbounded_channel();
let (snap_tx, mut snap_rx) = mpsc::unbounded_channel();
let _handle = spawn_surface(SurfaceId("s_x".into()), WorkspaceId("w_1".into()), pty, 80, 24, false, state_tx, exit_tx, snap_tx);
let msg = tokio::time::timeout(Duration::from_secs(2), snap_rx.recv()).await.unwrap().unwrap();
match msg {
crate::snapshot_store::SnapshotMsg::Save(sid, snap) => {
assert_eq!(sid.0, "s_x");
assert!(snap.ansi.contains("BYE"), "final snapshot: {:?}", snap.ansi);
}
_ => panic!("expected a Save message on exit"),
}
}
- Step 2: Run tests to verify they fail
Run: cargo test -p spaceshd snapshot_msg_returns_grid_and_tracks_dirty
Expected: FAIL — compile error: SurfaceMsg::Snapshot variant missing and spawn_surface takes too few arguments.
- Step 3: Add the message variant and snapshot channel
In crates/spaceshd/src/surface.rs:
Add the variant to SurfaceMsg:
pub enum SurfaceMsg {
Input(Vec<u8>),
Resize { cols: u16, rows: u16 },
Attach { reply: oneshot::Sender<broadcast::Receiver<Vec<u8>>> },
/// Attach with snapshot: subscribe AND capture the grid in one actor turn.
AttachSnapshot { reply: oneshot::Sender<(Snapshot, broadcast::Receiver<Vec<u8>>)> },
/// On-demand snapshot without subscribing; bool = dirty since last snapshot.
Snapshot { reply: oneshot::Sender<(Snapshot, bool)> },
Close,
}
Thread a snapshot_tx: mpsc::UnboundedSender<SnapshotMsg> parameter through spawn_from_spec, spawn_surface, spawn_surface_deferred, and run_actor. For each, add the parameter (last position) and pass it down.
spawn_from_spec signature + body:
#[allow(clippy::too_many_arguments)]
pub fn spawn_from_spec(
id: SurfaceId,
workspace_id: WorkspaceId,
spec: &SurfaceSpec,
extra_env: Vec<(String, String)>,
hooks_active: bool,
state_tx: mpsc::UnboundedSender<(SurfaceId, SurfaceState)>,
exit_tx: mpsc::UnboundedSender<(SurfaceId, i32)>,
snapshot_tx: mpsc::UnboundedSender<SnapshotMsg>,
) -> std::io::Result<SurfaceHandle> {
let mut env = vec![("SPACESH_SURFACE_ID".to_string(), id.0.clone())];
env.extend(extra_env);
let spawn_spec = SpawnSpec {
command: spec.command.clone(),
args: spec.args.clone(),
cwd: std::path::PathBuf::from(&spec.cwd),
cols: spec.cols,
rows: spec.rows,
env,
};
Ok(spawn_surface_deferred(id, workspace_id, spawn_spec, spec.cols, spec.rows, hooks_active, state_tx, exit_tx, snapshot_tx))
}
spawn_surface (eager, test path):
#[allow(clippy::too_many_arguments)]
pub fn spawn_surface(
id: SurfaceId,
workspace_id: WorkspaceId,
pty: PtyHandle,
cols: u16,
rows: u16,
hooks_active: bool,
state_tx: mpsc::UnboundedSender<(SurfaceId, SurfaceState)>,
exit_tx: mpsc::UnboundedSender<(SurfaceId, i32)>,
snapshot_tx: mpsc::UnboundedSender<SnapshotMsg>,
) -> SurfaceHandle {
let (tx, rx) = mpsc::channel::<SurfaceMsg>(64);
let (bcast, _) = broadcast::channel::<Vec<u8>>(BROADCAST_CAP);
tokio::spawn(run_actor(id.clone(), pty, cols, rows, hooks_active, bcast, rx, state_tx, exit_tx, Vec::new(), snapshot_tx));
SurfaceHandle { id, workspace_id, tx }
}
spawn_surface_deferred: add snapshot_tx: mpsc::UnboundedSender<SnapshotMsg> as the final parameter; inside the pre-spawn loop, answer the new message with the empty grid; and pass snapshot_tx into run_actor. In the pre-spawn select!, add:
Some(SurfaceMsg::Snapshot { reply }) => {
let snap = snapshot_ansi(&GridSurface::new(cols, rows));
let _ = reply.send((snap, false));
}
and change the spawn call:
Ok(pty) => run_actor(actor_id, pty, cols, rows, hooks_active, bcast, rx, state_tx, exit_tx, prebuf, snapshot_tx).await,
run_actor: add snapshot_tx: mpsc::UnboundedSender<SnapshotMsg> as the final parameter. Introduce a dirty flag, set it when output arrives, clear it on a snapshot, answer the new message, and send the final snapshot on exit. The relevant edits inside run_actor's grid block:
Declare alongside the other loop locals:
let mut dirty = false;
In the SurfaceMsg::AttachSnapshot arm, after building snap, also clear dirty (the screen has just been handed out fresh):
Some(SurfaceMsg::AttachSnapshot { reply }) => {
let sub = bcast.subscribe();
let snap = snapshot_ansi(&grid);
dirty = false;
let _ = reply.send((snap, sub));
}
Add the new arm next to it:
Some(SurfaceMsg::Snapshot { reply }) => {
let snap = snapshot_ansi(&grid);
let was_dirty = dirty;
dirty = false;
let _ = reply.send((snap, was_dirty));
}
In the PTY output arm, when bytes arrive (the Some(bytes) => branch), set dirty = true; after extending pending:
Some(bytes) => {
pending.extend_from_slice(&bytes);
dirty = true;
if flush_deadline.is_none() {
flush_deadline = Some(Instant::now() + FLUSH_INTERVAL);
}
if pending.len() >= FLUSH_BYTES {
flush(&mut pending, &mut grid, &mut osc, &mut deterministic, &mut last_state, &detect_id, &bcast, &state_tx);
flush_deadline = None;
}
}
Replace the exit tail of the block (currently let code = pty.wait(); let _ = exit_tx.send((actor_id, code));) with a final snapshot first:
let final_snap = snapshot_ansi(&grid);
let _ = snapshot_tx.send(SnapshotMsg::Save(actor_id.clone(), final_snap));
let code = pty.wait();
let _ = exit_tx.send((actor_id, code));
}
}
Note:
actor_idis currently moved intodetect_id/used once; clone as needed so it is available for both the snapshot send andexit_tx. If the compiler reports a move, change the earlierlet detect_id = id;/let actor_id = id.clone();setup so bothactor_id(cloneable) anddetect_idexist, and useactor_id.clone()for the snapshot send.
Update the existing in-file tests attach_receives_output and attach_snapshot_reflects_prior_output (and any other spawn_surface(...) callers in this file's tests) to pass a snapshot sender. Add let (snap_tx, _snap_rx) = mpsc::unbounded_channel(); before each spawn_surface call and append , snap_tx to the call.
- Step 4: Run tests to verify they pass
Run: cargo test -p spaceshd -- surface
Expected: PASS — the two new tests plus the pre-existing surface tests (now passing the extra arg).
- Step 5: Commit
git add crates/spaceshd/src/surface.rs
git commit -m "feat(daemon): actor Snapshot message + dirty tracking + final snapshot on exit"
Task 5: Snapshot writer task
Files:
- Modify:
crates/spaceshd/src/snapshot_store.rs - Test: same file (
#[cfg(test)]module)
The writer owns the store and serializes all disk writes off the router/actor hot paths. It accepts saves and removes over one channel.
- Step 1: Write the failing test
Add to crates/spaceshd/src/snapshot_store.rs (SnapshotMsg was already defined in Task 2; this task adds only the writer + its test). The test needs tokio:
/// Spawn the writer task; returns the sender used by the router and actors.
pub fn spawn_writer(store: std::sync::Arc<dyn SnapshotStore>) -> tokio::sync::mpsc::UnboundedSender<SnapshotMsg> {
let (tx, mut rx) = tokio::sync::mpsc::unbounded_channel::<SnapshotMsg>();
tokio::spawn(async move {
while let Some(msg) = rx.recv().await {
match msg {
SnapshotMsg::Save(sid, snap) => store.save(&sid, &snap),
SnapshotMsg::Remove(sid) => store.remove(&sid),
}
}
});
tx
}
Test:
#[tokio::test]
async fn writer_saves_and_removes() {
let dir = tmp_dir("writer");
let store: std::sync::Arc<dyn SnapshotStore> = std::sync::Arc::new(JsonSnapshotStore::new(dir.clone()));
let tx = spawn_writer(store.clone());
let sid = SurfaceId("s_w".into());
tx.send(SnapshotMsg::Save(sid.clone(), sample())).unwrap();
// Poll until the writer has flushed (bounded).
let mut saved = None;
for _ in 0..50 {
if let Some(s) = store.load(&sid) { saved = Some(s); break; }
tokio::time::sleep(std::time::Duration::from_millis(10)).await;
}
assert_eq!(saved, Some(sample()));
tx.send(SnapshotMsg::Remove(sid.clone())).unwrap();
let mut gone = false;
for _ in 0..50 {
if store.load(&sid).is_none() { gone = true; break; }
tokio::time::sleep(std::time::Duration::from_millis(10)).await;
}
assert!(gone, "writer should have removed the snapshot file");
let _ = std::fs::remove_dir_all(dir);
}
- Step 2: Run test to verify it passes
Implementation is included above (the writer is a thin loop). Run:
cargo test -p spaceshd writer_saves_and_removes
Expected: PASS.
- Step 3: Commit
git add crates/spaceshd/src/snapshot_store.rs
git commit -m "feat(daemon): snapshot writer task (Save/Remove over one channel)"
Task 6: Server wiring — store param, ticker, stopped-Attach reads disk, remove on close
Files:
-
Modify:
crates/spaceshd/src/server.rs -
Modify:
crates/spaceshd/src/main.rs -
Test:
crates/spaceshd/src/server.rs(#[cfg(test)]) -
Step 1: Thread the snapshot store into
serveandrouter
In crates/spaceshd/src/server.rs:
Add imports near the other use crate::... lines:
use crate::snapshot_store::{SnapshotStore, SnapshotMsg, spawn_writer};
Change serve to accept the store, build the writer + ticker, and pass both the writer sender and an Arc clone (for reads) into router:
pub async fn serve(
socket: &Path,
store: Arc<dyn StateStore>,
event_store: Arc<dyn EventStore>,
snapshot_store: Arc<dyn SnapshotStore>,
) -> Result<()> {
let listener = UnixListener::bind(socket)?;
let (router_tx, router_rx) = mpsc::channel::<ServerMsg>(256);
// ... existing exit_tx / state_tx bridges unchanged ...
let snapshot_tx = spawn_writer(snapshot_store.clone());
// Periodic snapshot tick → router.
let tick_router = router_tx.clone();
let interval_secs = crate::config::Config::load().snapshot_interval_secs();
tokio::spawn(async move {
let mut tick = tokio::time::interval(Duration::from_secs(interval_secs));
tick.tick().await; // consume the immediate first tick
loop {
tick.tick().await;
if tick_router.send(ServerMsg::SnapshotTick).await.is_err() { break; }
}
});
let persister = persist::spawn(store.clone(), Duration::from_millis(500));
let initial = store.load().unwrap_or_default();
let event_persister = event_store::spawn(event_store.clone(), Duration::from_millis(500));
let event_initial = event_store.load().unwrap_or_default();
let started_at_ms = now_millis();
let shutdown = tokio::spawn(router(
router_rx, router_tx.clone(), exit_tx, state_tx,
persister, initial, event_persister, event_initial,
started_at_ms, snapshot_store, snapshot_tx,
));
// ... existing accept loop unchanged ...
}
Add SnapshotTick to the ServerMsg enum (around line 23):
enum ServerMsg {
// ... existing variants ...
SnapshotTick,
}
Change router's signature to take the two new params (final positions):
async fn router(
mut rx: mpsc::Receiver<ServerMsg>,
router_tx: mpsc::Sender<ServerMsg>,
exit_tx: mpsc::UnboundedSender<(SurfaceId, i32)>,
state_tx: mpsc::UnboundedSender<(SurfaceId, SurfaceState)>,
persister: Persister,
initial: crate::state_store::PersistState,
event_persister: EventPersister,
event_initial: crate::event_log::EventLogState,
started_at_ms: u64,
snapshot_store: Arc<dyn SnapshotStore>,
snapshot_tx: mpsc::UnboundedSender<SnapshotMsg>,
) {
- Step 2: Handle
SnapshotTickand thread the snapshot sender to spawns
In the router match loop, add the tick arm. It snapshots each live surface and forwards dirty ones to the writer:
ServerMsg::SnapshotTick => {
let ids: Vec<SurfaceId> = reg.live_ids();
for sid in ids {
let Some(handle) = reg.live(&sid) else { continue };
let (reply_tx, reply_rx) = oneshot::channel();
if handle.tx.send(SurfaceMsg::Snapshot { reply: reply_tx }).await.is_err() { continue; }
if let Ok((snap, dirty)) = reply_rx.await {
if dirty {
let _ = snapshot_tx.send(SnapshotMsg::Save(sid.clone(), snap));
}
}
}
}
This needs a live_ids() accessor on Registry. In crates/spaceshd/src/registry.rs add:
/// Ids of all currently-live surfaces.
pub fn live_ids(&self) -> Vec<SurfaceId> {
self.live.keys().cloned().collect()
}
Pass snapshot_tx.clone() into every spawn_from_spec(...) call inside handle_request. There are four callsites (NewSurface, SplitSurface, ApplyPreset, RestartSurface). Each currently ends ..., state_tx.clone(), exit_tx.clone()); change to ..., state_tx.clone(), exit_tx.clone(), snapshot_tx.clone()). To make snapshot_tx reachable inside handle_request, add it as a parameter to handle_request and pass it from the ServerMsg::Request arm:
ServerMsg::Request { id, cmd, client, out } => {
handle_request(id, cmd, client, out, &mut reg, &mut subs, &clients,
&router_tx, &exit_tx, &state_tx, &persister,
&mut event_log, &event_persister, started_at_ms, &mut config,
&snapshot_store, &snapshot_tx).await;
}
and in handle_request's signature add the two trailing params:
snapshot_store: &Arc<dyn SnapshotStore>,
snapshot_tx: &mpsc::UnboundedSender<SnapshotMsg>,
- Step 3: Stopped-
Attachreturns the disk snapshot; close/remove deletes it
In the Cmd::Attach handler, replace the stopped-panel branch (the else that returns the empty snapshot) with a disk read:
} else {
// stopped panel: no live stream. Paint the last on-disk screen if we have one.
match snapshot_store.load(&surface_id) {
Some(snap) => {
let _ = out.send(ok(id, serde_json::json!({
"snapshot": snap.ansi, "cols": snap.cols, "rows": snap.rows,
"cursor_row": snap.cursor_row, "cursor_col": snap.cursor_col, "stopped": true,
}))).await;
}
None => {
let _ = out.send(ok(id, serde_json::json!({ "snapshot": "", "cols": 0, "rows": 0, "stopped": true }))).await;
}
}
}
In the Cmd::Close handler and Cmd::CloseWorkspace handler, after the surface(s) are removed, drop their snapshot files. For Close { surface_id } add, right after reg.remove_surface(&surface_id) (or wherever the removal happens):
let _ = snapshot_tx.send(SnapshotMsg::Remove(surface_id.clone()));
For CloseWorkspace { workspace_id }, the handler already collects let ids = reg.close_workspace(&workspace_id);. After the existing cleanup loop, add:
for sid in &ids { let _ = snapshot_tx.send(SnapshotMsg::Remove(sid.clone())); }
- Step 4: Update
main.rsto build and pass the store
In crates/spaceshd/src/main.rs, in run_daemon, after the event store is built:
let snapshots_dir = lifecycle::spacesh_dir()?.join("snapshots");
let snapshot_store: std::sync::Arc<dyn snapshot_store::SnapshotStore> =
std::sync::Arc::new(snapshot_store::JsonSnapshotStore::new(snapshots_dir));
eprintln!("spaceshd listening on {}", sock.display());
server::serve(&sock, store, event_store, snapshot_store).await
- Step 5: Fix all
serve(...)test callsites
In crates/spaceshd/src/server.rs's #[cfg(test)] module there are ~12 calls of the form serve(&sockX, store, event_store) (and ..._b variants). Append a NullSnapshotStore argument to each. Add this import inside the test module:
use crate::snapshot_store::NullSnapshotStore;
and change each call, e.g.:
tokio::spawn(async move {
let _ = serve(&sock_for_task, store2, event_store, std::sync::Arc::new(NullSnapshotStore)).await;
});
Apply the same , std::sync::Arc::new(NullSnapshotStore) insertion before .await to every serve(...) call in the test module (~12 sites, including the _b second-daemon ones). Compilation will fail until all are updated — use the compiler errors as the checklist.
- Step 6: Write the stopped-Attach integration test
Add a new test in the server.rs test module. It starts a daemon with a real JsonSnapshotStore over a temp dir, opens a workspace + surface, lets it print, forces a snapshot tick by waiting (or by closing the surface so the on-exit final snapshot lands), then re-attaches a fresh client and asserts the disk snapshot comes back for the stopped surface.
#[tokio::test(flavor = "multi_thread", worker_threads = 4)]
async fn stopped_attach_returns_disk_snapshot() {
let _serial = crate::test_support::serial();
let dir = unique_tmp_dir("stopped-snap"); // use the module's existing temp-dir helper
let sock = dir.join("sock");
let store: std::sync::Arc<dyn crate::state_store::StateStore> =
std::sync::Arc::new(crate::state_store::JsonStateStore::new(dir.join("state.json")));
let event_store: std::sync::Arc<dyn crate::event_store::EventStore> =
std::sync::Arc::new(crate::event_store::JsonEventStore::new(dir.join("events.json")));
let snap_store: std::sync::Arc<dyn crate::snapshot_store::SnapshotStore> =
std::sync::Arc::new(crate::snapshot_store::JsonSnapshotStore::new(dir.join("snapshots")));
let sock2 = sock.clone();
tokio::spawn(async move { let _ = serve(&sock2, store, event_store, snap_store).await; });
wait_for_socket(&sock).await; // module helper
let mut c = connect(&sock).await; // module helper
let ws = open_workspace(&mut c, dir.to_str().unwrap()).await; // adapt to existing helpers
let sid = new_surface(&mut c, &ws, Some("/bin/sh"), vec!["-c".into(), "printf SNAPDISK; sleep 0.2".into()]).await;
// Let it print and exit; the actor sends a final snapshot on exit.
tokio::time::sleep(Duration::from_millis(500)).await;
// Fresh client attaches to the now-stopped surface.
let mut c2 = connect(&sock).await;
let r = req(&mut c2, 99, Cmd::Attach { surface_id: spacesh_proto::SurfaceId(sid.clone()) }).await;
let data = res_data(&r);
assert_eq!(data["stopped"], serde_json::json!(true));
assert!(data["snapshot"].as_str().unwrap().contains("SNAPDISK"), "snapshot: {:?}", data["snapshot"]);
let _ = std::fs::remove_dir_all(dir);
}
Adapt the helper calls (
unique_tmp_dir,wait_for_socket,connect,open_workspace/new_surface,req,res_data) to the exact helpers already used by the neighbouring tests (seereattach_returns_snapshot_with_prior_outputfor the established pattern). The assertions are the contract:stopped == trueand the ANSI contains the printed marker.
- Step 7: Run tests
Run: cargo test -p spaceshd
Expected: PASS — all daemon tests including the new stopped_attach_returns_disk_snapshot. Watch for any missed serve(...) callsite (compile error) and fix.
- Step 8: Commit
git add crates/spaceshd/src/server.rs crates/spaceshd/src/main.rs crates/spaceshd/src/registry.rs
git commit -m "feat(daemon): periodic snapshot ticker + stopped-attach reads disk snapshot + cleanup on close"
Task 7: Protocol — RestartSurface gains resume
Files:
-
Modify:
crates/spacesh-proto/src/message.rs -
Test: same file (
#[cfg(test)]) -
Step 1: Write the failing test
Add to the tests module in crates/spacesh-proto/src/message.rs:
#[test]
fn restart_surface_resume_defaults_false_and_round_trips() {
// Legacy frame without `resume` decodes to false.
let legacy = r#"{"kind":"req","id":5,"cmd":{"cmd":"restart_surface","args":{"surface_id":"s_1"}}}"#;
let env: Envelope = serde_json::from_str(legacy).unwrap();
match env {
Envelope::Req { cmd: Cmd::RestartSurface { resume, .. }, .. } => assert!(!resume),
_ => panic!("wrong variant"),
}
// resume=true round-trips.
let e = Envelope::Req { id: 6, cmd: Cmd::RestartSurface { surface_id: SurfaceId("s_1".into()), resume: true } };
let back: Envelope = serde_json::from_str(&serde_json::to_string(&e).unwrap()).unwrap();
assert_eq!(back, e);
}
- Step 2: Run test to verify it fails
Run: cargo test -p spacesh-proto restart_surface_resume
Expected: FAIL — Cmd::RestartSurface has no resume field.
- Step 3: Add the field
In crates/spacesh-proto/src/message.rs, change the variant:
RestartSurface {
surface_id: SurfaceId,
#[serde(default)]
resume: bool,
},
- Step 4: Run test to verify it passes
Run: cargo test -p spacesh-proto — all green.
This breaks the daemon and Tauri callers that construct
Cmd::RestartSurface. They are fixed in Tasks 8 and 9; if you build the whole workspace now it will fail to compile there — that is expected and resolved by the next tasks.
- Step 5: Commit
git add crates/spacesh-proto/src/message.rs
git commit -m "feat(proto): RestartSurface gains resume flag (defaults false)"
Task 8: Server honors resume
Files:
-
Modify:
crates/spaceshd/src/server.rs -
Test: same file (
#[cfg(test)]) -
Step 1: Write the failing test for the pure helper
Add a unit test (no process spawn) for a helper that swaps args when resuming:
#[test]
fn resume_spec_swaps_args_when_mapped() {
use spacesh_proto::workspace::SurfaceSpec;
let spec = SurfaceSpec {
command: "claude".into(), args: vec!["--foo".into()], cwd: "/tmp".into(),
agent_label: Some("claude".into()), cols: 80, rows: 24, autostart: false,
};
let cfg = crate::config::Config::default();
// resume=false → original args
let plain = resume_spec(&spec, false, &cfg);
assert_eq!(plain.args, vec!["--foo".to_string()]);
// resume=true with a default mapping → resume args
let resumed = resume_spec(&spec, true, &cfg);
assert_eq!(resumed.args, vec!["--continue".to_string()]);
// resume=true for an unmapped command → original args (graceful fallback)
let mut shell = spec.clone();
shell.command = "bash".into();
let resumed_shell = resume_spec(&shell, true, &cfg);
assert_eq!(resumed_shell.args, shell.args);
}
- Step 2: Run test to verify it fails
Run: cargo test -p spaceshd resume_spec_swaps_args_when_mapped
Expected: FAIL — resume_spec not defined.
- Step 3: Implement the helper and use it in the handler
Add the helper near spawn_env in crates/spaceshd/src/server.rs:
/// Build the spawn spec for a (re)start. When `resume` and the command has a
/// resume mapping, its args are replaced with the resume args; otherwise the
/// original spec args are kept.
fn resume_spec(
spec: &spacesh_proto::workspace::SurfaceSpec,
resume: bool,
cfg: &crate::config::Config,
) -> spacesh_proto::workspace::SurfaceSpec {
let mut out = spec.clone();
if resume {
if let Some(args) = cfg.resume_args(&spec.command) {
out.args = args;
}
}
out
}
Update the Cmd::RestartSurface handler to destructure resume and spawn from the resume spec:
Cmd::RestartSurface { surface_id, resume } => {
if reg.is_running(&surface_id) {
let _ = out.send(ok(id, serde_json::Value::Null)).await; return; // already running
}
let Some(spec) = reg.surface_spec(&surface_id) else {
let _ = out.send(err(id, "NOT_FOUND", "surface")).await; return;
};
let spec = resume_spec(&spec, resume, config);
let ws_id = reg.workspace_of(&surface_id).unwrap();
let (env, hooks_active) = spawn_env(&surface_id, &spec);
match crate::surface::spawn_from_spec(surface_id.clone(), ws_id.clone(), &spec, env, hooks_active, state_tx.clone(), exit_tx.clone(), snapshot_tx.clone()) {
Ok(handle) => {
spawn_output_bridge(surface_id.clone(), &handle, router_tx.clone());
reg.set_live(handle);
reg.set_state(&surface_id, spacesh_proto::SurfaceState::Idle);
broadcast_evt(clients, &Envelope::Evt(Evt::SurfaceRestarted { surface_id: surface_id.clone() }));
let _ = out.send(ok(id, serde_json::Value::Null)).await;
}
Err(e) => { let _ = out.send(err(id, "SPAWN_FAILED", &e.to_string())).await; }
}
}
configis the&mut Configalready in scope inhandle_request; pass it as&*config/configtoresume_spec(which takes&Config). Adjust the borrow as the compiler requires (e.g.resume_spec(&spec, resume, config)whereconfig: &mut Configcoerces to&Config).
Note: the snapshot_tx.clone() added to this spawn_from_spec call is the same one threaded in Task 6 Step 2 — ensure all four spawn callsites carry it.
- Step 4: Run tests to verify they pass
Run: cargo test -p spaceshd resume_spec_swaps_args_when_mapped
Expected: PASS. Then cargo test -p spaceshd — all green.
- Step 5: Commit
git add crates/spaceshd/src/server.rs
git commit -m "feat(daemon): RestartSurface honors resume — swap to resume_args when mapped"
Task 9: Tauri bridge + socketBridge resume arg
Files:
-
Modify:
app/src-tauri/src/bridge.rs -
Modify:
app/src/socketBridge.ts -
Test:
cd app && npx tsc --noEmit -
Step 1: Update the Tauri command
In app/src-tauri/src/bridge.rs, change restart_surface to accept and forward resume:
#[tauri::command]
pub async fn restart_surface(state: BridgeState<'_>, surface_id: String, resume: bool) -> Result<Value, String> {
data_of(state.request(Cmd::RestartSurface { surface_id: SurfaceId(surface_id), resume }).await.map_err(|e| e.to_string())?)
}
(Any other place in bridge.rs constructing Cmd::RestartSurface must pass resume. The version-handshake/attach code does not; only this handler builds it.)
- Step 2: Update the JS binding and AttachResult
In app/src/socketBridge.ts:
export interface AttachResult {
snapshot: string;
cols: number;
rows: number;
cursor_row?: number;
cursor_col?: number;
stopped?: boolean;
}
export async function restartSurface(surfaceId: string, resume = false): Promise<void> {
await invoke("restart_surface", { surfaceId, resume });
}
- Step 3: Verify types compile
Run: cd app && npx tsc --noEmit
Expected: PASS (no type errors). Note: existing callers of restartSurface(id) remain valid because resume defaults to false.
Also build the Rust side: cargo check -p spaceshd and cargo check --manifest-path app/src-tauri/Cargo.toml (or cargo check in app/src-tauri).
Expected: clean.
- Step 4: Commit
git add app/src-tauri/src/bridge.rs app/src/socketBridge.ts
git commit -m "feat(app): plumb resume flag through restart_surface bridge + binding"
Task 10: Stopped overlay — paint last screen + Resume button
Files:
-
Modify:
app/src/LayoutEngine.tsx -
Test:
cd app && npx tsc --noEmit+ manual check -
Step 1: Add a read-only snapshot painter component
In app/src/LayoutEngine.tsx, add a small component that fetches the stopped surface's disk snapshot via attachSurface and paints it into a dimmed, read-only xterm. Import what is needed at the top of the file:
import { useEffect, useRef } from "react";
import { Terminal } from "@xterm/xterm";
import { attachSurface } from "./socketBridge";
(Confirm against TerminalView.tsx for the exact xterm import path and theme/font options it uses; mirror them so the dimmed preview matches the live terminal's look. Reuse the same font/palette props already threaded into Leaf.)
function StoppedSnapshot({ surfaceId, font, palette }: { surfaceId: string; font: TermFont; palette: TermPalette }) {
const hostRef = useRef<HTMLDivElement | null>(null);
useEffect(() => {
const host = hostRef.current;
if (!host) return;
const term = new Terminal({
fontFamily: font.family,
fontSize: font.size,
theme: palette,
cursorBlink: false,
disableStdin: true,
convertEol: false,
scrollback: 0,
});
term.open(host);
let disposed = false;
void attachSurface(surfaceId, () => {}).then((res) => {
if (!disposed && res.snapshot) term.write(res.snapshot);
});
return () => { disposed = true; term.dispose(); };
}, [surfaceId, font, palette]);
return <div ref={hostRef} style={{ position: "absolute", inset: 0, opacity: 0.45, pointerEvents: "none" }} />;
}
Use the exact
TermFont/TermPalettetypes already defined/imported in this file for thefont/paletteprops (seeLeaf's props). IfTerminalViewwrapsTerminalconstruction in a helper, prefer reusing that helper instead of constructingTerminaldirectly.
- Step 2: Render the snapshot + Resume button in the stopped branch
Replace the if (running[id] === false) { ... } block in Leaf with one that layers the snapshot behind centered controls and adds a Resume button. Keep the existing RotateCw/Minimize2 imports; add Play from lucide-react at the file's icon import.
if (running[id] === false) {
return card(
<div style={{ position: "relative", height: "100%", width: "100%" }}>
<StoppedSnapshot surfaceId={id} font={font} palette={palette} />
<div style={{ position: "absolute", inset: 0, display: "flex", alignItems: "center", justifyContent: "center", flexDirection: "column", gap: 10, color: COLORS.textSecondary, background: "rgba(0,0,0,0.35)" }}>
<div style={{ fontFamily: FONT.mono, fontSize: 13 }}>Stopped</div>
<div style={{ display: "flex", gap: 8 }}>
<button onClick={() => void restartSurface(id, true)}
style={{ display: "flex", alignItems: "center", gap: 6, padding: "6px 14px", background: COLORS.accent, color: COLORS.bgApp, border: "none", borderRadius: 7, fontSize: 12, fontWeight: 600 }}>
<Play size={13} /> Resume
</button>
<button onClick={() => void restartSurface(id, false)}
style={{ display: "flex", alignItems: "center", gap: 6, padding: "6px 14px", background: COLORS.bgElevated, color: COLORS.textPrimary, border: `1px solid ${COLORS.borderStrong}`, borderRadius: 7, fontSize: 12 }}>
<RotateCw size={13} /> Restart fresh
</button>
{zoomed === id && (
<button onClick={() => void setZoom(workspaceId, null)}
style={{ display: "flex", alignItems: "center", gap: 6, padding: "6px 14px", background: "transparent", color: COLORS.textSecondary, border: `1px solid ${COLORS.borderStrong}`, borderRadius: 7, fontSize: 12 }}>
<Minimize2 size={13} /> Exit zoom
</button>
)}
</div>
</div>
</div>
);
}
- Step 3: Verify types compile
Run: cd app && npx tsc --noEmit
Expected: PASS.
- Step 4: Manual verification
Build and run (make reinstall then launch, or make dev). Steps:
- Open a workspace, add a
claude(or shell) panel, let it print output. - Quit the GUI and
pkill -x spaceshd(simulate reboot), then relaunch the app. - The panel shows its last screen dimmed with Resume + Restart fresh.
- Click Resume → the agent relaunches (for claude/codex with its continue flag) and the live terminal returns.
Confirm keypress→echo still feels instant and no prompt-duplication regression on focus switches.
- Step 5: Commit
git add app/src/LayoutEngine.tsx
git commit -m "feat(app): stopped panel paints last screen + Resume/Restart fresh controls"
Final verification
- Run the full suite:
cargo test
cd app && npx tsc --noEmit
Expected: all Rust tests pass; tsc clean.
- Dispatch a final code review over the whole branch, then use superpowers:finishing-a-development-branch to merge.
Notes / gotchas
- Snapshot tick blocks the router briefly while it awaits each live actor's reply. Visible-screen snapshots are tiny and the await is per-surface and sequential; with a 5s cadence this is negligible. Do not move the disk write into the router — it stays in the writer task.
- Resume is best-effort. A new process is started; the literal in-flight process cannot survive a daemon death. For agents without a resume mapping, Resume == Restart fresh (original args).
actor_idmove inrun_actor: the final-snapshot send needsactor_idbeforeexit_txconsumes it — clone as the compiler directs.- Do not silently skip any
serve(...)test callsite (Task 6 Step 5): the compiler enumerates them; fix every one.