When a Shiny app's WebSocket connection drops — whether from a network blip, laptop sleep, server error, or any other cause — the user sees a semi-transparent gray overlay with no recovery path (the "gray screen of death"). The session is permanently destroyed server-side, and all state is lost.
An existing session$allowReconnect() mechanism exists but is off by default, only works on certain hosting platforms, and performs a cold restart (new session, server function re-runs, all server-side state lost).
Keep the R session alive across WebSocket disconnects so clients can transparently reconnect to the same session with no state loss. Short disconnects (network blips) should be invisible to the user. Longer disconnects show a subtle reconnect UI. Fatal R errors show an informative overlay.
- End users: Better default experience without developer effort
- App developers: New callbacks and configuration for advanced control
Today a session is binary: closed = FALSE (alive) or closed = TRUE (dead). We introduce a three-state lifecycle:
CONNECTED <--> SUSPENDED --> CLOSED
CONNECTED: Normal operation. WebSocket active, reactivity flowing, flushes write to client.
SUSPENDED: WebSocket gone, but session alive.
self$closedstaysFALSE(reactive graph keeps running)- New flag:
self$suspended <- TRUE - Output observers are suspended (reversible via
resume()) - Timers (
invalidateLater,reactiveTimer) continue running — they invalidate contexts, but output observers won't recompute until resumed closedCallbacks(onSessionEnded) are NOT fired- Messages are buffered up to a configurable cap (see "Message Buffering" below)
- A grace period timer starts; if it expires, transition to CLOSED
CLOSED: Same as today. closedCallbacks fire, observers destroyed, timers cancelled, session removed.
On reconnect (SUSPENDED -> CONNECTED):
- Swap
private$websocketto the new WebSocket - Set
self$suspended <- FALSE - Resume all output observers (they recompute and flush to the new socket)
- Client sends a
resumemessage (notinit) with current input values - Server calls
manageInputs()without re-runningserverFunc()
Instead of wsClosed() immediately tearing everything down, we split into two methods:
suspendSession() (called from ws$onClose):
- Set
self$suspended <- TRUE - Call
output$suspend()for all outputs (reversible) - Start grace period timer
- Move session from
appsByTokento a newsuspendedSessionsmap
closeSession() (called when grace period expires, or on fatal error):
- Set
self$closed <- TRUE - Fire
closedCallbacks(onSessionEnded, observer destroy, timer cancel, etc.) - Remove from
suspendedSessions
Parallel to appsByToken, keyed by session token. When a new WebSocket arrives with a reconnect token, look it up here instead of creating a new ShinySession.
In the WebSocket handler (server.R), before creating a new session:
- Check for reconnect token (query param
?reconnect_token=<token>) - If token found in
suspendedSessions:- Cancel grace period timer
- Call
resumeSession(newWebSocket):private$websocket <- newWebSocketself$suspended <- FALSE- Attach message handlers to new WebSocket
- Resume all output observers
- Send
configmessage (so client knows resume succeeded) - Move session back to
appsByToken - Call
requestFlush()to push accumulated state
- If token not found (expired or new client):
- Create new
ShinySessionas today
- Create new
Messages are buffered up to a global cap (total bytes across all message types). This covers:
- Output values: Already self-deduplicate in
invalidatedOutputValues/invalidatedOutputErrorsMaps (keyed by output name, last value wins) - Input update messages: Accumulated in
inputMessageQueue, bounded by input count - Custom messages (
session$sendCustomMessage): The primary unbounded concern — buffered under the cap - Progress/notifications: Ephemeral, buffered under the same cap
If the cap is reached, stop buffering and flag that a full recompute will happen on resume. Notify the client of this on reconnect so it can inform the user if appropriate.
session$onDisconnected(callback)— fires when entering SUSPENDED statesession$onReconnected(callback)— fires when resuming from SUSPENDEDsession$onSessionEnded()— unchanged semantics, just delayed until CLOSED
When the WebSocket closes without a preceding fatal error message:
- 0-5 seconds: No visual change. App looks normal. Client silently attempts reconnect every 1.5s, sending the session token.
- 5s-timeout: Subtle, non-blocking banner at top of page: "Connection lost. Reconnecting..." with a manual "Reconnect now" link. App remains visible and readable — no overlay.
- Timeout reached / server rejects token: Overlay: "Session expired. Reload to start fresh."
The server sends a {"type": "error", "message": "...", "fatal": true} message before closing the WebSocket. The client shows an overlay immediately with:
- Error details (respecting
shiny.sanitize.errors) - "Reload" button — clean slate, fresh session
- "Reload and restore inputs" button — with a note: "This will attempt to restore your previous inputs, but the error may recur if it was caused by a specific input combination."
- Include session token in WebSocket URL:
ws://host/websocket?reconnect_token=<token> - On connection, send a
resumemessage (with current input values) instead ofinit - If server responds with the same session ID in
config, resume succeeded — hide reconnect UI - If server responds with a new session ID (token changed), the old session expired — treat as fresh start
While disconnected, user interactions (typing, clicking) are captured and sent as an update message after reconnect, so nothing is lost.
options(
shiny.reconnect = TRUE, # Enable/disable (default TRUE)
shiny.reconnect.timeout = 60, # Grace period in seconds (default 60)
shiny.reconnect.bufferSize = 1e6 # Max buffered bytes during SUSPENDED (~1MB default)
)# Override grace period
session$setReconnectTimeout(seconds)
# Disable for this session
session$setReconnectTimeout(0)
# React to lifecycle events
session$onDisconnected(function() { ... })
session$onReconnected(function() { ... })session$allowReconnect(TRUE)becomes a wrapper forsetReconnectTimeout(getOption("shiny.reconnect.timeout"))session$allowReconnect(FALSE)becomessetReconnectTimeout(0)onSessionEndedfires later than before (delayed by grace period). For apps needing immediate cleanup,setReconnectTimeout(0)restores old behavior.
If a new WebSocket arrives with a reconnect token while the old WebSocket's onClose hasn't fired yet, accept the new one and discard the old. Tokens are unique per session, so no cross-session collision.
The token is 128-bit random — infeasible to guess. Interception risk over HTTP is the same as session cookies generally. Document that HTTPS is strongly recommended.
If the R process dies, there's no session to resume. Client retries, gets rejected (token not found), falls through to "session expired" overlay. This is where bookmarking remains the right tool.
Apps holding expensive resources (DB connections, large data) hold them longer during the grace period. setReconnectTimeout(0) provides an escape hatch. Document this tradeoff.
Frozen values thaw in onFlushed callbacks, which don't fire while suspended. On resume, the first flush thaws them. May cause a brief stale-value flash. Acceptable for v1.
Mid-stream uploads are lost on disconnect. On reconnect, user would need to re-upload. No attempt to resume uploads in v1.
Extended tasks run in background processes (mirai/future) and are WebSocket-agnostic. During SUSPENDED:
- Background work continues and completes normally
on_success/on_errorsetsrv_statusandrv_value/rv_errorvia reactive values- Output observers are suspended, so they don't recompute yet
- On reconnect, observers resume, read
task$result(), and flush the completed value
This is one of the strongest arguments for session persistence — today, disconnecting during a long computation loses it entirely. With this design, the result is waiting on reconnect.
bind_task_button state may be briefly stale on reconnect (showing "running" when task is complete). The output flush corrects this quickly.
Queued invocations in invocation_queue chain and complete normally during SUSPENDED.
- Automatic state serialization — That's bookmarking territory
- Cross-process session migration — Different problem entirely
- Changes to reactive graph or observer semantics — Beyond the suspend/resume lifecycle
- Upload resume — Too complex for v1