Review of the conversation about whether epoch/must-refetch should be part of the core Durable Streams protocol or remain an implementation detail.
The discussion mentions 400 vs 410 but doesn't explore that 410 Gone is actually cacheable by CDNs per HTTP spec. This could be important - a cached 410 tells clients "don't bother asking again" which has different implications than an uncacheable 400.
Sam briefly mentions "307 redirect" as an option but nobody explores it. This could be elegant: offset=-1 returns a redirect to an epoch-specific URL (e.g., /stream?offset=epoch123_0) which can be cached indefinitely. First request always hits origin, but the redirect target is immutable.
What happens to idempotent producers when an epoch changes? If a producer is mid-batch when the stream resets, does their (producerId, epoch, seq) become invalid? How should producers handle this?
The conversation assumes clients can always "just restart" but doesn't address:
- What if the client has uncommitted local changes (collaborative editing)?
- What if restarting means re-processing expensive operations?
- Should there be a "soft restart" vs "hard restart" distinction?
What happens during the gap between stream deletion and recreation? A client could:
- Start following the old stream from cache
- Reach the server which now has a new stream with the same path
- Get valid responses from the new stream thinking it's the old one
The epoch prevents this, but is this edge case explicitly covered?
Neither side addresses how this affects the conformance test suite. Testing epoch/must-refetch behavior across all clients adds significant test matrix complexity.
Kyle's ending suggestion about encoding streams as "restartable" is interesting but underdeveloped. This could be the middle ground:
- Restartable streams: clients auto-handle restart dance
- Non-restartable streams: errors elevate to user
- Server declares this via header, client behavior follows
Sam argues "just document it as a pattern" but patterns that aren't enforced drift. How do you ensure third-party implementations get this right? The protocol either guarantees it or it doesn't.
If a client is restarting, should writes be blocked? Buffered? What's the expected behavior for a bidirectional use case?
The fundamental tension:
| Kyle's View | Sam's View |
|---|---|
| Protocol = foundation for sync apps | Protocol = transport layer |
| Encode epoch explicitly | Epoch is opaque in offset |
| Guarantee caching works | Let implementations decide |
| One protocol, no extensions | Extensions for higher-level concerns |
Sam's point that "invalid offset error = must-refetch" is technically correct, but Kyle's concern about ensuring clients actually implement this correctly across languages is valid. The question is: where do you draw the line between "protocol guarantees" and "implementation guidance"?