Skip to content

Instantly share code, notes, and snippets.

@TheYkk
Last active June 8, 2025 09:32
Show Gist options
  • Save TheYkk/959756387290f598d6e0934125a4de1e to your computer and use it in GitHub Desktop.
Save TheYkk/959756387290f598d6e0934125a4de1e to your computer and use it in GitHub Desktop.

πŸš€ High-Level Goal
Support a 64 v 64 (128 total) β€œHell-Let-Loose–style” FPS with Godot clients and an authoritative Rust server, while keeping latency low (< 80 ms RTT budget) and bandwidth reasonable for both clients (< 250 kbps) and the server box (< 25 Mbps).

────────────────────────────────────────

  1. Core Design Pillars ──────────────────────────────────────── β€’ Authoritative server ‑ no trust in clients
    β€’ UDP first, with a light reliability/ordering layer (think ENet/Laminar/QUIC)
    β€’ Fixed-rate server simulation tick, client-side prediction + interpolation
    β€’ Delta-compressed, relevance-filtered snapshots (a.k.a. interest management)
    β€’ Multi-threaded ECS simulation on the server; network I/O kept lock-free
    β€’ Single box for 128 players, but layout is shard-friendly if we ever split

──────────────────────────────────────── 2. Top-Level Architecture ──────────────────────────────────────── Godot Client <-UDP/QUIC-> Rust β€œGame-Core” (authoritative) <-TCP-> Lobby / DB

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”   inputs   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   events  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Godot  │───────────►│ Net Front   │──────────►│  Match   β”‚
β”‚ Client │◄───────────│  Gate (IO)  │◄──────────│  Lobby   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ snapshots  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚
                         lock-free
                         channels
                             β”‚
                       β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
                       β”‚ Game ECS  β”‚
                       β”‚  (Bevy?)  β”‚
                       β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
                             β”‚
                       β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”
                       β”‚  Worker   β”‚
                       β”‚ Threads   β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Why two layers inside the server?
β€’ Net Front Gate = purely async I/O, packet (de)frag, (de)crypt, acks.
β€’ Game ECS = deterministic world updated at fixed Ξ”t, batch-consumes inputs, emits snapshots.

──────────────────────────────────────── 3. Transport & Packet Layout ──────────────────────────────────────── Transport: UDP (or QUIC if you want built-in encryption + congestion control).
Max safe MTU: 1200 bytes (fits inside most home NAT-MTUs).

Packet Header (7 bytes):

uint16  seq_id
uint16  ack_of_remote
uint32  ack_bitfield    (32 earlier acks)
uint8    flags          (bit0=reliable, bit1=frag, bit2=control…)

Payload = 1β€’N β€œmessages” TLVed inside the datagram:

Msg-Types (1 byte id + 1 byte len if <256): 00 Heartbeat / ping
01 InputCmd (bitfield buttons 2B + 3Γ—pos32 or delta16 + uint8 tick)
02 SnapshotDelta (compressed)
03 SnapshotBaseline (full state if delta lost)
04 Event/RPC (grenade exploded, chat, UI)
05 StreamFrag (map chunk, voice, etc.)

Reliability:
β€’ β€œreliable” flag + sliding window resends.
β€’ Unreliable for InputCmds (they become obsolete quickly).
β€’ Semi-reliable for SnapshotBaselines.

──────────────────────────────────────── 4. Tick & Time Model ──────────────────────────────────────── Simulation tick = 60 Hz (Ξ”t = 16.66 ms)
Networking tick = 20 Hz (every 3rd sim tick we send a snapshot)

                  Client Render (144 Hz)
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

Timeline β†’ β”‚I I Iβ”‚I I Iβ”‚I I Iβ”‚ … (inputs @ 60) β”‚ β”œβ”€β”¬β”€β”¬β”€β”΄β”€β”¬β”€β”¬β”€β”΄β”€β”¬β”€β”¬β”€β”΄β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ Server Sim β”‚S β”‚S β”‚S β”‚S β”‚S β”‚S β”‚S … (60 Hz) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜ Snapshot Tx β–² β–² β–² (20 Hz) Interpolation buf. 2.5 ticks β‰ˆ 40 ms

Client-side: β€’ Sends InputCmd every render frame (ideally 60 Hz limit).
β€’ Predicts locally.
β€’ Keeps 100 ms of history; on mismatch vs authoritative state β‡’ smooth rewind/correct.

Server: β€’ Collects all inputs with tick ID ≀ current-tick.
β€’ Simulates physics, hit-scan.
β€’ Serializes state diff vs. last ACKed snapshot per client.
β€’ Runs interest mgmt: spatial hash + LOS + team filter.

──────────────────────────────────────── 5. Interest / Relevance Management ──────────────────────────────────────── World split into 3-D grid cells (e.g. 32 m cubes).
For each client we only ship entities inside a radius of R = 250 m in front 120Β° FOV + team markers.
Typical relevant entity count:

β€’ Players: β‰ˆ 40
β€’ Projectiles (bullets & tracers): β‰ˆ 30 (fade quickly)
β€’ Grenades / effects: 10
β€’ Buildables / vehicles: 20
TOTAL β‰ˆ 100 entities / player on average.

Entity State Quantization per delta entry id (uint16) 2 B
position (x,y,z int16) 6 B (centimeter accuracy inside 2 km map)
yaw/pitch (2Γ—int16) 4 B
velocity (packed int16Γ—3) 6 B
state bits (1 byte) 1 B
TOTAL ~19 B β†’ delta often ~10 B after XOR & RLE

Bandwidth per client (down): 100 entities Γ— 10 B Γ— 20 Hz = 20 kB/s β‰ˆ 160 kbps

Bandwidth per client (up): InputCmd 8 B Γ— 60 Hz = 480 B/s β‰ˆ 4 kbps

Server aggregate: Down: 20 kB/s Γ— 128 = 2.5 MB/s β‰ˆ 20 Mbps
Up: 0.48 kB/s Γ— 128 = 61 kB/s β‰ˆ 0.5 Mbps

Well within a single gig-E NIC.

──────────────────────────────────────── 6. Server Threading & Scaling ──────────────────────────────────────── CPU Budget (per tick): β€’ Physics + ECS: ~100 Β΅s per player β†’ 128 Γ— 100 Β΅s = 12.8 ms
β€’ Overhead / pathing / extras β†’ 2.0 ms
Total β†’ 14.8 ms < 16.6 ms budget πŸ’š

Implementation:
β€’ 1 async thread (Tokio/Quinn) for recv/send (zero-copy to/from mpsc channel).
β€’ N-1 worker threads (rayon or Bevy_schedule) own ECS; partition by entity or system.
β€’ End of tick = barrier; snapshot builder runs, pushes bytes back to Net thread.

Memory:
Baseline entity (archetype) ~256 B; 5 000 live entities β†’ 1.3 MB.
Plenty of headroom; 32 GB RAM box is luxurious.

──────────────────────────────────────── 7. Will It Still Work for 128 Players? ──────────────────────────────────────── We already designed for 128 total. Stress test scenario: everybody in one courtyard.

Entity count might double to 200 relevant.
Bandwidth per client β†’ 40 kB/s (β‰ˆ320 kbps) still OK.
Server outbound β†’ 5 MB/s (β‰ˆ40 Mbps) still < 1/20th gig-E.
CPU: bullet spam could spike physics to 25 ms β†’ mitigation:
β€’ cap projectile simulation (hitscan on server, clients draw fake tracers)
β€’ off-thread async jobs for explosions etc.

So yes, still viable on one modern 8-core (Ryzen 5 7600, Xeon E-2288G, etc.). For >128, you’d shard or open β€œregion servers” (same exe, different port).

──────────────────────────────────────── 8. Special Topics / Trade-offs ──────────────────────────────────────── Anti-cheat:
β€’ Server validates hits; client only raycasts for FX.
β€’ CRC on resources, obfuscation of packet opcodes.
β€’ Optional: kernel driver not covered here.

Matchmaking & Persistence:
β€’ Separate micro-service; server receives a β€œSpawnBlob” (loadout, cosmetics).
β€’ At end of match flush stats via TCP to DB.

Voice:
Don’t mix in main data path; use separate SFU or Vivox-like relay.

Tick vs. Event-Driven alternative?
If you wish to ditch fixed 60 Hz, you could go β€œsnapshot-based” variable Ξ”t (Apex approach) but you’ll complicate determinism & physics. For indie scope, classic fixed-tick is safer.

──────────────────────────────────────── 9. Checklist Summary ──────────────────────────────────────── βœ… UDP + reliability layer (seq/acks)
βœ… 60 Hz sim / 20 Hz snapshots, client interp 100 ms
βœ… Entity relevance + delta compression to keep < 320 kbps per user
βœ… Rust: ECS (Bevy/Legion/Shipyard) + Tokio/Quinn net I/O
βœ… 8-core box, 40 Mbps peak outbound, < 16 ms/frame CPU
βœ… Scales to 128 players; >128 β‡’ shard or stream sections to sub-servers

You now have a concrete yet implementation-agnostic blueprint for building the server & protocol. Happy fragging! πŸ”«

@TheYkk
Copy link
Author

TheYkk commented Jun 8, 2025

πŸ‹οΈβ€β™‚οΈ Authoritative Physics & Collision Handling
(what runs on the Rust server every 60 Hz tick)

────────────────────────────────────────

  1. Guiding Principles
    ────────────────────────────────────────
    β€’ 100 % authoritative: only the server decides β€œwhat touched what”.
    β€’ Fixed Ξ”t (16.66 ms) to keep integration stable and deterministic.
    β€’ Keep it cheap (≀ 2 ms of the 16 ms budget) – no full-blown rigid-body chaos, only what an infantry-centric FPS really needs.
    β€’ Give clients a mirror-lite version for prediction; small divergences are OK because reconciliation corrects them.

────────────────────────────────────────
2. Physics Scope for an HLL-Style FPS
────────────────────────────────────────
A. Player locomotion – capsule vs. static level geometry, jump, step-up, ladder.
B. Bullets / hitscan – instant ray checks (99 % of shots).
C. Projectiles – grenades, rockets: parabolic flight + explosion radius.
D. Environment – static meshes, trigger volumes, no destructibles (keep first release simple).
E. Vehicles – if/when added, approximate with single convex hull, no wheel suspension simulation initially.

────────────────────────────────────────
3. Tech Choice
────────────────────────────────────────
Use the Rapier3D crate (MIT-licensed, by Dimforge). Reasons:

β€’ Pure Rust – perf & FFI-friendly.
β€’ Deterministic when you pin the same compiler flags and CPU float mode (no SSE‐vs-AVX divergence).
β€’ Already has broad-phase (SAP), narrow-phase (GJK/EPA) and CCD.
β€’ Integrates cleanly with Bevy ECS (via bevy_rapier) or any custom ECS.

Alternative: roll your own capsule-only solver β†’ even faster, but higher upfront cost. Start with Rapier, profile, replace later if necessary.

────────────────────────────────────────
4. Collision Pipeline per Tick
────────────────────────────────────────

for tick in 0..∞ {
    // 1. Collect inputs (already queued by Net-IO thread)  
    apply_player_inputs(dt);

    // 2. External forces  
    add_gravity();
    apply_friction();

    // 3. Broad Phase  
    rapier.update_broad_phase();      // uniform grid + SAP

    // 4. Narrow Phase  
    rapier.compute_narrow_phase();    // capsule-mesh, ray, sphere

    // 5. Solve contacts & integrate  
    rapier.step_island_solver();      // penetration correction

    // 6. Hitscan / Ray Tests  
    process_hitscan_requests();       // see Β§5

    // 7. Explosions & AoE overlaps  
    evaluate_overlap_queries();

    // 8. Write-back to ECS (Transform, Velocity)  
    publish_new_state();

    // 9. Snapshot packing happens after this
}

Average cost on Ryzen 5 5600:
β€’ 128 dynamic capsules + 200 static colliders β‰ˆ 0.4 ms
β€’ 2 000 raycasts (1 full-auto MG burst) β‰ˆ 0.3 ms
β€’ 50 grenade projectiles β‰ˆ 0.2 ms
TOTAL β‰ˆ 0.9 ms

So we’re safely below 2 ms.

────────────────────────────────────────
5. Bullets β‰  Rigid Bodies
────────────────────────────────────────
β€’ 99 % of weapons modeled as hitscan:
– Collect all fire events this tick.
– Raycast from muzzle to muzzle + range in Rapier’s query API (no insertion of dynamic bodies).
– First intersection decides hit; store β€œHitEvent” component, later resolved into damage & FX.

β€’ Tracers are purely cosmetic on the client (draw a ribbon between start & impact point after server response).

Benefits: zero per-frame memory churn, no tunneling issues, trivial network traffic (only send HitEvent).

────────────────────────────────────────
6. Grenades / Rockets (Slow Movers)
────────────────────────────────────────
β€’ Insert as lightweight RigidBodyType::KinematicPositionBased.
β€’ Integrate with gravity: pos += vel * dt; vel += g * dt.
β€’ Continuous Collision Detection enabled so they don’t clip through walls when frame-offset.
β€’ On impact OR fuse-timeout β†’ spawn ExplosionEvent.
β€’ Explosion = overlap query of spheres within radius – O(#entities in that cell).

Network payload: only grenade spawn (reliable) + grenade despawn/explosion event (reliable). Intermediate positions are not networked; clients lerp.

────────────────────────────────────────
7. Static World Representation
────────────────────────────────────────
β€’ Export level geometry from Godot as aggregate triangle mesh; pre-baked into Rapier’s TriMesh on server start.
β€’ For broad-phase culling the mesh is internally split into BVH nodes; no per-tick cost.
β€’ Doors / bridges that move? Represent as separate kinematic bodies switched by gameplay scripts.

Memory footprint: 2 Γ— compressed mesh size (BVH + verts). Typical 1 kmΒ² map ~ 30 MB – fine.

────────────────────────────────────────
8. Player Prediction on Client
────────────────────────────────────────
Server uses Rapier.
Client ships with subset of the same code compiled to WebAssembly or GDExtension:

β€’ Step-up height, slope limit, gravity must match server constants.
β€’ Disable expensive CCD & contact manifold generation client-side (not needed for prediction).
β€’ Divergence <2 cm over 200 ms is usually unnoticeable; when it exceeds threshold β†’ reconciliation.

To ensure numeric parity:

cargo rustc -- -C target-feature=+strict-float

on both builds, or ship custom fixed-point math module just for locomotion.

────────────────────────────────────────
9. Determinism vs β€œGood Enough”
────────────────────────────────────────
We do not need lock-step determinism across all hardware, only β€œserver as the single source of truth”. Therefore:

β€’ Clients may drift a bit; server snaps them back.
β€’ Spectator replay uses server log, so always correct.
β€’ Future e-sport / anti-cheat hardening β‡’ move to fixed-point math to make server re-simulation easier in the cloud; not a V1 requirement.

────────────────────────────────────────
10. Profiling & Regression
────────────────────────────────────────
β€’ Benchmarks (cargo criterion) that run the physics step with recorded input traces – catch perf regressions.
β€’ Integration test: spawn 128 dummy capsules + 5 000 random raycasts, assert no panics and tick <2 ms on CI’s m5zn.metal reference machine.

────────────────────────────────────────
11. Extensibility Hooks
────────────────────────────────────────
β€’ Vehicles later? Stick a convex hull collider + apply engine force; still fits.
β€’ Destructible walls? Spawn new static collider chunks and mark them β€œdestroyed” after HP ≀0 β†’ update BVH once.

────────────────────────────────────────
12. Recap Cheat-Sheet
────────────────────────────────────────
Physics engine: Rapier3D (server) + stripped client mirror
Tick: 60 Hz fixed step
Collision path: Broad Phase β†’ Narrow Phase β†’ Island Solver
Bullets: instant raycasts, no rigid bodies
Projectiles: kinematic, CCD on, overlap query on explode
Static map: baked triangle mesh BVH
Perf budget: < 1 ms of CPU per tick for 128 players

This gives you authoritative, efficient, and maintainable physics that stays within your latency and CPU budgets while scaling cleanly to 64 v 64 battles.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment