You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The DistSender circuit breaker prevents CockroachDB's DistSender from getting
stuck on non-functional replicas. The DistSender normally relies on receiving a
NotLeaseHolderError (NLHE) from a replica to redirect to other replicas. If
Single-node CockroachDB (n2-standard-16, 64GB RAM) running a KV workload at
~20% CPU with GOGC=off and GOMEMLIMIT=51GiB. The live heap is ~480MB, but
with GOGC disabled, the heap grows to ~50GB before GC triggers (driven entirely
by the memory limit). GC runs roughly every 24 seconds.
Single-node CockroachDB (n2-standard-16), KV workload at ~20-25% CPU.
GODEBUG=gctrace=1,gcpacertrace=1.
pacer: assist ratio=+1.966144e+000 (scan 226 MB in 1660->1736 MB) workers=4++0.000000e+000
pacer: 27% CPU (25 exp.) for 151835216+1501680+2831530 B work (155682658 B exp.) in 1741434408 B -> 1766226960 B (∆goal -54389166, cons/mark +1.702709e-001)
gc 20311 @11135.890s 0%: 0.099+11+0.098 ms clock, 1.5+5.1/45/49+1.5 ms cpu, 1660->1684->434 MB, 1736 MB goal, 1 MB stacks, 2 MB globals, 16 P
pacer: sweep done at heap size 458MB; allocated 23MB during sweep; swept 218348 pages at +1.681737e-004 pages/byte
OTel Datadog exporter inflates counter metric rates by ~3x
OTel Datadog exporter inflates counter metric rates by ~3x
Summary
The Datadog cockroachdb.sys.gc.assist.ns metric (and likely all Prometheus
counter-type metrics) reports a rate ~3x higher than the actual rate when using
.as_rate(). The root cause appears to be a mismatch between the OTel
Prometheus scrape interval (30s) and the interval metadata submitted to Datadog
by the OTel Datadog exporter (suspected 10s, matching the batch processor
timeout).
[correctness] highDiskSpaceUtilization comment is now stale (capacity_model.go:703-724): The comment explains that fractionUsed = load/capacity = LogicalBytes / (LogicalBytes / diskUtil) = diskUtil. Under the new model, load=Used, capacity=Used+Available — the math still recovers actual disk utilization, but the comment references the old LogicalBytes-based derivation and is now misleading.
[correctness] minCapacity floor is dramatically lower than the old floor (physical_model.go): The old model had cpuCapacityFloorPerStore = 0.1 * 1e9 (0.1 cores). The new minCapacity = 1.0 means 1 ns/s — effectively zero CPU capacity. The old floor existed to prevent utilization from going to infinity on overloaded nodes (its comment explains this in detail). If a store has non-zero load and capacity=1 ns/s, utilization
Review: PR #161454 — kvserver: thread in correct engine when destroying and subsuming replicas
Summary
This PR replaces two uses of kvstorage.TODOReadWriter(b.batch) in
replicaAppBatch.runPostAddTriggersReplicaOnly with a new
b.ReadWriter() helper that correctly separates the state engine batch
(b.batch) from the raft engine batch (b.RaftBatch()). This is part of the
broader effort to logically separate the state and raft engines in the apply
stack (issue #161059). The change is correct, small, and follows the pattern
Review: PR #79134 — kv: support FOR {UPDATE,SHARE} SKIP LOCKED
Summary
This PR implements the KV portion of SKIP LOCKED support for
SELECT ... FOR UPDATE SKIP LOCKED and SELECT ... FOR SHARE SKIP LOCKED.
The change spans the MVCC scanner, KV concurrency control, optimistic
evaluation, timestamp cache, refresh spans, and the lock table. The SQL
optimizer still rejects SKIP LOCKED (the SQL portion was extracted into
a separate PR, #83627), so this is plumbing-only from the KV side.
Review: PR #164677 — changefeedccl: add roachtest for CDC rolling restarts with KV workload
Summary
This PR adds a roachtest that exercises changefeeds during rolling node
drain+restart cycles and introduces a COCKROACH_CHANGEFEED_TESTING_SLOW_RETRY
env var for reaching max backoff behavior quickly. The test is well-structured
and the motivation is clear. There are a few structural and correctness issues
worth addressing.