Backend Integration Test Flakiness Report

Scope: 35 failed integration_test CI runs, ~2 days (April 15–16, 2026) Total tests in suite: ~674 TL;DR: One data race is responsible for roughly 75% of all failures. Fix it and most of the CI pain goes away.

The main culprit: a data race in analytics

When Kunal saw 31 failures in a single run, they weren't 31 broken tests. They were 1 race condition killing the binary and taking everything running at that moment down with it.

Go's -race flag instruments every memory read and write. When two goroutines access the same memory concurrently without synchronization — one writing — it kills the test binary immediately. Every test that was in-flight at that moment gets marked FAIL. That's why a single race produces 20–40 failures at once with durations of 0.00s.

The race:

Reader: svc/analytics/firehose.go:133 — the firehose background goroutine is calling json.Marshal(e) on a buffered analytics event
Writer: cmd/svc/billing/internal/utils/utils.go:1314 — billing's SyncCustomerDataToSupport is doing append(tr, "Billing_Parent") on a slice

These seem unrelated, but they share a backing array. Here's how:

Billing builds tagsToAdd/tagsToRemove slices and calls UpdateThread via the threading fused client
The fused client is in-process — no serialization, no copy. The request proto holds the exact same []string slice headers
The threading handler stores in.AddTags / in.RemoveTags in a ThreadEventUpdateTags struct and queues it as an analytics event's Attributes (server.go:5331)
The firehose flushes and marshals that event, reading the slice's backing array
Meanwhile billing's loop hits the next org iteration and does append(tr, "Billing_Parent") — if the slice has spare capacity, this writes into that same backing array in place

The fix is one line in threading/server.go:5330: copy the slices before storing them.

// Before
AddTags:    in.AddTags,
RemoveTags: in.RemoveTags,

// After
AddTags:    append([]string(nil), in.AddTags...),
RemoveTags: append([]string(nil), in.RemoveTags...),

The directory service's UpdateTags handler likely has the same pattern and should get the same treatment.

Most affected tests (all cascade victims — the tests themselves are fine):

Test	Failures (35 runs)	Example run
`TestTwimulator_IncomingSIPCall_Transfer`	16	run 24491638845
`TestCallRecordingTranscription_EndToEnd`	16	run 24471400445
`TestTwimulator_IncomingSoftphoneCall_Transfer`	15	run 24489987763
`TestAITranscriptionSettings_Mutations`	15	run 24487987796
`TestTwimulator_OutgoingSIPCall_Transfer`	14	run 24498542318

In failed runs with this race, you'll see testing.go:1712: race detected during execution of test in the output. Tests marked 0.00s didn't even start — they were killed by the race before they got a chance to run.

Second culprit: MySQL deadlock on `ai_transcription_configuration`

About 15% of failures are a deadlock unrelated to the race. Three transcription tests run in parallel, each creating an AI transcription configuration for a test org. They all INSERT into integration_excomms.ai_transcription_configuration concurrently and deadlock on the secondary index idx_endpoint_voicemail_config_org_id.

Error: Error 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Affected tests:

TestCallRecordingTranscription_EndToEnd (example)
TestAudioMessageTranscription_EndToEnd (example)
TestAITranscriptionSettings_Mutations

The fix here is adding retry logic on deadlock (MySQL error 1213) in the DAL layer for this table, or serializing these three tests with t.Parallel() removed (simpler but slower).

Everything else (~10%)

Two smaller issues that only show up after the race is fixed and stop masking them:

Async call transfer timeouts — TestTwimulator_OutgoingSIPCall_Transfer fails with Condition never satisfied at sip.outgoing.star_menu_transfer_test.go:261. The test polls for a warm transfer to complete, but under CI load the state machine runs slower than the poll window. Increasing the timeout or reducing parallelism for call tests would help.

Missing PhoneTreeNodeDescription — TestTwimulator_IncomingCall_CallFlowMenuHangup fails because a field is set asynchronously and the assertion runs before it's populated. 5 failures across 35 runs. Small but genuine.

Priority order

Fix the data race (threading/server.go:5330, directory UpdateTags handler) — eliminates ~75% of all failures. This is a real bug in production too, not just a test issue. The race corrupts analytics events, silently writing wrong tag data to Firehose in production.
Add deadlock retry for ai_transcription_configuration — eliminates ~15% of remaining failures.
Increase timeout in TestTwimulator_OutgoingSIPCall_Transfer — small, targeted fix for the remaining call transfer flake.

The race fix alone should take most runs from 20–30 failures to 0–2.

sibljon/flaky-test-report.md

Select an option

No results found

Select an option

No results found

Backend Integration Test Flakiness Report

The main culprit: a data race in analytics

Second culprit: MySQL deadlock on `ai_transcription_configuration`

Everything else (~10%)

Priority order

sibljon/flaky-test-report.md

Backend Integration Test Flakiness Report

The main culprit: a data race in analytics

Second culprit: MySQL deadlock on ai_transcription_configuration

Everything else (~10%)

Priority order

Second culprit: MySQL deadlock on `ai_transcription_configuration`