Skip to content

Instantly share code, notes, and snippets.

@nathanshelly
Created November 18, 2020 09:14
Show Gist options
  • Save nathanshelly/93bc3aac357dd6863f2de60c2374f712 to your computer and use it in GitHub Desktop.
Save nathanshelly/93bc3aac357dd6863f2de60c2374f712 to your computer and use it in GitHub Desktop.

👋 First off thanks for your product! Recently introduced it at my company and have had nothing but positive feedback so far.

I noticed an odd race condition from percy start --detached between spawning the agent and running the health check. This causes the command to error but still start the process in the background. It only occurs with the --detached flag.

I'm trying to start the agent from node script using execa.command. Unfortunately this error causes execa to bail which kills the agent.

I investigated a naive fix which seems to have worked around the problem though definitely not in a clean way.

current behavior

Note: The PERCY_TOKEN in the following examples was created to be purposefully "leaked" here

Here is the current behavior:

without_sleep

Notice how the health check fails and I can see that nothing is running on the port immediately after it finishes but a second or two later it shows up in lsof's output. If I use a fake PERCY_TOKEN (e.g. PERCY_TOKEN=foo yarn percy start --detached) I get the same health check error but the process never starts (since there is no valid project to create a build for).

To reproduce this behavior simply run the following in a new directory:

> yarn init
> yarn add @percy/script
> PERCY_TOKEN=<fill-in-your-token-here> yarn percy start --detached

naive fix

I tested a naive fix by inserting a simple 2 second sleep between the call to run Percy detached and the failing health check. Basically I inserted the following on L60 of that same file:

function sleep(ms) {
  return new Promise((resolve) => setTimeout(resolve, ms));
}

await sleep(2000)

This avoids the race condition:

with_sleep

Note that when spawning via execa 2000ms isn't enough. Bumping up to 5000ms seemed to consistently succeed

I also tried awaiting the spawn call that I think is the culprit here but that didn't seem to have any effect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment