Skip to content

Instantly share code, notes, and snippets.

@hamidzr
Last active February 8, 2024 19:14
Show Gist options
  • Save hamidzr/ed2283c8474376b4af2b73474df97b65 to your computer and use it in GitHub Desktop.
Save hamidzr/ed2283c8474376b4af2b73474df97b65 to your computer and use it in GitHub Desktop.
Deployment Guide and FAQs

Hamid's Deployment Guide

Frequently Asked Questions (FAQs)

Accessing AWS Instances

Q: How do I access any AWS instance?
A: Utilize Amazon SSM to gain access. Navigate to the EC2 instance in the AWS console, select the connect button, and this provides root access to the instance. Optionally, you can add your SSH key for future access.

Updating Master Configuration

Q: How do I update the master config?
A: First, ensure to create a backup of the master config located at /usr/local/determined/etc/master.yaml. After making necessary edits and saving the changes, consult with other developers to confirm if it is appropriate to restart the master, especially if it's part of a shared cluster. To restart, execute the following command as a privileged user: docker restart determined-master.

Deploying to AWS

Q: How do I deploy to AWS?
A: Ensure you have the det cli version containing the latest updates. Deploy as normal using the command det deploy aws up, including the --deployment-type lore flag. To deploy a specific version of lore, use the --lore-version flag alongside the appropriate tag. Lore is accessible through the det master address DET_MASTER/lore.

Finding Specific Lore Tags

Q: How do I find these tags?
A: Check GitHub for the CI job named build-backend-server-image and request a build on the commit you desire. The tag is a either a name given to a version of Lore or the first N characters of the commit hash that shows up at the tail end of the docker images that we build for Lore. Currently N == 7.

Example: click view CI workflow here https://github.com/determined-ai/lore/runs/18464117418 to get access to the tag for a commit and/or request the image for it to be built.

Running the Server Locally

Q: How do I run the server locally?
A: To be determined (TBD).

Connecting to Agents

Q: How do I connect to agents?
A: To establish a connection with agents, execute the script found at determined/dev-scripts/..orch.py from the hamid branch. https://github.com/determined-ai/dev-scripts/blob/hamid/orch.py There are other routes. #det-halp is a good place to start.

Resource Pool Reconfiguration

Q: How do I get a specific resource pool configuration or resource?
A: Update the master config to include the resource pool config you want and restart it.

Ensuring Agent Availability

Q: How do I ensure agents don't go away?
A: Set the minimum agent count to more than 0 in the resource pool config which is specified in master config.

Setting Up Multiple FastAPI Servers

Q: How do I set up multiple FastAPI servers in a single cluster/master instance?
A: (only do this compared to a dedicated instance if you're sure you need it) Update master config to include more entries:

Make sure you have the environment variables set correctly: You can look these up from the master config file in the master instance.

export VITE_API_URL=UPDATEME
export VITE_BASE_PATH="/UPDATEME"
export DB_USER=postgres
export DB_PASSWORD=postgres
export DB_HOST=UPDATEME
export DB_PORT=5432
export DB_NAME=lore

Run the server with:

python lore/backend/server.py --routing_mode user

Then, update the master config to include a line for your prefix and port:

__internal:
  proxied_servers:
    ...
    - destination: 'http://172.17.0.1:9081/hamid'
      path_prefix: '/hamid'
    - destination: 'http://172.17.0.1:9051/caleb'
      path_prefix: '/caleb'

Access lore through the prefix you just added.

Through Docker

We can achieve a similar effect through lore docker images. TODO.

Local Deployment

TODO

Resources

Appendix

Lore server update script

#!/bin/bash

# consider running in a tmux session.

set -ex

ghash=$1

if [ -z "$ghash" ]; then
    echo "Usage: $0 <git-hash>"
    exit 1
fi

git fetch -a
git checkout $ghash
pipenv shell || true
pip install -e .
python tools/build_generated_code.py
make -C web

python ./lore/backend/server.py --routing_mode user
@hamidzr
Copy link
Author

hamidzr commented Feb 8, 2024

moved to dev-docs repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment