Q: How do I access any AWS instance?
A: Utilize Amazon SSM to gain access. Navigate to the EC2 instance in the AWS console, select the connect button, and this provides root access to the instance. Optionally, you can add your SSH key for future access.
Q: How do I update the master config?
A: First, ensure to create a backup of the master config located at /usr/local/determined/etc/master.yaml
. After making necessary edits and saving the changes, consult with other developers to confirm if it is appropriate to restart the master, especially if it's part of a shared cluster. To restart, execute the following command as a privileged user: docker restart determined-master
.
Q: How do I deploy to AWS?
A: Ensure you have the det cli
version containing the latest updates. Deploy as normal using the command det deploy aws up
, including the --deployment-type lore
flag. To deploy a specific version of lore, use the --lore-version
flag alongside the appropriate tag.
Lore is accessible through the det master address DET_MASTER/lore
.
Q: How do I find these tags?
A: Check GitHub for the CI job named build-backend-server-image
and request a build on the commit you desire.
The tag is a either a name given to a version of Lore or the first N characters of the commit hash that shows up at the tail end of the docker images that we build for Lore. Currently N == 7.
Example: click view CI workflow here https://github.com/determined-ai/lore/runs/18464117418 to get access to the tag for a commit and/or request the image for it to be built.
Q: How do I run the server locally?
A: To be determined (TBD).
Q: How do I connect to agents?
A: To establish a connection with agents, execute the script found at determined/dev-scripts/..orch.py
from the hamid
branch. https://github.com/determined-ai/dev-scripts/blob/hamid/orch.py
There are other routes. #det-halp is a good place to start.
Q: How do I get a specific resource pool configuration or resource?
A: Update the master config to include the resource pool config you want and restart it.
Q: How do I ensure agents don't go away?
A: Set the minimum agent count to more than 0 in the resource pool config which is specified in master config.
Q: How do I set up multiple FastAPI servers in a single cluster/master instance?
A: (only do this compared to a dedicated instance if you're sure you need it) Update master config to include more entries:
Make sure you have the environment variables set correctly: You can look these up from the master config file in the master instance.
export VITE_API_URL=UPDATEME
export VITE_BASE_PATH="/UPDATEME"
export DB_USER=postgres
export DB_PASSWORD=postgres
export DB_HOST=UPDATEME
export DB_PORT=5432
export DB_NAME=lore
Run the server with:
python lore/backend/server.py --routing_mode user
Then, update the master config to include a line for your prefix and port:
__internal:
proxied_servers:
...
- destination: 'http://172.17.0.1:9081/hamid'
path_prefix: '/hamid'
- destination: 'http://172.17.0.1:9051/caleb'
path_prefix: '/caleb'
Access lore through the prefix you just added.
We can achieve a similar effect through lore docker images. TODO.
TODO
- https://docs.determined.ai/
- Helper for using
det deploy
with Lore https://github.com/determined-ai/lore/blob/main/tools/det-deploy-helper.sh
Lore server update script
#!/bin/bash
# consider running in a tmux session.
set -ex
ghash=$1
if [ -z "$ghash" ]; then
echo "Usage: $0 <git-hash>"
exit 1
fi
git fetch -a
git checkout $ghash
pipenv shell || true
pip install -e .
python tools/build_generated_code.py
make -C web
python ./lore/backend/server.py --routing_mode user
moved to
dev-docs
repo