Skip to content

Instantly share code, notes, and snippets.

@serafdev
Last active July 18, 2024 16:35
Show Gist options
  • Save serafdev/cf54c45190146b4a1efc53951d614056 to your computer and use it in GitHub Desktop.
Save serafdev/cf54c45190146b4a1efc53951d614056 to your computer and use it in GitHub Desktop.
MicroCeph with Internal Network Interface

Command on the "master" node

I've had issues where my VM setup has multiple network interfaces but only 1 interface should be used for my nodes to communicate with eachother. MicroCeph kept using the external IP and external subnet during setup, the following command overrides the defaults to use the correct IP (here using the 10.10.103.0/24 subnet to communicate internally)

sudo microceph cluster bootstrap --microceph-ip 10.10.103.218 --mon-ip 10.10.103.218 --public-network 10.10.103.0/24 --cluster-network 10.10.103.0/24

After that simply run the usual microceph init or microceph add-node, the correct IP will be assigned to the token

Command on the "slave" nodes

To join you also need to specify the IP to use

sudo microceph cluster join --microceph-ip 10.10.103.222 THE_TOKEN_FROM_MASTER
@serafdev
Copy link
Author

WARNING: Before doing anything please read the documentation and understand the various features of ceph, manipulating the database directly is HIGH risk, backup your VMs and if working with rook-ceph and Kubernetes, DRAIN YOUR NODES

When removing a node I had issues with the database getting corrupt and a row stuck at 'PENDING' even when uninstall ceph on that specific node, simple manipulations on the database solved that:

Delete the member from the internal_cluster_members table:

sudo microceph cluster sql 'select * from internal_cluster_members;'
sudo microceph cluster sql 'delete from internal_cluster_members where id=NODE_ID_FROM_PREVIOUS_COMMAND;'

Sometimes the Token need to be regenerated, which throws the error on the UNIQUE constraint on that table, for the tokens, I personally just deleted all generated tokens (they are one time use), if you have more you can do the same as the above note, else simply run delete all:

sudo microceph cluster sql 'delete from internal_token_records;'

To get a better understanding on the state you can investigate the whole database with the following command and run a few select *:

sudo microceph cluster sql '.schema'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment