Last active
March 3, 2018 05:05
-
-
Save misterdjules/516e88ef7b18cb2d3cc37d34d8215910 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The original goal of this ticket was to handle the case when a volume would | |
unexpectedly change its IP address on its existing network due to, e.g. operator | |
changes, migration or bugs. | |
The first implementation that has been tried at | |
https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 updates the | |
{{internal_metadata}} of VMs that mount a given volume during their {{start}} | |
workflow and update the IP address of any volume that they require. | |
While this solves part of the original use case, it has the fundamental | |
limitation that users need to stop _and_ start every VM that mount a volume for | |
which its IP address changed. | |
It doesn't handle the case when VMs mounting volumes are rebooted by users, or | |
reboot automatically. | |
We could definitely use the same approach as | |
https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 in the VMAPI {{reboot}} | |
workflow, but the system would still not handle a VM reboot not initiated by a | |
VMAPI workflow. | |
To handle all use cases, it seems like a lower-level approach would be | |
necessary. | |
At this point, we can step back and wonder whether the original use case | |
(handling unexpected IP addresses changes) is worth the potential complexity of | |
implementation and limitations. | |
If this was the only use case, my answer would be that it's not worth it. | |
However, there is another use case that potentially changes this trade off. | |
The {{AttachVolumeToNetwork}} and {{DetachVolumeFromNetwork}} described at | |
https://github.com/joyent/rfd/tree/master/rfd/0026#attachvolumetonetwork-post-volumesidattachtonetwork-mvp-milestone | |
would allow users to change on which network(s) a given volume is reachable. | |
Without any possibility for existing VMs that mount those volumes to change how | |
they connect to those volumes, users would need to recreate those VMs. It seems | |
that it might not be an acceptable limitation. | |
I'll assume for now that we want to solve that use case. Going back to potential | |
lower-level approaches, I've identified two of them: | |
1. use DNS names instead of IP address when automatically mounting NFS volumes | |
2. update the IP address of volumes at the init subsystem level ({{lxinit}} for | |
Docker and LX VMs, mdata-fetch for "infrastructure", or SmartOS, containers) | |
Approach #2 would still require users to reboot VMs mounting volumes, but would | |
handle the case of unplanned reboots. | |
I have questions for both approaches. | |
Using DNS names would require the following guarantees in Triton: | |
1. every VM that can mount NFS volumes have access to a DNS server able to | |
resolve NFS volumes' host names. This should always be the case for VMs on | |
networks that have a NAT zone, since they should be able to query Joyent's | |
public DNS, which would answer with CNS entries corresponding to any NFS | |
volume. However, it's not clear to me that all fabric networks are guaranteed | |
to have NAT zones (e.g currently, or until recently, there were at least some | |
use cases when Terraform used to _not_ create NAT zones when creating | |
instances on a fabric network). It's even less clear if instances provisioned | |
on non-fabric networks would be guaranteed to have access to a DNS service | |
able to serve those records. Since we're currently discussing about allowing | |
VMs on non-fabric networks to mount NFS volumes, this could be a relevant use | |
case. | |
2. We would need to verify that the implementation of the NFS client on SmartOS | |
(and potentially Linux and other systems for KVM) retry DNS name lookups when | |
various operations fail due to timeouts or errors because the NFS server | |
serving volumes' data is unreachable. | |
Updating IP addresses at the init subsystem level would probably be implemented | |
in {{lxinit}}, {{mdata-fetch}} an in user-scripts for KVM instances. I don't | |
foresee specific issues with that in {{lxinit}}, since requests to internal | |
services (e.g. VOLAPI) would be performed by code controlled by the | |
implementation, and could not be abused by users (although we could imagine a | |
large number of LX containers stuck in an autoreboot loop sending requests to | |
VOLAPI for IP addresses, but it seems could still mitigate that using e.g. a | |
cache). | |
However, performing that update in {{mdata-fetch}} or user scripts would imply | |
exposing tools to be able to refresh the IP address of a volume to the user. It | |
seems that without being careful, those tools could be used to DOS the internal | |
services that they would depend on. | |
Moreover, while it seems that this could be implemented in the metadata agent for | |
infrastructure (SmartOS) containers, it's also not clear how that interface | |
would be exposed to KVM VMs. It seems that it would require customization of | |
guest images (similar to cloudinit for Ubuntu images). | |
As a result, this is my current position on how we should move forward on this, | |
in order: | |
1. Determine whether the use case of changing network reachability of volumes | |
need to be solved (that is, whether requiring to destroy/recreate mounting | |
VMs is acceptable in this case). | |
2. If that use case needs to be solved, implement the approach used for | |
https://github.com/joyent/sdc-vmapi/tree/ZAPI-793 for the reboot and start | |
workflows and document usage and limitations. Otherwise just close this | |
ticket. | |
3. Evaluate feasability of robust and safe implementation for updating IP | |
addresses of volumes at the init subsystem level for all brands/types of | |
instances/machines. Document limitations (users need to reboot VMs mounting | |
volumes). | |
4. Evaluate feasbility of using DNS names for referring to NFS volumes | |
everywhere. | |
Thoughts? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment