Skip to content

Instantly share code, notes, and snippets.

@gwithers
Last active August 29, 2015 14:19
Show Gist options
  • Save gwithers/f03ff5bde2985c3890fc to your computer and use it in GitHub Desktop.
Save gwithers/f03ff5bde2985c3890fc to your computer and use it in GitHub Desktop.
HQ Field Migration

HQ/Field Deployment Overview

Overview

To minimize risk, we will deploy in stages with back-end functionality performing data migration. This will reduce the risk of a big-bang event where we'd be in a longer window of downtime. This will involve 1-2 deployments of HQ and 2 deployments of Field. Details follow.

Timeline

  1. Pre-Release Deployment Deployment of API complete, production HQ -- API and data model frozen
  2. Deployment of Field with API push/migrate back-end to HQ but existing Administrative UI
  3. Release Day Deployment (optional) Deployment of HQ in final visible form
  4. Field shutdown (optional) (see below)
  5. Data validation steps
  6. Deployment of Field with HQ-enabled UI

Detailed Steps

Pre-release Deployment HQ

A normal deployment of HQ should be put into production. This should be complete in terms of data model and APIs. The production database should be ready to accept data from Field. This follows the test phases starting this week. This must be available before the Field release.

This release of HQ must support the abilities of Field to post Account and Project creations (which eventually will not be needed).

Pre-release Deployment Field

A release of Field will be deployed into production. This release incorporates the back-end functionality of pushing all active changes in administrative functionality via the back-end queuing system. A feature gate prevents the UI changes from appearing making this very similar to the first phase 1 demonstration.

This release also enables production to execute the active migration script which will walk the Field database and send all data to HQ.

Once deployed, the migration from Field -> HQ will be invoked. This will initially be throttled on the Field side limiting the number of fibres sending data and limiting the number of workers sending account data. As the rate is established an Field and HQ are known to be running in tolerance, the queue can be increased to complete migration over a shorter time period.

OPEN ISSUE: expected duration will come from the QA tests occuring soon

Detailed Steps

  1. Deploy HQ
  2. Deploy Field
  3. Manually select migration for 5 accounts as initial smoke test
  4. Deploy job server to migrate data
  5. Execute script to fill queue for job

The queue will be monitored minimally daily to see how we proceeding.

Pre-release data validation

As the data arrives in HQ production, differential queries and dump/compares can be made on an account and project basis to track the validity of the data during the process.

As was done with QA testing for migration, we should push completely a small set of known accounts and perform deep validation there. The process then should continue at higher granularity basing on counts and data extracts. An approach, used by Field in the Rails 3 migration, would be to use SQL to create a CSV that is neutral, that is free of local ids. An example would be a sorted list of users including:

email, autodesk id, oxygen uid

Both systems can generate a file with the agreed subset of columns that could be compared by simple differencing tools.

OPEN QUESTION: some validation work is already on going so the above is a sketch but should be agreed upon by QA.

Release Day

As release day approaches, the queue of active migration should be known empty and high level validation complete including detailed validation on individual, selected accounts.

HQ is deployed to final form. If applicable, this removes any Feature Gates that may have blocked Field visibility in the HQ interface.

Field is optionally stopped from a web prospective. This downtime is not necessary but would allow for final validation of counts of data transferred. I do not believe we do this unless it will provide a clear benefit beyond noting that the queues are empty. Administrative data is less likely to be under constant change so a shutdown is very unlikely to provide simplification or benefits.

If this is deemed necessary, we need to:

  1. Prepare maintenance page
  2. Shutdown
  3. Turn on maintenance page

We will craft the maintenance page data regardless.

Field is deployed with the HQ enabled Administrative interfaces on as primary.

Final end-to-end smoke testing is performed. This should include, but not be limited to:

  • test of administration of existing Field project from Field (common)
  • test of creation of new Field project for existinhg account
  • test of new account/project and pushed data from HQ to Field
  • test of free trial experience and workflow

Timing

As timing data comes from the experiements with QA, this plan should be tightened in terms of timeframes between pre-release deployment and final deployment.

The individual pushes of Field and HQ as described above do NOT include any bulk data transfer thus should be on-par with standard product release cycles. Neither product is, at this time, employing long-running branches so there should not be a significant build up of product specific migrations.

The release should, to reduce intrusion, be scheduled so all parties are available for the deployment.

INTERNAL NOTES FOR REVIEWERS

I'm expecting we should get times from QA and augment this so we have an expected set of timings so we can say something like:

Deploy Field: T + 15 minutes: deploy complete, web up etc.

Also we need to crisp up the when so we have a stake in the ground likely burning a saturday afternoon or something for us.

@cthurrott
Copy link

Good work so far on this. I'm concerned that we're 2.5 weeks out from release and don't have more firm answers, but we can only know what we know.

As you note, we need relative dates on these. For example, Pre-release Deployment HQ = Final Release - 5 days so we can put these milestones on the schedule (and hit them).

The Pre-release Deployment Field is non-trivial for our users (we will be locking parts of the UI) so we will need to send release messaging one week before this deployment.

As @lester suggested in email, we should plan for downtime on release day and not use it. So we need to have another announcement (or combine with the one above).

On Release Day we need to figure out a window that works for Shanghai, Boston, and I think SF too. Getting on Ops calendar can be difficult so best to pick a date very soon and defer if necessary.
http://www.timeanddate.com/worldclock/meetingtime.html?iso=20150515&p1=43&p2=237&p3=224

Timings - what specific times do we need and how do we get the answers? Who owns that?

@cthurrott
Copy link

@gwithers

Was chatting with Joe and realized we need a Contingencies section. What could go wrong during deployment and how would we deal. Joe suggested doing a trial run on QA where we pull the plug midstream and see what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment