Estimated time: 20 minutes
Welcome to the Foundations trial environment!
In this tutorial we'll go through the process of optimizing and serving a simple text generator model using Foundations.
This trial environment provides you with a fully managed Foundations setup, including:
- 10 GPUs
- Foundations, TensorFlow, and the Python scientific stack (NumPy, pandas, etc.) pre-installed
- An in-browser IDE
Foundations is infrastructure-agnostic and can be set up on-premise or on the cloud. It can be used with any development environment.
In this trial we will start by taking some basic model code and using it to explore some of Foundations' key features.
This is what we'll achieve today:
-
With minimal effort, we will optimize our model with an architecture and hyperparameter search on a cluster of machines with GPUs on Google Cloud Platform
-
We will track and share metrics to assess model performance
-
We will see how Foundations automatically tracks the parameters and results of these experiments in a dashboard, enabling full reproducibility.
-
Finally, we'll select the best model and serve it to a demo web app.
Let's submit a job with Foundations. Run the following command in the terminal:
$ foundations deploy --env scheduler --job-directory experiments/text_generation_simple
Congratulations! The job is running and the model is training remotely on GPUs.
Any code can be submitted in this way without modifications.
Now let's scale up our experimentation with Foundations.
To the right of this pane, you will see main.py
. This code was
quickly assembled by one of our machine learning engineers without using
Foundations.
The model is a GRU (gated recurrent unit) language generator. We will train it on some Shakespearean text, and the resulting model will be able to synthesize new text that sounds (ostensibly) like Shakespeare.
We're going to optimize the model performance using an architecture and hyperparameter search.
Without Foundations, running a search over many architectures and sets of hyperparameters is messy and difficult to manage. Foundations makes this straightforward! We're going to write a simple script to immediately kick off a set of jobs of a random search on our cluster.
In the editor, right click on the experiment_management/
folder
and create a
new file called submit_jobs.py
. Add in the
following code:
import foundations
import numpy as np
# Constant for the number of models to be submitted
NUM_JOBS = 100
# Get params returns randomly generated architecture specifications
# and hyperparameters in the form of a dictionary
def generate_params():
params = {
"rnn_layers": np.random.randint(1, 4),
"rnn_units": int(np.random.choice([128, 256, 512])),
"batch_size": int(np.random.choice([32, 64, 128, 256])),
"learning_rate": np.random.choice([0.001, 0.01, 0.005]),
"embedding_dim": np.random.randint(128, 257),
"epochs": np.random.randint(5, 21),
"seq_length": 100,
"temperature": np.random.choice([.2, .3, .4]),
"dataset_url": "https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt"
}
return params
# A loop that calls the deploy method from the
# Foundations SDK which takes in a parameters dictionary
# and the entrypoint script for our code (main.py)
for _ in range(NUM_JOBS):
foundations.deploy(
env="scheduler",
job_directory="experiments/text_generation_simple",
entrypoint="main.py",
project_name="text_generation_simple",
params=generate_params(),
)
Start by adding an import statement to the top of main.py
:
import foundations
Beginning on line 7, the code has a locally defined parameters dictionary. Replace that with the following line:
params = foundations.load_parameters()
In main.py
we have lines which print
useful information about our model. It's easy to get
Foundations to log them.
Around line 30, there is the following code:
train_loss = model.test(dataset_train, steps_per_epoch_train)
print("Final train loss: {}".format(train_loss))
test_loss = model.test(dataset_test, steps_per_epoch_test)
print("Final test loss: {}".format(test_loss))
# Change the model to test mode
model.set_test_mode(checkpoint_dir='./training_checkpoints')
# Prompt the model to output text in the desired format
generated_text = model.generate(start_string=u"ROMEO: ", num_characters_to_generate=25)
print("Sample generated text: \n{}".format(generated_text))
To track any performance metric using Foundations, you can
call log_metric
. Let's add the following lines to the bottom of main.py
:
foundations.log_metric("train loss", train_loss)
foundations.log_metric("test loss", test_loss)
foundations.log_metric("sample output", generated_text)
Foundations can track any number or string in any part of your project code this way.
In order to serve the model later, we'll need to prepare the predict.py
entrypoint and create a configuration file.
Open predict.py
and add an import statement to the top:
import foundations
Also replace the params
dictionary with
params = foundations.load_parameters()
At the bottom of this window you'll see a terminal. Type the following command to launch the script we just wrote:
$ python experiment_management/submit_jobs.py
That's it! Foundations is now using the full capacity of available compute resources to explore our architecture and parameter space by training a group of models concurrently. The jobs will now be deployed and run in the background.
Let's take a look at how they're doing.
Foundations provides a dashboard that allows teams to monitor and manage multiple projects across a cluster. We can take a look at the parameters and performance metrics of all the jobs we submitted.
Click here to open the dashboard. Each job will show up in the dashboard upon submission, along with an icon indicating the run status. Refresh the page in order to see updated statuses and metrics.
Each job will show up in the dashboard upon submission, along with an icon indicating the run status.
Icon | Status |
---|---|
green | Job completed |
green flashing | Currently running |
yellow | Queued |
red | Job exited with an error |
Some jobs will already be completed. Remember that we added a
sample
of generated output as a metric above: hover
over a few examples in the sample output
column
to see how our models are doing.
Foundations provides a standard format to seamlessly package machine learning models for production.
We've included a configuration file foundations_package_manifest.yaml
which tells Foundations to serve generate_prediction(...)
from predict.py
We will use the predict.py
function and package yaml
created earlier to serve the model.
On the dashboard, select a job to serve. It is recommended to choose the one
with the lowest test_loss
, or perhaps your favourite generated
text example. Copy the job_id
.
In the terminal, enter
foundations serve start <JOB_ID>
Foundations automatically retrieves the bundle associated with the job_id
and wraps it in a REST API. Requests can be made to the entrypoints
specified by the foundations_package_manifest.yaml
file.
Click here to go to a demo webapp that makes a REST call to the model being served. For the Model Name field, use MODEL_IP_ADDRESS. Now click "Generate" to output generated text from your served model!
We want to hear your what you think about Foundations!
- Fill out this feedback survey
- Tell us what you thought of Foundations via email
- Tweet us @Dessa with your best model-generated text using #FoundationsML