Skip to content

Instantly share code, notes, and snippets.

@Groostav
Created April 26, 2024 03:47
Show Gist options
  • Save Groostav/36664ad5ddb3bea727fd7e949b3689ed to your computer and use it in GitHub Desktop.
Save Groostav/36664ad5ddb3bea727fd7e949b3689ed to your computer and use it in GitHub Desktop.
rom description 2024-04-25

Reduced Order Modelling Toolkit

version 0.1. by Empower Operations. Copyright 2023.


This repository contains releases (sans source code) for the Reduced Order Modelling "ROM" toolkit by Empower Operations "empower-rom".

This is the culmination of work from research by Empower Operations under grant from the Government of Canada. We hope it helps you accelerate engineering!

Overview

A "Reduced Order Model" (ROM) is a mechanism to greatly speed up simulation that leverages AI techniques and a large body of simulation data as a reference. This technique involves scanning a simulation dataset that consists of input operational load conditions and output simulation results (typically mesh files) with learning algorithms, and saving the result to a compact format. Then, as needed, when a user wants to run a simulation but does not have the time, they can use the AI to infer a simulation result by combining the previously scanned dataset with the users new operational load conditions.

In this way a ROM can be used to accelerate simulation such that it can be used in real-time applications.

Usage:

There are three broad steps you need to create and run a ROM:

0. setup input data

The Reduced Order Modelling toolkit assumes that you have a dataset that can be made into a reduced order model. This dataset should have a set of inputs, typically things like "loading conditions", typically controlled operationally. These are things like inlet velocities, input voltages, water temperatures, machine RPM, etc. Then there is the simulation output mesh data, which is typically very large.

Before we can do anything with the toolkit there are two pieces of data that must be assembled:

  1. Input load condition vectors table file aka load vectors
  2. Output simulation mesh snapshots columns aka mesh snapshots

Note this documentation uses the phase "mesh snapshots" to describe all simulation output types. This is because most output is expected to be characteristics involving a mesh, but it need not be. empower-rom will work with any structured output up to sizes of one-million entries. Unfortunately in such a case "mesh snapshot" becomes a misnomyr, but I will ask the reader to replace "mesh snapshot" with "simulation snapshot" or "output snapshot" as they read, for such a cercomstance.

Load Vectors

The load vectors are a table, with each column representing a parameter and each row representing values for a specific state of the machine in each of the parameters.

An example load vectors file:

heat-diffuser-load-vectors.csv

inlet_velocity,inlet_temperature,outlet_velocity,outlet_temperature
0.013,24.7,0.007,64.4
0.014,39.21,0.005,68.43
0.009,14.35,0.008,73.22
0.011,15.26,0.005,57.19
0.013,42.48,0.009,74.85

This file encodes 5 different measured (or simulated) operational conditions, or "load vectors", one per row. Each load vector describes 4 parameter values. In this case the first is cold stream velocity, the second is cold stream temperature, the third is hot stream velocity and the fourth is hot stream temperature.

Mesh Snapshots

Simulation output mesh snapshots are more difficult to work with. The mesh snapshots are a list of columns of nodal mesh values, that is a column of values each of which indicates some measurable or interesting quantity a corresponding mesh node in the output space. The ROM will be produced by memorizing these meshes such that it can predict the resulting mesh for a given load vector.

By convention: simulations output mesh data in column-wise order, so empower-rom expects column-oriented mesh snapshots. This means that instead of each row representing a mesh, each column will represent a mesh.

An example of a mesh snapshots file:

heat-diffuser-mesh-snapshots-abbreviated.csv

4.67E+01,5.33E+01,5.42E+01,5.12E+01,4.53E+01
4.52E+01,5.15E+01,5.31E+01,5.08E+01,4.52E+01
4.46E+01,5.09E+01,5.27E+01,5.07E+01,4.52E+01
4.58E+01,5.23E+01,5.36E+01,5.10E+01,4.53E+01
4.50E+01,5.13E+01,5.30E+01,5.08E+01,4.52E+01
4.25E+01,4.85E+01,5.13E+01,5.02E+01,4.51E+01
4.61E+01,5.26E+01,5.38E+01,5.11E+01,4.53E+01
4.45E+01,5.08E+01,5.27E+01,5.07E+01,4.52E+01
4.43E+01,5.05E+01,5.25E+01,5.06E+01,4.52E+01
4.51E+01,5.14E+01,5.31E+01,5.08E+01,4.52E+01
4.41E+01,5.03E+01,5.24E+01,5.06E+01,4.52E+01
4.52E+01,5.15E+01,5.31E+01,5.08E+01,4.52E+01
4.47E+01,5.09E+01,5.28E+01,5.07E+01,4.52E+01
4.51E+01,5.13E+01,5.30E+01,5.08E+01,4.52E+01
4.44E+01,5.06E+01,5.26E+01,5.06E+01,4.52E+01
4.69E+01,5.34E+01,5.43E+01,5.12E+01,4.53E+01
4.58E+01,5.21E+01,5.35E+01,5.09E+01,4.53E+01
4.61E+01,5.24E+01,5.37E+01,5.10E+01,4.53E+01
4.74E+01,5.39E+01,5.46E+01,5.13E+01,4.54E+01
4.58E+01,5.22E+01,5.36E+01,5.10E+01,4.53E+01
4.66E+01,5.31E+01,5.41E+01,5.11E+01,4.53E+01
4.68E+01,5.33E+01,5.42E+01,5.12E+01,4.53E+01
4.75E+01,5.41E+01,5.47E+01,5.14E+01,4.54E+01
4.77E+01,5.43E+01,5.49E+01,5.14E+01,4.54E+01
4.79E+01,5.45E+01,5.50E+01,5.15E+01,4.54E+01
4.86E+01,5.53E+01,5.55E+01,5.16E+01,4.54E+01
4.97E+01,5.65E+01,5.62E+01,5.19E+01,4.55E+01
4.88E+01,5.55E+01,5.56E+01,5.17E+01,4.54E+01
5.00E+01,5.69E+01,5.65E+01,5.20E+01,4.55E+01
4.93E+01,5.60E+01,5.60E+01,5.18E+01,4.55E+01
...

1570 lines omitted for brevity

Each column corresponds to a single mesh, and each mesh corresponds to the output from a simulation for a provided load vector (row in the above table). Thus the number of columns here must match the number of rows from the load-vectors file.

note that the mesh snapshots file has 5 columns, where the load vectors file has 5 rows. The first row in the load vectors file must correspond to the first column in the mesh snapshots file.

Build

Given that we have a well formatted load-vectors file and a well formatted mesh-snapshots file, we can do part 1 of the ROM process: convert these files into a ROM file.

This description will use the heat exchanger dataset: Heat exchanger model

A ROM file is the resulting "memory" from our software scanning the load vectors and corresponding mesh snapshots. We will use the rom-empower build functionality to create the ROM file. First we can run it with --help to see its usage:

PS C:\Users\geoff> .\empower-rom.exe build --help
usage: empower-rom build [-h] --training-load-vectors <input-path.csv>
                         --training-mesh-snapshots <input-path.csv> --output-rom
                         <output-path.blob> [--print-metadata]

Create a Reduce Order Model (ROM) blob file from a loading condition vectors training file
and a mesh vector snapshots file

options:
  -h, --help            show this help message and exit
  --training-load-vectors <input-path.csv>
                        loading condition vectors to train the model with, row-major (eg
                        'model/training-load-vectors.csv')
  --training-mesh-snapshots <input-path.csv>
                        mesh snapshot vectors to train the model with, column-major (eg
                        'model/training-file.csv')
  --output-rom <output-path.blob>
                        path to create the output from pickle file (eg 'model/rom.blob').
  --output-metadata <>    
                        path to print metadata (including bounds) about the training process
                        (eg 'model/meta.json', '-' for stdout, which is the default)

To Generate a ROM file, we need:

  • the load vectors csv file
  • the mesh snapshots csv file

We pass these files in to --training-load-vectors and --training-mesh-snapshots respectively.

we also attach the --print-metadata flag, which will cause empower-rom to emit some information about the training process. This is not necessary, but it will give us an overview of our definition:

C:\Users\geoff> .\empower-rom.exe `
>> build `
>> --print-metadata `
>> --training-load-vectors tests/models/heat_exchanger/training-inputs.csv `
>> --training-mesh-snapshots tests/models/heat_exchanger/training-snapshots.csv `
>> --output-rom tests/models/heat_exchanger/generated-rom.blob
{
  "bounds": {
    "inlet_velocity": {
      "min": 0.007,
      "max": 0.015
    },
    "inlet_temperature": {
      "min": 10.34,
      "max": 49.59
    },
    "outlet_velocity": {
      "min": 0.005,
      "max": 0.009
    },
    "outlet_temperature": {
      "min": 40.34,
      "max": 79.62
    }
  },
  "reconst_rmse": 0.03
}

you can see the same output by using the command empower-rom --print-metadata tests/models/heat_exchanger/generate-rom.blob after you've created the rom file.

After calling empower-rom build ... the file generated-rom.blob is created. This file contains the "memory" from the algorithm training.

We can now use this memory "ROM" blob to speed up our simulations.

Predict

Now that we have a ROM, we want to use it to actually make some fast predictions about how different load vectors will behave. To do this we us the empower-rom predict functions.

We can start with --help to see its usage:

C:\Users\geoff> .\empower-rom.exe predict --help
usage: empower-rom predict [-h] --rom <rom-path.blob> --load-vectors <input-path.csv>
                           --prediction-snapshots <output-path.csv>

Use a Reduced Order Model (ROM) file to make a prediction of the output mesh snapshot for a
given input load vector

options:
  -h, --help            show this help message and exit
  --rom <rom-path.blob>
                        ROM file created with 'build' from desired load-mesh records (eg
                        'model/rom-file.blob', see 'build --help')
  --load-vectors <input-path.csv>
                        loading condition vector to predict for (eg 'model/load-vector.csv',
                        '-' for stdin)
  --prediction-snapshots <output-path.csv>
                        path to put output mesh vector predicted by ROM as a CSV file (eg
                        'model/snapshot-prediction.csv', '-' for stdout)

To run a prediction, we would use:

  • the generated-rom.blob file for --rom
  • a single parameter, expressed as a csv file, validation-input-1.csv passed to --load-vectors

for the following command line:

C:\Users\geoff> cat .\tests\models\heat_exchanger\validation-input-1.csv
0.009,22.38,0.006,60.36
C:\Users\geoff> .\empower-rom.exe `
>> predict `
>> --rom tests/models/heat_exchanger/generated-rom.blob `
>> --load-vectors tests/models/heat_exchanger/validation-input-1.csv `
>> --prediction-snapshots tests/models/heat_exchanger/prediction.csv                                                                                       C:\Users\geoff> cat .\tests\models\heat_exchanger\prediction.csv
46.771531,45.229516,44.644142,45.866941,45.062784,42.554229,46.160094,44.571782,44.354437,45.146634,44.174272,45.261169,44.731962,45.096272,44.434267,46.910324,45.779198,46.072229,47.391633,45.867290,46.640306,46.840946,47.548046...

note: at time of writing, "load vectors" is a misnomer as it only supports one load vector. We are working to add support for multiple predictions.

This took several seconds. Why?

Performance of ROM predictions is important. Much of this is in dll loading & caching time.

While this was much faster than running the simulation, it still took several seconds which may be too slow for many applications. Windows takes more than a second to start an application, and this is the absolute floor that any external application can take to start. The ROM blob file may take several seconds to load. Finally: empower-rom also consists of several internal dlls in a compressed format that must be extracted and loaded.

To increase performance we employ this concept: start empower-rom predict once, and then re-use the "warm" instance to do predictions. We call this "streaming mode".

Predict Streaming Mode

empower-rom predict can use the standard-input stream ("stdin") and standard-out stream ("stdout") to take load-vectors and output mesh-snapshot predictions respectively. In this way, your application can send a stream of requests to predict and read the responses. This avoids having to re-load the application and incur all the costs of doing so. In this way, empower-rom can be started once at the beginning of your application, and then re-used for as long as the application is alive.

As an example, we can achieve the same result as above using streams:

C:\Users\geoff> cat .\tests\models\heat_exchanger\validation-input-1.csv `
>> | .\empower-rom.exe predict `
>>   --rom .\tests\models\heat_exchanger\generated-rom.pickle `
>>   --load-vectors - `
>>   --prediction-snapshots - `
>> | Out-File .\tests\models\heat_exchanger\prediction.csv
C:\Users\geoff\Code\rom\OASISAIROM [1_add_nuitka_and_cli ≡ +2 ~0 -0 !]> cat .\tests\models\heat_exchanger\prediction.csv
46.771531,45.229516,44.644142,45.866941,45.062784,42.554229,46.160094,44.571782,44.354437,45.146634,44.174272,45.261169,44.731962,45.096272,44.434267,46.910324,45.779198,46.072229,47.391633,45.867290,46.640306,46.840946,47.548046,47.713045...

note 1: using a hyphen '-' to indicate stdin and stdout is a linux convention. It is not an operator or special *shell syntax, but rather empower rom giving special treatment to a file named '-', and deciding to use stdin and stdout rather than reading-from/writing-to a file named '-'.
note 2: Out-File uses UTF-16 encoding with a Byte-order-mark (BOM) by default: so the two prediction.csv files used in each method are not exactly the same.

This command takes the same amount of time as the previous command. So why is it better?

The difference between the above command and the previous command is the use of the hyphen - and the use of pipes |. The thing to note here is that empower-rom predict is configured to take input from its standard input stream stdin, and parse it. This input stream is being filled by data from the cat operator before it. Similarly, we take the data from the standard output stream stdout_ of empower-rom predict and put it in a pipe to te command out-file.

In a deployed software case, using your environment of choice (dotnet, java, or native) you can start empower-rom predict once, and then keep a reference to the subprocesse's stdin and stdout, and send data to it on demand. The startup time of the process will not be reduced, but the time from writing data to the stdin stream and recieving data from the stdout stream should be on the order of milliseconds.

In this way, your application can integrate with empower-rom as per the pseudo code:

// json-ish pseudo code:
// important dependency in nodeJS: https://nodejs.org/api/child_process.html
// examples: https://www.javatpoint.com/nodejs-child-process

// to run a process to build the rom
var build_rom = child_process.execFile('empower_rom.exe', [
  'build',
  '--training-load-vectors', 'heat_exchanger/training-load-vectors.csv',
  '--training-mesh-snapshots', 'heat_exchanger/training-mesh-snapshots.csv',
  '--output-rom', 'heat_exchanger/generated-rom.pickle',
  '--print-metadata', 'outputs/output_bounds_info.json'
]);  

// to use the ROM in slow file-mode:
var exec_start = Date.now();

var predict_rom = child_process.execFile('empower_rom.exe', [
  'predict', 
  '--rom', 'heat_exchanger/generated-rom.pickle',
  '--load-vectors', 'heat_exchanger/validation-load-vector-1.csv',
  '--prediction-snapshots', 'heat_exchanger/prediction.csv'
])

var exec_time = Date.now() - exec_start;
console.log("exec took: " + exec_time);

// to use in faster mode, first start the process somewhere:
var predict_rom_proc = child_process.spawn('empower_rom.exe', [
  'predict', 
  '--rom', 'generated-rom.pickle', 
  '--load-vectors', '-', //note: using '-' as a path will be interpreted as stdin
  '--prediction-snapshots', '-' // using '-' means stdout
])
predict_rom_child_proc.stdin.setEncoding('utf-8');

var stream_start = Date.now();

predict_rom_child_proc.stdin.write("0.009,22.38,0.006,60.36\n"); // write the new loading conditions to the pipe
var output = await new Promise( (resolve) => {
    predict_rom_child_proc.stdout.on('data', data => {
        resolve(data)
    });
})

var time = Date.now() - stream_start;
console.log("streaming took: "+time)

With this, you should be able to clearly see that streaming takes an order of magnitude less time than exec mode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment