Skip to content

Instantly share code, notes, and snippets.

@pbacon-blaber
Created March 10, 2021 19:05
Show Gist options
  • Save pbacon-blaber/cab9b5165ce1622925b7a7f8e175779e to your computer and use it in GitHub Desktop.
Save pbacon-blaber/cab9b5165ce1622925b7a7f8e175779e to your computer and use it in GitHub Desktop.

DynamoDB Advanced Queries (Lambda)

A lambda function which can perform advanced queries on Virtru's DynamoDB tables. This was originally taken and modified from the devhacks db-crawler and modified to work as an AWS lambda function. The lambdas are:

  • develop01: lambda-dynamodb-advanced-queries-develop01-us-west-2
  • staging: lambda-dynamodb-advanced-queries-staging-us-west-2
  • production: lambda-dynamodb-advanced-queries-production-us-east-2

Additional Documentation

Additional documentation can be found in Confluence

Lambda Payload (Arguments)

The specifics of the advanced query are taken in as a JSON file which is passed to the lambda function as the payload. Below is an example of what that file would look like. (Note that every item in the configuration object is under the scanOptions field. This was done in case other data needs to be passed into the lambda payload later on.)

{
  "scanOptions": {
    "baseTable": "policies",
    "index": {
      "name": "orgIdSecureEmailSentAtIndex",
      "keys": {
        "orgId": "d302f05d-d99f-4dc1-b769-0a8345870172",
        "secureEmailSentAt": {
          "operation": "between",
          "values": [
            "2020-03-01T14:33:39Z",
            "2020-03-23T14:33:39Z"
          ]
        }
    },
    "options": {
      "scanDelay": 100,
      "scanLimit": 200,
      "delimeter": ","
    },
    "startKey": {
      "uuid": "7ebd5a2a-a16e-4a87-96d8-0fc425247bac",
    },
    "filters": [
      {
        "field": "version",
        "value": "3.0.0"
      },
      {
        "logic": "OR",
        "filters": [
          {
            "field": "sentFrom",
            "operation": "IS_IN",
            "value": [
              "[email protected]",
              "[email protected]",
            ]
          },
          {
            "logic": "NOT",
            "filter": {
              "field": "displayName",
              "operation": "INCLUDES",
              "value": "hello"
            }
          }
        ]
      }
    ],
    "counters": [
      {
        "field": "created",
        "operation": "AFTER",
        "value": "2019-01-16T16:14:25.862Z",
        "allowUndefined": true
      }
    ],
    "output": [
      "uuid",
      "displayName",
      "sentFrom",
      "secureEmailSentAt"
    ],
    "outFile": "example-out.csv"
  },
}
  • baseTable REQUIRED string - the Virtru table to scan, do not include the ending of the table (just policies instead of policies-develop01-us-west-2)
  • index object - information about the index on the table you would like to use, if not specified then the lambda will scan the entire table
    • name REQUIRED string - the name of the index to use
    • keys REQUIRED object - contains the information about the keys and their conditions on the index
      • {key} REQUIRED string|object - if a string, then it is the exact string match of the index specified, if an object then it can represent a more advanced query
        • operation REQUIRED string - the operation to perform
        • values REQUIRED array(string) - an array of values to be used with the operation, if operation is "between" and values is ["a", "b"] then the query woul dlook like {key} is between "a" and "b" for example
  • options object - options for the scan
    • scanDelay number - the time (in ms) to wait between each batch call to the database, used to manage load on the tables (Default: 100)
    • scanLimit number - the size of each batch to retrieve from the databse (Default: 200)
    • delimeter string - the delimeter for the CSV file (Default: ",")
  • startKey object - specify a start key/value pair to resume a scan from where it left off
  • filters array(object) - an array of filters to be applied to the scan results to be placed in the output file, by default the array is ANDed together (see filter operations for more info on these)
    • field string - the field to query for (cannot specify both field and logic)
    • operation string - the operation to perform on the string (Default: "EQUALS")
    • value string - the value to compare the field to with the given operation (REQUIRED when field is specified)
    • logic string - the logic to apply to the filter or filters specified in the same object, supported operations are AND and OR which require a filters array in the object, or NOT which requires a filter object (cannot specify both field and logic)
    • filter|filters object|array - a list of filters or a single filter that the logical operation will be applied on, these filters are the same as the filters object this is defined in, so they can be nested (REQUIRED whenlogic is defined)
  • counters (object) - a list of conditions to count on, if specified then the filters that match each entry will be counted and logged at the end of execution
  • output REQUIRED array(string) - an array of fields to output as the columns in the CSV file
  • outFile string - the name of the output file, (Default: {timestamp}.csv)

Output

The lambda will output your CSV file to the following S3 buckets:

  • develop01: virtru-com-us-west-2-lambda-dynamodb-advanced-queries-develop01
  • staging: virtru-com-us-west-2-lambda-dynamodb-advanced-queries-staging
  • production: virtru-com-us-east-2-lambda-dynamodb-advanced-queries-production It will have the name specified in the outFile parameter, or be given a timestamp name if none is specified.

Filter Operations

Inside a filter object you can describe a filter operation, this operation is used to filter out items in the base query as follows:

item[field] `OPERATION` value 

below are the possible operation types:

Operation Name JS Operation Restricted Types Description
EQUALS field === value None Simple equals comparison
NOT_EQUALS field !== value None Simple not equals comparison
LESS_THAN field < value None Simple less than comparison
GREATER_THAN field > value None Simple greater than comparison
LESS_THAN_EQUALS field <= value None Simple less than or equals comparison
GREATER_THAN_EQUALS field >= value None Simple greater than or equals comparison
BEFORE field.isBefore(value) Date Strings Parses field and value as momentjs objects and calls .isBefore
AFTER field.isAfter(value) Date Strings Parses field and value as momentjs objects and calls .isAfter
INCLUDES field.includes(value) Strings Simple string includes method
ENDS_WITH field.endsWith(value) Strings Simple string endsWith method
STARTS_WITH field.startsWith(value) Strings Simple string startsWith method
IS_DEFINED field === undefined None Simple undefined comparison
IS_UNDEFINED field !== undefined None Simple undefined comparison
IS_IN value.includes(field) Arrays Check value as an array if it includes field
IS_NOT_IN !value.includes(field) Arrays Check value as an array if it does not include field

Note that while it is possible to use operations like LESS_THAN or GREATER_THAN on values that aren't numbers, the script will perform a simple JS comparison so some results may be unexpected.

Invoking

You can invoke the lambda from the command line with the AWS CLI. Below is an example of invoking the develop01 lambda with an options.json file in the current directory:

aws lambda invoke \
  --function-name lambda-dynamodb-advanced-queries-develop01-us-west-2 \
  --payload file://./options.json \
  response.json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment