A lambda function which can perform advanced queries on Virtru's DynamoDB tables. This was originally taken and modified from the devhacks db-crawler and modified to work as an AWS lambda function. The lambdas are:
- develop01:
lambda-dynamodb-advanced-queries-develop01-us-west-2
- staging:
lambda-dynamodb-advanced-queries-staging-us-west-2
- production:
lambda-dynamodb-advanced-queries-production-us-east-2
Additional documentation can be found in Confluence
The specifics of the advanced query are taken in as a JSON file which is passed
to the lambda function as the payload. Below is an example of what that file
would look like. (Note that every item in the configuration object is under
the scanOptions
field. This was done in case other data needs to be passed
into the lambda payload later on.)
{
"scanOptions": {
"baseTable": "policies",
"index": {
"name": "orgIdSecureEmailSentAtIndex",
"keys": {
"orgId": "d302f05d-d99f-4dc1-b769-0a8345870172",
"secureEmailSentAt": {
"operation": "between",
"values": [
"2020-03-01T14:33:39Z",
"2020-03-23T14:33:39Z"
]
}
},
"options": {
"scanDelay": 100,
"scanLimit": 200,
"delimeter": ","
},
"startKey": {
"uuid": "7ebd5a2a-a16e-4a87-96d8-0fc425247bac",
},
"filters": [
{
"field": "version",
"value": "3.0.0"
},
{
"logic": "OR",
"filters": [
{
"field": "sentFrom",
"operation": "IS_IN",
"value": [
"[email protected]",
"[email protected]",
]
},
{
"logic": "NOT",
"filter": {
"field": "displayName",
"operation": "INCLUDES",
"value": "hello"
}
}
]
}
],
"counters": [
{
"field": "created",
"operation": "AFTER",
"value": "2019-01-16T16:14:25.862Z",
"allowUndefined": true
}
],
"output": [
"uuid",
"displayName",
"sentFrom",
"secureEmailSentAt"
],
"outFile": "example-out.csv"
},
}
baseTable
REQUIRED string - the Virtru table to scan, do not include the ending of the table (justpolicies
instead ofpolicies-develop01-us-west-2
)index
object - information about the index on the table you would like to use, if not specified then the lambda will scan the entire tablename
REQUIRED string - the name of the index to usekeys
REQUIRED object - contains the information about the keys and their conditions on the index{key}
REQUIRED string|object - if a string, then it is the exact string match of the index specified, if an object then it can represent a more advanced queryoperation
REQUIRED string - the operation to performvalues
REQUIRED array(string) - an array of values to be used with the operation, ifoperation
is"between"
and values is["a", "b"]
then the query woul dlook like{key} is between "a" and "b"
for example
options
object - options for the scanscanDelay
number - the time (in ms) to wait between each batch call to the database, used to manage load on the tables (Default: 100)scanLimit
number - the size of each batch to retrieve from the databse (Default: 200)delimeter
string - the delimeter for the CSV file (Default: ",")
startKey
object - specify a start key/value pair to resume a scan from where it left offfilters
array(object) - an array of filters to be applied to the scan results to be placed in the output file, by default the array isAND
ed together (see filter operations for more info on these)field
string - the field to query for (cannot specify bothfield
andlogic
)operation
string - the operation to perform on the string (Default: "EQUALS")value
string - the value to compare the field to with the given operation (REQUIRED whenfield
is specified)logic
string - the logic to apply to the filter or filters specified in the same object, supported operations areAND
andOR
which require afilters
array in the object, orNOT
which requires afilter
object (cannot specify bothfield
andlogic
)filter|filters
object|array - a list of filters or a single filter that the logical operation will be applied on, these filters are the same as thefilters
object this is defined in, so they can be nested (REQUIRED whenlogic
is defined)
counters
(object) - a list of conditions to count on, if specified then the filters that match each entry will be counted and logged at the end of executionoutput
REQUIRED array(string) - an array of fields to output as the columns in the CSV fileoutFile
string - the name of the output file, (Default: {timestamp}.csv)
The lambda will output your CSV file to the following S3 buckets:
- develop01:
virtru-com-us-west-2-lambda-dynamodb-advanced-queries-develop01
- staging:
virtru-com-us-west-2-lambda-dynamodb-advanced-queries-staging
- production:
virtru-com-us-east-2-lambda-dynamodb-advanced-queries-production
It will have the name specified in theoutFile
parameter, or be given a timestamp name if none is specified.
Inside a filter object you can describe a filter operation, this operation is used to filter out items in the base query as follows:
item[field] `OPERATION` value
below are the possible operation types:
Operation Name | JS Operation | Restricted Types | Description |
---|---|---|---|
EQUALS | field === value |
None | Simple equals comparison |
NOT_EQUALS | field !== value |
None | Simple not equals comparison |
LESS_THAN | field < value |
None | Simple less than comparison |
GREATER_THAN | field > value |
None | Simple greater than comparison |
LESS_THAN_EQUALS | field <= value |
None | Simple less than or equals comparison |
GREATER_THAN_EQUALS | field >= value |
None | Simple greater than or equals comparison |
BEFORE | field.isBefore(value) |
Date Strings | Parses field and value as momentjs objects and calls .isBefore |
AFTER | field.isAfter(value) |
Date Strings | Parses field and value as momentjs objects and calls .isAfter |
INCLUDES | field.includes(value) |
Strings | Simple string includes method |
ENDS_WITH | field.endsWith(value) |
Strings | Simple string endsWith method |
STARTS_WITH | field.startsWith(value) |
Strings | Simple string startsWith method |
IS_DEFINED | field === undefined |
None | Simple undefined comparison |
IS_UNDEFINED | field !== undefined |
None | Simple undefined comparison |
IS_IN | value.includes(field) |
Arrays | Check value as an array if it includes field |
IS_NOT_IN | !value.includes(field) |
Arrays | Check value as an array if it does not include field |
Note that while it is possible to use operations like LESS_THAN
or GREATER_THAN
on values that aren't numbers, the script will perform a simple JS comparison so
some results may be unexpected.
You can invoke the lambda from the command line with the AWS CLI. Below is
an example of invoking the develop01 lambda with an options.json
file in the
current directory:
aws lambda invoke \
--function-name lambda-dynamodb-advanced-queries-develop01-us-west-2 \
--payload file://./options.json \
response.json