Skip to content

Instantly share code, notes, and snippets.

@opethe1st
Last active February 16, 2020 13:13
Show Gist options
  • Save opethe1st/1a81d5b7f167ff650c49449221a831ce to your computer and use it in GitHub Desktop.
Save opethe1st/1a81d5b7f167ff650c49449221a831ce to your computer and use it in GitHub Desktop.
A proposed DSL for data validation.

Data validation language

So I think it would be nice to have a DSL for just data validation (think jsonschema but just the bits that do with validation, no annotations and a simpler referencing system)

Why a DSL?

A DSL means that the language can be designed from the ground up for the best user experience and then compile to native languages ( the approach protobufs use for example). I don't propose any syntax or semantics that is radically different from what exists in standard programming languages today so I don't expect the learning curve to be high

Challenges with a DSL?

  • No tooling support. General purpose languages come with tons of support with things like linters and autcomplete. If this is going to be actually successful, it would need these tools too.
  • Need to write compilers for different languages.

So let's dive in.

# boolean operations to combine schemas. syntax stolen from F#
validator =
    | str
    | array
    | object

validator =
    & str
    & object  # obviously impossible to satisfy since an instance can not be both a str and object

list and every item must be a string

validator = [str]  # type that every item in the list/sequence needs to satisfy. shortform for
validator = iterable<str>

list and every item must be a string or int

validator = [str | int]  # string or int items

Nested validators, object + properties named data and children and data is required.

(required syntax borrows from gql :D)

Tree = {data!: any, children: [Tree]} # short form for
Tree = mapping<data!: any, children: iterable<Tree>>

Nested list

NestedList = [NestedList | str]

Constant - when applied, the instance must be 4. "`" is used to denote constants

validator = `4`
validator = `"a string"`

User-defined validators are in capitalized, provided types are lowercase

range06 = min(0) & max(6)

Parameterization of validators and values

range<start: int, end: int> = min(start) & max(end)
range<end: int> = min(0) & max(end)

case statement

case =
    str: rangeLength<1, 6>
    int & range<1, 100>: True
    default: None

rangeLength<start, end> =
    & minLength(start)
    & maxLength(end)

naming a validator

StringRangeLength =
    case =
        str: rangeLength<1, 6>
        default: None

Questions/Thoughts

What is the basis of all constraint operations? define the smallest subset on top of which everything else could be built Does this need expressions - say I want to count that exactly five thing validate in a list?

How would I implement features that exist in Jsonschema? patternProps, additionalProps, contains, minContains, maxContains etc?

New idea! Regex inspired syntax describing schemas that apply to arrays

this is mostly descriptive to show that this can simulate every schema that can be written with json schema. Plus demonstrate the core schema.

items<schema> = [^, schema*, $]
contains<schema> = [schema+]
minContains<schema, num> = [schema{num,}]
maxContains<schema, num> = [schema{, num}]
minItems<num> = [any{num,}]
maxItems<num> = [any{, num}]
range<start, end> = [any{start, end}]
rangeItems<schema, start, end> = [schema{start, end}]
unique = unique

object keywords

Profile = {"name": maxLength<20>}           # properties
Profile = {"x-[a-z]+": minLength<10>}       # patternProperties
Profile = {rest=~any}                      # additionalProperties
Profile = {required=`["name", "surname"]`}    # required
Profile = {"x-[a-z]+": any, min=1, max=3}  # min and maxProperties
Profile = {propertyNames="x-[a-z]+"}                # propertyNames 
Profile = {                                 # DependentRequired
    dependentRequired=`{"name": ["surname", "address"]}`
}
Profile =
    & {"name": maxLength<20>}
    & {required=["name", "surname"]}
    & {dependentRequired=`{"name": ["surname", "address"]}`}


number

min = {3,}
max = {,5}
exclusiveMin = {3-,}
exlusiveMax = {,5-}
range = {3, 5}

boolean combinators

caseSchema =
    | caseSchema1: then1
    | caseSchema2: then2
    | default: then3
    
not<schema> = ~schema
@opethe1st
Copy link
Author

Oh an idea just popped in my head. I can do make the basis - logic! so have things like "for all", "there exists", "belongs to" etc.

@shalvah
Copy link

shalvah commented Feb 15, 2020

Hehe... That's a lovely idea, but is going to raise the complexity (implementation) a lot.

@shalvah
Copy link

shalvah commented Feb 15, 2020

Don't recommend the = for case. The very last example (named validator + case) looks confusing.

@opethe1st
Copy link
Author

what do you mean by raise the implementation a lot? For example?

@opethe1st
Copy link
Author

and naming, you would rather have braces?

@shalvah
Copy link

shalvah commented Feb 15, 2020

and naming, you would rather have braces?

I don't have a preference, but it's often best to stick to accepted conventions unless they're clearly faulty/detrimental to your goals, so yes, braces could work.

@shalvah
Copy link

shalvah commented Feb 15, 2020

what do you mean by raise the implementation a lot? For example?

Oh, I was working on a DSL not too long ago, and it's amazing how complexity rises when you decide to add a simple feature that would totally make the language kickass. Not saying you shouldn't do it.

@shalvah
Copy link

shalvah commented Feb 15, 2020

Also, came across this recently. Might be relevant (or not) - http://jsonlogic.com

@opethe1st
Copy link
Author

thanks for interacting! Btw I added some extra stuff towards the end inspired by regex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment