Created
May 27, 2014 18:14
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The core of what the system is about is: | |
- Having rough/flexible data strutures | |
- Performing transformations on those data structures | |
- Finding the core "must be correct" structures we want to reason about | |
- Defining and naming those structures with relative precision, | |
hardening them into definite declarations | |
- Validating and verifying that input and output conform to those | |
now-hardened structures. | |
As the system started developing, it was all about pervasively flexible | |
maps, of varying degrees of requirements for the definitions of what was | |
in each map. | |
Now that we've got a somewhat-maturing system, that we want to be able | |
to apply rather large scale restructuring on, and we want to have | |
relatively reliable rule-based reasoning about it, we're coming up on | |
the problem of defining and validating those core data structures | |
and their transformations. | |
We have a couple of tools to fulfill these roles: | |
- maps are the basic, flexible, embryonic structure that are | |
fast and flexible to work with | |
- records that start to reify dynamically tuped tuples | |
with definite field sets that support abstract protocols | |
- prismatic/schema describes compound data structures at runtime | |
and allows you to define functions and validators in terms | |
of those structures. It is not "pervasive," in the sense of, | |
you don't need to annotate an entire namespace, and the validation | |
is not always active, and the validation is always in terms of | |
"does this one given value to conform to the schema?" | |
Another way to think about this is as a specific type of | |
runtime contracts library, where the contracts are structural. | |
- core.typed describes values and functions at compile time. It | |
is pervasive in the senses that a) you need to annotate an entire | |
namespace at a time, and its dependencies, b) the reasoning that | |
core.typed attempts to do is, like traditional static type checking, | |
attempting to reason _for all possible values of the given types_. | |
This provides stronger guarantees earlier, but in practice is | |
harder to build out, especially if your namespace is changing, or it | |
deals with flexible manipulations of complex maps. | |
In our system, we're applying these tools to: | |
- the backend record structures, which have been around for a little | |
while | |
- the web API, which we want both: | |
- validation that incoming EDN to the API is well structured | |
- testing that outgoing EDN is well structured | |
- and in general, that functional transformations we expect are indeed | |
still happening how we want them, even in the face of code change | |
(e.g. compile-time checking and test suite) | |
We've previously been accumulating a disorganized mishmash of concepts | |
and usage. This commit starts to rectify the situation: | |
1. It roughly reconciles the prismatic/schema entries with the core | |
defrecord entries in the `report` namespace. schema provides | |
a macro to literally define both in the same declaration. | |
2. It attempts to do the data transformation to/from the web API in | |
terms of these records and schemas | |
3. It starts adding some basic tests asserting that the transformation | |
functions work in terms of these structures. | |
This should give us a much stronger foundation to start standing on in | |
terms of solid data structures and data validation. | |
Futhermore, a few hard-won understandings came out of the process: | |
1. Nothing is more consise or more flexible than Clojure's built-in | |
map support. Both schema and core.typed add non-trivial structural | |
scaffolding that is, while useful for testing things are still | |
the way you want them, time-consuming and annoying to lay down for | |
everything, *especially* if you are trying to rapidly experiment | |
with, try out, move things around. | |
Everything should begin as maps, and persist that way for quite | |
a while, until you're *sure* that you want to solidify a structure. | |
2. schema is nontrivially easier and faster to use than core.typed, | |
and more flexible for common cases (e.g. structuring plain maps). | |
It is also, I think, more clearly written and documented. It also | |
allows for anonymous schemas, which can come in handy for say, | |
unit tests, without having to fully reify a schema. | |
3. However core.typed delivers a more complete reasoning structure | |
about the code AND importantly runs at compile time AND reinforces | |
the important point that *the correctness of the code should be able | |
to be reasoned about at comile time*. It is pretty good for | |
most data types, with the most complex being complex manipulations | |
of heterogenous maps, which unfortunately makes up a lot of | |
common Clojure code before the maps get reified into records. | |
What ti | |
4. In terms of laying down initial security layers, I would recommend | |
small sets of plain unit tests | |
and then simple prismatic/schema | |
validators. Both of these only check *small sets of cases*, but they | |
do it easily. schema gives you a more clearly structured way to | |
define and validate a structure (reusable for a number of cases), | |
which is very good, BUT does require that you reify a struture, | |
which quickly leads to a proliferating numbers of closely-related | |
schemas for non-essential data structures. This is bad. You want | |
to keep the number of named concepts low and powerful. | |
6. The new generative testing tools (test.generative and test.check) | |
may be useful in conjunction with schema--since they will generate | |
a lot of domain check data, and then you can run all of those | |
cases through the schema. This is still "check by case" technically, | |
but it's blanketing a lot more cases. | |
7. Finally, core.typed is very powerful but very slow and hard to | |
change, and the reasoning and debugging must be very careful. | |
I would only recommend this for the most fixed and immutable | |
parts of the codebase. | |
In addition, core.typed may be able to play a useful role | |
in eliminating the need for some mocking and stubbing, if you | |
just want to be able to assert through inference that the proper | |
types of output come from the proper types of input, without manually | |
trying to shove in mock objects in as inputs. | |
To recap, start and persist with maps and mild unit tests as far as they | |
can get you. Then consider moving up to schema and possibly generative | |
testing if necessary. Only finally move up to core.typed when you're | |
really sure it'll be worth the time and security EXCEPT PERHAPS if your | |
namespace contains low-hanging fruit--e.g. code is both important | |
and simple enough to cover with core.typed in an efficient manner. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment