TLDR; How do I use it?
It is very simple, just use the newest
clj-kondo
version and
you are good to go. We are able to detect Datalog in the position of first
argument to DataScript’s, Datomic’s and Datahike’s q
function. I am not sure
whether there are other situations where you know that a data structure
corresponds to a query of these functions, but we could also support a metadata
hint for example.
Why support linting for Datalog?
We use Datalog daily, either to build
frontends with DataScript, backends
with Datomic or with our composable database stack
Datahike. These three systems all use
the Datalog dialect that has first been introduced by Datomic. A benefit of this
dialect is that it is simply described as a symbolic expression built out of
edn. Its ability to bind and join input
data over multiple data sources as well as invoke pure Clojure functions is
extremely
powerful even
without managing durable data. But also, whether beginner or expert, one still
often makes mistakes in query formulations and you need to invoke a query first
before you get feedback. Datalog is a very concise language for which we have an
informative parser that both checks syntax and part of its semantics, so my
colleague Konrad suggested that it would be very useful to extend the linter of
clj-kondo
and tighten the feedback loop for writing Datalog queries as shown
in the picture above. So I just did that.
Implementing a linter
Let’s see how easy implementing static analysis can be. Static analysis means
that your code is not really run, i.e. it is interpreted in a way to prove its
(in)consistencies without causing any side effects, e.g. in a type checker or a
verification tool. clj-kondo
is a linter, a tool that does analysis on your
code in a separate context, that means it is not interfering with the REPL and
not stopping you from running inconsistent code. This allows you to care of
consistency whenever you have figured out what you actually want to do, compared
to languages that need intrinsic type safety to be run. So clj-kondo
needs to
analyze your code, but instead of executing (compiling and running) it, it just
tries to find (likely) inconsistencies, e.g. unbound variable errors or
redundant code warnings. To add the Datalog analyzer I have added a case in the
analysis of function
calls
for the three query functions of DataScript, Datomic and Datahike for the
following function:
(defn analyze-datalog [{:keys [:findings] :as ctx} expr]
(let [children (next (:children expr))
query-raw (first children)
quoted? (when query-raw
(= :quote (tag query-raw)))
;; 1. Let's try to unpack the first argument (query) if it is a quoted literal.
datalog-node (when quoted?
(when-let [edn-node (first (:children query-raw))]
(when (one-of (tag edn-node) [:vector :map])
edn-node)))]
(when datalog-node
(try
;; 2. Call the external parser.
(datalog/parse (sexpr datalog-node))
nil
(catch Exception e
;; 3. Finally we can extract the error from the exception
;; and point to the source.
(findings/reg-finding! findings
(node->line (:filename ctx) query-raw
:error :datalog-syntax
(.getMessage e))))))))
This code mainly follows 3 steps. First it uses
rewrite-clj to extract the symbolic
expression for the first argument and tries to unpack it into a Datalog literal.
Second it calls our
datalog-parser, which is
derived from DataScript, to check whether it would succeed. Third if it throws
an Exception we extract the error message from the parser and register it with
reg-finding!
for the original query-raw
expression. The only thing missing
is to define the :datalog-syntax
type in the default configuration map in
config.clj.
We also had to make sure that the parser does not use reflection so that it is
compatible with GraalVM’s native image
support that is
used to compile clj-kondo
. This functionality has only added a few hundred
kilobytes to the final static binary. Overall it was a very pleasing experience
and doable in two afternoons. Done.
Pushing things further
Depending on how much you work in a specific domain with formally specified language abstractions providing a linter for it can pay off quickly. I am sure it was even worth the effort if I would just have done it for myself. It might also be good to improve your parser/checker from the perspective of an interactive helper, maybe we should try to catch more errors by making sure all variables passed to predicates are properly bound?
In general I think static analysis is extremely underexplored in Clojure,
partially because of the very good arguments Rich Hickey
made against global static type
checking and the resulting puzzling mentality that distracts from the actual
problem solving in the vaguely formalized domains of everyday programming. Here
is a good clarifying article about
his talk . But we can try to proof consistencies locally and for some domains
stronger static guarantees might be worth it, e.g. if the cost of uncaught
downstream errors is high. Clojure’s immutable nature makes it a superb
target for such analysis. clj-kondo
already provides a simple extrinsic type
system that works very well for me. Static analysis for Clojure is also provided
by joker and
Cursive.
The Clojure ecosystem can borrow plenty of good ideas in this direction from the tower of languages and libraries provided in Racket. While this community is much more academic, they are our friendly cousins in parenthesis. I interact with researchers working in Racket daily and I am pretty sure they would help you if you want to port one of their languages or libraries to Clojure. core.typed is a port of Typed Racket for instance.
A pretty mind boggling approach is to use Datalog itself for analysis, e.g. with
clindex, which I plan to use in my
research compiler. Facebook is using a
similar approach in Glean and
Rust’s borrow checker can be
described
in Datalog. Semmle has been bought by Github and is
integrating its Datalog based static analysis now. There are also many other
interesting ideas around Datalog in the programming language community. e.g.
Lambda Prolog and
datafun. I will try to collect some
interesting type system related resources for clj-kondo
here, feel free to reach out to me
or Michiel Borkent if you have ideas and want to
contribute.
Finally I want to thank Michiel Borkent for clj-kondo
and for guiding me
through the process, Nikita Tonsky for providing the
original Datalog parser, Mistral Constratin for
pointers to static analysis and his work on data flow analysis for Datalog
itself and Ron
Garcia for teaching me a thing or two about
static analysis ;).
Let’s bring more static analysis and Datalog to Clojure for even more fun and profit!