secure information flow analysis: my first steps

During the last months and have been reading a lot about information flow analysis, with the remarkable Eduardo Bonelli's guidance.

Some months ago, as an exercise, I wrote two analyzers for a really short command set of Python (while, if and assign). Before remove that directory, it occurred to me that may exists a remote possibility that someone might find it interesting. So here it is, with a quick and dirty introduction to secure information flow.

The goal, in short words, is to avoid that variables tagged as secret (high confidential level) doesn't leak to public variables (low confidential level). This may happen in two ways:

  • Explicit: A high variable is assigned to a low variable
  • public:=secret

  • Implicit: A low variable content depends on the content of a high variable
  • if secret == true
    then public:=true
    else public:=false

If there is no leak, we said that the code satisfies non-interference (wikipedia link). You can learn more about secure information flow analysis in the web. In my humble opinion, this is a good introduction.

A typical way (certainly not the only one) to detect these leaks is with type systems. This was the approach in both analyzers. The first one is a sort of an implementation of a fundation paper, by Volpano et.al.. I made an algorithm version (probably wrong) of the typing rules exposed in the paper. The code is here. This type of analyzers are called Denning-style, because Denning and Denning introduced those concepts in a 1977 paper.

The second analyzer (the code is here) is based on the formalism presented by Hunt and Sands in this paper. It's a dynamic analyzer (Denning-style analyzers are static), which means that the non-interference can be broken in subprograms and still be good as a whole. This may be a little tricky. For example, this code is secure (the leak was overwritten with a 0) even when a subprogram (without the last line) is insecure:
public:=secret
public:=0

Anyway, that's all for now. The analyzers are written in Python, using the Abstract Syntax Trees module and python-lattice (yes, this is what that stupid library is for). If you want to play more, here is a tarball with the code, the papers and few examples to analyze.