Lenses for philosophers

Lens tutorials are the new monad tutorials, I hear. (This is neat, since monads and lenses were both discovered in the year 1958.) The thing is, after independently rediscovering lenses and working on them for a year and a half before Jeremy Gibbons made the connection, I have a very different perspective on them. This post is based on a talk I gave at the 7th international workshop on bidirectional transformations in Nice. My aim is to move fast and break things, where the things in question are your preconceptions about what lenses are and what they can be used for. Much of this will be a history of lenses, which includes at least 9 independent rediscoveries.

The earliest discovery of lenses (that I know of) was by Kurt Gödel in 1958, as part of his dialectica interpretation. (Actually it presented at a lecture at Yale in 1941, but not published until 1958. Gödel was quite the perfectionist.) The dialectica interpretation is a proof translation from Heyting arithmetic (the intuitionistic first-order theory of arithmetic) into system T (a logic with no quantifiers, but with variables of higher-order types). Gödel’s original aim was philosophical, part of an extended Hilbert programme studying the relative consistency of different logical features. Combined with Spector’s bar recursion, it is used to extract constructive information from borderline-nonconstructive proofs in analysis (see applied proof theory).

The dialectica interpretation translates a logical formula \varphi to a double-indexed family of logical formulas |\varphi|^{x : X}_{s : S}, where x and s are free variables of types X and S. This translation has the property that \varphi is equivalent to \exists x \forall s . |\varphi|^x_s. This means that we can view x as a possible proof and s as a possible counter-proof. The formula is true iff there exists a proof that ‘defeats’ every possible counter-proof.

The definition of |\varphi|^{x : X}_{s : S} from \varphi is defined by structural recursion on formulas \varphi. For example, if we already know |\varphi|^{x : X}_{s : S} and |\psi|^{y : Y}_{r : R} then the interpretation of the conjunction \varphi \wedge \psi is defined by

|\varphi \wedge \psi|^{x : X, y : Y}_{s : S, r : R} = |\varphi|^x_s \wedge |\psi|^y_r

Formally this is really a product type |\varphi \wedge \psi|^{(x, y) : X \times Y}_{(s, r) : S \times R}, but it looks nicer to write tuples of variables instead.

The most interesting thing about the dialectica interpretation (and this is a universal opinion, not just me) is how it interprets (intuitionistic) implication. Suppose we already know |\varphi|^{x : X}_{s : S} and |\psi|^{y : Y}_{r : R}. Then \varphi \to \psi is interpreted as

|\varphi \to \psi|^{v : X \to Y, u : X \times R \to S}_{x : X, r : R} = |\varphi|^x_{u (x, r)} \to |\psi|^{v (x)}_r

 A proof of \varphi \to \psi is a lens from proofs of \varphi to proofs of \psi! Crucially, if we cut together a proof of \varphi \to \psi and a proof of \psi \to \chi, the resulting proof of \varphi \to \chi is obtained by ordinary lens composition. But is it reasonable to call them lenses, though?

What is a lens actually? A lens consists of two parts. The first part is an ordinary function f : X \to Y. We also have some sort of data floating over X and Y. These might be, for example, sets of update actions, or it could be possible amounts of profit, or it could be counter-proofs as in the dialectica interpretation. The last thing that defines a lens is a backpropagation rule, which takes a value x and the data associated to f (x), and gives the data associated to x. For example if you know x and you are given an update action on f (x), the lens tells you the corresponding update action on x. Similarly, if you know x and you know the monetary value of f (x), then the lens tells you the monetary value of x.

The dialectica interpretation was put in a modern form beginning in 1989 by Valeria de Paiva, replacing logic with categories. Much insight about the structure of lenses can be learned from de Paiva’s papers, beginning with the fact that the category of lenses can be made a sound model of linear logic. For example in Morphisms of Open Games I use the fact that certain coproducts of lenses exist in order to handle unbalanced extensive-form game trees; I could have saved myself the trouble by reading page 13 of de Paiva’s 1991 tech report. Keep in mind though that the majority of her work is on a variant that looks like the category of isos.

Going back a bit to 1982, Frank Oles defines lenses in chapter 6 of his PhD thesis, together with all 3 lens laws, and the interpretation that they are bidirectional transformations of datatypes. Oles calls view and update ‘forgetting’ and ‘replacement’. I had never heard of this, until Bob Atkey pointed it out to me just after I published this blog post, when I added this paragraph at adjusted the numbers a bit. After a reverse citation search, through, I found that Johnson, Rosebrugh and Wood attribute leneses to Oles in these two papers.

In 1995 lenses and the lens laws appear again, in Martin Hoffmann and Benjamin Pierce’s paper Positive Subtyping, with reference back to Oles. According to the paper, it was an independent discovery and the link, which they call a “coincidence”, was made by John Reynolds and Bob Tennent.

The next stop is in 2005, when the ‘modern era’ of lenses began with the publication of Combinators for Bi-directional Tree Transformations by Foster, Greenwald, Moore, Pierce and Schmitt. They introduced the term ‘lens’, wrote down the 3 lens laws again, and classified lenses into ‘well behaved’ and ‘very well behaved’ according to those laws. This paper is a rare chance to correctly use the term seminal, in the sense that it spawned the entire field of bx (‘bidirectional transformations’), but it didn’t appear in a vacuum: from their references it is clear that database theorists had been working towards the idea for some time. They actually reference both Oles and Hoffman-Pierce, but not many people noticed, and FGMPS are usually credited with the discovery of lenses.

The lens laws are really the crux of everything I’m saying. From my earlier general and informal definition of lenses, it is possible to justify the (associative) composition law of lenses, but it is not possible to justify the lens laws. The laws are firmly properties of update actions — similar laws also appear in Hoare logic for example. For many other lens-like things the laws are either false, or more likely fail to even type-check. I considered calling the general thing, namely “things that look like lenses and compose like lenses”, something like ‘pre-lenses’ or ‘quasi-lenses’. But I have the opinion that if it walks like a duck and composes like a duck, then you ought to call it a duck.

The lens laws are undoubtedly important, and not just because they’re true in the case of updates. Very well behaved lenses are equivalent to the much simpler constant-complement lenses, and are also equivalently coalgebras of a comonad. Most of the theoretical work that uses the term ‘lenses’ takes the laws as given. Many bx researchers define ‘lenses’ to be what FGMPS called ‘well behaved lenses’, and consider that something not satisfying the lens laws is no lens at all.

Around the same time as the FGMPS paper in the mid-2000s, Haskell programmers started talking about functional references, based on getters and setters from object oriented programming. The motivation is that Haskell’s encapsulation of effects, while important, is quite awkward when working with an inherently state-heavy application such as videogames. The origins of the idea are firmly folklore and probably lost in the depths of IRC, but Edward Kmett wrote a history, and the proper attribution is probably to Luke Palmer’s blog post here.

Although the connection was made fairly quickly (I don’t know who made it) and functional references were renamed to lenses, bx theorists and functional programmers have remained fairly separate. While bx theorists focussed mainly on applications, functional programmers pushed down into the foundations: van Laarhoven lenses and profunctor lenses, despite being motivated by a quirk of how the GHC dialect of Haskell handles polymorphism, are very interesting and nontrivial things from a theoretical point of view. The mistake that the average lens tutorial makes is to present the van Laarhoven isomorphism as a central concept, rather than an implementation detail.

While bx researchers focussed on monomorphic lenses X \to Y (where X and Y are sets or otherwise datatypes) which consist of a view function X \to Y and an update function X \times Y \to X, functional programmers switched to polymorphic lenses after they were introduced by Russell O’Connor. These consist (after some serious de-obfuscation) of a view function X \to Y and an update function X \times R \to S. But crucially, the parameters X,Y,R,S are types in a polymorphic typesystem that can have type variables in common. Kmett then pointed out that this polymorphism can be used to talk about the lens laws. Sadly this was quite informal, and as far as I know the lens laws for 4-variant polymorphic lenses have remained handwaved ever sense.

It’s time to jump forward in time again. It’s 2015 and I write down the play/coplay functions of open games, and their composition law. For a fixed strategy, an open game’s play function takes an observation to an action; and its coplay function takes an observation and the payoffs resulting from an action, and gives the payoffs associated to the observation. The delicate interaction between play and coplay is a crucial component of how game theory is made compositional. Coplay is probably also the hardest thing to understand about open games, and for a long time I used phrases like ‘weird/entangled information flow’. I know now that the term ‘bidirectional’ is precisely what I meant.

At this point I knew that play and coplay looked a bit like the dialectica interpretation/categories, but I didn’t think the link was significant.

After I had been working on open games for about 1.5 years and written my thesis about them, I moved to Oxford and took the chance to clean up the mathematical foundations. In particular I isolated the ‘weird information flow’ parts from the game-theoretic parts. Initially I called the category whose morphisms are play/coplay pairs PC. Then I noticed that PC is fibred over the category of sets by extracting the play part. Then I noticed that this fibration is the fibrewise opposite of Bart Jacobs’ simple fibration. (The reference on this is his book Categorical logic and type theory, which can be downloaded as an 800-page pdf here, but be warned it is not for the faint of heart!) The simple fibration is used in (and named for) the categorical semantics of simple type theory, where it models the interaction between typing contexts and terms.

So I renamed PC ‘the op-simple fibration’. I gave a talk at Oxford’s OASIS seminar, partly focussing on the op-simple fibration as a category that I thought was of independent interest. Jeremy Gibbons was in the audience, and he commented, essentially, “This looks like lenses.” At this point I’d heard of lenses but I thought they were something from applied functional programming that wasn’t relevant to me. (In particular, I associated them with Template Haskell.) I went to check and, sure enough, it was lenses. At first this was just a formal renaming, and it was another year and a half before I started to intuit the meaning of this connection, and even that was only by generalising the understanding of lenses as I explained earlier. Additionally at this point I found exercise 1.10.11 in Jacobs’ book which explicitly links the op-simple fibration to dialectica categories, but I still didn’t think they were close enough to be significant. (There’s no way I can hide the fact that I’m stubbornly wrong more often that is good for me. Nearly everything I’m writing in this article is stuff I refused to believe at first, so don’t attribute any of it to me.)

There’s a catch to calling these things ‘lenses’ though: They are 4-variant (i.e. they consist of functions X \to Y and X \times R \to S), but not polymorphic. This means that the lens laws don’t even type check. The good news, though, is that since functional programmers handwave the lens laws anyway, nearly everything continues to apply: in particular, you can define van Laarhoven and profunctor lenses in the non-polymorphic setting. I call them ‘bimorphic lenses’, although I also like Cezar Ionescu’s suggestion ‘outlaw lenses’.

At this point I was contacted out of the blue by Mitchell Riley, a PhD student at Wesleyan. He had read my paper connecting lenses and open games, and showed me this paper. Again, it took me embarrassingly long to accept this connection; admittedly the paper is written in the fine tradition of Australian category theory and is totally unreadable to me. The double construction turns out to be equivalent to a lawless version of constant complement lenses that at the time was even more folkloric than usual for the functional programming community. Since then it was used by Guillaume Boisseau and Jeremy Gibbons in this paper, and Mitchell’s insight turned out to be crucial (and extremely well-timed) for my (still ongoing) work on Bayesian open games with Joe Bolt and Philipp Zahn.

The next thing that happened was in late 2017 when Brendan Fong, David Spivak and Rémy Tuyéras dropped the paper Backprop as Functor on the arXiv, which got a lot of people quite excited. It made it to the top of my reading list in early 2018, just before David came to Oxford for an OASIS seminar of his own about the paper. As soon as I started reading it I saw immediately that they had independently reinvented a big chunk of the category of open games, and in particular their category of open learners is to monomorphic lenses (almost) exactly as open games is to bimorphic lenses. This paper was my personal tipping point where I had enough examples of lens-like things to see the common structure, and I was able to finally have intuition for coplay, perhaps my  key invention, by analogy to it.

Here is how David (implicitly) explained lenses in machine learning. Suppose we fix the parameters of a learning algorithm. Those parameters define a function, which is the forwards part of a lens. For example, suppose inputs are images and the outputs are booleans, where the intent is for the output to be true iff the image is of a cat. For some given parameters, the algorithm will (correctly or not) classify images into cats and non-cats. Now suppose we have a picture which is classified as a non-cat, but is in fact a cat. The update part (which FST call request) takes the original input and the updated output, and it updates the input by producing a slightly modified image that moves it towards being classified with the given output. This procedure is sometimes called dreaming, and it can produce beautiful, disturbing, psychodelic images. The only subtlety is that update corresponds to updating the input (as it should be for a lens), not to the more obvious step of updating the parameters (i.e. learning).

During the seminar history came full circle, and I played the role of Jeremy, commenting “This looks like lenses.” The next day, me and David sat down and proved that open learners monoidally embed into open games. I later learned that Mike Johnson had contacted them around the same time to tell them about the connection between open learners and lenses, and they responded by introducing him to open games. I spoke to Mike and Bob Rosebrugh at bx’2018, where I learned many interesting things, and I also learned that Bob claims to have independently rediscovered lenses too (while working in industry, if I remember correctly). I only mention this because it allowed me to claim 9 rather than 8 independent rediscoveries in my clickbaity first paragraph.

There is one more connection I know about in this wacky web of ideas. It turns out that David Spivak also independently rediscovered bimorphic lenses, and didn’t connect it to the Backprop paper. This one was realised by Christina Vasilakopoulou after I spent most of the Applied Category Theory workshop in Leiden telling everybody who would listen about all these connections, rather like this:

But I’m going to keep this last one a mystery, so we have some surprise left when we get around to writing a paper.

9 thoughts on “Lenses for philosophers”

Leave a Reply to Categorical cybernetics: A manifesto – Jules Hedges Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s