Some Philosophical Implications of Deacon's Model of Symbolic Reference

This is an attempt from a few years ago to assemble some thoughts on Terrence Deacon's The Symbolic Species. It is presented here as written then with a few minor edits.

A hierarchy of referential processes

In his book The Symbolic Species, Terrence Deacon sketches a theory of human language aiming to explain how it is qualitatively different from other forms of animal communication, and not just a more advanced form of the same kind of communication 1. I will summarize his model and attempt to describe how it leads to particular stances on a selection of problems in the philosophy of language and of linguistics.

Deacon argues that different modes of reference entail properties that constrain or allow different degrees of expressiveness, learnability, and other formal characteristics affecting how they can be used. It is the mode he terms symbolic, which he holds responsible for the unique properties of human languages. This symbolic mode of communication in turn allows symbolic thought.

The symbolic mode is contrasted with two simpler modes, iconic and indexical, on which it depends. He then argues that human communication is the only animal communication that uses all three modes. These three modes are not meant to be properties of signs themselves, but rather modes of cognitive processing. And argues that symbolic processing cannot occur without being simultaneous iconic and indexical processing.

Deacon's terminology in brief

As these terms have all been used by other philosophers and linguists in different ways, we must head off confusion. The word 'symbol' is often used to refer to a mark, such a character or other grapheme. We will refer to that usage with sign vehicle. Similarly, the word 'indexical' has a different meaning historically in discussion of the semantic reference pertaining to certain words including demonstratives. Deacon's use of the term is related and so does have implications for that type of reference, which I will touch on only tangentially.

The simplest mode of reference in this model is iconic. What is meant by iconic reference, is the mere recognition that x is similar to y and thus invokes it. As such, iconic reference is basic to any cognitive process; it reflects the formation of a category. This is basic because all learning requires enough stimulus generalization to respond the same way to different stimuli based on their similarity. But such generalization can also be used in iconic sign vehicles that bring to mind a category and thus represent it.

Iconic reference is necessary as a first step of language processing even in sign vehicles that can be processed further as indices or symbols. For example, a written character must be recognized as being an instance of its class before it can be interpreted. Similarly, this is reflected in the type-token distinction of words in corpus analysis, where we refer to a type as the class of the word as separate from each of its token instances.

Indexical reference builds on iconic reference by the addition of association. To refer indexically requires that two categories are already established via iconic reference. The association of these categories allows one to index, or indicate the other. Because these are established by co-occurrence, there is potential for implication of causal relation. Note also that indexical reference is transitive. If smoke indicates fire and a smoke alarm indicates smoke, then a smoke alarm indicates fire.

As with iconicity, associations are also characteristic of all learning processes and not unique to human communication. They do not even require cognition. Biochemical communication, for example in the human body, is based on what are often termed signals, because they indicate a physiological state. For example, high levels of insulin can indicate conditions favourable for growth, so systems in the body respond to that signal with growth promoting activity.

Indexicality is a critical stage in linguistic processing. For example, the process of associating the word 'cup' with things in the world that hold liquid for drinking entails iconic relationships between instances of the word 'cup' and iconic relationships between instances of drinking vessels establishing both categories before (or at the same time as) the association between them can be made. Indexical reference is also ubiquitous in animal communication. For example, vervet alarm calls differentiate predator types. Likewise, in human to animal communication, a dog can be taught associations between spoken words and behaviours such as sitting or events such as feeding.

The mode of reference that sets human language apart is what Deacon terms symbolic Symbolic reference depends on a sign vehicle already having been processed indexically. It then entails further indexical reference among other symbols in a set. What makes a word a symbol is not just that it indexically points to a conceptually or physically present referent. It must also refer to other words, not only reflected in a thesaurus-like web of substitutability, but in semantic ontologies and usage constraints constituting grammar. This aspect of symbolic reference is responsible for its combinatorially expansive generative power.

However, this also poses a fundamental problem in language learnability. If learning to symbolically process a word requires learning its relationship with every other word, then the task quickly becomes intractable. The puzzle of how children seemingly effortlessly learn language, despite this "poverty of the stimulus" is a long-standing question, and is the motivation for cognitive hypotheses of innateness, and even of the hard-coded grammar modules proposed by Chomsky that lack only a small set of parameters by which to fit the native language 2.

In AI terms, Chomsky's view says that the language learning problem requires high bias in the bias-variance tradeoff of learning models. That is, most of the information is already in the model. This would seem to be confirmed by the fact that computational models that learn co-occurrence relationships between contiguous vocabulary items on a large scale are still very limited in ability to recognize or generate well formed language. Not only do they fail to capture long distance relationships, but there is never enough data to capture the behaviour of rare collocations, let alone predict the behaviour of unseen ones.

While there is certainly evidence of bias in human brains conferring ability to process language, Deacon's view is that the machinery for learning these relationships is not something added to the ability to process symbols, but that is already present simply by having these three modes available, and the ability to switch among them. What is necessary to reduce the size of the language learning search space is a way to constrain options. Through a process of "desymbolization", Deacon proposes that some words and contexts call for only indexical reference to other words, and knowing which mode of reference to use is part of what is learned. So, for example, the word 'cup' can in principle occur in speech with any other known word. However the phrase 'a cup of ___' has a much smaller set of likely completions, and 'not my cup of ___' even fewer. This is because grammatical function words represent only indexically. They point to specific relationships between symbols. It does turn out that computational models that learn implicit distribution patterns are much more powerful.

Once the responsibility for the complexity of human language is seen as an inherent property of optional symbolic reference, the answer to the question of how humans can acquire complex grammars ceases to require either more intelligence than other creatures or more cognitive infrastructure than what is necessary for symbolic reference itself. And since symbolic reference is built hierarchically from simpler forms of reference already present in other animal communication, and even other important non-verbal human communication, there need only be an enhanced disposition to refer in this mode.

Even the important property of recursion follows emergently from symbolic reference itself, because symbolic reference allows substitution across type levels. For example, symbols can replace deictic indexicals, such as 'that boy' being replaced with 'he', or larger grammatical units such as entire arguments replaced with 'that', or, more abstractly, the implicit recursion of successively applying adjectives to noun phrases, such as 'the little red hen'.

Iconic reference and indexical reference are constrained by the necessities of similarity of form and correlation in time or space, respectively. That means they cannot be substituted except when those similarities or correlations apply transitively. This constraint is transcended in the symbolic mode. We don't actually need to have seen two words occur together to learn a symbolic relationship between them! In fact, we learn strong associations between words that occur in alternation. For example, we associate words such as 'coffee' and 'tea', even if each tends to occur in exclusion of the other. This implies both that symbolic properties such as recursion cannot be added to a system without symbolic reference, and that once symbolic reference is in place, nothing more is required for properties such as recursion to be possible.

I hope through the following examples to further clarify how this symbolic distinction operates.

Linguistic relativism

Deacon reasons that since the structural properties of human language emerge from the symbolic mode of reference, the habitual use of symbolic reference for communication in turn affects the way we think, making it different from other kinds of thought based on other forms of reference. Because this mode is almost entirely restricted to humans, this could be viewed as a species version of strong linguistic relativism 3.

The relativistic idea that there is a relationship between the particular language one speaks and the form of one's thoughts is usually attributed to Sapir and Whorf. It has been criticized as being either trivial in its weak form of mere influence, or implausible in its strong form of limitation on what thoughts are possible. It's not particularly contentious that, for example, linguistic distinctions our language requires can predispose us to make those distinctions when thinking about objects, especially when we are preparing to talk about them, but just because a certain meaning is more easily accessible in one language than another, does not mean it cannot be arrived at in another. However, if the comparison is among species rather than within, the strong form becomes a more attractive hypothesis. We know from the study of formal languages that different generative rules in symbolic systems concretely affect expressibility in mathematically provable ways. Not having symbolic reference at all seems to preclude many kinds of thought, particularly thought involving modal and logical operators or other counterfactuals.

If we think about animals in close human contact, for example dogs, it's clear that we can teach some of them to "understand" words, but this understanding appears to stay strictly within indexical interpretation. A dog can learn to associate a word or phrase with a behaviour to perform, or with an event to happen now, such as "Let's go for a walk" or "dinner time". These are indexical understandings only, because they associate a word category with a conceptual category that is, crucially, coincident in time and place. If the co-occurrence between the word and event ceased, the association would eventually be extinguished, too. But imagine trying to verbally present a dog with an alternative between a walk and a meal, or to tell him a story about yesterday's walk. You cannot ask a dog whether he believes there is a squirrel outside.

Consider trying to replicate the "marshmallow test" of delayed gratification. In the original Stanford experiment, young children were given a marshmallow, and a choice. If they did not eat the marshmallow when the staff left the room, they would soon be given two marshmallows to eat. The idea is that two marshmallows are better than one, and that a child who has the ability to temporarily inhibit the impulse to eat the one given knowledge of the contingent second, will have better outcomes in life.

I submit that the corresponding test performed on dogs can only test obedience, because the dog cannot be taught to understand the trade being offered, even if they "know" the word for the treat in question. We could train the dog not to eat a treat because he knows he'll be rewarded. We could even train him to reject a lesser treat, say, a strawberry, in anticipation of a slice of beef. But this would be a mere association of the strawberry with the initiation of a performance: do not eat the strawberry and be rewarded with beef. It would require training exactly because it could not be explained in words. In other words, the dog can learn an association between behaviour and reward; this is an indexical reference, a particular instance of an "if x then y" relation. But the dog can't seem to learn "if x then y" itself as a hypothetical structure and then apply it to new referents. If other animals had symbolic tools with which to organize perception, we would expect to see expression of symbolic reference in their native communication, or at least the ability to be taught to use it with humans.

Although much of the focus of the discourse on linguistic relativism among human languages is lexical — the canonical example being the existence of many words for snow in the Arctic — Whorf's original descriptions focused on what he deemed grammatical differences. That is, he noted that in order to communicate we have to agree on a system of organizing our perceptions, and asserts it is this organization which then shapes our thought 4. Specifically, he wrote about classifications that would either be explicitly marked grammatically, for example with grammatical morphemes, or implicitly through distribution patterns.

Arguably, from a strictly lexical perspective, we should expect other animals to be able to associate words with particular sets of stimuli in ways that could illustrate different ways of organizing the perceptual world. We know that we can teach lab animals complex tasks that entail such distinctions. Experiencing categories is not a linguistic act. It requires only iconic reference and is not unique to humans. Using a word to refer to a category is only indexical reference and is also demonstrably teachable to some animals as in our dog example.

What is linguistic and symbolic is using categories to constrain grammatical constructions. For this purpose categories can be based on semantic properties grounded in experience the way 'a cup of ___' imposes selectional restrictions 5, but they can also be completely arbitrary such as grammatical gender. Having the ability to organize categories around grammatical functions is critical for reducing the search space of relations between words. It's also essential for the recursive ability to carry forward reference through a chain of substitution.

Sense and reference

Deacon hypothesizes that the predicate-argument structure of linguistic communication results from the dependence of symbolic reference on indexical reference 6. The linguistic function of a predicate is to categorize its argument. In this sense, the argument is what the discourse is "about", and therefore its reference requires a kind of concreteness of instantiation, or grounding to something "real", while the predicate only refers to a symbolic category that needn't itself be thus instantiated. Thus the predicate derives grounding via its argument and, according to Deacon, requires it to be meaningful. Concreteness is a property of indexical reference, because of its requirement for co-occurence or contiguity. What an index refers to must be something present or proximate. But because indexical reference has transitivity, this proximity requirement for words that refer indexically can be linked in a chain. To see this, consider sentences (1) - (4).

  1. The girl sat in the chair.

  2. She was too heavy.

  3. It broke.

  4. That's a shame.

In Deacon's analysis, in (1) 'the' serves in both cases as a form of deixis that grounds the sentence indexically to specific referents. However, once they have been incorporated into the sentence, they can be substituted symbolically in a way that carries the reference. 'She', 'it', and 'that' make use of grammatical category to transitively maintain a chain of referential grounding.

There is a tradition in analytic philosophy to seek a direct correspondence between language and metaphysical reality; for words and expressions to "attach" via reference to things in the world. In Wittgenstein's Tractatus, he lays out a kind of logico-linguistic atomism, where every proposition can be analyzed down to "names": elements that directly refer to objects in the world.

There have been many different interpretations of what it means for an expression to refer to objects and how that reference happens. Gottlob Frege distinguished between sense and reference, saying "We let a sign express its sense and designate its nominatum." 7. In other words, in this view, sense is a set of properties that an expression describes, and that description, in addition to corresponding to the properties, also refers to something specific by them, insofar as it narrows down the potential referential candidates to the single thing in the world that has those properties. If something does exist in the world with the properties, the expression also corresponds to a true proposition. He also pointed out that an expression can have a sense and yet not have a nominatum; not refer to anything existing in the world, if there is nothing having the properties described.

A distinction was subsequently proposed by Keith Donellan about definite descriptions 8. He suggested that when a definite description narrows a subject to something specific that we don't know the referent for, it merely attributes, but doesn't refer. For example, his claim is that in a sentence such as 'The person who sat in my chair was too heavy.' if we don't know who sat in the chair, then the expression 'The person who sat in my chair' doesn't refer, but merely attributes, having a meaning akin to there exists a person x such that x sat in my chair. Whereas if we do believe we know who sat in the chair, then the expression refers to that person in a "rigid" way. That is, in the attributive case, any subsequent attribution is meant to apply to whoever sat in the chair, whereas in the "referring" case, subsequent attributions are meant to apply to the person we believed sat in the chair, even if it turns out that they didn't sit in the chair after all, or if there is contention over that point.

In both cases there is an act of predication, so Deacon's hypothesis would require an indexical reference in the phrase either way. However, I don't think his hypothesis requires that the chain of reference terminate in a rigid reference or Wittgensteinian atom. Rather, the chain of reference need only terminate in some mutually agreed upon focus of attention. And this focus could be entirely fictional, such as the Jabberwock, or incompletely specified, so long as it is held constant across substitutions.

The requirement that every predicate be a symbol grounded by its argument via indexical reference and that a reference chain need trace back only to a mutual focus of attention, may explain a further example in Donellan's argument. Donellan pointed out that a definite description in a predicate need neither refer nor attribute (and thus carry the implication that there exists a referent). His example contrasts the questions:

  1. Is de Gaulle the King of France?

  2. Is the King of France de Gaulle?

Normally, we would not consider (5) to contain the presupposition that there is a king of France, whereas we would in (6). Deacon's hypothesis would explain this by noting that the first predication requires an indexical grounding, because it begins the communication. Therefore it asserts its description as referring to something, even if it is only the attributive sense of asserting that the thing exists. The second predication derives its indexical reference from the first, so it need not itself be grounded. In this way, (5) does not assert there is a king of France, but only asks if the referent, de Gaulle, fits that symbolic (and therefore hypothetical), category of being king of France. In contrast, (6) first asserts the king of France has a referent before asking if it can be merged with the category of being de Gaulle.

Knowledge of language

While the above discussion suggests that expressions of language refer, at least transitively, to objects in the world, I have glossed over the difference between reference and meaning. Potential differences are sometimes illustrated by embedding propositions that look like they have the same meaning from a reference point of view into belief statements. Hence the classic example from Frege of 'the morning star' and 'the evening star' which both refer to Venus can be used to show the difference.

  1. I believe that the morning star is the morning star.

  2. I believe that the morning star is the evening star.

If meaning were merely reference, (7) and (8) would have the same meaning. Meaning appears to have more to do with ideas in the mind than objects in the world. Given that only the individual has access to his own ideas, this led Locke to somewhat absurdly conclude that language must be private — absurd because the putative function of language is to communicate, and if meaning is necessarily private then nothing can be communicated 9.

If what words or expressions refer to are ideas, then we may be able to make a helpful analogy between mental representations and indexical reference. One theory of associative memory is that the hippocampus organizes it by index. For example, the word 'cup' can be an index into associative memory which ties together perceptual fragments of actual experience (though they may be merged or idealized over time). In this way, the associations with the word 'cup' for an individual are indeed private. However, because 'cup' can also be processed symbolically, that means there are associations with other symbols and how those symbols interact via grammar. Grammar, unlike associative memory, is procedural and therefore processed in part in the cerebellum 10. Because the word 'cup' and its grammatical occurrences are exactly what is shared when people use language to communicate, this symbol-to-symbol structure is negotiated in public, regardless of the idiosyncratic associative contents of its indexical grounding. In other words, perhaps what is private is not language, the symbolic system, but merely associations in the pre-verbal processing of meanings.

Debate about whether language can be private does sometimes center on discussion of grammatical rules. For example, Saul Kripke and others point out that there can be many, perhaps even infinitely many rules that a given data set of usage seen at a given point in time is obeying, and therefore there is no way to distinguish whether you and I are following the same rules or whether our rules merely agree thus far, but are each, essentially, private and different. To be sure even within what is considered correct usage, there are idiosyncracies in production such that linguistic forensics can sometimes determine authorship just by statistical analysis.

In analogy to formal languages, many have sought to characterize human languages as infinite sets of utterances that would qualify as grammatically correct English. This analogy leads naturally to treating the problem as an inference puzzle to find the formal language that would generate the appropriate grammar. In contrast, Deacon's view is that language in the wild is not so much deemed correct or incorrect, but understandable or not, and that this depends exactly on whether the indexical references can be derived from immediate sources of joint attention, linguistic or non-linguistic. This would explain tolerance of "broken" English by foreigners and children, or the fragments characteristic of dialogue. Interestingly, this does seem to imply that language requires outside verification to judge, denying the ability of language to be private.

A related problem in language knowledge that Deacon specifically aimed to address was the Chomskian idea that the complexity of grammar is so high that humans must have innate grammatical structures that need only be parameterized to function. They seem to agree that the induction of a rule set is implausible, but Deacon's solution is to consider the complexity inherent in the symbolic mode of reference, and the constraints to be exactly what is imposed by generic learning constraints in the human brain. Thus, we incorporate desymbolization exactly to the degree needed to cope with our cognitive limitations, such as short term memory capacity. In other words, the forms of grammatical constraints human languages have are a direct result of our limited cognitive ability to deal with symbolic reference, and this is the source of apparent universality. The structure is the product of the process, not the constraint that has to be learned.

In "Language and Problems of Knowledge", Chomsky goes so far as to say that we must have innate concepts that need only be given labels [1]. Otherwise, he says, we could not explain that children at their peak of language acquisition learn many words a day, sometimes after a single occurrence. But this is not so remarkable when we remember that concepts are not symbolic, they are merely indexical, and therefore do not require language. It's possible that when children are in the phase of acquiring lexical items at this rate, they have a backlog of concepts waiting for labels, not because those concepts were innate, but because they were acquired non-verbally before the rest of their language ability was ready to label them, and the phase corresponds to a sudden readiness to acquire words symbolically, because they have figured out how to do this mode of reference.


Deacon's model of symbolic reference distinguishes a hierarchy of modes of reference, such that the first two are ubiquitous in learning and communication in all living systems, whereas the symbolic mode, building on the others, is unique to human language. These distinctions cast new light on some old problems. While they do not directly bear on differences between human languages, they clarify how certain word-like communications in other animals differ from full-fledged symbolic words, while retaining indexical properties, and how this limitation to their use in communication, likely imposes limitations in thought.

Conceiving of reference as a mode of processing rather than something inherent in sign vehicles themselves allows a shift in the objective of linguistic analyses from finding assertions about things in the world to maintaining shared focus on a small, but growing set of ideas as they are transformed and created through predications. This ability to focus on ideas via symbolic reference allows a kind of cognition that is not dependent on the proximity or reality of any object, and yet is always linked to a shared context.

The general constraints of memory are sufficient to require limitations on symbol-to-symbol relations that would be reflected as a kind of universal in any language. Therefore, according to Deacon, We need no special innate representation of grammar, but only a capacity to optionally use symbolic reference, in order to end up with language structures sharing basic properties. Non-human animal communication is not simplified symbolic language, but rather non-symbolic.


Deacon, Terrence William and International Society for Science and Religion. The Symbolic Species: The Co-Evolution of Language and the Brain. Cambridge: International Society for Science and Religion, 2007.


Chomsky, Noam. “Language and Problems of Knowledge.” Teorema: Revista Internacional de Filosofía 16, no. 2 (1997): 5–33.


Deacon spends many pages describing a set of experiments in which the researchers successfully (though with difficulty) teach chimps to make a modest number of symbolic references. This is important for two reasons: first it shows that symbolic reference is a process, not an innate brain structure, that you either have or you don't, and second it shows that nonetheless it is much easier for humans to make that particular leap, so there must be an innate facility.


Scholz, Barbara C., Francis Jeffry Pelletier, and Geoffrey K. Pullum. “Philosophy of Linguistics.” In The Stanford Encyclopedia of Philosophy, edited by Edward N. Zalta, Summer 2020. Metaphysics Research Lab, Stanford University, 2020.


selectional restrictions:


Deacon, Terrence W. “Beyond the Symbolic Species.” In The Symbolic Species Evolved, edited by Theresa Schilhab, Frederik Stjernfelt, and Terrence Deacon, 6:9–38. Biosemiotics. Dordrecht: Springer Netherlands, 2012.


Frege, G.: 1892, ‘On Sense and Nominatum’, in Martinich, Aloysius, ed. The Philosophy of Language. New York, N.Y: Oxford University Press, 2010.


Donnellan, Keith S. “Reference and Definite Descriptions.” The Philosophical Review 75, no. 3 (July 1966): 281.


Locke, John, and Frederick Ryland. Locke on Words: An Essay Concerning Human Understanding. W.S. Sonnenschein & Company, 1882.


See e.g. Adamaszek, Michael, and Kenneth C. Kirkby. “Cerebellum and Grammar Processing.” In The Linguistic Cerebellum, 81–105. Elsevier, 2016.

I did it!

I did it!

It's been a long time since I wrote anything here; everything I've written has been over at Mostly Fat or the companion book site, being health or nutrition related in nature.

I've had some subsymbol themed ideas to write, but after six years I was afraid I might forget how to use Nikola, or that the code would have changed so dramatically that I wouldn't be able to get my site to build at all. So I had been putting it off.

But I finally tried it today, and it worked!

I did run into a snag with the old posts written in html. It seems that where Nikola used to read "meta" files for certain information, it now expects to read that information out of the html head. That was easy enough to fix once I figured out the issue.

There are still some warnings and formatting changes that I need to clean up, but this seems good enough to at least try deploying! I'm eager to start writing here again.

Notes on Fragment Grammars

Last week I read Fragment Grammars: Exploring Computation and Reuse in Language by Timothy J. O'Donnell, Joshua B. Tenenbaum, and Noah D. Goodman.

As I mentioned in my journal, the authors of this tech report promise to generalise adaptor grammars (a variety of PCFG that uses a Pitman-Yor process to adapt its probabilities based on context) by using a heterogenous lexicon (one that is not married to some prescriptivist notion of linguistic category such as word or morpheme, and thus can include items smaller or larger than words). The "lexicon" is chosen to optimise between storing a relatively small number of tiny units which require a lot of computation to structure together vs. storing a large number of long strings which cover large stretches of text, but aren't very flexible or general. In other words, it's a tradeoff in compression vs. computation.

Here are my impressions on first reading.

What I really love about this tech report is that it unpacks a lot of typically presumed knowledge right inside it.

So if you didn't know about PCFGs or memoization, or Chinese restaurant processes, or non-parametric Bayesian methods before, you can get a lot of what you need to know right there. Of course, The reason a typical conference or journal paper doesn't include such thorough background, is simply that there isn't the space for it. Moreover, one can usually assume that the audience has the appropriate background, or knows how to acquire it. Nonetheless, I find it a great pleasure to read something that assumes an educated audience that isn't intimidated by statistical models or equations, but might not know every niche term involved in such a specialised task.

Here are some ways in which reading this paper helped me to grok non-parametric Bayesian techniques.

I had never thought of LDA and related algorithms as stochastic memoisation, which is how they are described here.

"A stochastic memoizer wraps a stochastic procedure [i.e a sampler] in another distribution, called the memoization distribution, which tells us when to reuse one of the previously computed values, and when to compute a fresh value from the underlying procedure. To accomplish this, we generalize the notion of a memotable so that it stores a distribution for each procedure–plus–arguments combination."

I like this description because it is immediately understandable to someone who has used dynamic programming. We know the value of limiting recomputation (and, again, if you don't, the classic Fibonacci example is right in the paper!), and now we see this generalised to probabilistically either using the cached value or resampling. As the authors explain:

"If we wrap such a random procedure in a deterministic memoizer, then it will sample a value the first time it is applied to some arguments, but forever after, it will return the same value by virtue of memoization. It is natural to consider making the notion of memoization itself stochastic, so that sometimes the memoizer returns a value computed earlier, and sometimes it computes a fresh value."

I have seen several different presentations of LDA, and not once was it described in this intuitive way.

Further, we can see how using the Chinese Restaurant Process, which is biased to sample what has been sampled before as a "simplicity bias":

"all else being equal, when we use the CRP as a stochastic memoizer we favor reuse of previously computed values."

An assumption that Gibbs sampling relies on was made clear to me in the explanation of exchangeability.

"Intuitively, exchangeability says that the order in which we observed some data will not make a difference to our inferences about it. Exchangeability is an important property in Bayesian statistics, and our inference algorithms below will rely on it crucially. It is also a desirable property in cognitive models."

"Pitman-Yor processes, multinomial-Dirichlet distributions, and beta-Binomial distributions are all exchangeable, which means that we are free to treat any expression e(i) ∈ E as if it were the last expression sampled during the creation of E. Our sampling algorithm leverages this fact by (re-)sampling each p(i) ∈ P for each expression in turn."

Even though I knew that the exchangeability was necessary for taking products, that is, that permutations don't effect the joint distributions, I hadn't thought about the way this frees us in our sampling order. If we wanted to add some kind of recency effects to our models, order would, of course, become important.

The real meat of the paper, though, is in describing Fragment Grammars as contrasted with Adaptor Grammars.

This will likely be the topic of the next post.

Two inotify Pitfalls

All I wanted was to track some files.

I have a set of directories populated with files that are hardlinked from other places. This serves as a kind of database for associating tags to files. I wanted to write a daemon that notifies changes to the files that might affect the consistency of the database. For example, if the file is removed, I want my database to let go of its reference.

The linux kernel has a set of system calls for this called inotify, and Twisted has a module to support that API. Not only that, but the module documentation has everything you need.

from twisted.internet import inotify
from twisted.python import filepath

def notify(ignored, filepath, mask):
    For historical reasons, an opaque handle is passed as first
    parameter. This object should never be used.

    @param filepath: FilePath on which the event happened.
    @param mask: inotify event as hexadecimal masks
    print "event %s on %s" % (
        ', '.join(inotify.humanReadableMask(mask)), filepath)

notifier = inotify.INotify()
notifier.startReading()"/some/directory"), callbacks=[notify])

Pitfall #1

Great! I copy the code, change the path from "/some/directory" to "/home/amber/somedir", and start it up.

  • I make a file, "f", and am notified of its creation, and some attribute change.

$ touch f
event create on FilePath('/home/amber/somedir/f')
event attrib on FilePath('/home/amber/somedir/f')
  • I hard link it with another name

$ ln f h
event create on FilePath('/home/amber/subdir/h')

Ok. It logged the creation of h for me, but hasn't this changed the links attribute of f? Why wasn't I notified of that?

  • I try modifying it.

$ echo hello >> h
event modify on FilePath('/home/amber/subdir/h')

I'm further non-plussed. This event should have modified both f and h, but I was only notified of the one used in the command.

Finally I try what I really want. * I make a hard link to f from outside of "somedir/", and modify it through there.

$ ln f ../h
$ echo hello >> ../h
(no response)

What's going on?

Inotify takes a pathname. If the pathname is a directory, then it watches the directory, but this is not the same as watching each file in the directory.

Pitfall #2

Glad to have figured out my error, I try again, modifying the pathname argument in the daemon to "somedir/f". I remove all those files, touch f, and start the daemon again. This time it does what I want.

$ ln f h
event attrib on FilePath('/home/amber/subdir/f')
$ ln f h
$ echo hello >> h
event attrib on FilePath('/home/amber/subdir/f')
event modify on FilePath('/home/amber/subdir/f')
$ ln f ../h
$ echo hello >> ../h
event attrib on FilePath('/home/amber/subdir/f')
event modify on FilePath('/home/amber/subdir/f')

But wait!

I was about to call it good, when I decided to try modifying the file with vim or emacs. I deleted all those files again, touched f, and this time modified it with vim. On saving the file, I get this:

event move_self on FilePath('/home/amber/subdir/f')
event attrib on FilePath('/home/amber/subdir/f')
event delete_self on FilePath('/home/amber/subdir/f')

What's going on?

It turns out that vim and emacs, and who knows what else, have a trick to save backups while in use.

To see what happens, I edited the daemon to watch the directory again, and also to print some stats about the files:

from twisted.internet import inotify, reactor
from twisted.python import filepath
import os

def notify(ignored, filepath, mask):
    For historical reasons, an opaque handle is passed as first
    parameter. This object should never be used.

    @param filepath: FilePath on which the event happened.
    @param mask: inotify event as hexadecimal masks
    print "event %s on %s" % ( ', '.join(inotify.humanReadableMask(mask)), filepath)
    for f in os.listdir(fp):
        stat = os.stat(os.path.join(fp, f))
        print f, "mode:", stat.st_mode, "inode:", stat.st_ino

fp = "/home/amber/subdir"
notifier = inotify.INotify()
notifier.startReading(), callbacks=[notify])

Now I run it and open f with vi.

$ vi f
event create on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event create on FilePath('/home/amber/subdir/.f.swpx')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/.f.swpx')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event create on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event attrib on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363


event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363

(modify manually)

event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363

(save manually)

event create on FilePath('/home/amber/subdir/4913')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event attrib on FilePath('/home/amber/subdir/4913')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/4913')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event moved_from on FilePath('/home/amber/subdir/f')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event moved_to on FilePath('/home/amber/subdir/f~')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event create on FilePath('/home/amber/subdir/f')
f mode: 33204 inode: 2307371
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event modify on FilePath('/home/amber/subdir/f')
f mode: 33204 inode: 2307371
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event attrib on FilePath('/home/amber/subdir/f')
f mode: 33204 inode: 2307371
.f.swp mode: 33188 inode: 2307363
event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307371
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/f~')
f mode: 33204 inode: 2307371
.f.swp mode: 33188 inode: 2307363


event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307371
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307371

As far as I can tell, the result is as if they have renamed f to something else like f~, copied the contents of f~ to a new file named f, modified f, and finally deleted f~. This is essentially copy-on write.

But inotify, while taking pathnames as arguments and returning pathnames, actually tracks inodes. So simply using an editor has the effect of moving the file to a different inode and thereby breaks inotify!

This is ultimately a consequence of using aliases to files (pathnames) as if they were canonical references to files (inodes).

Post Script: Lucky break?

As it happens, the behaviour of vim and emacs is different when the inode holding the file has more than one reference. I can prevent the inode from disappearing by making a hardlink to the file before opening it with an editor. The editor must have recognised that it can't move inodes willy-nilly when other pathnames depend on it. This maps exactly to my original scenario, and therefore might make it safe for me to use. On the other hand, my whole confidence in the bahaviour is undermined, and I am reluctant to rely on it.

Pólya's Urn

Skip to interactive demo: Try it!

Balls in Urns

If you have studied probability, you are probably familiar with the canonical balls-in-an-urn allegory for understanding discrete probability distributions. For example, you could imagine an urn containing 1 red ball and 3 green balls. Drawing a ball from the urn at random represents sampling from a probability distribution where the probability of one outcome is \(25\%\) and the probability of the other outcome is \(75\%\) We can extend this idea in a variety of ways.

Pólya's Urn

In Pólya's Urn, the extension is that whenever you draw a ball from the urn, you not only replace it, but you add an extra ball of the same colour. So if you happened to draw a green ball in the example above, then the ratio would change from \(1:3\) to \(1:4\). That means on the next draw, you would now have only a \(20\%\) chance of drawing red. On the other hand, if you happened to have drawn red, then the ratio would change to \(2:3\), giving red a probability of \(40\%\)

This process is interesting, because it has the property that the more often you observe something, the more likely you are to observe it again.

Different starting conditions

The way the distribution changes over time depends on the starting conditions.

One of each

Let's imagine the simplest case, in which we start with one each of two colours, red and green. The following table shows the probabilities of getting red on the first three draws, and how each draw changes the probability of the next by changing the proportion of colours in the urn.




new R:G










There are more ways to have drawn two of one colour, and one of the other, than 3 of one colour. However, because of the way drawing a particular colour reinforces itself, there is a \(50\%\) chance of drawing the same colour every time over the first three draws.

First three draws



\(1/2 \times 2/3 \times 3/4 = 1/4\)


\(1/2 \times 2/3 \times 1/4 = 1/12\)


\(1/2 \times 2/3 \times 1/2 = 1/6\)


\(1/2 \times 2/3 \times 1/2 = 1/6\)


\(1/2 \times 2/3 \times 1/2 = 1/6\)


\(1/2 \times 2/3 \times 1/2 = 1/6\)


\(1/2 \times 2/3 \times 1/4 = 1/12\)


\(1/2 \times 2/3 \times 3/4 = 1/4\)

Ten of each

Now suppose that we start with 10 each of red and green balls. In this case, simply drawing a red ball the first time does not change the probability that it will be drawn again nearly as significantly as with the \(1:1\) starting conditions. The probability of drawing 3 of the same colour in a row falls to \(2 \times 10/20 \times 11/21 \times 12/22 = 2/7 \cong 29\%\)

We can view the starting conditions as a list of numbers, one for each starting colour, and call it alpha (\(\alpha\)). So our first example had \(\alpha = [1, 3]\), our second example had \(\alpha = [1, 1]\), and our third example had \(\alpha = [10, 10]\),

Higher returns

On the other hand, imagine if we started with 1 each of red and green, but instead of increasing the number of balls by 1 when we draw a colour, we increased it by 10. Now every draw has a much stronger effect. The probability of drawing the same colour 3 times in a row would now be \(2 \times 1/2 \times 11/12 \times 21/22 = 7/8 \cong 88\%\)

We could even have a particular increase number for each colour, and have another list, called beta (\(\beta\)).

More colours

Another way to change the starting conditions is to increase the number of colours. If our starting urn had one each of 10 different colours, then, again, when we draw the first ball, it has much less of an effect on the chance of drawing it again. We can call the number of colours \(n\).

Try it!

Use the sliders to choose \(n\) colours, and a single \(\alpha\) and \(\beta\) for all colours. Try drawing balls from the urn, and see how the urn changes. At any time you can display the urn in rank order or reset to the current slider position.

n: α: β:


The Grammaticality Continuum

Yesterday I was thinking about implementing Latent Dirichlet Allocation (LDA). LDA is used for topic modelling — inducing a set of topics, such that a set of natural language documents can be represented by a mixture of those topics. This is then used to estimate document similarity, and related information retrieval tasks.

The first step in such a project is to tokenise — to break up the text into words, removing attached punctuation, and regularising things like capitalisation. When looking at the words in a document for the purposes of topic modelling, it seems appropriate to merge word forms with the same root, or stem, instead of having each form of the "same" word represented individually. The canonical way to tokenise for topic modelling involves stemming, and it also involves removing stop words — words like "the", and "and" that are more syntactic than semantic.

I am not entirely convinced that this latter is appropriate. The reason is that the grammatically of words exists on a continuum. Even the word "the" carries semantic weight, though its main function is probably to signal the boundaries of syntactic chunks.

My favourite example of the syntactic function of "the" comes from Petr Beckmann 's book The structure of language: a new approach, which has profoundly influenced me since my undergraduate days. In it he shows how the following headline is disambiguated by the placement of "the" before or after "biting":

"Man Refuses to Give up Biting Dog"

A couple of years ago at the NAACL conference, there was a session where a few prominent computational linguists presented their favourite papers from the past. Eugene Charniak presented Kenneth Church's 2000 COLING paper: Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to \(p/2\) than \(p^2\). It introduced a measure of adaptation for language models based on how much a recent occurrence of a word increases its tendency to occur beyond what is already expected.

Charniak used this paper as a background with which to present a new idea about the way the prior likelihood of a word predicts its future occurrences. He divided words into sets according to how well their priors predicted them. Stop words were most likely to be true to priors, and content words least, with verbs at the end of the spectrum.

At the time I took this as evidence for the stratification of grammaticality. Because of this stratification, treating stop words as a special set appears arbitrary and oversimplified. I expect that leaving stop words in a topic model would simply result in having some topics that are distributed more evenly throughout the corpus of documents. These topics would discriminate among documents poorly. However, this result should also be part of a continuum. It may be useful to account for the distribution of topics when using them in LDA, in a way analogous to inverse document frequency in latent semantic analysis.

More generally, I am interested in the phenomenon that words vary in their semantic and syntactic load. Even just within semantics, a morpheme may carry more than one meaning that cannot be decomposed linguistically. For example, "uncle" is male, and while we could assign "uncle" a set of semantic features in a computer system in order to reason about its meaning, those features are only implicit in English. In logographic writing systems this is all the more apparent.

This simultaneity of features in an apparently linear system is, to me, one of the most interesting aspects of language, and one of the reasons computational linguistics is difficult and rewarding.

Productively Lost

Yesterday Hacker School's resident Mel Chua shared her work on educational psychology theory for Hacker School 1. I had seen an earlier iteration of this talk from PyCon video archives, and it was useful to me then. However, this time I had more relevant experience with which to understand it. Hacker School is the first time I have had such a fluid and indeterminate educational experience. Even graduate school was more structured, and with more fixed goals.

I have previously compared Hacker School to a game of Tetris, in which new exciting things are constantly dropping from the sky, and you can't get them all and fit them all into your life. Eventually you will lose, but it is fun to try, anyway. I like this analogy, but in some ways it is too passive. Hacker School (and life in general, if you let it) is more like a giant maze with more and more doors appearing all the time. Many paths connect to each other, and you may find yourself back where you were before, but from a new perspective. Here I can see more clearly than ever before the unboundedness of the space of learning, and this makes the idea of a best path through it almost laughable. That's not to say that there are no poor ways to learn. Only that that are many good ways.

One central message from Mel's talk was the idea of being Productively Lost. Given that you are your own guide in an infinite maze makes being lost natural. The question is how to make the best of your learning given that situation.

Mel talked about using measurement to guide learning, in analogy with Test-Driven-Design. She talked about how to most effectively join an open source project so that you can maximise your interactions and contributions for everyone's benefit, and for your own development. There was also a section on motivation, self-efficacy, and attitudes.

She mentioned different learning styles, and followed up later in the day with a workshop on the topic. I found this enormously helpful, because instead of just coming out with a label, which I have done in the past with this kind of theory, I was able to see strategies that make better use of my strengths. By reviewing my experiences at Hacker School so far, and relating them to these axes, I feel I am in a better position to enhance my learning experiences deliberately.

Mel also talked about the progression of learning. Learning tends to follow a cyclical pattern of periods of assimilation of new ideas into an existing mental model followed by a paradigm shift that requires accommodation. Accommodation is needed when new ideas are fitting less well into the existing model, and an extensive refactorisation makes everything fit more naturally. This stage is slow and uncomfortable, and may even feel like a regression. After this, there is a shorter period during which learning new things with the new model is fast and rewarding, before reaching another steadier state of assimilation.

Even though I have taken my own way on some critical aspects of my life, much of my life is characterisable by following paths that were set by someone else, or were simply unexamined pursuit of "the way things are done". Applying to Hacker School in the first place was a big, intimidating step away from this pattern that stretched my courage. It rivals the most rewarding decisions of my life so far. The increased autonomy and competence I am developing here feels like a new freedom, a tipping point into a feedback loop of self-expression and creative action that goes way beyond any particular programming concept I have learned while here.

Becoming comfortable with this fundamental lostness, and yet feeling adequate to navigate it, is ultimately much more empowering than the security of excelling at following well-lit, paths sanctioned and rewarded by others.



Slides from last year's version here:

Addresses and Contents

What are we naming?

The name of the song is called "Haddocks' Eyes."'

'Oh, that's the name of the song, is it?' Alice said, trying to feel interested.

'No, you don't understand,' the Knight said, looking a little vexed. 'That's what the name is called. The name really is "The Aged Aged Man."'

'Then I ought to have said "That's what the song is called"?' Alice corrected herself.

'No, you oughtn't: that's quite another thing! The song is called "Ways and Means": but that's only what it's called, you know!'

'Well, what is the song, then?' said Alice, who was by this time completely bewildered.

'I was coming to that,' the Knight said. 'The song really is "A-sitting On A Gate": and the tune's my own invention.'

Alice Through the Looking Glass

Mutability applies to content

I have been designing a system that seeks to change the way we name our data: instead of naming the hierarchical way, using directories (or folders), it will name the category way, using tags. Because the system needed an ID for every file it tagged, and perhaps for no other good reason than that I am fond of content-based hashing, I chose to use a content-based hash for the IDs. However, this ID is not a good match for mutable files. I had been putting off thinking about how to handle mutable files, figuring I could add that functionality later.

When I was finally ready to confront mutable files, I realised that I was running up against a fundamental issue:

People often conflate addresses with contents when naming things

Take for example the problem of citation. If you cite a book or a research paper, then what you are citing is the contents. When we first started wanting to cite URLs, we treated them as though we were addressing content, but we aren't. URLs point to content that can change.

Of course, this issue shows up in many areas of computer science whenever we use references.


In a filesystem, we usually refer to files by their paths. The relationships between paths, addresses (inodes), and contents are shown here:

That is, a pathname refers to a single inode, but not necessarily vice versa. An inode has exactly one set of contents, but those contents may be replicated in many inodes in the system. I didn't recognise at first that the problem of mutable and immutable files is the distinction between addresses and contents.

As far as file systems go, I don't know of any that make the distinction between contents and addresses, except Tahoe-LAFS. The reference to an immutable file in Tahoe is content based for integrity and deduplication. The reference to a mutable file just specifies a unique "slot".

I've decide to follow this paradigm in Protagonist. Both addresses and contents should be eligible for tagging. Tagging the box will use the inode. Tagging the contents will use the hash.

How I transformed a Blogger blog to a Nikola two-blog website

In the last post I described why I wanted to move my blog, and what went into my choices. My goal in exporting my site from Blogger to Nikola was to separate my content into two separate blogs on a single site: one which is the primary blog, and the other which resides in the background. I wanted them to be archived separately. I also wanted there to be a front page that didn't change every time I wrote a new post.

Importing the blog

Nikola has a plugin to import a blog from Blogger, so the first step was to import the blog. To import, I took the following steps:

  • First, I exported my blog from Blogger, which gave me a file called blog-07-05-2014.xml. Then,

$ sudo pip install nikola
$ nikola plugin -i import_blogger
$ nikola import_blogger -o site blog-07-05-2014.xml # The -o tells nikola what directory to make the site in.

After this, I had some cleanup to do. For one thing, Nikola renamed all the Blogger files (in a sensible way). Blogger has date-derived subdirectories for posts, whereas in Nikola all the posts are kept in one folder, called "posts" So when the new site is up, redirects will be required for anyone with the old link.

The import gave me a file called url_map.csv, which contained all the information necessary to redirect the old links to their new locations, but as far as I could tell, those redirects still had to be encoded into the configuration file to take effect. Since I only had a few, I did this manually. For every line in the url_map, I inserted a tuple into the REDIRECTIONS list in

The result at this point was a Nikola blog that contained everything the old blog did.

Making it not a blog

There is a document in the Nikola project describing how to make a non-blog site. The instructions boil down to changing three lines of

Posts and Pages

As usual, Nikola distinguishes two different kinds of text-derived files.

  • Posts are blog files. They are archived, indexed by tag, and ordered by date for display.

  • Pages are essentially independent.

So to make a blog into a non-blog, you simply manipulate the variable POSTS defined in POSTS and PAGES are lists describing where to find posts or pages respectively, where to render them, and what template to use for them. We let the POSTS list be empty, so everything on the site will be a page.

POSTS = []
        ("pages/*.rst", "", "story.tmpl"),
        ("pages/*.txt", "", "story.tmpl"),

The two entries for PAGES are here to allow either txt or rst, but the one that comes first is what will be used when you use the new_post command.

So you can create a page called "index.html" by running new_post -p, and giving it the name index.

$ nikola new_post -p


Creating New Page

Title: index

INFO: new_page: Your page's text is at: pages/index.rst

Since we put it at "", which is the top level of the site, it will be what you see on the "front page".

Unfortunately, this creates a conflict, because when you build the site, the blog part is already wired to make an index.html file in the top-level directory, since that's how blogs normally appear. So you intercept this by adding to the line:

INDEX_PATH = "blog"

This just makes it so the blog index is now created under the folder blog, instead of the top level, and it no longer conflicts.

It also means that now you can have a regular blog under the subdirectory blog, by putting options back in for POSTS:

POSTS = [("posts/*.rst", "blog", "post.tmpl")]

But this is not enough for us, because we have imported posts from Blogger that were also being found from the POSTS list. There is another entry in POSTS that we need back, that tells Nikola to also collect and render existing html files, such as those we imported. So we need to add:

("posts/*.html", "blog", "post.tmpl")

back into the list.

Two blogs

At this point I had part of what I wanted. I had a front page separate from my blog, and all of my previous blog reachable from the site. But I still wanted to have two blogs, a primary one for themed entries, and a journal for unstructured reflections.

Here are the steps I took to factor the blogs apart:

  • Made a new directory for the journal posts called "journal", and moved the appropriate files into it.

  • Added ("journal/.rst", "journal", "story.tmpl") and ("journal/.html", "journal", "story.tmpl") to the PAGES list, so old and new journal entries can be found for rendering.

  • Updated the REDIRECTIONS to reflect those

This worked to render them and include them in the site, but the journal articles were not indexed. That meant that if I knew the URL, I could go to the article, but a visitor to the site could never discover them.

To add indexing I had to add to


But again, this created a conflict with multiple files called "index.html" trying to go in the same folder. So I also had to change the name of the index. I chose "index.htm", so that the server would automatically redirect.

INDEX_FILE = "index.htm"

Finally, I wanted the journal to be findable without knowing the directory name "journal", so I updated the navigation links:

     ("/archive.html", "Archive"),
     ("/categories/", "Tags"),
     ("/blog/", "Blog"),
     ("/journal/", "Journal"),
     ("/rss.xml", "RSS feed"),

As an extra configuration tweak, I set TAG_PAGES_ARE_INDEXES = True, so that when you go to the page for a given tag, it renders the posts themselves, rather than a list.

I would like to do that for my journal index as well, but that feature is not yet general, so if you navigate to "Journal" you will get a list of posts, and unfortunately, since it is a journal, they are named by date. Moreover, their tags aren't collected.

Parting thoughts

All in all, I'm satisfied with the move. I got a lot of help from the Nikola community, and my main requirements are fulfilled.

There are a few remaining troubles.


I signed up with Disqus, and think I have initiated the process of importing my old comments, which I was reluctant to lose. It takes an unspecified amount of time to complete, so I'm hoping that will take care of itself now, but I'm uncertain.

Orphaned rst

Because of the way I built my Blogger site: writing in reStructuredText, converting to HTML, and uploading, I still have the original, pristine rst files on my local system, but Nikola doesn't use them. It uses the backported HTML from Blogger. In order to inject the old rst files into Nikola, however, would require manually editing them all to include the correct headers and timestamps. This seems like a lot of work, and I'm not willing to do it right now.

Moving my site off of Blogger

I was having two problems with the setup of this website. First, I wanted to factor out the experiment in vulnerability and transparency that I have been doing by keep a log of my daily goals, progress, and insights at Hacker School. I like the experiment, but I wanted it to be separate from articles I write more deliberately.

The second problem was that I wanted to migrate from Blogger.

Blogger has advantages. It is easy to set up, and freely hosted. It has themes, comments, and a variety of plugins. You can export your stuff if you want to, so you aren't completely locked in.

However, Blogger is not a good match for someone who wants fine-grained control over her content. My use of Blogger for the three sites I have hosted on it has consisted of the following elaborate dance:

  1. Edit my post in reStructuredText 1.

  2. Convert my post into HTML using a custom script I adapted from rst2blogger 2.

  3. Cut and paste my post into the Blogger compose form, click Preview, and see if it looked ok.

  4. Repeat until all typos and other issues were resolved.

  5. Click Publish.

The result of all this work was a site that looked more-or-less how I wanted it to in some ways, but was frustrating in others. For example,

  • I couldn't change the css styles that went with a given theme, (and some of them were really dysfunctional).

  • I couldn't make the site not a blog — the blog assumption is that your most recent content ought to be your most prominent, and this is not an appropriate assumption for some of my use cases. I sometimes found myself putting off making a post that was less compelling until I knew I could follow it with a better one quickly! The restrictiveness wasn't serving me.

Also, a little independence from Google seems healthy.

What I really wanted was a static site generator, with no dynamic logic on a server-side database. I wanted my whole site complete and rendered on my local machine to do with what I liked. In other words, I wanted a static site generator.

Choosing a static site generator

My criteria were:

  • Open source

  • Python

  • Works to generate non-blog sites.

This gave me two choices that I knew of: Pelican, and Nikola.

I have had my eyes on Pelican for some time now. I didn't like the way the resulting websites looked, though, until recently. They had a jarring "I'm a programmer, not a designer" feel. Being a programmer and not a designer, I can't articulate it much more precisely than that. These days, the sites look fine to me. The docs are well written, the code looks good, and people whose opinions about such things I respect use it.

In the meantime, I also found out about Nikola, which was recommended by another respected coder-friend.

I decided to use Nikola, in part because it has a plugin to import from Blogger, and in part because the above-mentioned friend offered to help me.

In the next post, I'll describe how I ported my Blogger blog into a Nikola blog.


I used to write in pure HTML, but after much goading from Zooko, I switched to rst. I'm glad; I find rst more flexible.


I didn't want the script to deploy it, because I didn't want to trust the script with my Google authentication, so I took that part out. I also changed some heading styles that Blogger doesn't render well.