Notes on Fragment Grammars

Last week I read Fragment Grammars: Exploring Computation and Reuse in Language by Timothy J. O'Donnell, Joshua B. Tenenbaum, and Noah D. Goodman.

As I mentioned in my journal, the authors of this tech report promise to generalise adaptor grammars (a variety of PCFG that uses a Pitman-Yor process to adapt its probabilities based on context) by using a heterogenous lexicon (one that is not married to some prescriptivist notion of linguistic category such as word or morpheme, and thus can include items smaller or larger than words). The "lexicon" is chosen to optimise between storing a relatively small number of tiny units which require a lot of computation to structure together vs. storing a large number of long strings which cover large stretches of text, but aren't very flexible or general. In other words, it's a tradeoff in compression vs. computation.

Here are my impressions on first reading.

What I really love about this tech report is that it unpacks a lot of typically presumed knowledge right inside it.

So if you didn't know about PCFGs or memoization, or Chinese restaurant processes, or non-parametric Bayesian methods before, you can get a lot of what you need to know right there. Of course, The reason a typical conference or journal paper doesn't include such thorough background, is simply that there isn't the space for it. Moreover, one can usually assume that the audience has the appropriate background, or knows how to acquire it. Nonetheless, I find it a great pleasure to read something that assumes an educated audience that isn't intimidated by statistical models or equations, but might not know every niche term involved in such a specialised task.

Here are some ways in which reading this paper helped me to grok non-parametric Bayesian techniques.

I had never thought of LDA and related algorithms as stochastic memoisation, which is how they are described here.

"A stochastic memoizer wraps a stochastic procedure [i.e a sampler] in another distribution, called the memoization distribution, which tells us when to reuse one of the previously computed values, and when to compute a fresh value from the underlying procedure. To accomplish this, we generalize the notion of a memotable so that it stores a distribution for each procedure–plus–arguments combination."

I like this description because it is immediately understandable to someone who has used dynamic programming. We know the value of limiting recomputation (and, again, if you don't, the classic Fibonacci example is right in the paper!), and now we see this generalised to probabilistically either using the cached value or resampling. As the authors explain:

"If we wrap such a random procedure in a deterministic memoizer, then it will sample a value the first time it is applied to some arguments, but forever after, it will return the same value by virtue of memoization. It is natural to consider making the notion of memoization itself stochastic, so that sometimes the memoizer returns a value computed earlier, and sometimes it computes a fresh value."

I have seen several different presentations of LDA, and not once was it described in this intuitive way.

Further, we can see how using the Chinese Restaurant Process, which is biased to sample what has been sampled before as a "simplicity bias":

"all else being equal, when we use the CRP as a stochastic memoizer we favor reuse of previously computed values."

An assumption that Gibbs sampling relies on was made clear to me in the explanation of exchangeability.

"Intuitively, exchangeability says that the order in which we observed some data will not make a difference to our inferences about it. Exchangeability is an important property in Bayesian statistics, and our inference algorithms below will rely on it crucially. It is also a desirable property in cognitive models."

"Pitman-Yor processes, multinomial-Dirichlet distributions, and beta-Binomial distributions are all exchangeable, which means that we are free to treat any expression e(i) ∈ E as if it were the last expression sampled during the creation of E. Our sampling algorithm leverages this fact by (re-)sampling each p(i) ∈ P for each expression in turn."

Even though I knew that the exchangeability was necessary for taking products, that is, that permutations don't effect the joint distributions, I hadn't thought about the way this frees us in our sampling order. If we wanted to add some kind of recency effects to our models, order would, of course, become important.

The real meat of the paper, though, is in describing Fragment Grammars as contrasted with Adaptor Grammars.

This will likely be the topic of the next post.

Two inotify Pitfalls

All I wanted was to track some files.

I have a set of directories populated with files that are hardlinked from other places. This serves as a kind of database for associating tags to files. I wanted to write a daemon that notifies changes to the files that might affect the consistency of the database. For example, if the file is removed, I want my database to let go of its reference.

The linux kernel has a set of system calls for this called inotify, and Twisted has a module to support that API. Not only that, but the module documentation has everything you need.

from twisted.internet import inotify
from twisted.python import filepath

def notify(ignored, filepath, mask):
    For historical reasons, an opaque handle is passed as first
    parameter. This object should never be used.

    @param filepath: FilePath on which the event happened.
    @param mask: inotify event as hexadecimal masks
    print "event %s on %s" % (
        ', '.join(inotify.humanReadableMask(mask)), filepath)

notifier = inotify.INotify()
notifier.startReading()"/some/directory"), callbacks=[notify])

Pitfall #1

Great! I copy the code, change the path from "/some/directory" to "/home/amber/somedir", and start it up.

  • I make a file, "f", and am notified of its creation, and some attribute change.
$ touch f
event create on FilePath('/home/amber/somedir/f')
event attrib on FilePath('/home/amber/somedir/f')
  • I hard link it with another name
$ ln f h
event create on FilePath('/home/amber/subdir/h')

Ok. It logged the creation of h for me, but hasn't this changed the links attribute of f? Why wasn't I notified of that?

  • I try modifying it.
$ echo hello >> h
event modify on FilePath('/home/amber/subdir/h')

I'm further non-plussed. This event should have modified both f and h, but I was only notified of the one used in the command.

Finally I try what I really want. * I make a hard link to f from outside of "somedir/", and modify it through there.

$ ln f ../h
$ echo hello >> ../h
(no response)

What's going on?

Inotify takes a pathname. If the pathname is a directory, then it watches the directory, but this is not the same as watching each file in the directory.

Pitfall #2

Glad to have figured out my error, I try again, modifying the pathname argument in the daemon to "somedir/f". I remove all those files, touch f, and start the daemon again. This time it does what I want.

$ ln f h
event attrib on FilePath('/home/amber/subdir/f')
$ ln f h
$ echo hello >> h
event attrib on FilePath('/home/amber/subdir/f')
event modify on FilePath('/home/amber/subdir/f')
$ ln f ../h
$ echo hello >> ../h
event attrib on FilePath('/home/amber/subdir/f')
event modify on FilePath('/home/amber/subdir/f')

But wait!

I was about to call it good, when I decided to try modifying the file with vim or emacs. I deleted all those files again, touched f, and this time modified it with vim. On saving the file, I get this:

event move_self on FilePath('/home/amber/subdir/f')
event attrib on FilePath('/home/amber/subdir/f')
event delete_self on FilePath('/home/amber/subdir/f')

What's going on?

It turns out that vim and emacs, and who knows what else, have a trick to save backups while in use.

To see what happens, I edited the daemon to watch the directory again, and also to print some stats about the files:

from twisted.internet import inotify, reactor
from twisted.python import filepath
import os

def notify(ignored, filepath, mask):
    For historical reasons, an opaque handle is passed as first
    parameter. This object should never be used.

    @param filepath: FilePath on which the event happened.
    @param mask: inotify event as hexadecimal masks
    print "event %s on %s" % ( ', '.join(inotify.humanReadableMask(mask)), filepath)
    for f in os.listdir(fp):
        stat = os.stat(os.path.join(fp, f))
        print f, "mode:", stat.st_mode, "inode:", stat.st_ino

fp = "/home/amber/subdir"
notifier = inotify.INotify()
notifier.startReading(), callbacks=[notify])

Now I run it and open f with vi.

$ vi f
event create on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event create on FilePath('/home/amber/subdir/.f.swpx')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/.f.swpx')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event create on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event attrib on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363


event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363

(modify manually)

event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363

(save manually)

event create on FilePath('/home/amber/subdir/4913')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event attrib on FilePath('/home/amber/subdir/4913')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/4913')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event moved_from on FilePath('/home/amber/subdir/f')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event moved_to on FilePath('/home/amber/subdir/f~')
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event create on FilePath('/home/amber/subdir/f')
f mode: 33204 inode: 2307371
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event modify on FilePath('/home/amber/subdir/f')
f mode: 33204 inode: 2307371
f~ mode: 33204 inode: 2307370
.f.swp mode: 33188 inode: 2307363
event attrib on FilePath('/home/amber/subdir/f')
f mode: 33204 inode: 2307371
.f.swp mode: 33188 inode: 2307363
event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307371
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/f~')
f mode: 33204 inode: 2307371
.f.swp mode: 33188 inode: 2307363


event modify on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307371
.f.swp mode: 33188 inode: 2307363
event delete on FilePath('/home/amber/subdir/.f.swp')
f mode: 33204 inode: 2307371

As far as I can tell, the result is as if they have renamed f to something else like f~, copied the contents of f~ to a new file named f, modified f, and finally deleted f~. This is essentially copy-on write.

But inotify, while taking pathnames as arguments and returning pathnames, actually tracks inodes. So simply using an editor has the effect of moving the file to a different inode and thereby breaks inotify!

This is ultimately a consequence of using aliases to files (pathnames) as if they were canonical references to files (inodes).

Post Script: Lucky break?

As it happens, the behaviour of vim and emacs is different when the inode holding the file has more than one reference. I can prevent the inode from disappearing by making a hardlink to the file before opening it with an editor. The editor must have recognised that it can't move inodes willy-nilly when other pathnames depend on it. This maps exactly to my original scenario, and therefore might make it safe for me to use. On the other hand, my whole confidence in the bahaviour is undermined, and I am reluctant to rely on it.

Pólya's Urn

Skip to interactive demo: Try it!

Balls in Urns

If you have studied probability, you are probably familiar with the canonical balls-in-an-urn allegory for understanding discrete probability distributions. For example, you could imagine an urn containing 1 red ball and 3 green balls. Drawing a ball from the urn at random represents sampling from a probability distribution where the probability of one outcome is \(25\%\) and the probability of the other outcome is \(75\%\) We can extend this idea in a variety of ways.

Pólya's Urn

In Pólya's Urn, the extension is that whenever you draw a ball from the urn, you not only replace it, but you add an extra ball of the same colour. So if you happened to draw a green ball in the example above, then the ratio would change from \(1:3\) to \(1:4\) . That means on the next draw, you would now have only a \(20\%\) chance of drawing red. On the other hand, if you happened to have drawn red, then the ratio would change to \(2:3\) , giving red a probability of \(40\%\)

This process is interesting, because it has the property that the more often you observe something, the more likely you are to observe it again.

Different starting conditions

The way the distribution changes over time depends on the starting conditions.

One of each

Let's imagine the simplest case, in which we start with one each of two colours, red and green. The following table shows the probabilities of getting red on the first three draws, and how each draw changes the probability of the next by changing the proportion of colours in the urn.

Draw p(Draw) new R:G
R \(1/2\) 2:1
R \(2/3\) 3:1
R \(3/4\) 4:1

There are more ways to have drawn two of one colour, and one of the other, than 3 of one colour. However, because of the way drawing a particular colour reinforces itself, there is a \(50\%\) chance of drawing the same colour every time over the first three draws.

First three draws probability
RRR \(1/2 \times 2/3 \times 3/4 = 1/4\)
RRG \(1/2 \times 2/3 \times 1/4 = 1/12\)
RGR \(1/2 \times 2/3 \times 1/2 = 1/6\)
RGG \(1/2 \times 2/3 \times 1/2 = 1/6\)
GRR \(1/2 \times 2/3 \times 1/2 = 1/6\)
GRG \(1/2 \times 2/3 \times 1/2 = 1/6\)
GGR \(1/2 \times 2/3 \times 1/4 = 1/12\)
GGG \(1/2 \times 2/3 \times 3/4 = 1/4\)

Ten of each

Now suppose that we start with 10 each of red and green balls. In this case, simply drawing a red ball the first time does not change the probability that it will be drawn again nearly as significantly as with the \(1:1\) starting conditions. The probability of drawing 3 of the same colour in a row falls to \(2 \times 10/20 \times 11/21 \times 12/22 = 2/7 \cong 29\%\)

We can view the starting conditions as a list of numbers, one for each starting colour, and call it alpha (\(\alpha\) ). So our first example had \(\alpha = [1, 3]\) , our second example had \(\alpha = [1, 1]\) , and our third example had \(\alpha = [10, 10]\) ,

Higher returns

On the other hand, imagine if we started with 1 each of red and green, but instead of increasing the number of balls by 1 when we draw a colour, we increased it by 10. Now every draw has a much stronger effect. The probability of drawing the same colour 3 times in a row would now be \(2 \times 1/2 \times 11/12 \times 21/22 = 7/8 \cong 88\%\)

We could even have a particular increase number for each colour, and have another list, called beta (\(\beta\) ).

More colours

Another way to change the starting conditions is to increase the number of colours. If our starting urn had one each of 10 different colours, then, again, when we draw the first ball, it has much less of an effect on the chance of drawing it again. We can call the number of colours \(n\) .

Try it!

Use the sliders to choose \(n\) colours, and a single \(\alpha\) and \(\beta\) for all colours. Try drawing balls from the urn, and see how the urn changes. At any time you can display the urn in rank order or reset to the current slider position.

n: α: β:


The Grammaticality Continuum

Yesterday I was thinking about implementing Latent Dirichlet Allocation (LDA). LDA is used for topic modelling — inducing a set of topics, such that a set of natural language documents can be represented by a mixture of those topics. This is then used to estimate document similarity, and related information retrieval tasks.

The first step in such a project is to tokenise — to break up the text into words, removing attached punctuation, and regularising things like capitalisation. When looking at the words in a document for the purposes of topic modelling, it seems appropriate to merge word forms with the same root, or stem, instead of having each form of the "same" word represented individually. The canonical way to tokenise for topic modelling involves stemming, and it also involves removing stop words — words like "the", and "and" that are more syntactic than semantic.

I am not entirely convinced that this latter is appropriate. The reason is that the grammatically of words exists on a continuum. Even the word "the" carries semantic weight, though its main function is probably to signal the boundaries of syntactic chunks.

My favourite example of the syntactic function of "the" comes from Petr Beckmann 's book The structure of language: a new approach, which has profoundly influenced me since my undergraduate days. In it he shows how the following headline is disambiguated by the placement of "the" before or after "biting":

"Man Refuses to Give up Biting Dog"

A couple of years ago at the NAACL conference, there was a session where a few prominent computational linguists presented their favourite papers from the past. Eugene Charniak presented Kenneth Church's 2000 COLING paper: Empirical Estimates of Adaptation: The chance of Two Noriegas is closer to \(p/2\) than \(p^2\) . It introduced a measure of adaptation for language models based on how much a recent occurrence of a word increases its tendency to occur beyond what is already expected.

Charniak used this paper as a background with which to present a new idea about the way the prior likelihood of a word predicts its future occurrences. He divided words into sets according to how well their priors predicted them. Stop words were most likely to be true to priors, and content words least, with verbs at the end of the spectrum.

At the time I took this as evidence for the stratification of grammaticality. Because of this stratification, treating stop words as a special set appears arbitrary and oversimplified. I expect that leaving stop words in a topic model would simply result in having some topics that are distributed more evenly throughout the corpus of documents. These topics would discriminate among documents poorly. However, this result should also be part of a continuum. It may be useful to account for the distribution of topics when using them in LDA, in a way analogous to inverse document frequency in latent semantic analysis.

More generally, I am interested in the phenomenon that words vary in their semantic and syntactic load. Even just within semantics, a morpheme may carry more than one meaning that cannot be decomposed linguistically. For example, "uncle" is male, and while we could assign "uncle" a set of semantic features in a computer system in order to reason about its meaning, those features are only implicit in English. In logographic writing systems this is all the more apparent.

This simultaneity of features in an apparently linear system is, to me, one of the most interesting aspects of language, and one of the reasons computational linguistics is difficult and rewarding.

Productively Lost

Yesterday Hacker School's resident Mel Chua shared her work on educational psychology theory for Hacker School [1]. I had seen an earlier iteration of this talk from PyCon video archives, and it was useful to me then. However, this time I had more relevant experience with which to understand it. Hacker School is the first time I have had such a fluid and indeterminate educational experience. Even graduate school was more structured, and with more fixed goals.

I have previously compared Hacker School to a game of Tetris, in which new exciting things are constantly dropping from the sky, and you can't get them all and fit them all into your life. Eventually you will lose, but it is fun to try, anyway. I like this analogy, but in some ways it is too passive. Hacker School (and life in general, if you let it) is more like a giant maze with more and more doors appearing all the time. Many paths connect to each other, and you may find yourself back where you were before, but from a new perspective. Here I can see more clearly than ever before the unboundedness of the space of learning, and this makes the idea of a best path through it almost laughable. That's not to say that there are no poor ways to learn. Only that that are many good ways.

One central message from Mel's talk was the idea of being Productively Lost. Given that you are your own guide in an infinite maze makes being lost natural. The question is how to make the best of your learning given that situation.

Mel talked about using measurement to guide learning, in analogy with Test-Driven-Design. She talked about how to most effectively join an open source project so that you can maximise your interactions and contributions for everyone's benefit, and for your own development. There was also a section on motivation, self-efficacy, and attitudes.

She mentioned different learning styles, and followed up later in the day with a workshop on the topic. I found this enormously helpful, because instead of just coming out with a label, which I have done in the past with this kind of theory, I was able to see strategies that make better use of my strengths. By reviewing my experiences at Hacker School so far, and relating them to these axes, I feel I am in a better position to enhance my learning experiences deliberately.

Mel also talked about the progression of learning. Learning tends to follow a cyclical pattern of periods of assimilation of new ideas into an existing mental model followed by a paradigm shift that requires accommodation. Accommodation is needed when new ideas are fitting less well into the existing model, and an extensive refactorisation makes everything fit more naturally. This stage is slow and uncomfortable, and may even feel like a regression. After this, there is a shorter period during which learning new things with the new model is fast and rewarding, before reaching another steadier state of assimilation.

Even though I have taken my own way on some critical aspects of my life, much of my life is characterisable by following paths that were set by someone else, or were simply unexamined pursuit of "the way things are done". Applying to Hacker School in the first place was a big, intimidating step away from this pattern that stretched my courage. It rivals the most rewarding decisions of my life so far. The increased autonomy and competence I am developing here feels like a new freedom, a tipping point into a feedback loop of self-expression and creative action that goes way beyond any particular programming concept I have learned while here.

Becoming comfortable with this fundamental lostness, and yet feeling adequate to navigate it, is ultimately much more empowering than the security of excelling at following well-lit, paths sanctioned and rewarded by others.


[1] Slides from last year's version here:

Addresses and Contents

What are we naming?

The name of the song is called "Haddocks' Eyes."'

'Oh, that's the name of the song, is it?' Alice said, trying to feel interested.

'No, you don't understand,' the Knight said, looking a little vexed. 'That's what the name is called. The name really is "The Aged Aged Man."'

'Then I ought to have said "That's what the song is called"?' Alice corrected herself.

'No, you oughtn't: that's quite another thing! The song is called "Ways and Means": but that's only what it's called, you know!'

'Well, what is the song, then?' said Alice, who was by this time completely bewildered.

'I was coming to that,' the Knight said. 'The song really is "A-sitting On A Gate": and the tune's my own invention.'

Alice Through the Looking Glass

Mutability applies to content

I have been designing a system that seeks to change the way we name our data: instead of naming the hierarchical way, using directories (or folders), it will name the category way, using tags. Because the system needed an ID for every file it tagged, and perhaps for no other good reason than that I am fond of content-based hashing, I chose to use a content-based hash for the IDs. However, this ID is not a good match for mutable files. I had been putting off thinking about how to handle mutable files, figuring I could add that functionality later.

When I was finally ready to confront mutable files, I realised that I was running up against a fundamental issue:

People often conflate addresses with contents when naming things

Take for example the problem of citation. If you cite a book or a research paper, then what you are citing is the contents. When we first started wanting to cite URLs, we treated them as though we were addressing content, but we aren't. URLs point to content that can change.

Of course, this issue shows up in many areas of computer science whenever we use references.


In a filesystem, we usually refer to files by their paths. The relationships between paths, addresses (inodes), and contents are shown here:

That is, a pathname refers to a single inode, but not necessarily vice versa. An inode has exactly one set of contents, but those contents may be replicated in many inodes in the system. I didn't recognise at first that the problem of mutable and immutable files is the distinction between addresses and contents.

As far as file systems go, I don't know of any that make the distinction between contents and addresses, except Tahoe-LAFS. The reference to an immutable file in Tahoe is content based for integrity and deduplication. The reference to a mutable file just specifies a unique "slot".

I've decide to follow this paradigm in Protagonist. Both addresses and contents should be eligible for tagging. Tagging the box will use the inode. Tagging the contents will use the hash.

How I transformed a Blogger blog to a Nikola two-blog website

In the last post I described why I wanted to move my blog, and what went into my choices. My goal in exporting my site from Blogger to Nikola was to separate my content into two separate blogs on a single site: one which is the primary blog, and the other which resides in the background. I wanted them to be archived separately. I also wanted there to be a front page that didn't change every time I wrote a new post.

Importing the blog

Nikola has a plugin to import a blog from Blogger, so the first step was to import the blog. To import, I took the following steps:

  • First, I exported my blog from Blogger, which gave me a file called blog-07-05-2014.xml. Then,
$ sudo pip install nikola
$ nikola plugin -i import_blogger
$ nikola import_blogger -o site blog-07-05-2014.xml # The -o tells nikola what directory to make the site in.

After this, I had some cleanup to do. For one thing, Nikola renamed all the Blogger files (in a sensible way). Blogger has date-derived subdirectories for posts, whereas in Nikola all the posts are kept in one folder, called "posts" So when the new site is up, redirects will be required for anyone with the old link.

The import gave me a file called url_map.csv, which contained all the information necessary to redirect the old links to their new locations, but as far as I could tell, those redirects still had to be encoded into the configuration file to take effect. Since I only had a few, I did this manually. For every line in the url_map, I inserted a tuple into the REDIRECTIONS list in

The result at this point was a Nikola blog that contained everything the old blog did.

Making it not a blog

There is a document in the Nikola project describing how to make a non-blog site. The instructions boil down to changing three lines of

Posts and Pages

As usual, Nikola distinguishes two different kinds of text-derived files.

  • Posts are blog files. They are archived, indexed by tag, and ordered by date for display.
  • Pages are essentially independent.

So to make a blog into a non-blog, you simply manipulate the variable POSTS defined in POSTS and PAGES are lists describing where to find posts or pages respectively, where to render them, and what template to use for them. We let the POSTS list be empty, so everything on the site will be a page.

POSTS = []
        ("pages/*.rst", "", "story.tmpl"),
        ("pages/*.txt", "", "story.tmpl"),

The two entries for PAGES are here to allow either txt or rst, but the one that comes first is what will be used when you use the new_post command.

So you can create a page called "index.html" by running new_post -p, and giving it the name index.

$ nikola new_post -p


Creating New Page

Title: index

INFO: new_page: Your page's text is at: pages/index.rst

Since we put it at "", which is the top level of the site, it will be what you see on the "front page".

Unfortunately, this creates a conflict, because when you build the site, the blog part is already wired to make an index.html file in the top-level directory, since that's how blogs normally appear. So you intercept this by adding to the line:

INDEX_PATH = "blog"

This just makes it so the blog index is now created under the folder blog, instead of the top level, and it no longer conflicts.

It also means that now you can have a regular blog under the subdirectory blog, by putting options back in for POSTS:

POSTS = [("posts/*.rst", "blog", "post.tmpl")]

But this is not enough for us, because we have imported posts from Blogger that were also being found from the POSTS list. There is another entry in POSTS that we need back, that tells Nikola to also collect and render existing html files, such as those we imported. So we need to add:

("posts/*.html", "blog", "post.tmpl")

back into the list.

Two blogs

At this point I had part of what I wanted. I had a front page separate from my blog, and all of my previous blog reachable from the site. But I still wanted to have two blogs, a primary one for themed entries, and a journal for unstructured reflections.

Here are the steps I took to factor the blogs apart:

  • Made a new directory for the journal posts called "journal", and moved the appropriate files into it.
  • Added ("journal/.rst", "journal", "story.tmpl") and ("journal/.html", "journal", "story.tmpl") to the PAGES list, so old and new journal entries can be found for rendering.
  • Updated the REDIRECTIONS to reflect those

This worked to render them and include them in the site, but the journal articles were not indexed. That meant that if I knew the URL, I could go to the article, but a visitor to the site could never discover them.

To add indexing I had to add to


But again, this created a conflict with multiple files called "index.html" trying to go in the same folder. So I also had to change the name of the index. I chose "index.htm", so that the server would automatically redirect.

INDEX_FILE = "index.htm"

Finally, I wanted the journal to be findable without knowing the directory name "journal", so I updated the navigation links:

     ("/archive.html", "Archive"),
     ("/categories/", "Tags"),
     ("/blog/", "Blog"),
     ("/journal/", "Journal"),
     ("/rss.xml", "RSS feed"),

As an extra configuration tweak, I set TAG_PAGES_ARE_INDEXES = True, so that when you go to the page for a given tag, it renders the posts themselves, rather than a list.

I would like to do that for my journal index as well, but that feature is not yet general, so if you navigate to "Journal" you will get a list of posts, and unfortunately, since it is a journal, they are named by date. Moreover, their tags aren't collected.

Parting thoughts

All in all, I'm satisfied with the move. I got a lot of help from the Nikola community, and my main requirements are fulfilled.

There are a few remaining troubles.


I signed up with Disqus, and think I have initiated the process of importing my old comments, which I was reluctant to lose. It takes an unspecified amount of time to complete, so I'm hoping that will take care of itself now, but I'm uncertain.

Orphaned rst

Because of the way I built my Blogger site: writing in reStructuredText, converting to HTML, and uploading, I still have the original, pristine rst files on my local system, but Nikola doesn't use them. It uses the backported HTML from Blogger. In order to inject the old rst files into Nikola, however, would require manually editing them all to include the correct headers and timestamps. This seems like a lot of work, and I'm not willing to do it right now.

Moving my site off of Blogger

I was having two problems with the setup of this website. First, I wanted to factor out the experiment in vulnerability and transparency that I have been doing by keep a log of my daily goals, progress, and insights at Hacker School. I like the experiment, but I wanted it to be separate from articles I write more deliberately.

The second problem was that I wanted to migrate from Blogger.

Blogger has advantages. It is easy to set up, and freely hosted. It has themes, comments, and a variety of plugins. You can export your stuff if you want to, so you aren't completely locked in.

However, Blogger is not a good match for someone who wants fine-grained control over her content. My use of Blogger for the three sites I have hosted on it has consisted of the following elaborate dance:

  1. Edit my post in reStructuredText [1].
  2. Convert my post into HTML using a custom script I adapted from rst2blogger [2].
  3. Cut and paste my post into the Blogger compose form, click Preview, and see if it looked ok.
  4. Repeat until all typos and other issues were resolved.
  5. Click Publish.

The result of all this work was a site that looked more-or-less how I wanted it to in some ways, but was frustrating in others. For example,

  • I couldn't change the css styles that went with a given theme, (and some of them were really dysfunctional).
  • I couldn't make the site not a blog — the blog assumption is that your most recent content ought to be your most prominent, and this is not an appropriate assumption for some of my use cases. I sometimes found myself putting off making a post that was less compelling until I knew I could follow it with a better one quickly! The restrictiveness wasn't serving me.

Also, a little independence from Google seems healthy.

What I really wanted was a static site generator, with no dynamic logic on a server-side database. I wanted my whole site complete and rendered on my local machine to do with what I liked. In other words, I wanted a static site generator.

Choosing a static site generator

My criteria were:

  • Open source
  • Python
  • Works to generate non-blog sites.

This gave me two choices that I knew of: Pelican, and Nikola.

I have had my eyes on Pelican for some time now. I didn't like the way the resulting websites looked, though, until recently. They had a jarring "I'm a programmer, not a designer" feel. Being a programmer and not a designer, I can't articulate it much more precisely than that. These days, the sites look fine to me. The docs are well written, the code looks good, and people whose opinions about such things I respect use it.

In the meantime, I also found out about Nikola, which was recommended by another respected coder-friend.

I decided to use Nikola, in part because it has a plugin to import from Blogger, and in part because the above-mentioned friend offered to help me.

In the next post, I'll describe how I ported my Blogger blog into a Nikola blog.

[1] I used to write in pure HTML, but after much goading from Zooko, I switched to rst. I'm glad; I find rst more flexible.
[2] I didn't want the script to deploy it, because I didn't want to trust the script with my Google authentication, so I took that part out. I also changed some heading styles that Blogger doesn't render well.

Literally ironic semantic shifts are actually very typical

I recently saw a story about a whimsical browser plug-in written by Mike Lazer-Walker(incidentally, a Hacker Schooler), that substitutes all occurrences of the word 'literally' with the word 'figuratively'. I posted it with (almost) no comment on Facebook, to which the only response was from a Waterloo friend, who actually found the substitution annoying, because, he argued, 'figuratively' is not what is usually meant.

Of course, he is right about that. 'Literally', like 'very', 'really', 'truly', and in some cases 'actually' [1], share a common semantic shift. They all originally meant something like 'do not take the following as exaggeration or metaphor' (insofar as any language use can be non-metaphorical), and have all become intensifiers. This is, of course, somewhat ironic (in the situational sense of 'ironic', which is neither the literary sense, nor the emerging sense of 'not what I wanted' [2]).

I don't cringe when I hear 'very', except insofar as it is often vacuous. That I do cringe on hearing 'literally' used as an intensifier is probably because, unlike 'very', the current use which is very old, this meaning of 'literally' emerged in my adulthood, and so it sounds unnatural to me. When wearing my linguist hat, I like to embrace diversity and change, rather than taking the stance that my dialect is 'correct'. The strongest stance I can fairly take is that in my dialect, the new meaning is unacceptable, or that using this meaning has certain social implications about the speaker.

But rather than taking the plug-in to be suggesting an actual solution, I take it in the spirit of Mark Twain's advice: 'Substitute 'damn' every time you're inclined to write 'very;' your editor will delete it and the writing will be just as it should be.' Once the reader or writer sees 'figuratively', the appropriate transformation should be more readily obvious.

More likely, it is simply an elaborate joke made by a clever person while exploring how to write plug-ins. In any case, I liked the idea, both for its wit, and for the other fun linguistics ideas it points to, including, for example, tranzizzle (hat tip to Noah Smizzle).

The question I am left with is, given this development, what word do I use now if I want the literal meaning of literal?

[1] 'Actually' is often used to make a contrast.
[2] See this comic:

Transparent Learning

I have applied to Hacker School for the Summer 2014 batch. I'm immensely excited about it for various reasons, but the one I wanted to mention here now is the attitude toward learning that Hacker School promotes.

One of the few Rules of Conduct at Hacker School is not to feign surprise when someone doesn't know something.

[No feigning surprise] means you shouldn't act surprised when people say they don't know something. This applies to both technical things ("What?! I can't believe you don't know what the stack is!") and non-technical things ("You don't know who RMS is?!"). Feigning surprise has absolutely no social or educational benefit: When people feign surprise, it's usually to make them feel better about themselves and others feel worse. And even when that's not the intention, it's almost always the effect. As you've probably already guessed, this rule is tightly coupled to our belief in the importance of people feeling comfortable saying "I don't know" and "I don't understand."

The other side of the coin of not feigning surprise is being transparent about your own learning.

When I meet smart people I admire, I am usually eager to show them that I am like them. So I look for opportunities to demonstrate that we share ideas and values. In a programming community, this could translate into finding a reason to bring up topics such as stacks or RMS. There is nothing inherently wrong with this. It is normal and healthy to establish common ground. It feels good.

The problem comes when you want to interact with someone as you continue to learn. If you can't comfortably say “I don't know what a stack is.”, then you deny yourself and your peers the opportunity to collaboratively change that. More importantly, your silence reinforces the idea that it is not okay to not know or not understand. It is more subtle than feigning surprise, but not necessarily less powerful.

In the process of applying to Hacker School, I've looked at the blogs of some of the facilitators. In her blog, Allison Kaptur takes this concept of transparency a step further. For example, in a post in which she teaches about the Python interpreter, Allison writes (emphasis mine):

There are four steps that python takes when you hit return: lexing, parsing, compiling, and interpreting. Lexing is breaking the line of code you just typed into tokens. The parser takes those tokens and generates a structure that shows their relationship to each other (in this case, an Abstract Syntax Tree). The compiler then takes the AST and turns it into one (or more) code objects. Finally, the interpreter takes each code object executes the code it represents.

I’m not going to talk about lexing, parsing, or compiling at all today, mainly because I don’t know anything about these steps yet. Instead, we’ll suppose that all that went just fine, and we’ll have a proper python code object for the interpreter to interpret.

To me, this feels like radical activism. Even if I don't get in to Hacker School, I want to learn this skill of portraying myself authentically, even if it exposes some vulnerability. Fundamentally, it's about separating self-worth from knowledge, and getting over Imposter Syndrome.

As a step in this direction, I am posting the fact that I have applied to Hacker School, even though I may not be admitted. If that happens, I will be very disappointed, and it will be embarrassing for that to be public, but I want to say that it's okay to fail at things, and it's okay to make mistakes.