Markov chain text generator

Back in December, I was learning about Markov chains in my linear algebra class, and I read on Wikipedia that they could be used to generate real-looking text from a bunch of source texts. Since I wanted to procrastinate on my college applications anyway, I decided to make my own text generator!

How it works is that it takes a source document and records the probability that certain words appear after each word. For example, if the sentence is “I think that I will think of the will that I wrote,” it would record that “I” is followed by “think” a third of the time, “will” a third of the time, and “wrote” a third of the time. “Think” is followed by “that” and “of” half of the time each, and “that” is followed by “I” a hundred percent of the time, etc.

Visually:

Markov chain diagram

Once the program records the probability, it can generate realistic-sounding texts. It’ll start with the root node (I) and follow random paths until it reaches a node that doesn’t go anywhere. For example, a text it might generate is: “I will think of the will that I will think that I wrote.” Obviously, it doesn’t sound perfect, but if you feed it more source text, it usually produces some pretty hilarious text.

Here’s the demo. Paste some source text into the first box, click Add, and repeat. Then, click Generate, and it should produce some Markov chain text! You can keep clicking Generate for new text. I’ve prepopulated the source textbox with some text about lions from Wikipedia, but feel free to try it with something else!

Sadly, the generator wasn’t able to produce a usable essay for my college applications.


Here’s the HTML:

<textarea id="in" rows="15" cols="50"></textarea><br />
<button id="add">Add</button> <button id="generate" disabled="disabled">Generate</button><br />
<textarea id="out" rows="15" cols="50"></textarea>

And here’s the JavaScript to go with it:

// Returns the source text
function get() { return document.getElementById('in').value; }

// Clears the source textbox
function clear() { document.getElementById('in').value = ''; }

// Writes to the output textbox
function set(v) { document.getElementById('out').value = v; }

// Holds the state information
var cache = {
    '_START': []
};

document.getElementById('add').onclick = function () {
    // Get the source text and split it into words
    var text = get().split(/\s+/g);
    
    if (!text.length)
        return;

    document.getElementById('generate').disabled = false;

    // Add it to the start node's list of possibility
    cache['_START'].push(text[0]);
    
    // Now go through each word and add it to the previous word's node
    for (var i = 0; i < text.length - 1; i++) {
        if (!cache[text[i]])
            cache[text[i]] = [];
        cache[text[i]].push(text[i + 1]);
        
        // If it's the beginning of a sentence, add the next word to the start node too
        if (text[i].match(/\.$/))
            cache['_START'].push(text[i + 1]);
    }
    clear();
};

document.getElementById('generate').onclick = function () {
    // Start with the root node
    var currentWord = '_START';
    var str = '';
    
    // Generate 300 words of text
    for (var i = 0; i < 300; i++) {
        
        // Follow a random node, append it to the string, and move to that node
        var rand = Math.floor(Math.random() * cache[currentWord].length);
        str += cache[currentWord][rand];
        
        // No more nodes to follow? Start again. (Add a period to make things look better.)
        if (!cache[cache[currentWord][rand]]) {
            currentWord = '_START';
            if (!cache[currentWord][rand].match(/\.$/))
                str += '. ';
            else
                str += ' ';
        } else {
            currentWord = cache[currentWord][rand];
            str += ' ';
        }
    }
    set(str);
}

March 26, 2012, 10:45pm by Casey
Categories: Programming | Tags: , | Permalink | Leave a comment

Leave a Reply

Required fields are marked *