﻿ Markov chain text generator | Bits of Pancake

## Markov chain text generator

Back in December, I was learning about Markov chains in my linear algebra class, and I read on Wikipedia that they could be used to generate real-looking text from a bunch of source texts. Since I wanted to procrastinate on my college applications anyway, I decided to make my own text generator!

How it works is that it takes a source document and records the probability that certain words appear after each word. For example, if the sentence is “I think that I will think of the will that I wrote,” it would record that “I” is followed by “think” a third of the time, “will” a third of the time, and “wrote” a third of the time. “Think” is followed by “that” and “of” half of the time each, and “that” is followed by “I” a hundred percent of the time, etc.

Visually:

Once the program records the probability, it can generate realistic-sounding texts. It’ll start with the root node (I) and follow random paths until it reaches a node that doesn’t go anywhere. For example, a text it might generate is: “I will think of the will that I will think that I wrote.” Obviously, it doesn’t sound perfect, but if you feed it more source text, it usually produces some pretty hilarious text.

Here’s the demo. Paste some source text into the first box, click Add, and repeat. Then, click Generate, and it should produce some Markov chain text! You can keep clicking Generate for new text. I’ve prepopulated the source textbox with some text about lions from Wikipedia, but feel free to try it with something else!

Sadly, the generator wasn’t able to produce a usable essay for my college applications.

Here’s the HTML:

<textarea id="in" rows="15" cols="50"></textarea><br />
<textarea id="out" rows="15" cols="50"></textarea>

And here’s the JavaScript to go with it:

// Returns the source text
function get() { return document.getElementById('in').value; }

// Clears the source textbox
function clear() { document.getElementById('in').value = ''; }

// Writes to the output textbox
function set(v) { document.getElementById('out').value = v; }

// Holds the state information
var cache = {
'_START': []
};

// Get the source text and split it into words
var text = get().split(/\s+/g);

if (!text.length)
return;

document.getElementById('generate').disabled = false;

// Add it to the start node's list of possibility
cache['_START'].push(text[0]);

// Now go through each word and add it to the previous word's node
for (var i = 0; i < text.length - 1; i++) {
if (!cache[text[i]])
cache[text[i]] = [];
cache[text[i]].push(text[i + 1]);

// If it's the beginning of a sentence, add the next word to the start node too
if (text[i].match(/\.$/)) cache['_START'].push(text[i + 1]); } clear(); }; document.getElementById('generate').onclick = function () { // Start with the root node var currentWord = '_START'; var str = ''; // Generate 300 words of text for (var i = 0; i < 300; i++) { // Follow a random node, append it to the string, and move to that node var rand = Math.floor(Math.random() * cache[currentWord].length); str += cache[currentWord][rand]; // No more nodes to follow? Start again. (Add a period to make things look better.) if (!cache[cache[currentWord][rand]]) { currentWord = '_START'; if (!cache[currentWord][rand].match(/\.$/))
str += '. ';
else
str += ' ';
} else {
currentWord = cache[currentWord][rand];
str += ' ';
}
}
set(str);
}


March 26, 2012, 10:45pm by Casey