Skip to content

Generates random text in the style of a given corpus

Notifications You must be signed in to change notification settings

jamais-vu/markov-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

markov-analysis

Generates random text in the style of a given corpus.

This code is my solution to Exercise 13.8 of Think Python by Allen Downey, released under the CC BY-NC 3.0 license.

Usage

mashup(n, text_length, *files)

Generates a string of random text of text_length words from user-specified text *files. Words are generated one by one, and each subsequent word is randomly chosen from a probability distribution of which words in the source texts tend to follow the previous n words (referred to as an n-gram).

Goals

Exercise 13.8

  1. Write a program to read a text from a file and perform Markov analysis. The result should be a dictionary that maps from prefixes to a collection of possible suffixes.

  2. Add a function to the previous program to generate random text based on the Markov analysis.

  3. Once your program is working, you might want to try a mash-up: if you combine text from two or more books, the random text you generate will blend the vocabulary and phrases from the sources in interesting ways.

Further Exploration

My solution is likely not the most efficient, and since writing it I have become more familiar with Python. I would like to develop intuitions for how to determine which algorithms or data structures suit which applications, and how to efficiently implement these in Python.

Some options:

  • Refactor my solution.

  • Become familiar with timeit or cProfile. Profile my code so I know which parts are slowing it down.

  • Compare my original solution to my refactored solution, or to the solution provided by the author,

  • Add an option for using characters as the n-gram items.

About

Generates random text in the style of a given corpus

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages