Plotting the classics
In this example, we will explore statistics for two classic novels: The Adventures of Huckleberry Finn by Mark Twain, and Little Women by Louisa May Alcott. The text of any book can be read by a computer at great speed. Books published before 1923 are currently in the public domain, meaning that everyone has the right to copy or use the text in any way. Project Gutenberg is a website that publishes public domain books online. Using Python, we can load the text of these books directly from the web.
This example is meant to illustrate some of the broad themes of this text. Don’t worry if the details of the program don’t yet make sense. Instead, focus on interpreting the images generated below. Later sections of the text will describe most of the features of the Python programming language used below.
First, we read the text of both books into lists of chapters, called huck_finn_chapters
and little_women_chapters
. In Python, a name cannot contain any spaces, and so we will often use an underscore _
to stand in for a space. The =
in the lines below give a name on the left to the result of some computation described on the right. A uniform resource locator or URL is an address on the Internet for some content; in this case, the text of a book. The #
symbol starts a comment, which is ignored by the computer but helpful for people reading the code.
# Read two books, fast!
huck_finn_url = 'https://www.inferentialthinking.com/data/huck_finn.txt'
huck_finn_text = read_url(huck_finn_url)
huck_finn_chapters = huck_finn_text.split('CHAPTER ')[44:]
little_women_url = 'https://www.inferentialthinking.com/data/little_women.txt'
little_women_text = read_url(little_women_url)
little_women_chapters = little_women_text.split('CHAPTER ')[1:]
While a computer cannot understand the text of a book, it can provide us with some insight into the structure of the text. The name huck_finn_chapters
is currently bound to a list of all the chapters in the book. We can place them into a table to see how each chapter begins.
# Display the chapters of Huckleberry Finn in a table.
Table().with_column('Chapters', huck_finn_chapters)
Each chapter begins with a chapter number in Roman numerals, followed by the first sentence of the chapter. Project Gutenberg has printed the first word of each chapter in upper case.