P04: Working with text files

Find a target string and then keep the next N lines of text

  • Here we will read in txt from a book (Frankenstein by Mary Wollstonecraft (Godwin) Shelley)

  • We’ll then search for a specific string in the book…in this case we’ll look for the first sentence of Chapter 5

  • We’ll use .readlines() to read the book into a list, where each line of text from the book is an entry in the list (and remember, .readlines() is parsing the text based on where the '\n' (newline) characters are).

  • Then we’ll get rid of everything else in the book except for the few lines at the start of Chapter 5

  • I will show you two methods…one using a counter that we initialize, and one using the enumerate method that allows you to have a counter in a for loop that is initialized and incremented automatically.

  • I’ll also show you the break syntax for a for loop. break allows you to exit the loop if a certain condition is met. In this case, we will exit the loop when we find the text that we’re looking for!

  • Last - there are many ways to do this task…so don’t feel constrained by this approach when you’re working on the problem set…

# open our file for reading...
with open('frankenstein.txt', 'r') as f:
    # read the entire file...with each line 
    # returned as a string in a list
    # we'll call the list 'book'
    book = f.readlines()

# now we can search the book for the text that we want, 
# which in this case is the first sentence of Chapter 5, which 
# starts with the string 'It was on a dreary night' 
search_target = 'It was on a dreary night'

# how many lines of text do we want to keep 
# after we find the search target?
keep_lines = 50

# initialize a counter (int object)
cnt = 0

# loop over each line in the book
for line in book:
    
    # test to see if the current line has search_target in it
    if search_target in line:
        # grab the line where chapter 5 starts
        start_index = cnt
        
        # exit the for loop by calling 'break'
        break
        
    # if the current line does not have our search_target
    # then we'll increment the counter to keep track of how
    # many lines we've read...
    else:
        cnt += 1
        
# now use slicing to just keep the first keep_lines lines 
# after the search_target
book = book[start_index:start_index + keep_lines]

print(book)

# Related to ps3, Q1: How would you do the slicing if you 
# wanted to keep **all** of the text in the book after search_target?
# And how would do the slicing if you wanted to keep all of the text
# after the search_target while excluding the line that contained the 
# search_target? 
['It was on a dreary night of November that I beheld the accomplishment\n', 'of my toils. With an anxiety that almost amounted to agony, I\n', 'collected the instruments of life around me, that I might infuse a\n', 'spark of being into the lifeless thing that lay at my feet. It was\n', 'already one in the morning; the rain pattered dismally against the\n', 'panes, and my candle was nearly burnt out, when, by the glimmer of the\n', 'half-extinguished light, I saw the dull yellow eye of the creature\n', 'open; it breathed hard, and a convulsive motion agitated its limbs.\n', '\n', 'How can I describe my emotions at this catastrophe, or how delineate\n', 'the wretch whom with such infinite pains and care I had endeavoured to\n', 'form? His limbs were in proportion, and I had selected his features as\n', 'beautiful. Beautiful! Great God! His yellow skin scarcely covered\n', 'the work of muscles and arteries beneath; his hair was of a lustrous\n', 'black, and flowing; his teeth of a pearly whiteness; but these\n', 'luxuriances only formed a more horrid contrast with his watery eyes,\n', 'that seemed almost of the same colour as the dun-white sockets in which\n', 'they were set, his shrivelled complexion and straight black lips.\n', '\n', 'The different accidents of life are not so changeable as the feelings\n', 'of human nature. I had worked hard for nearly two years, for the sole\n', 'purpose of infusing life into an inanimate body. For this I had\n', 'deprived myself of rest and health. I had desired it with an ardour\n', 'that far exceeded moderation; but now that I had finished, the beauty\n', 'of the dream vanished, and breathless horror and disgust filled my\n', 'heart. Unable to endure the aspect of the being I had created, I\n', 'rushed out of the room and continued a long time traversing my\n', 'bed-chamber, unable to compose my mind to sleep. At length lassitude\n', 'succeeded to the tumult I had before endured, and I threw myself on the\n', 'bed in my clothes, endeavouring to seek a few moments of forgetfulness.\n', 'But it was in vain; I slept, indeed, but I was disturbed by the wildest\n', 'dreams. I thought I saw Elizabeth, in the bloom of health, walking in\n', 'the streets of Ingolstadt. Delighted and surprised, I embraced her,\n', 'but as I imprinted the first kiss on her lips, they became livid with\n', 'the hue of death; her features appeared to change, and I thought that I\n', 'held the corpse of my dead mother in my arms; a shroud enveloped her\n', 'form, and I saw the grave-worms crawling in the folds of the flannel.\n', 'I started from my sleep with horror; a cold dew covered my forehead, my\n', 'teeth chattered, and every limb became convulsed; when, by the dim and\n', 'yellow light of the moon, as it forced its way through the window\n', 'shutters, I beheld the wretch—the miserable monster whom I had\n', 'created. He held up the curtain of the bed; and his eyes, if eyes they\n', 'may be called, were fixed on me. His jaws opened, and he muttered some\n', 'inarticulate sounds, while a grin wrinkled his cheeks. He might have\n', 'spoken, but I did not hear; one hand was stretched out, seemingly to\n', 'detain me, but I escaped and rushed downstairs. I took refuge in the\n', 'courtyard belonging to the house which I inhabited, where I remained\n', 'during the rest of the night, walking up and down in the greatest\n', 'agitation, listening attentively, catching and fearing each sound as if\n', 'it were to announce the approach of the demoniacal corpse to which I\n']

Find a target string and then keep the next N lines of text but this time use enumerate

  • See here for a nice explanation…

# open our file for reading...
with open('frankenstein.txt', 'r') as f:
    # read the entire file...with each line 
    # returned as a string in a list
    # we'll call the list 'book'
    book = f.readlines()

# now we can search the book for the text that we want, 
# which in this case is the first sentence of Chapter 5, which 
# starts with the string 'It was on a dreary night' 
search_target = 'It was on a dreary night'

# how many lines of text do we want to keep 
# after we find the search target?
keep_lines = 50

# loop over each line in the book using enumerate
# enumerate will automatically create and increment a counter 
# that we'll call 'cnt' and it will give you each line
# in the book just like a normal for loop. 
for cnt, line in enumerate(book):
    
    # test to see if the current line has search_target in it
    if search_target in line:
        # grab the line where chapter 5 starts
        start_index = cnt
        
        # exit the loop by calling 'break'
        break
        

# now use slicing to just keep the first keep_lines lines 
# after the search_target
book = book[start_index:start_index + keep_lines]

print(book)

# Related to ps3, Q1: How would you do the slicing if you 
# wanted to keep **all** of the text in the book after search_target?
# And how would do the slicing if you wanted to keep all of the text
# after the search_target while excluding the line that contained the 
# search_target? 
['It was on a dreary night of November that I beheld the accomplishment\n', 'of my toils. With an anxiety that almost amounted to agony, I\n', 'collected the instruments of life around me, that I might infuse a\n', 'spark of being into the lifeless thing that lay at my feet. It was\n', 'already one in the morning; the rain pattered dismally against the\n', 'panes, and my candle was nearly burnt out, when, by the glimmer of the\n', 'half-extinguished light, I saw the dull yellow eye of the creature\n', 'open; it breathed hard, and a convulsive motion agitated its limbs.\n', '\n', 'How can I describe my emotions at this catastrophe, or how delineate\n', 'the wretch whom with such infinite pains and care I had endeavoured to\n', 'form? His limbs were in proportion, and I had selected his features as\n', 'beautiful. Beautiful! Great God! His yellow skin scarcely covered\n', 'the work of muscles and arteries beneath; his hair was of a lustrous\n', 'black, and flowing; his teeth of a pearly whiteness; but these\n', 'luxuriances only formed a more horrid contrast with his watery eyes,\n', 'that seemed almost of the same colour as the dun-white sockets in which\n', 'they were set, his shrivelled complexion and straight black lips.\n', '\n', 'The different accidents of life are not so changeable as the feelings\n', 'of human nature. I had worked hard for nearly two years, for the sole\n', 'purpose of infusing life into an inanimate body. For this I had\n', 'deprived myself of rest and health. I had desired it with an ardour\n', 'that far exceeded moderation; but now that I had finished, the beauty\n', 'of the dream vanished, and breathless horror and disgust filled my\n', 'heart. Unable to endure the aspect of the being I had created, I\n', 'rushed out of the room and continued a long time traversing my\n', 'bed-chamber, unable to compose my mind to sleep. At length lassitude\n', 'succeeded to the tumult I had before endured, and I threw myself on the\n', 'bed in my clothes, endeavouring to seek a few moments of forgetfulness.\n', 'But it was in vain; I slept, indeed, but I was disturbed by the wildest\n', 'dreams. I thought I saw Elizabeth, in the bloom of health, walking in\n', 'the streets of Ingolstadt. Delighted and surprised, I embraced her,\n', 'but as I imprinted the first kiss on her lips, they became livid with\n', 'the hue of death; her features appeared to change, and I thought that I\n', 'held the corpse of my dead mother in my arms; a shroud enveloped her\n', 'form, and I saw the grave-worms crawling in the folds of the flannel.\n', 'I started from my sleep with horror; a cold dew covered my forehead, my\n', 'teeth chattered, and every limb became convulsed; when, by the dim and\n', 'yellow light of the moon, as it forced its way through the window\n', 'shutters, I beheld the wretch—the miserable monster whom I had\n', 'created. He held up the curtain of the bed; and his eyes, if eyes they\n', 'may be called, were fixed on me. His jaws opened, and he muttered some\n', 'inarticulate sounds, while a grin wrinkled his cheeks. He might have\n', 'spoken, but I did not hear; one hand was stretched out, seemingly to\n', 'detain me, but I escaped and rushed downstairs. I took refuge in the\n', 'courtyard belonging to the house which I inhabited, where I remained\n', 'during the rest of the night, walking up and down in the greatest\n', 'agitation, listening attentively, catching and fearing each sound as if\n', 'it were to announce the approach of the demoniacal corpse to which I\n']

Counting words in text…

  • Here we can count the occurence of each word in the book (or in the part of the book that you have left after slicing the book list in the code cell above).

  • Notice that there are some empty strings '' in the list that we need to deal with (i.e. things that are not words, so we need to check for these so they don’t get counted). We can do that in a few ways that I’ll write out below.

  • Our general algorithm here will be:

    • convert the list to a string to make it easier to clean

    • remove the newline characters using the .replace() method (and if you are working with a string that has other stuff in it that you don’t want you can define a string with all the unwanted characters and loop over each_letter in that string to .replace() each unwanted character using a for loop).

    • convert back to a list

    • initialize an empty dictionary

    • loop over all unique words in the book - use set to get the unique words

      • in this loop check to make sure we’re not considering the empty strings ''

    • whenever we find a real word, create a new key:value pair in a dictionary with the word as the key and an initial value of 0

    • Now that the dictionary is set up and we have one key for each unique word, we can loop over all the words in the book, including repeated words, and increment the value associated with each key

# turn the book list into a string to make it easier to remove things we don't want (like newline 
# characters)
book_str = ''.join(book)

# convert to lower case
book_str = book_str.lower()

# clean out the newline characters
book_str = book_str.replace('\n', ' ')

# turn back into a list based on the location of spaces
book_lst_clean = book_str.split(' ')

# init a dictionary with a key for each unique 
# word, and 0 for the starting count
wc = {}
for w in set(book_lst_clean):
    # because there are some empty strings in our list
    # we can check here and we'll skip them 
    if w != '':
        wc[w] = 0

# now loop over **all** words in the book, even the repeated words
# and count them up!
for w in book_lst_clean:
    if w != '':
        wc[w] += 1

# have a look at the dictionary...
print(wc)
{'they': 3, 'forehead,': 1, 'tumult': 1, 'down': 1, 'while': 1, 'at': 3, 'burnt': 1, 'hard,': 1, 'or': 1, 'body.': 1, 'fearing': 1, 'clothes,': 1, 'curtain': 1, 'as': 6, 'each': 1, 'vanished,': 1, 'folds': 1, 'opened,': 1, 'downstairs.': 1, 'endeavouring': 1, 'vain;': 1, 'hear;': 1, 'features': 2, 'beneath;': 1, 'accidents': 1, 'beautiful.': 1, 'changeable': 1, 'moderation;': 1, 'traversing': 1, 'some': 1, 'stretched': 1, 'nearly': 2, 'finished,': 1, 'one': 2, 'formed': 1, 'endured,': 1, 'ardour': 1, 'detain': 1, 'were': 4, 'covered': 2, 'by': 3, 'light': 1, 'breathless': 1, 'sleep.': 1, 'refuge': 1, 'hand': 1, 'bed;': 1, 'endure': 1, 'worked': 1, 'only': 1, 'if': 2, 'out,': 2, 'first': 1, 'imprinted': 1, 'wrinkled': 1, 'did': 1, 'jaws': 1, 'was': 7, 'panes,': 1, 'room': 1, 'different': 1, 'dreams.': 1, 'pains': 1, 'of': 27, 'corpse': 2, 'hard': 1, 'every': 1, 'lassitude': 1, 'beheld': 2, 'appeared': 1, 'threw': 1, 'limbs.': 1, 'proportion,': 1, 'few': 1, 'bloom': 1, 'wretch—the': 1, 'her,': 1, 'attentively,': 1, 'thing': 1, 'describe': 1, 'contrast': 1, 'already': 1, 'dull': 1, 'health,': 1, 'death;': 1, 'grave-worms': 1, 'inhabited,': 1, 'the': 46, 'pattered': 1, 'half-extinguished': 1, 'complexion': 1, 'human': 1, 'me.': 1, 'rest': 2, 'mind': 1, 'breathed': 1, 'might': 2, 'sole': 1, 'bed-chamber,': 1, 'pearly': 1, 'dew': 1, 'where': 1, 'far': 1, 'toils.': 1, 'indeed,': 1, 'flannel.': 1, 'way': 1, 'filled': 1, 'rushed': 2, 'monster': 1, 'years,': 1, 'feelings': 1, 'during': 1, 'into': 2, 'change,': 1, 'rain': 1, 'created.': 1, 'endeavoured': 1, 'not': 2, 'limbs': 1, 'to': 12, 'arms;': 1, 'black,': 1, 'horror;': 1, 'disgust': 1, 'flowing;': 1, 'through': 1, 'aspect': 1, 'his': 10, 'glimmer': 1, 'dismally': 1, 'teeth': 2, 'agitation,': 1, 'enveloped': 1, 'saw': 3, 'in': 11, 'i': 33, 'lips.': 1, 'whom': 2, 'infinite': 1, 'when,': 2, 'agitated': 1, 'demoniacal': 1, 'cheeks.': 1, 'moments': 1, 'eyes': 1, 'be': 1, 'how': 2, 'elizabeth,': 1, 'long': 1, 'eye': 1, 'health.': 1, 'wildest': 1, 'light,': 1, 'shutters,': 1, 'limb': 1, 'lustrous': 1, 'walking': 2, 'and': 22, 'these': 1, 'deprived': 1, 'heart.': 1, 'scarcely': 1, 'life': 3, 'myself': 2, 'up': 2, 'hair': 1, 'seek': 1, 'instruments': 1, 'lifeless': 1, 'infuse': 1, 'candle': 1, 'beauty': 1, 'luxuriances': 1, 'mother': 1, 'seemingly': 1, 'compose': 1, 'set,': 1, 'approach': 1, 'greatest': 1, 'slept,': 1, 'creature': 1, 'a': 11, 'form,': 1, 'inanimate': 1, 'catastrophe,': 1, 'lips,': 1, 'beautiful!': 1, 'embraced': 1, 'convulsive': 1, 'straight': 1, 'feet.': 1, 'are': 1, 'with': 6, 'emotions': 1, 'continued': 1, 'but': 7, 'shroud': 1, 'disturbed': 1, 'spoken,': 1, 'almost': 2, 'on': 4, 'succeeded': 1, 'unable': 2, 'dream': 1, 'my': 13, 'desired': 1, 'being': 2, 'an': 3, 'before': 1, 'dead': 1, 'me,': 2, 'belonging': 1, 'sleep': 1, 'time': 1, 'her': 3, 'open;': 1, 'horrid': 1, 'against': 1, 'delighted': 1, 'colour': 1, 'form?': 1, 'bed': 1, 'spark': 1, 'anxiety': 1, 'kiss': 1, 'around': 1, 'whiteness;': 1, 'such': 1, 'chattered,': 1, 'exceeded': 1, 'muscles': 1, 'amounted': 1, 'from': 1, 'yellow': 3, 'grin': 1, 'accomplishment': 1, 'its': 2, 'skin': 1, 'two': 1, 'dim': 1, 'horror': 1, 'motion': 1, 'that': 8, 'morning;': 1, 'surprised,': 1, 'wretch': 1, 'watery': 1, 'listening': 1, 'black': 1, 'sockets': 1, 'called,': 1, 'seemed': 1, 'work': 1, 'catching': 1, 'thought': 2, 'he': 3, 'forced': 1, 'sound': 1, 'held': 2, 'livid': 1, 'which': 3, 'november': 1, 'care': 1, 'ingolstadt.': 1, 'night,': 1, 'may': 1, 'started': 1, 'so': 1, 'god!': 1, 'out': 1, 'took': 1, 'announce': 1, 'have': 1, 'hue': 1, 'it': 7, 'house': 1, 'inarticulate': 1, 'for': 3, 'courtyard': 1, 'delineate': 1, 'window': 1, 'remained': 1, 'lay': 1, 'had': 9, 'moon,': 1, 'collected': 1, 'eyes,': 2, 'arteries': 1, 'same': 1, 'more': 1, 'crawling': 1, 'now': 1, 'great': 1, 'agony,': 1, 'muttered': 1, 'created,': 1, 'convulsed;': 1, 'this': 2, 'selected': 1, 'fixed': 1, 'night': 1, 'sounds,': 1, 'became': 2, 'escaped': 1, 'dun-white': 1, 'dreary': 1, 'shrivelled': 1, 'forgetfulness.': 1, 'nature.': 1, 'streets': 1, 'can': 1, 'miserable': 1, 'purpose': 1, 'infusing': 1, 'length': 1, 'cold': 1}

A slightly more compact way to count all the words in text using just one for loop

  • This might be easier to understand (or it might be harder) so just writing it out in case it helps…

  • I am using the continue statement (see below)…

  • Note that there are several other ways you could achieve the same result, so long as you are checking to make sure that the current word is not an empty string '' somehow

# turn the book list into a string to make it easier to remove things we don't want (like newline 
# characters)
book_str = ''.join(book)

# convert to lower case
book_str = book_str.lower()

# clean out the newline characters
book_str = book_str.replace('\n', ' ')

# turn back into a list based on the location of spaces
book_lst_clean = book_str.split(' ')

# init an empty dictionary
wc = {}   # or you can use dict()

# now loop over **all** words in the book, even the repeated words
# and count them up!
for w in book_lst_clean:
    
    # if w is not a word (i.e. it is an empty string)
    # then continue, where continue means "go back to the 
    # top of the for loop and skip the rest of the code in
    # the loop"
    if not w:
        continue
    
    # if w is a word, and its not already in our dictionary
    # then make a new key and assign a value of 0
    if w not in wc:
        wc[w] = 0
    
    # increment a counter each time the word w appears...
    # note that you only get to this line of code if w
    # is indeed a word (the if not w: continue
    # line above will prevent you from getting here if w is
    # not a word)
    wc[w] += 1

# have a look at the dictionary...
print(wc)
{'it': 7, 'was': 7, 'on': 4, 'a': 11, 'dreary': 1, 'night': 1, 'of': 27, 'november': 1, 'that': 8, 'i': 33, 'beheld': 2, 'the': 46, 'accomplishment': 1, 'my': 13, 'toils.': 1, 'with': 6, 'an': 3, 'anxiety': 1, 'almost': 2, 'amounted': 1, 'to': 12, 'agony,': 1, 'collected': 1, 'instruments': 1, 'life': 3, 'around': 1, 'me,': 2, 'might': 2, 'infuse': 1, 'spark': 1, 'being': 2, 'into': 2, 'lifeless': 1, 'thing': 1, 'lay': 1, 'at': 3, 'feet.': 1, 'already': 1, 'one': 2, 'in': 11, 'morning;': 1, 'rain': 1, 'pattered': 1, 'dismally': 1, 'against': 1, 'panes,': 1, 'and': 22, 'candle': 1, 'nearly': 2, 'burnt': 1, 'out,': 2, 'when,': 2, 'by': 3, 'glimmer': 1, 'half-extinguished': 1, 'light,': 1, 'saw': 3, 'dull': 1, 'yellow': 3, 'eye': 1, 'creature': 1, 'open;': 1, 'breathed': 1, 'hard,': 1, 'convulsive': 1, 'motion': 1, 'agitated': 1, 'its': 2, 'limbs.': 1, 'how': 2, 'can': 1, 'describe': 1, 'emotions': 1, 'this': 2, 'catastrophe,': 1, 'or': 1, 'delineate': 1, 'wretch': 1, 'whom': 2, 'such': 1, 'infinite': 1, 'pains': 1, 'care': 1, 'had': 9, 'endeavoured': 1, 'form?': 1, 'his': 10, 'limbs': 1, 'were': 4, 'proportion,': 1, 'selected': 1, 'features': 2, 'as': 6, 'beautiful.': 1, 'beautiful!': 1, 'great': 1, 'god!': 1, 'skin': 1, 'scarcely': 1, 'covered': 2, 'work': 1, 'muscles': 1, 'arteries': 1, 'beneath;': 1, 'hair': 1, 'lustrous': 1, 'black,': 1, 'flowing;': 1, 'teeth': 2, 'pearly': 1, 'whiteness;': 1, 'but': 7, 'these': 1, 'luxuriances': 1, 'only': 1, 'formed': 1, 'more': 1, 'horrid': 1, 'contrast': 1, 'watery': 1, 'eyes,': 2, 'seemed': 1, 'same': 1, 'colour': 1, 'dun-white': 1, 'sockets': 1, 'which': 3, 'they': 3, 'set,': 1, 'shrivelled': 1, 'complexion': 1, 'straight': 1, 'black': 1, 'lips.': 1, 'different': 1, 'accidents': 1, 'are': 1, 'not': 2, 'so': 1, 'changeable': 1, 'feelings': 1, 'human': 1, 'nature.': 1, 'worked': 1, 'hard': 1, 'for': 3, 'two': 1, 'years,': 1, 'sole': 1, 'purpose': 1, 'infusing': 1, 'inanimate': 1, 'body.': 1, 'deprived': 1, 'myself': 2, 'rest': 2, 'health.': 1, 'desired': 1, 'ardour': 1, 'far': 1, 'exceeded': 1, 'moderation;': 1, 'now': 1, 'finished,': 1, 'beauty': 1, 'dream': 1, 'vanished,': 1, 'breathless': 1, 'horror': 1, 'disgust': 1, 'filled': 1, 'heart.': 1, 'unable': 2, 'endure': 1, 'aspect': 1, 'created,': 1, 'rushed': 2, 'out': 1, 'room': 1, 'continued': 1, 'long': 1, 'time': 1, 'traversing': 1, 'bed-chamber,': 1, 'compose': 1, 'mind': 1, 'sleep.': 1, 'length': 1, 'lassitude': 1, 'succeeded': 1, 'tumult': 1, 'before': 1, 'endured,': 1, 'threw': 1, 'bed': 1, 'clothes,': 1, 'endeavouring': 1, 'seek': 1, 'few': 1, 'moments': 1, 'forgetfulness.': 1, 'vain;': 1, 'slept,': 1, 'indeed,': 1, 'disturbed': 1, 'wildest': 1, 'dreams.': 1, 'thought': 2, 'elizabeth,': 1, 'bloom': 1, 'health,': 1, 'walking': 2, 'streets': 1, 'ingolstadt.': 1, 'delighted': 1, 'surprised,': 1, 'embraced': 1, 'her,': 1, 'imprinted': 1, 'first': 1, 'kiss': 1, 'her': 3, 'lips,': 1, 'became': 2, 'livid': 1, 'hue': 1, 'death;': 1, 'appeared': 1, 'change,': 1, 'held': 2, 'corpse': 2, 'dead': 1, 'mother': 1, 'arms;': 1, 'shroud': 1, 'enveloped': 1, 'form,': 1, 'grave-worms': 1, 'crawling': 1, 'folds': 1, 'flannel.': 1, 'started': 1, 'from': 1, 'sleep': 1, 'horror;': 1, 'cold': 1, 'dew': 1, 'forehead,': 1, 'chattered,': 1, 'every': 1, 'limb': 1, 'convulsed;': 1, 'dim': 1, 'light': 1, 'moon,': 1, 'forced': 1, 'way': 1, 'through': 1, 'window': 1, 'shutters,': 1, 'wretch—the': 1, 'miserable': 1, 'monster': 1, 'created.': 1, 'he': 3, 'up': 2, 'curtain': 1, 'bed;': 1, 'if': 2, 'eyes': 1, 'may': 1, 'be': 1, 'called,': 1, 'fixed': 1, 'me.': 1, 'jaws': 1, 'opened,': 1, 'muttered': 1, 'some': 1, 'inarticulate': 1, 'sounds,': 1, 'while': 1, 'grin': 1, 'wrinkled': 1, 'cheeks.': 1, 'have': 1, 'spoken,': 1, 'did': 1, 'hear;': 1, 'hand': 1, 'stretched': 1, 'seemingly': 1, 'detain': 1, 'escaped': 1, 'downstairs.': 1, 'took': 1, 'refuge': 1, 'courtyard': 1, 'belonging': 1, 'house': 1, 'inhabited,': 1, 'where': 1, 'remained': 1, 'during': 1, 'night,': 1, 'down': 1, 'greatest': 1, 'agitation,': 1, 'listening': 1, 'attentively,': 1, 'catching': 1, 'fearing': 1, 'each': 1, 'sound': 1, 'announce': 1, 'approach': 1, 'demoniacal': 1}

Find the most common word in text.

  • To find the most common word, you can loop over our word count dictionary (wc, defined above)

  • Basic approach:

    • Initialize a counter (lets call it max_count) to 0

    • Loop over key:value pairs in wc using the .items() method

    • If the current value exceeds max_count, then update max_count with that value and also store the current key

# init max_count to 0
max_count = 0

# loop over the key:value pairs in wc
for k,v in wc.items():
    
    # if the current value exceeds the previous value of 
    # max_count, then reassign max_count to the current value
    # and save the associated key (which is the actual word)
    if v > max_count:
        max_count = v
        most_common_word = k
        
print(f'The most common word is "{most_common_word}", it occured {max_count} times')
    
The most common word is "the", it occured 46 times