back

python workout: exercise 15

rainfall

problem

Another use for dicts is to accumulate data over the life of a program. In this exercise, you’ll use a dict for just that. Specifically, write a function, get_rainfall, that tracks rainfall in a number of cit- ies. Users of your program will enter the name of a city; if the city name is blank, then the function prints a report (which I’ll describe) before exiting. If the city name isn’t blank, then the program should also ask the user how much rain has fallen in that city (typically measured in millimeters). After the user enters the quantity of rain, the program again asks them for a city name, rainfall amount, and so on—until the user presses Enter instead of typing the name of a city. When the user enters a blank city name, the program exits—but first, it reports how much total rainfall there was in each city. Thus, if I enter

Boston
5
New York
7
Boston
5
[Enter; blank line]

the program should output

Boston: 10
New York: 7

attempts

Seems straightforward. We just need to grow and maintain a dict with user input. And because we want total rainfall, we might as well use collections.defaultdict and just accumulate rainfall amounts directly. It should make printing the results even easier:

from collections import defaultdict

def get_rainfall():
    rainfalls = defaultdict(int)
    while True:
        city = input()
        if not city:
            break
        rainfalls[city] += int(input())
    for city, rainfall in rainfalls.items():
        print(f"{city}: {rainfall}")
>>> get_rainfall()
Boston
5
New York
7
Boston
5

Boston: 10
New York: 7

We should probably handle invalid input when converting user rainfall to an int. Other than that, I’m happy.

solution

The book’s implementation:

def get_rainfall():
    rainfall = {}

    while True:
        city_name = input('Enter city name: ')
        if not city_name:
            break

        mm_rain = input('Enter mm rain: ')
        rainfall[city_name] = rainfall.get(city_name, 0) + int(mm_rain)

    for city, rain in rainfall.items():
        print(f'{city}: {rain}')

get_rainfall()

beyond the exercise

average rainfall

  • problem

    Instead of printing just the total rainfall for each city, print the total rainfall and the average rainfall for reported days. Thus, if you were to enter 30, 20, and 40 for Boston, you would see that the total was 90 and the average was 30.

  • attempts

    We have a couple options here.

    The first option would be to just have 2 collections.defaultdict instances. One to track total rainfall, and another to track the total number of recorded days.

    Another option would be to store a kind of object or tuple in a single dictionary. With one field for the total rainfall, and another for the number of recorded days.

    I think the first approach is the simplest:

    from collections import defaultdict
    
    def get_rainfall():
        rainfalls, n_reported_days = defaultdict(int), defaultdict(int)
        while True:
            city = input()
            if not city:
                break
            rainfalls[city] += int(input())
            n_reported_days[city] += 1
        for city, rainfall in rainfalls.items():
            avg_rainfall = rainfall / n_reported_days[city]
            print(f"{city}: {rainfall} total & {avg_rainfall} average.")
    
    >>> get_rainfall()
    Boston
    30
    Boston
    20
    New York
    5
    Boston
    40
    
    Boston: 90 total & 30.0 average.
    New York: 5 total & 5.0 average.
    

response codes and ip address

  • problem

    Open a log file from a Unix/Linux system—for example, one from the Apache server. For each response code (i.e., three-digit code indicating the HTTP request’s success or failure), store a list of IP addresses that generated that code.

  • attempts

    The most basic approach would be to just split each line in the log file and retrieve the IP address and response code with hardcoded indexes. It’s then as simple as using a collection.defaultdict to accumulate the response codes and associated IP addresses.

    The only trouble – logs aren’t so easily parsed.

    We can get the IP address with a simple white space str.split(). But for the response code, we have to be slightly more clever. Knowing that a response code is always 3 characters and immediately follows the request, separated by a space, in a log line. We can just get the index of the second double quote and take the 3 characters that occur after a space:

    from collections import defaultdict
    path = 'files/apache_logs.log'
    
    codes = defaultdict(list)
    
    with open(path, 'r') as f:
        lines = f.readlines()[::20] # we don't need so many logs
        for line in lines:
            items = line.split()
            ip = items[0]
            end_of_request_idx = line.index('" ')
            code = line[end_of_request_idx+2:end_of_request_idx+5]
            codes[code].append(ip)
    for code, ips in codes.items():
        print(f'{code}: {ips}')
    
    200: ['66.249.73.135', '128.179.155.97', '66.249.73.135', '66.249.73.135', '174.26.93.238', '66.249.73.135', '5.10.83.98', '93.80.29.12', '64.131.102.243', '50.16.19.13', '165.139.161.160', '46.105.14.53', '108.184.124.170', '72.4.104.94', '49.206.120.190', '74.76.53.142', '199.16.156.125', '23.30.147.145', '79.197.82.119', '94.253.195.219', '46.105.14.53', '200.31.173.106', '200.31.173.106', '204.62.56.3', '204.62.56.3', '150.162.56.185', '5.10.83.73', '24.130.53.65', '46.105.14.53', '46.105.14.53', '216.151.137.35', '78.19.193.147', '134.192.71.41', '173.231.106.34', '66.249.73.135', '103.245.44.13', '82.130.48.164', '209.85.238.199', '31.4.197.143', '184.66.149.103', '184.66.149.103', '81.34.53.43', '173.164.44.34', '209.17.114.78', '66.249.73.135', '38.99.236.50', '38.99.236.50']
    206: ['173.252.110.119']
    304: ['66.249.73.185']
    404: ['208.91.156.11']
    

word length frequencies

  • problem

    Read through a text file on disk. Use a dict to track how many words of each length are in the file—that is, how many three-letter words, four-letter words, five-letter words, and so on. Display your results.

  • attempts

    We really only need a collections.defaultdict instance that takes word lengths as keys and integers to represent their count:

    from collections import defaultdict
    path = '/files/some_text.txt'
    
    word_lengths = defaultdict(int)
    
    with open(path, 'r') as f:
        for line in f:
            for word in line.strip().split():
                word_lengths[len(word)] += 1
    
    for length, count in word_lengths.items():
        print(f'{length} letter words: {count}')
    
    2 letter words: 45
    3 letter words: 63
    5 letter words: 66
    6 letter words: 59
    4 letter words: 62
    7 letter words: 46
    1 letter words: 23
    13 letter words: 2
    10 letter words: 17
    9 letter words: 14
    8 letter words: 26
    11 letter words: 6
    17 letter words: 1
    16 letter words: 1
    15 letter words: 1
    
mail@jonahv.com