2026-01-27

python workout: exercise 17

problem

Write a function, called how_many_different_numbers, that takes a single list of integers and returns the number of different integers it contains.

attempts

The first thing that comes to mind is to just use a set and get its length/size:

def how_many_different_numbers(numbers: list[int]) -> int:
    return len(set(numbers))

numbers = [1, 2, 3, 1, 2, 3, 4, 1]

print(how_many_different_numbers(numbers))

solution

The book’s implementation:

def how_many_different_numbers(numbers):
    unique_numbers = set(numbers)
    return len(unique_numbers)

Pretty much the same thing.

beyond the exercise

server log ip addresses

problem

Read through a server (e.g. Apache or nginx) log file. What were the different IP addresses that tried to access your server?

attempts

We already have some fake apache logs we can parse. We know how to extract the ip addresses (i.e. first pice of data in each row). We can then just construct a set with these:

path = 'files/apache_logs.log'

with open(path, 'r') as f:
    lines = f.readlines()[:50] # we don't need so many
    unique_ips = {line.split()[0] for line in lines}
    print(unique_ips)

{'179.179.206.176', '157.55.33.15', '189.127.128.209', '208.93.0.48', '190.198.191.75', '177.6.142.6', '188.192.27.241', '68.14.231.140', '128.118.108.67', '107.170.9.55', '173.213.99.1', '198.27.64.9', '66.249.73.135', '74.125.176.83', '187.45.193.158', '212.101.243.11', '50.16.19.13', '80.108.184.97', '216.172.140.128', '77.234.68.135', '180.76.5.22', '68.180.224.225', '46.105.14.53', '128.179.155.97', '5.10.83.105', '208.115.111.72', '208.91.156.11', '184.151.222.45', '198.46.149.143'}

server log response codes

problem

Reading from that same server log, what response codes were returned to users?

attempts

Again, we already know how to parse the codes from our fake apache log thanks to python workout: exercise 15. Constructing a set around them is straightforward:

path = 'files/apache_logs.log'

with open(path, 'r') as f:
    lines = f.readlines()[:50] # we don't need so many
    unique_codes = set()
    for line in lines:
        items = line.split()
        end_of_request_idx = line.index('" ')
        code = line[end_of_request_idx+2:end_of_request_idx+5]
        unique_codes.add(code)
    print(unique_codes)

{'200', '304', '404'}

directory file suffixes

problem

Use os.listdir to get the names of files in the current directory. What file extensions (i.e., suffixes following the final . character) appear in that directory? It’ll probably be helpful lto use os.path.splitext.

attempts

We’ll use a fake/AI-generated os.listdir output for simplicity. We can see from the docs, and running the function ourselves, that it returns a list. All that’s left is parsing each element, extracting their suffix with os.path.splitext, and adding them to a set:

import os
listdir_output = ['README.md', 'main.py', 'config.yaml', '.gitignore', 'requirements.txt',
 'data_processing.py', 'utils.py', '__pycache__', 'tests', 'output.csv',
 'notes.txt', 'analysis_v2.ipynb', 'logo.png', 'schema.json', '.env',
 'report_final.pdf', 'dataset.parquet', 'Dockerfile', 'run_experiments.sh',
 'models', 'archive.zip', 'credentials.example.json', 'CHANGELOG.md',
 'setup.py', 'thumbnail.jpg', 'error_log.txt', '.DS_Store', 'venv',
 'presentation_deck.pptx', 'budget_2024.xlsx']

unique_suffixes = {os.path.splitext(item)[1] for item in listdir_output}
print(unique_suffixes)

{'', '.md', '.jpg', '.txt', '.json', '.pdf', '.png', '.pptx', '.xlsx', '.sh', '.yaml', '.zip', '.parquet', '.csv', '.py', '.ipynb'}

Only drawback, it includes the “empty suffix”. But we can remove that with unique_suffixes.discard(''). If we don’t mind it, we have a nice one liner in: {os.path.splitext(item)[1] for item in os.listdir()}.