python workout: exercise 17
problem
Write a function, called
how_many_different_numbers, that takes a single list of integers and returns the number of different integers it contains.
attempts
The first thing that comes to mind is to just use a set and get its length/size:
def how_many_different_numbers(numbers: list[int]) -> int:
return len(set(numbers))
numbers = [1, 2, 3, 1, 2, 3, 4, 1]
print(how_many_different_numbers(numbers))
4
solution
The book’s implementation:
def how_many_different_numbers(numbers):
unique_numbers = set(numbers)
return len(unique_numbers)
Pretty much the same thing.
beyond the exercise
server log ip addresses
-
problem
Read through a server (e.g. Apache or nginx) log file. What were the different IP addresses that tried to access your server?
-
attempts
We already have some fake apache logs we can parse. We know how to extract the ip addresses (i.e. first pice of data in each row). We can then just construct a set with these:
path = 'files/apache_logs.log' with open(path, 'r') as f: lines = f.readlines()[:50] # we don't need so many unique_ips = {line.split()[0] for line in lines} print(unique_ips){'179.179.206.176', '157.55.33.15', '189.127.128.209', '208.93.0.48', '190.198.191.75', '177.6.142.6', '188.192.27.241', '68.14.231.140', '128.118.108.67', '107.170.9.55', '173.213.99.1', '198.27.64.9', '66.249.73.135', '74.125.176.83', '187.45.193.158', '212.101.243.11', '50.16.19.13', '80.108.184.97', '216.172.140.128', '77.234.68.135', '180.76.5.22', '68.180.224.225', '46.105.14.53', '128.179.155.97', '5.10.83.105', '208.115.111.72', '208.91.156.11', '184.151.222.45', '198.46.149.143'}
server log response codes
-
problem
Reading from that same server log, what response codes were returned to users?
-
attempts
Again, we already know how to parse the codes from our fake apache log thanks to python workout: exercise 15. Constructing a set around them is straightforward:
path = 'files/apache_logs.log' with open(path, 'r') as f: lines = f.readlines()[:50] # we don't need so many unique_codes = set() for line in lines: items = line.split() end_of_request_idx = line.index('" ') code = line[end_of_request_idx+2:end_of_request_idx+5] unique_codes.add(code) print(unique_codes){'200', '304', '404'}
directory file suffixes
-
problem
Use
os.listdirto get the names of files in the current directory. What file extensions (i.e., suffixes following the final . character) appear in that directory? It’ll probably be helpful lto useos.path.splitext.
-
attempts
We’ll use a fake/AI-generated
os.listdiroutput for simplicity. We can see from the docs, and running the function ourselves, that it returns a list. All that’s left is parsing each element, extracting their suffix withos.path.splitext, and adding them to a set:import os listdir_output = ['README.md', 'main.py', 'config.yaml', '.gitignore', 'requirements.txt', 'data_processing.py', 'utils.py', '__pycache__', 'tests', 'output.csv', 'notes.txt', 'analysis_v2.ipynb', 'logo.png', 'schema.json', '.env', 'report_final.pdf', 'dataset.parquet', 'Dockerfile', 'run_experiments.sh', 'models', 'archive.zip', 'credentials.example.json', 'CHANGELOG.md', 'setup.py', 'thumbnail.jpg', 'error_log.txt', '.DS_Store', 'venv', 'presentation_deck.pptx', 'budget_2024.xlsx'] unique_suffixes = {os.path.splitext(item)[1] for item in listdir_output} print(unique_suffixes){'', '.md', '.jpg', '.txt', '.json', '.pdf', '.png', '.pptx', '.xlsx', '.sh', '.yaml', '.zip', '.parquet', '.csv', '.py', '.ipynb'}Only drawback, it includes the “empty suffix”. But we can remove that with
unique_suffixes.discard(''). If we don’t mind it, we have a nice one liner in:{os.path.splitext(item)[1] for item in os.listdir()}.