back

python workout: exercise 17

problem

Write a function, called how_many_different_numbers, that takes a single list of integers and returns the number of different integers it contains.

attempts

The first thing that comes to mind is to just use a set and get its length/size:

def how_many_different_numbers(numbers: list[int]) -> int:
    return len(set(numbers))

numbers = [1, 2, 3, 1, 2, 3, 4, 1]

print(how_many_different_numbers(numbers))
4

solution

The book’s implementation:

def how_many_different_numbers(numbers):
    unique_numbers = set(numbers)
    return len(unique_numbers)

Pretty much the same thing.

beyond the exercise

server log ip addresses

  • problem

    Read through a server (e.g. Apache or nginx) log file. What were the different IP addresses that tried to access your server?

  • attempts

    We already have some fake apache logs we can parse. We know how to extract the ip addresses (i.e. first pice of data in each row). We can then just construct a set with these:

    path = 'files/apache_logs.log'
    
    with open(path, 'r') as f:
        lines = f.readlines()[:50] # we don't need so many
        unique_ips = {line.split()[0] for line in lines}
        print(unique_ips)
    
    {'179.179.206.176', '157.55.33.15', '189.127.128.209', '208.93.0.48', '190.198.191.75', '177.6.142.6', '188.192.27.241', '68.14.231.140', '128.118.108.67', '107.170.9.55', '173.213.99.1', '198.27.64.9', '66.249.73.135', '74.125.176.83', '187.45.193.158', '212.101.243.11', '50.16.19.13', '80.108.184.97', '216.172.140.128', '77.234.68.135', '180.76.5.22', '68.180.224.225', '46.105.14.53', '128.179.155.97', '5.10.83.105', '208.115.111.72', '208.91.156.11', '184.151.222.45', '198.46.149.143'}
    

server log response codes

  • problem

    Reading from that same server log, what response codes were returned to users?

  • attempts

    Again, we already know how to parse the codes from our fake apache log thanks to python workout: exercise 15. Constructing a set around them is straightforward:

    path = 'files/apache_logs.log'
    
    with open(path, 'r') as f:
        lines = f.readlines()[:50] # we don't need so many
        unique_codes = set()
        for line in lines:
            items = line.split()
            end_of_request_idx = line.index('" ')
            code = line[end_of_request_idx+2:end_of_request_idx+5]
            unique_codes.add(code)
        print(unique_codes)
    
    {'200', '304', '404'}
    

directory file suffixes

  • problem

    Use os.listdir to get the names of files in the current directory. What file extensions (i.e., suffixes following the final . character) appear in that directory? It’ll probably be helpful lto use os.path.splitext.

  • attempts

    We’ll use a fake/AI-generated os.listdir output for simplicity. We can see from the docs, and running the function ourselves, that it returns a list. All that’s left is parsing each element, extracting their suffix with os.path.splitext, and adding them to a set:

    import os
    listdir_output = ['README.md', 'main.py', 'config.yaml', '.gitignore', 'requirements.txt',
     'data_processing.py', 'utils.py', '__pycache__', 'tests', 'output.csv',
     'notes.txt', 'analysis_v2.ipynb', 'logo.png', 'schema.json', '.env',
     'report_final.pdf', 'dataset.parquet', 'Dockerfile', 'run_experiments.sh',
     'models', 'archive.zip', 'credentials.example.json', 'CHANGELOG.md',
     'setup.py', 'thumbnail.jpg', 'error_log.txt', '.DS_Store', 'venv',
     'presentation_deck.pptx', 'budget_2024.xlsx']
    
    unique_suffixes = {os.path.splitext(item)[1] for item in listdir_output}
    print(unique_suffixes)
    
    {'', '.md', '.jpg', '.txt', '.json', '.pdf', '.png', '.pptx', '.xlsx', '.sh', '.yaml', '.zip', '.parquet', '.csv', '.py', '.ipynb'}
    

    Only drawback, it includes the “empty suffix”. But we can remove that with unique_suffixes.discard(''). If we don’t mind it, we have a nice one liner in: {os.path.splitext(item)[1] for item in os.listdir()}.

mail@jonahv.comrss