back

python workout: exercise 12

word with most repeated letters

problem

Write a function, most_repeating_word, that takes a sequence of strings as input. The function should return the string that contains the greatest number of repeated let- ters. In other words

  • For each word, find the letter that appears the most times.
  • Find the word whose most-repeated letter appears more than any other.

That is, if words is set to

words = ['this', 'is', 'an', 'elementary', 'test', 'example']

then your function should return elementary.

attempts

Well, the book just tells us how to solve this. We use collections.Counter and its most_common() method.

I was going to suggest using Counter :(

It just means that implementing most_repeating_word will be straightforward to implement. We’ll just use max on the sequence of words and pass a lambda expression as the key argument. And the anonymous function will just map a word to a Counter instance and return the count of the most common letter, using the most_common method:

from collections.abc import Sequence
from collections import Counter

def most_repeating_word(words: Sequence[str]) -> str:
    return max(words, key=lambda w: Counter(w).most_common()[0][1])

words = ['this', 'is', 'an', 'elementary', 'test', 'example']

print(most_repeating_word(words))
elementary

The nice thing about this approach, we can use a comprehension to return the tuple of letter frequencies in each word to resolve ties in most repeated letter, i.e. words with the same number of most repeated letters can be ordered with respect to their second most repeated letter, etc:

from collections.abc import Sequence
from collections import Counter

def most_repeating_word(words: Sequence[str]) -> str:
    return sorted(words, key=lambda w: tuple(freq for _, freq in Counter(w).most_common()))

words = ['this', 'is', 'an', 'elementary', 'test', 'example', 'bannana']

print(most_repeating_word(words))
['is', 'an', 'this', 'test', 'example', 'elementary', 'bannana']

This way, “bannana” is ahead of “elementary”. Because “elementary” has one repeated letter which appears three times (i.e. ’e’). So does “bannana” with ‘a’. But “bannana”’s second most repeated letter (i.e. ’n’) also appears three times, which is more than “elementary”’s second most repeated letter.

solution

The book’s implementation:

from collections import Counter
import operator

WORDS = ['this', 'is', 'an',
         'elementary', 'test', 'example']

def most_repeating_letter_count(word):
    return Counter(word).most_common(1)[0][1]

def most_repeating_word(words):
    return max(words,
               key=most_repeating_letter_count)


print(most_repeating_word(WORDS))
elementary

I like the idea of putting the Counter logic in its own named function to isolate the awkward indexing. But I don’t see the need to pass an argument to most_common() – it does nothing to alleviate the indexing that follows.

I also don’t know why it’s importing operator; it doesn’t make use of it?

beyond the exercise

greatest number of repeated vowels

  • problem

    Instead of finding the word with the greatest number of repeated letters, find the word with the greatest number of repeated vowels.

  • attempts

    This requires some sort of filtering.

    We can either filter for vowels before we create a Counter instance or after, with the results of most_common().

    I think filtering after makes more sense. And I don’t think it will come with a peformance/efficiency cost.

    Something like:

    from collections.abc import Sequence
    from collections import Counter
    
    def most_repeating_word(words: Sequence[str]) -> str:
        return max(words,
                   key=lambda w: tuple(count for letter, count in Counter(w).most_common() if letter in 'aeiou'))
    
    words = ['this', 'is', 'an', 'elementary', 'test', 'example', 'bannana']
    
    print(most_repeating_word(words))
    
    elementary
    

    At this point, the lambda’s getting a bit too long for comfort:

    from collections.abc import Sequence
    from collections import Counter
    
    def most_repeated_vowels_count(word: str) -> tuple[int]:
        freqs = Counter(word).most_common()
        return tuple(count for letter, count in freqs if letter in 'aeiou')
    
    
    def most_repeating_word(words: Sequence[str]) -> str:
        return max(words,
                   key=most_repeated_vowels_count)
    
    words = ['this', 'is', 'an', 'elementary', 'test', 'example', 'bannana']
    
    print(most_repeating_word(words))
    
    elementary
    

reading etc/passwd

  • problem

    Write a program to read /etc/passwd on a Unix computer. The first field contains the username, and the final field contains the user’s shell, the command inter- preter. Display the shells in decreasing order of popularity, such that the most pop- ular shell is shown first, the second most popular shell second, and so forth.

  • attempts

    Looking at a snippet of a generated /etc/passwd, our lines to parse will look something like:

    root:x:0:0:root:/root:/bin/bash
    

    Each field is separated by colons and the last field is what we care about.

    So, IO aside, we just need to:

    • read each line in the file
    • extract each shell in line
    • pass the shells to Counter
    • use most_common() to print the shells from most to least common

    Counter and most_common():

    from collections import Counter
    
    def most_common_shell(path: str) -> None:
        with open(path, 'r') as f:
            shells = [line.strip().split(':')[-1] for line in f.readlines()]
            freqs = Counter(shells)
            for shell, _ in Counter(shells).most_common():
                print(shell)
    
    most_common_shell("/files/sample_passwd")
    
    /usr/sbin/nologin
    /usr/bin/fish
    /bin/sh
    /bin/bash
    /bin/zsh
    /usr/bin/zsh
    /bin/dash
    /usr/bin/bash
    /bin/sync
    /usr/bin/dash
    

    We could reduce iterations by building Counter progressively, reading lines one at a time. But this is fine.

sorted usernames

  • problem

    For an added challenge, after displaying each shell, also show the usernames (sorted alphabetically) who use each of those shells.

  • attempts

    The only difference here is that we need to keep track of which shell is used by which user.

    But this shouldn’t be too hard. Because the parsing isn’t any more complicated and we can just maintain a dictionary of which shells are used by which users for the final printing using collections.defaultdict:

    from collections import Counter, defaultdict
    
    def most_common_shell(path: str) -> None:
        with open(path, 'r') as f:
            shell_to_user = defaultdict(list)
            shell_counter = Counter()
    
            for line in f.readlines():
                fields = line.strip().split(':')
                user, shell = fields[0], fields[-1]
    
                shell_to_user[shell].append(user)
                shell_counter[shell] += 1
    
            for shell, _ in shell_counter.most_common():
                print(f"{shell} is used by {sorted(shell_to_user[shell])}")
    
    most_common_shell("/files/sample_passwd")
    
    /usr/sbin/nologin is used by ['bin', 'daemon', 'nobody', 'sys', 'systemd-coredump']
    /usr/bin/fish is used by ['diana', 'docker', 'games', 'henry', 'proxy']
    /bin/sh is used by ['eve', 'james', 'list', 'lp', 'uucp']
    /bin/bash is used by ['gnats', 'grace', 'root', 'www-data']
    /bin/zsh is used by ['alice', 'backup', 'isabel']
    /usr/bin/zsh is used by ['frank', 'man']
    /bin/dash is used by ['charlie', 'mail']
    /usr/bin/bash is used by ['bob', 'news']
    /bin/sync is used by ['sync']
    /usr/bin/dash is used by ['irc']
    

    Manually incrementing a counter like this just makes it a defaultdict(int) at this point with the addition of some useful methods.

    Overall, I don’t think it’s perfect. But it does the job.

mail@jonahv.com