python workout: exercise 12
word with most repeated letters
problem
Write a function,
most_repeating_word, that takes a sequence of strings as input. The function should return the string that contains the greatest number of repeated let- ters. In other words
- For each word, find the letter that appears the most times.
- Find the word whose most-repeated letter appears more than any other.
That is, if words is set to
words = ['this', 'is', 'an', 'elementary', 'test', 'example']then your function should return
elementary.
attempts
Well, the book just tells us how to solve this. We use collections.Counter and
its most_common() method.
I was going to suggest using Counter :(
It just means that implementing most_repeating_word will be straightforward to
implement. We’ll just use max on the sequence of words and pass a lambda
expression as the key argument. And the anonymous function will just map a word
to a Counter instance and return the count of the most common letter, using the
most_common method:
from collections.abc import Sequence
from collections import Counter
def most_repeating_word(words: Sequence[str]) -> str:
return max(words, key=lambda w: Counter(w).most_common()[0][1])
words = ['this', 'is', 'an', 'elementary', 'test', 'example']
print(most_repeating_word(words))
elementary
The nice thing about this approach, we can use a comprehension to return the tuple of letter frequencies in each word to resolve ties in most repeated letter, i.e. words with the same number of most repeated letters can be ordered with respect to their second most repeated letter, etc:
from collections.abc import Sequence
from collections import Counter
def most_repeating_word(words: Sequence[str]) -> str:
return sorted(words, key=lambda w: tuple(freq for _, freq in Counter(w).most_common()))
words = ['this', 'is', 'an', 'elementary', 'test', 'example', 'bannana']
print(most_repeating_word(words))
['is', 'an', 'this', 'test', 'example', 'elementary', 'bannana']
This way, “bannana” is ahead of “elementary”. Because “elementary” has one repeated letter which appears three times (i.e. ’e’). So does “bannana” with ‘a’. But “bannana”’s second most repeated letter (i.e. ’n’) also appears three times, which is more than “elementary”’s second most repeated letter.
solution
The book’s implementation:
from collections import Counter
import operator
WORDS = ['this', 'is', 'an',
'elementary', 'test', 'example']
def most_repeating_letter_count(word):
return Counter(word).most_common(1)[0][1]
def most_repeating_word(words):
return max(words,
key=most_repeating_letter_count)
print(most_repeating_word(WORDS))
elementary
I like the idea of putting the Counter logic in its own named function to
isolate the awkward indexing. But I don’t see the need to pass an argument to
most_common() – it does nothing to alleviate the indexing that follows.
I also don’t know why it’s importing operator; it doesn’t make use of it?
beyond the exercise
greatest number of repeated vowels
-
problem
Instead of finding the word with the greatest number of repeated letters, find the word with the greatest number of repeated vowels.
-
attempts
This requires some sort of filtering.
We can either filter for vowels before we create a
Counterinstance or after, with the results ofmost_common().I think filtering after makes more sense. And I don’t think it will come with a peformance/efficiency cost.
Something like:
from collections.abc import Sequence from collections import Counter def most_repeating_word(words: Sequence[str]) -> str: return max(words, key=lambda w: tuple(count for letter, count in Counter(w).most_common() if letter in 'aeiou')) words = ['this', 'is', 'an', 'elementary', 'test', 'example', 'bannana'] print(most_repeating_word(words))elementaryAt this point, the
lambda’s getting a bit too long for comfort:from collections.abc import Sequence from collections import Counter def most_repeated_vowels_count(word: str) -> tuple[int]: freqs = Counter(word).most_common() return tuple(count for letter, count in freqs if letter in 'aeiou') def most_repeating_word(words: Sequence[str]) -> str: return max(words, key=most_repeated_vowels_count) words = ['this', 'is', 'an', 'elementary', 'test', 'example', 'bannana'] print(most_repeating_word(words))elementary
reading etc/passwd
-
problem
Write a program to read
/etc/passwdon a Unix computer. The first field contains the username, and the final field contains the user’s shell, the command inter- preter. Display the shells in decreasing order of popularity, such that the most pop- ular shell is shown first, the second most popular shell second, and so forth.
-
attempts
Looking at a snippet of a generated
/etc/passwd, our lines to parse will look something like:root:x:0:0:root:/root:/bin/bashEach field is separated by colons and the last field is what we care about.
So, IO aside, we just need to:
- read each line in the file
- extract each shell in line
- pass the shells to
Counter - use
most_common()to print the shells from most to least common
Counterandmost_common():from collections import Counter def most_common_shell(path: str) -> None: with open(path, 'r') as f: shells = [line.strip().split(':')[-1] for line in f.readlines()] freqs = Counter(shells) for shell, _ in Counter(shells).most_common(): print(shell) most_common_shell("/files/sample_passwd")/usr/sbin/nologin /usr/bin/fish /bin/sh /bin/bash /bin/zsh /usr/bin/zsh /bin/dash /usr/bin/bash /bin/sync /usr/bin/dashWe could reduce iterations by building
Counterprogressively, reading lines one at a time. But this is fine.
sorted usernames
-
problem
For an added challenge, after displaying each shell, also show the usernames (sorted alphabetically) who use each of those shells.
-
attempts
The only difference here is that we need to keep track of which shell is used by which user.
But this shouldn’t be too hard. Because the parsing isn’t any more complicated and we can just maintain a dictionary of which shells are used by which users for the final printing using
collections.defaultdict:from collections import Counter, defaultdict def most_common_shell(path: str) -> None: with open(path, 'r') as f: shell_to_user = defaultdict(list) shell_counter = Counter() for line in f.readlines(): fields = line.strip().split(':') user, shell = fields[0], fields[-1] shell_to_user[shell].append(user) shell_counter[shell] += 1 for shell, _ in shell_counter.most_common(): print(f"{shell} is used by {sorted(shell_to_user[shell])}") most_common_shell("/files/sample_passwd")/usr/sbin/nologin is used by ['bin', 'daemon', 'nobody', 'sys', 'systemd-coredump'] /usr/bin/fish is used by ['diana', 'docker', 'games', 'henry', 'proxy'] /bin/sh is used by ['eve', 'james', 'list', 'lp', 'uucp'] /bin/bash is used by ['gnats', 'grace', 'root', 'www-data'] /bin/zsh is used by ['alice', 'backup', 'isabel'] /usr/bin/zsh is used by ['frank', 'man'] /bin/dash is used by ['charlie', 'mail'] /usr/bin/bash is used by ['bob', 'news'] /bin/sync is used by ['sync'] /usr/bin/dash is used by ['irc']Manually incrementing a counter like this just makes it a
defaultdict(int)at this point with the addition of some useful methods.Overall, I don’t think it’s perfect. But it does the job.