python workout: exercise 7
ubbi dubbi
problem
This exercise is meant to help you practice thinking in this way. Here, you’ll implement a translator from English into another secret children’s language, Ubbi Dubbi (http://mng.bz/90zl). (This was popularized on the wonderful American children’s program Zoom, which was on television when I was growing up.) The rules of Ubbi Dubbi are even simpler than those of Pig Latin, although programming a translator is more complex and requires a bit more thinking.
In Ubbi Dubbi, every vowel (a, e, i, o, or u) is prefaced with ub. Thus
milkbecomesmubilk(m-ub-ilk) andprogrambecomes prubogrubam (prub-ogrub-am). In theory, you only put anubbefore every vowel sound, rather than before each vowel. Given that this is a book about Python and not linguistics, I hope that you’ll forgive this slight dif- ference in definition.Ubbi Dubbi is enormously fun to speak, and it’s somewhat magical if and when you can begin to understand someone else speaking it. Even if you don’t understand it, Ubbi Dubbi sounds extremely funny. See some YouTube videos on the subject, such as http://mng.bz/aRMY, if you need convincing.
For this exercise, you’ll write a function (called
ubbi_dubbi) that takes a single word (string) as an argument. It returns a string, the word’s translation into Ubbi Dubbi. So if the function is called withoctopus, the function will return the stringuboctubopubus. And if the user passes the argumentelephant, you’ll outputubelubephubant.As with the original Pig Latin translator, you can ignore capital letters, punctuation, and corner cases, such as multiple vowels combining to create a new sound. When you do have two vowels next to one another, preface each of them with
ub. Thus,soapwill becomesuboubap, despite the fact thatoacombines to a single vowel sound.
attempts
The first thing that comes to mind is our previous pig latin implementation and the book’s point that ubbi dubbi translation might require multiple string modifications.
So, simple strong formatting won’t be enough in this case.
But I’m struggling to see why the construct-a-list-of-substrings-to-join-into-string approach couldn’t work here:
def ubbi_dubbi(word: str) -> str:
translation = []
for letter in word:
if letter in 'aeiou':
translation.append('ub')
translation.append(letter)
return ''.join(translation)
print(ubbi_dubbi('octopus'))
print(ubbi_dubbi('elephant'))
uboctubopubus
ubelubephubant
solution
The book’s implementation:
def ubbi_dubbi(word):
output = []
for letter in word:
if letter in 'aeiou':
output.append(f'ub{letter}')
else:
output.append(letter)
return ''.join(output)
print(ubbi_dubbi('python'))
pythubon
What’s interesting is that I considered using string formatting inside the
if clause. But I decied against it to avoid the
explicit else. Plus I like the continuity of always
adding a word’s letter to translation ==, only
intervening to make an addition.
beyond the exercise
handle capitalised words
-
problem
If a word is capitalized (i.e., the first letter is capital- ized, but the rest of the word isn’t), then the Ubbi Dubbi translation should be similarly capitalized.
-
attempts
Again, we didn’t something similar with pig latin in exercise 5. What stops us from doing the same here?
We only need to check whether the first letter is capitalised and act accordingly.
So we could perform that check at the start, before the iteration.
But we have to be careful because if the word starts with a vowel, we would have to prepend “ub” and the ‘u’ would need to be capitalised…
…
Why not be recursive then?
We could treat the first letter of a word as a word, call
ubbi_dubbion it, and capitalise its first letter if the original first letter was.But we would still have to take care in the case where the first letter is capitalised – we can’t pass a capital letter to our existing implementation of
ubbi_dubbi.…
If we make the assumption that if we receive a capitalised word, only the first letter will ever be capitalised. Then it would simplify things.
But there’s no need to do that here.
Let’s just go with the most naive approach and not try to remove repetition from the start:
def ubbi_dubbi(word: str) -> str: translation = [] first_letter = word[0] if first_letter in 'aeiouAEIOU': if first_letter == first_letter.upper(): translation.append(f'Ub{first_letter.lower()}') else: translation.append(f'ub{first_letter}') else: translation.append(first_letter) for letter in word[1:]: if letter in 'aeiou': translation.append('ub') translation.append(letter) return ''.join(translation) print(ubbi_dubbi('octopus')) print(ubbi_dubbi('Octopus')) print(ubbi_dubbi('python')) print(ubbi_dubbi('Python'))uboctubopubus Uboctubopubus pythubon PythubonThis works, but I don’t like the duplication of the translation logic for the first letter.
Wouldn’t it be easier to just maintain a boolean of whether the word is capitalised or not and make the change right at the end?
Like so:
def ubbi_dubbi(word: str) -> str: is_capitalised = word[0] == word[0].upper() translation = [] for letter in word: if letter in 'aeiou': translation.append('u') translation.append('b') translation.append(letter) if is_capitalised: translation[0] = translation[0].upper() return ''.join(translation) print(ubbi_dubbi('octopus')) print(ubbi_dubbi('Octopus')) print(ubbi_dubbi('python')) print(ubbi_dubbi('Python'))uboctubopubus Octubopubus pythubon PythubonI massively prefer this. And all it takes is to split the addition of the ‘ub’ substring into 2 appends.
remove author names
-
problem
In academia, it’s common to remove the authors’ names from a paper submitted for peer review. Given a string containing an article and a separate list of strings containing authors’ names, replace all names in the article with _ characters.
-
attempts
This feels like it’s coming a bit out of left field.
Shouldn’t we just make use of
str.replacehere?The hint being the requirement to “replace all names in the article”.
I guess the only trouble we could have here is if there are overlapping names maybe? Namely, if one author’s name is the substring of another’s, e.g. “Sam” and “Samantha”. In which case, performing
str.replace('Sam')beforestr.replace('Samantha')would ruin the latter.But we could mititate this by working at the word level. That is, if we assume each author’s name appears in the article padded by a space on either side. That way we could do
str.replace(' Sam ')instead.But this assumption might be too constraining – not generalisable enough.
The other option that comes to mind is to just use a sliding window of some kind then. Or to first split the article into words and perform a substitution using for each author name.
I think all this can get pretty complicated depending on how many edge cases we would want this function to cover.
Since the book has been fairly generous with assumptions so far. I think it’s fair to assume each name will appear as a word, i.e. a string separated by whitespace without punctuation.
In which case, I want to go with the splitting substitution approach:
def author_removal(article: str, authors: list[str]) -> str: words = article.split() cleaned_article = ['_' if w in set(authors) else w for w in words] return ' '.join(cleaned_article) article = """Joe et al, developped a new and incredible drug. In collaboratoin with Jane Joe managed to synthesise this new and wonderful substance that could cure all ailments. And after long clinical trials ran by Jane the drug is now ready for the market.""" print(author_removal(article, authors=['Joe', 'Jane']))_ et al, developped a new and incredible drug. In collaboratoin with _ _ managed to synthesise this new and wonderful substance that could cure all ailments. And after long clinical trials ran by _ the drug is now ready for the market.It’s not the most thorough solution. But it works. And the principle is easily extendable with more sophisticated splitting. So let’s leave it there.
url-encode characters
-
problem
In URLs, we often replace special and nonprintable characters with a % followed by the character’s ASCII value in hexadecimal. For example, if a URL is to include a space character (ASCII 32, aka 0x20), we replace it with
%20. Given a string, URL-encode any character that isn’t a letter or number. For the purposes of this exercise, we’ll assume that all characters are indeed in ASCII (i.e., one byte long), and not multibyte UTF-8 characters. It might help to know about theord(http://mng.bz/EdnJ) andhex(http://mng .bz/nPxg) functions.
-
attempts
With a bit of research, seems we can use
str.isalnumto test whether a character is alphanumeric. From there, we should be able to just useordto get the ascii the ASCII value of our non-alphanumeric character in base 10. Before converting that to its string base 16 representation usinghex:def url_encode(string: str) -> str: encoded = [] for char in string: if char.isalnum(): encoded.append(char) else: encoded.append(f'%{hex(ord(char))[2:]}') return "".join(encoded) print(url_encode("a test string#"))That should do it.
By assumption our input won’t be any unicode characters, so we don’t need to check for it.