back

python workout: exercise 7

ubbi dubbi

problem

This exercise is meant to help you practice thinking in this way. Here, you’ll implement a translator from English into another secret children’s language, Ubbi Dubbi (http://mng.bz/90zl). (This was popularized on the wonderful American children’s program Zoom, which was on television when I was growing up.) The rules of Ubbi Dubbi are even simpler than those of Pig Latin, although programming a translator is more complex and requires a bit more thinking.

In Ubbi Dubbi, every vowel (a, e, i, o, or u) is prefaced with ub. Thus milk becomes mubilk (m-ub-ilk) and program becomes prubogrubam (prub-ogrub-am). In theory, you only put an ub before every vowel sound, rather than before each vowel. Given that this is a book about Python and not linguistics, I hope that you’ll forgive this slight dif- ference in definition.

Ubbi Dubbi is enormously fun to speak, and it’s somewhat magical if and when you can begin to understand someone else speaking it. Even if you don’t understand it, Ubbi Dubbi sounds extremely funny. See some YouTube videos on the subject, such as http://mng.bz/aRMY, if you need convincing.

For this exercise, you’ll write a function (called ubbi_dubbi) that takes a single word (string) as an argument. It returns a string, the word’s translation into Ubbi Dubbi. So if the function is called with octopus, the function will return the string uboctubopubus. And if the user passes the argument elephant, you’ll output ubelubephubant.

As with the original Pig Latin translator, you can ignore capital letters, punctuation, and corner cases, such as multiple vowels combining to create a new sound. When you do have two vowels next to one another, preface each of them with ub. Thus, soap will become suboubap, despite the fact that oa combines to a single vowel sound.

attempts

The first thing that comes to mind is our previous pig latin implementation and the book’s point that ubbi dubbi translation might require multiple string modifications.

So, simple strong formatting won’t be enough in this case.

But I’m struggling to see why the construct-a-list-of-substrings-to-join-into-string approach couldn’t work here:

def ubbi_dubbi(word: str) -> str:
    translation = []
    for letter in word:
        if letter in 'aeiou':
            translation.append('ub')
        translation.append(letter)
    return ''.join(translation)

print(ubbi_dubbi('octopus'))
print(ubbi_dubbi('elephant'))
uboctubopubus
ubelubephubant

solution

The book’s implementation:

def ubbi_dubbi(word):
    output = []
    for letter in word:
        if letter in 'aeiou':
            output.append(f'ub{letter}')
        else:
            output.append(letter)
    return ''.join(output)

print(ubbi_dubbi('python'))
pythubon

What’s interesting is that I considered using string formatting inside the if clause. But I decied against it to avoid the explicit else. Plus I like the continuity of always adding a word’s letter to translation ==, only intervening to make an addition.

beyond the exercise

handle capitalised words

  • problem

    If a word is capitalized (i.e., the first letter is capital- ized, but the rest of the word isn’t), then the Ubbi Dubbi translation should be similarly capitalized.

  • attempts

    Again, we didn’t something similar with pig latin in exercise 5. What stops us from doing the same here?

    We only need to check whether the first letter is capitalised and act accordingly.

    So we could perform that check at the start, before the iteration.

    But we have to be careful because if the word starts with a vowel, we would have to prepend “ub” and the ‘u’ would need to be capitalised…

    Why not be recursive then?

    We could treat the first letter of a word as a word, call ubbi_dubbi on it, and capitalise its first letter if the original first letter was.

    But we would still have to take care in the case where the first letter is capitalised – we can’t pass a capital letter to our existing implementation of ubbi_dubbi.

    If we make the assumption that if we receive a capitalised word, only the first letter will ever be capitalised. Then it would simplify things.

    But there’s no need to do that here.

    Let’s just go with the most naive approach and not try to remove repetition from the start:

    def ubbi_dubbi(word: str) -> str:
        translation = []
    
        first_letter = word[0]
        if first_letter in 'aeiouAEIOU':
            if first_letter == first_letter.upper():
                translation.append(f'Ub{first_letter.lower()}')
            else:
                translation.append(f'ub{first_letter}')
        else:
            translation.append(first_letter)
    
        for letter in word[1:]:
            if letter in 'aeiou':
                translation.append('ub')
            translation.append(letter)
    
        return ''.join(translation)
    
    print(ubbi_dubbi('octopus'))
    print(ubbi_dubbi('Octopus'))
    print(ubbi_dubbi('python'))
    print(ubbi_dubbi('Python'))
    
    uboctubopubus
    Uboctubopubus
    pythubon
    Pythubon
    

    This works, but I don’t like the duplication of the translation logic for the first letter.

    Wouldn’t it be easier to just maintain a boolean of whether the word is capitalised or not and make the change right at the end?

    Like so:

    def ubbi_dubbi(word: str) -> str:
        is_capitalised = word[0] == word[0].upper()
        translation = []
        for letter in word:
            if letter in 'aeiou':
                translation.append('u')
                translation.append('b')
            translation.append(letter)
    
        if is_capitalised:
            translation[0] = translation[0].upper()
        return ''.join(translation)
    
    print(ubbi_dubbi('octopus'))
    print(ubbi_dubbi('Octopus'))
    print(ubbi_dubbi('python'))
    print(ubbi_dubbi('Python'))
    
    uboctubopubus
    Octubopubus
    pythubon
    Pythubon
    

    I massively prefer this. And all it takes is to split the addition of the ‘ub’ substring into 2 appends.

remove author names

  • problem

    In academia, it’s common to remove the authors’ names from a paper submitted for peer review. Given a string containing an article and a separate list of strings containing authors’ names, replace all names in the article with _ characters.

  • attempts

    This feels like it’s coming a bit out of left field.

    Shouldn’t we just make use of str.replace here?

    The hint being the requirement to “replace all names in the article”.

    I guess the only trouble we could have here is if there are overlapping names maybe? Namely, if one author’s name is the substring of another’s, e.g. “Sam” and “Samantha”. In which case, performing str.replace('Sam') before str.replace('Samantha') would ruin the latter.

    But we could mititate this by working at the word level. That is, if we assume each author’s name appears in the article padded by a space on either side. That way we could do str.replace(' Sam ') instead.

    But this assumption might be too constraining – not generalisable enough.

    The other option that comes to mind is to just use a sliding window of some kind then. Or to first split the article into words and perform a substitution using for each author name.

    I think all this can get pretty complicated depending on how many edge cases we would want this function to cover.

    Since the book has been fairly generous with assumptions so far. I think it’s fair to assume each name will appear as a word, i.e. a string separated by whitespace without punctuation.

    In which case, I want to go with the splitting substitution approach:

    def author_removal(article: str, authors: list[str]) -> str:
        words = article.split()
        cleaned_article = ['_' if w in set(authors) else w for w in words]
        return ' '.join(cleaned_article)
    
    article = """Joe et al, developped a new and incredible drug. In collaboratoin with Jane Joe managed to synthesise this new and wonderful substance that could cure all ailments. And after long clinical trials ran by Jane the drug is now ready for the market."""
    print(author_removal(article, authors=['Joe', 'Jane']))
    
    _ et al, developped a new and incredible drug. In collaboratoin with _ _ managed to synthesise this new and wonderful substance that could cure all ailments. And after long clinical trials ran by _ the drug is now ready for the market.
    

    It’s not the most thorough solution. But it works. And the principle is easily extendable with more sophisticated splitting. So let’s leave it there.

url-encode characters

  • problem

    In URLs, we often replace special and nonprintable characters with a % followed by the character’s ASCII value in hexadecimal. For example, if a URL is to include a space character (ASCII 32, aka 0x20), we replace it with %20. Given a string, URL-encode any character that isn’t a letter or number. For the purposes of this exercise, we’ll assume that all characters are indeed in ASCII (i.e., one byte long), and not multibyte UTF-8 characters. It might help to know about the ord (http://mng.bz/EdnJ) and hex (http://mng .bz/nPxg) functions.

  • attempts

    With a bit of research, seems we can use str.isalnum to test whether a character is alphanumeric. From there, we should be able to just use ord to get the ascii the ASCII value of our non-alphanumeric character in base 10. Before converting that to its string base 16 representation using hex:

    def url_encode(string: str) -> str:
        encoded = []
        for char in string:
            if char.isalnum():
                encoded.append(char)
            else:
                encoded.append(f'%{hex(ord(char))[2:]}')
        return "".join(encoded)
    
    print(url_encode("a test string#"))
    

    That should do it.

    By assumption our input won’t be any unicode characters, so we don’t need to check for it.

mail@jonahv.com