back

hackerrank: sherlock and the valid string

problem

Sherlock and the Valid String | HackerRank

attempts

final

There were too many attempts for me to justify including them all. The crux here was handling the edge cases.

Because, on the surface, it seems like we can easily solve the problem with a collections.Counter. Using it, we’d get back the frequencies of each character in the input string. So, checking the simplest case where all the characters appear the same number of times would be as simple as checking that the size of the set of the character frequencies is 1. All that’s then left would be to handle the case where the set happens to have 2 frequencies; any more than that and we know the string to be invalid.

With that said, handling the case where the set of frequencies is of size 2 isn’t as straightforward. Because there isn’t enough information in a set of 2 frequencies to test for validity.

In particular, a set of frequencies doesn’t tell us how many characters have those frequencies – it only tells us how many different character frequencies there are.

What we really need is the frequency distribution of frequencies (cumulative frequency?).

If we have this, the conditions for validity become clear in the case where there are only 2 distinct character frequencies in the string:

  1. The less frequent of the 2 frequencies must occur only once (we can only remove 1 character from the string, so there can be at most one differing frequency) AND
  2. the less frequent of the 2 frequencies must either be 1 (so that removing it would remove that only occurency of that character from the string) or exactly one more than the more frequent of the 2 frequencies (so that removing one occurence of the character will remove all discrepancies)

We can get this cumulative frequency using another Counter instance built atop of the first.

All together, this gives:

from collections import Counter
def isValid(s):
    c = Counter(s)
    freqs = [freq for _, freq in c.items()]
    freqs_counter = Counter(freqs)
    if len(freqs_counter) == 1:
        return 'YES'
    if len(freqs_counter) == 2:
        (f1, c1), (f2, c2) = freqs_counter.most_common()
        if c2 == 1 and (f2-1 == f1 or f2 == 1):
            return 'YES'
    return 'NO'
print(isValid("aabbc"))
YES

solutions

I like the double Counter method. But what did others come up with?

I like this python solution in the dicussions:

from collections import Counter

def isValid(s):
    sv = sorted(Counter(s).values())  # Count character frequencies and sort them

    # If all frequencies are the same, it's valid
    if len(set(sv)) == 1:
        return "YES"

    # Check cases where one frequency can be removed or adjusted to match the others
    if sv[0] == 1 and sv[1] == sv[-1]:
        return "YES"

    # Check if we can make it valid by reducing the highest frequency by one
    if sv[-1] - sv[-2] == 1 and sv[0] == sv[-2]:
        return "YES"

    return "NO"

By sorting the values, the test case sv[1] == sv[-1] is a clean way of checking that the remaining character frequencies are all equal.

I also like how it’s segmented into 3 clear cases.

It’s use of values() is also nice - we should steal it. Doing so would do away with our list comprehension, giving a cleaner final solution:

from collections import Counter
def isValid(s):
    freqs_counter = Counter(Counter(s).values())
    if len(freqs_counter) == 1:
        return 'YES'
    if len(freqs_counter) == 2:
        (freq1, count1), (freq2, count2) = freqs_counter.most_common()
        if count2 == 1 and (freq2-1 == freq1 or freq2 == 1):
            return 'YES'
    return 'NO'
print(isValid("aabbc"))
YES

This is the new and improved version I’m happy with!

mail@jonahv.com