hackerrank: sherlock and the valid string
problem
Sherlock and the Valid String | HackerRank
attempts
final
There were too many attempts for me to justify including them all. The crux here was handling the edge cases.
Because, on the surface, it seems like we can easily solve the problem with a
collections.Counter. Using it, we’d get back the frequencies of each character
in the input string. So, checking the simplest case where all the characters
appear the same number of times would be as simple as checking that the size of
the set of the character frequencies is 1. All that’s then left would be to
handle the case where the set happens to have 2 frequencies; any more than that
and we know the string to be invalid.
With that said, handling the case where the set of frequencies is of size 2 isn’t as straightforward. Because there isn’t enough information in a set of 2 frequencies to test for validity.
In particular, a set of frequencies doesn’t tell us how many characters have those frequencies – it only tells us how many different character frequencies there are.
What we really need is the frequency distribution of frequencies (cumulative frequency?).
If we have this, the conditions for validity become clear in the case where there are only 2 distinct character frequencies in the string:
- The less frequent of the 2 frequencies must occur only once (we can only remove 1 character from the string, so there can be at most one differing frequency) AND
- the less frequent of the 2 frequencies must either be 1 (so that removing it would remove that only occurency of that character from the string) or exactly one more than the more frequent of the 2 frequencies (so that removing one occurence of the character will remove all discrepancies)
We can get this cumulative frequency using another Counter instance built atop
of the first.
All together, this gives:
from collections import Counter
def isValid(s):
c = Counter(s)
freqs = [freq for _, freq in c.items()]
freqs_counter = Counter(freqs)
if len(freqs_counter) == 1:
return 'YES'
if len(freqs_counter) == 2:
(f1, c1), (f2, c2) = freqs_counter.most_common()
if c2 == 1 and (f2-1 == f1 or f2 == 1):
return 'YES'
return 'NO'
print(isValid("aabbc"))
YES
solutions
I like the double Counter method. But what did others come up with?
I like this python solution in the dicussions:
from collections import Counter
def isValid(s):
sv = sorted(Counter(s).values()) # Count character frequencies and sort them
# If all frequencies are the same, it's valid
if len(set(sv)) == 1:
return "YES"
# Check cases where one frequency can be removed or adjusted to match the others
if sv[0] == 1 and sv[1] == sv[-1]:
return "YES"
# Check if we can make it valid by reducing the highest frequency by one
if sv[-1] - sv[-2] == 1 and sv[0] == sv[-2]:
return "YES"
return "NO"
By sorting the values, the test case sv[1] == sv[-1] is a clean way of checking
that the remaining character frequencies are all equal.
I also like how it’s segmented into 3 clear cases.
It’s use of values() is also nice - we should steal it. Doing so would do away
with our list comprehension, giving a cleaner final solution:
from collections import Counter
def isValid(s):
freqs_counter = Counter(Counter(s).values())
if len(freqs_counter) == 1:
return 'YES'
if len(freqs_counter) == 2:
(freq1, count1), (freq2, count2) = freqs_counter.most_common()
if count2 == 1 and (freq2-1 == freq1 or freq2 == 1):
return 'YES'
return 'NO'
print(isValid("aabbc"))
YES
This is the new and improved version I’m happy with!