back

python workout: exercise 16

dictdiff

problem

Write a function, dictdiff, that takes two dicts as arguments. The function returns a new dict that expresses the difference between the two dicts.

If there are no differences between the dicts, dictdiff returns an empty dict. For each key-value pair that differs, the return value of dictdiff will have a key-value pair in which the value is a list containing the values from the two different dicts. If one of the dicts doesn’t contain that key, it should contain None. The following provides some examples:

d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d1, d1))
print(dictdiff(d1, d2))
d3 = {'a':1, 'b':2, 'd':3}
d4 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d3, d4))
d5 = {'a':1, 'b':2, 'd':4}
print(dictdiff(d1, d5))

:+results:

{'c': [3, None], 'd': [None, 4]}

attempts

first

Imagine if this was recursive (i.e. nested dicts) 😌.

From the problem statement, it really just seems like we should expect simple values; no complex ones like lists, tuples, or other dictionaries.

And even if we did, we could do the simple thing and enforce an exact match between values in different dictionaries. The equality operator (i.e. ==) should be sufficient for that.

In that spirit, I think a solution should be just as simple as iterating through each key-value pair in the first dictionary and deciding whether or not the value is the same as the one stored at the same key in the other dictionary.

Something like:

from typing import Hashable, Any

def dictdiff(d1: dict[Hashable, Any], d2: dict[Hashable, Any]) -> dict[Hashable, list[Any]]:
    diff = {}
    seen = set()
    for key, value in d1.items():
        if key in d2:
            if value != d2[key]:
                diff[key] = [value, d2[key]]
        else:
            diff[key] = [value, None]
        seen.add(key)

    for key, value in d2.items():
        if key in seen:
            continue
        diff[key] = [None, value]
    return diff

d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d1, d1))
print(dictdiff(d1, d2))
print()
d3 = {'a':1, 'b':2, 'd':3}
d4 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d3, d4))
print()
d5 = {'a':1, 'b':2, 'd':4}
print(dictdiff(d1, d5))
{}
{'c': [3, 4]}

{'d': [3, None], 'c': [None, 4]}

{'c': [3, None], 'd': [None, 4]}

This works. But I’m not happy with it.

Iterating twice isn’t nice, nor is keeping track of the keys we’ve visited in d1 so that we skip them when adding the keys in d2 that are not in d1.

second

The alternative that comes to mind is to initiate diff to a deep copy of d1. And only iterate d2’s items. That way, we are sure to have all the keys and values of d1 and only need to look at d2’s items to delete and modify the existing pairs while adding the necessary ones.

Something like:

from typing import Hashable, Any
from copy import deepcopy

def dictdiff(d1: dict[Hashable, Any], d2: dict[Hashable, Any]) -> dict[Hashable, list[Any]]:
    diff = deepcopy(d1)
    for key, val2 in d2.items():
        val1 = diff.get(key, None)
        if val1:
            if val1 == val2:
                del diff[key]
            else:
                diff[key] = [val1, val2]
        else:
            diff[key] = [None, val2]

        # Need something here to handle the keys that are only in d1 and not d2

But this won’t work particularly well either – we need a way of handling all the keys that are present in d1 and not d2.

third

Actually, we could get around this by initialising diff differently. We’d just lose deepcopy in the process, restricting dictdiff’s domain:

from typing import Hashable, Any
from copy import deepcopy

def dictdiff(d1: dict[Hashable, Any], d2: dict[Hashable, Any]) -> dict[Hashable, list[Any]]:
    diff = {key:[val,None] for key,val in d1.items()}
    for key, val2 in d2.items():
        val1, _ = diff.get(key, [None, None])
        if val1:
            if val1 == val2:
                del diff[key]
            else:
                diff[key] = [val1, val2]
        else:
            diff[key] = [None, val2]
    return diff

d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d1, d1))
print(dictdiff(d1, d2))
print()
d3 = {'a':1, 'b':2, 'd':3}
d4 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d3, d4))
print()
d5 = {'a':1, 'b':2, 'd':4}
print(dictdiff(d1, d5))
{}
{'c': [3, 4]}

{'d': [3, None], 'c': [None, 4]}

{'c': [3, None], 'd': [None, 4]}

Again, it works. But god is it inelegant as well (and stunted without deepcopy).

fourth

I think I’ve found it.

We want to use the get() method with a None as default and simply iterate through the set of keys from both dicts, constructing a new output dictionary as we go:

from typing import Hashable, Any

def dictdiff(d1: dict[Hashable, Any], d2: dict[Hashable, Any]) -> dict[Hashable, list[Any]]:
    diff = {}
    keys = set(d1) | set(d2)
    for key in keys:
        val1, val2 = d1.get(key, None), d2.get(key, None)
        if val1 != val2:
            diff[key] = [val1,val2]
    return diff

d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d1, d1))
print(dictdiff(d1, d2))
print()
d3 = {'a':1, 'b':2, 'd':3}
d4 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d3, d4))
print()
d5 = {'a':1, 'b':2, 'd':4}
print(dictdiff(d1, d5))
{}
{'c': [3, 4]}

{'c': [None, 4], 'd': [3, None]}

{'c': [3, None], 'd': [None, 4]}

Much better!

We iterate through the union of the set of each dictionary’s keys, adding only the mismatched values to the final output – simple.

solution

The book’s implementation:

def dictdiff(first, second):
    output = {}
    all_keys = first.keys() | second.keys()
    for key in all_keys:
        if first.get(key) != second.get(key):
            output[key] = [first.get(key),
                           second.get(key)]
    return output

d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'd':4}
print(dictdiff(d1, d2))
{'c': [3, None], 'd': [None, 4]}

I didn’t know the dict_keys returned by keys() was a set-like object. It would have saved us the conversion in our final attempt.

I also didn’t know that get() returns None by default. It would have saved us the explicit default argument.

The one thing I don’t understand though, is why not extract the values into dedicated valules to avoid the duplications of get()?

beyond the exercise

merge dicts

  • problem

    The dict.update method merges two dicts. Write a function that takes any number of dicts and returns a dict that reflects the combination of all of them. If the same key appears in more than one dict, then the most recently merged dict’s value should appear in the output.

  • attempts

    I assume the “most recently merged” dict is just the last one we merge as part of iterating through the dict arguments. As in, to merge an arbitrary number of dicts, we will need to iterate through each dict argument one by one. And that order will just be the argument order for the sake of simplicity.

    It’s then just a question of building an output dict as before:

    def dict_merge(*dicts):
        output = {}
        for d in dicts:
            output.update(d)
        return output
    
    test_dicts = [
        {},
        {"a": 1},
        {"b": 2},
        {"a": 1, "b": 2},
        {"x": 3, "y": 4},
        {"a": 10, "z": 5},
        {"a": 1, "b": 2, "c": 3},
        {"b": 20, "d": 4},
        {"a": None, "e": "test"},
        {"f": [1, 2], "g": {"h": 3}}
    ]
    print(dict_merge(*test_dicts))
    
    {'a': None, 'b': 20, 'x': 3, 'y': 4, 'z': 5, 'c': 3, 'd': 4, 'e': 'test', 'f': [1, 2], 'g': {'h': 3}}
    

even-odd dict

  • problem

    Write a function that takes any even number of arguments and returns a dict based on them. The even-indexed arguments become the dict keys, while the odd-numbered arguments become the dict values. Thus, calling the function with the arguments (‘a’, 1, ‘b’, 2) will result in the dict {‘a’:1, ‘b’:2} being returned.

  • attempts

    This should be straightforward. We can enforce the arguments be even by throwing a ValueError and we can use slicing and zip to construct the dictionary in a single iteration. We’ll call the function list2dict:

    def list2dict(*args):
        if len(args) % 2 == 1:
            raise ValueError(f"list2dict takes an even number of arguments: {len(args)} were given")
        return {key:value for key, value in zip(args[::2], args[1::2])}
    
    print(list2dict('a', 1, 'b', 2))
    
    {'a': 1, 'b': 2}
    

    Clean and simple.

    We could also save the slicing and use itertools.batched() instead:

    from itertools import batched
    def list2dict(*args):
        if len(args) % 2 == 1:
            raise ValueError(f"list2dict takes an even number of arguments: {len(args)} were given")
        return {key:value for key, value in batched(args, 2)}
    
    print(list2dict('a', 1, 'b', 2))
    
    {'a': 1, 'b': 2}
    

dict partitioning

  • problem

    Write a function , dict_partition, that takes one dict (d) and a function (f) as arguments. dict_partition will return two dicts, each containing key-value pairs from d. The decision regarding where to put each of the key-value pairs will be made according to the output from f, which will be run on each key- value pair in d. If f returns True, then the key-value pair will be put in the first output dict. If f returns False, then the key-value pair will be put in the second output dict.

  • attempst

    I’m assuming f will take a key value pair as input and not just one or the other.

    This just boils down to iterating over key value pairs:

    def dict_partition(d, f):
        d1, d2 = {}, {}
        for key, value in d.items():
            if f(key,value):
                d1[key] = value
            else:
                d2[key] = value
        return d1, d2
    
    test_dict = {
        'name': 'Alice',
        'age': 30,
        'city': 'Berlin',
        'job': 'Engineer',
        'active': True,
        'salary': 75000,
        'hobby': 'photography'
    }
    test_f = lambda k, v: len(k) > 3
    print(dict_partition(test_dict, test_f))
    
    ({'name': 'Alice', 'city': 'Berlin', 'active': True, 'salary': 75000, 'hobby': 'photography'}, {'age': 30, 'job': 'Engineer'})
    
mail@jonahv.com