python workout: exercise 16
dictdiff
problem
Write a function,
dictdiff, that takes two dicts as arguments. The function returns a new dict that expresses the difference between the two dicts.If there are no differences between the dicts,
dictdiffreturns an empty dict. For each key-value pair that differs, the return value ofdictdiffwill have a key-value pair in which the value is a list containing the values from the two different dicts. If one of the dicts doesn’t contain that key, it should containNone. The following provides some examples:d1 = {'a':1, 'b':2, 'c':3} d2 = {'a':1, 'b':2, 'c':4} print(dictdiff(d1, d1)) print(dictdiff(d1, d2))d3 = {'a':1, 'b':2, 'd':3} d4 = {'a':1, 'b':2, 'c':4} print(dictdiff(d3, d4))d5 = {'a':1, 'b':2, 'd':4} print(dictdiff(d1, d5)):+results:
{'c': [3, None], 'd': [None, 4]}
attempts
first
Imagine if this was recursive (i.e. nested dicts) 😌.
From the problem statement, it really just seems like we should expect simple values; no complex ones like lists, tuples, or other dictionaries.
And even if we did, we could do the simple thing and enforce an exact match
between values in different dictionaries. The equality operator (i.e. ==) should
be sufficient for that.
In that spirit, I think a solution should be just as simple as iterating through each key-value pair in the first dictionary and deciding whether or not the value is the same as the one stored at the same key in the other dictionary.
Something like:
from typing import Hashable, Any
def dictdiff(d1: dict[Hashable, Any], d2: dict[Hashable, Any]) -> dict[Hashable, list[Any]]:
diff = {}
seen = set()
for key, value in d1.items():
if key in d2:
if value != d2[key]:
diff[key] = [value, d2[key]]
else:
diff[key] = [value, None]
seen.add(key)
for key, value in d2.items():
if key in seen:
continue
diff[key] = [None, value]
return diff
d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d1, d1))
print(dictdiff(d1, d2))
print()
d3 = {'a':1, 'b':2, 'd':3}
d4 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d3, d4))
print()
d5 = {'a':1, 'b':2, 'd':4}
print(dictdiff(d1, d5))
{}
{'c': [3, 4]}
{'d': [3, None], 'c': [None, 4]}
{'c': [3, None], 'd': [None, 4]}
This works. But I’m not happy with it.
Iterating twice isn’t nice, nor is keeping track of the keys we’ve visited in d1
so that we skip them when adding the keys in d2 that are not in d1.
second
The alternative that comes to mind is to initiate diff to a deep copy of d1. And
only iterate d2’s items. That way, we are sure to have all the keys and values
of d1 and only need to look at d2’s items to delete and modify the existing
pairs while adding the necessary ones.
Something like:
from typing import Hashable, Any
from copy import deepcopy
def dictdiff(d1: dict[Hashable, Any], d2: dict[Hashable, Any]) -> dict[Hashable, list[Any]]:
diff = deepcopy(d1)
for key, val2 in d2.items():
val1 = diff.get(key, None)
if val1:
if val1 == val2:
del diff[key]
else:
diff[key] = [val1, val2]
else:
diff[key] = [None, val2]
# Need something here to handle the keys that are only in d1 and not d2
But this won’t work particularly well either – we need a way of handling all
the keys that are present in d1 and not d2.
…
third
Actually, we could get around this by initialising diff differently. We’d just
lose deepcopy in the process, restricting dictdiff’s domain:
from typing import Hashable, Any
from copy import deepcopy
def dictdiff(d1: dict[Hashable, Any], d2: dict[Hashable, Any]) -> dict[Hashable, list[Any]]:
diff = {key:[val,None] for key,val in d1.items()}
for key, val2 in d2.items():
val1, _ = diff.get(key, [None, None])
if val1:
if val1 == val2:
del diff[key]
else:
diff[key] = [val1, val2]
else:
diff[key] = [None, val2]
return diff
d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d1, d1))
print(dictdiff(d1, d2))
print()
d3 = {'a':1, 'b':2, 'd':3}
d4 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d3, d4))
print()
d5 = {'a':1, 'b':2, 'd':4}
print(dictdiff(d1, d5))
{}
{'c': [3, 4]}
{'d': [3, None], 'c': [None, 4]}
{'c': [3, None], 'd': [None, 4]}
Again, it works. But god is it inelegant as well (and stunted without deepcopy).
…
fourth
I think I’ve found it.
We want to use the get() method with a None as default and simply iterate
through the set of keys from both dicts, constructing a new output dictionary as
we go:
from typing import Hashable, Any
def dictdiff(d1: dict[Hashable, Any], d2: dict[Hashable, Any]) -> dict[Hashable, list[Any]]:
diff = {}
keys = set(d1) | set(d2)
for key in keys:
val1, val2 = d1.get(key, None), d2.get(key, None)
if val1 != val2:
diff[key] = [val1,val2]
return diff
d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d1, d1))
print(dictdiff(d1, d2))
print()
d3 = {'a':1, 'b':2, 'd':3}
d4 = {'a':1, 'b':2, 'c':4}
print(dictdiff(d3, d4))
print()
d5 = {'a':1, 'b':2, 'd':4}
print(dictdiff(d1, d5))
{}
{'c': [3, 4]}
{'c': [None, 4], 'd': [3, None]}
{'c': [3, None], 'd': [None, 4]}
Much better!
We iterate through the union of the set of each dictionary’s keys, adding only the mismatched values to the final output – simple.
solution
The book’s implementation:
def dictdiff(first, second):
output = {}
all_keys = first.keys() | second.keys()
for key in all_keys:
if first.get(key) != second.get(key):
output[key] = [first.get(key),
second.get(key)]
return output
d1 = {'a':1, 'b':2, 'c':3}
d2 = {'a':1, 'b':2, 'd':4}
print(dictdiff(d1, d2))
{'c': [3, None], 'd': [None, 4]}
I didn’t know the dict_keys returned by keys() was a set-like object. It would
have saved us the conversion in our final attempt.
I also didn’t know that get() returns None by default. It would have saved us
the explicit default argument.
The one thing I don’t understand though, is why not extract the values into
dedicated valules to avoid the duplications of get()?
beyond the exercise
merge dicts
-
problem
The
dict.updatemethod merges two dicts. Write a function that takes any number of dicts and returns a dict that reflects the combination of all of them. If the same key appears in more than one dict, then the most recently merged dict’s value should appear in the output.
-
attempts
I assume the “most recently merged” dict is just the last one we merge as part of iterating through the dict arguments. As in, to merge an arbitrary number of dicts, we will need to iterate through each dict argument one by one. And that order will just be the argument order for the sake of simplicity.
It’s then just a question of building an
outputdict as before:def dict_merge(*dicts): output = {} for d in dicts: output.update(d) return output test_dicts = [ {}, {"a": 1}, {"b": 2}, {"a": 1, "b": 2}, {"x": 3, "y": 4}, {"a": 10, "z": 5}, {"a": 1, "b": 2, "c": 3}, {"b": 20, "d": 4}, {"a": None, "e": "test"}, {"f": [1, 2], "g": {"h": 3}} ] print(dict_merge(*test_dicts)){'a': None, 'b': 20, 'x': 3, 'y': 4, 'z': 5, 'c': 3, 'd': 4, 'e': 'test', 'f': [1, 2], 'g': {'h': 3}}
even-odd dict
-
problem
Write a function that takes any even number of arguments and returns a dict based on them. The even-indexed arguments become the dict keys, while the odd-numbered arguments become the dict values. Thus, calling the function with the arguments (‘a’, 1, ‘b’, 2) will result in the dict {‘a’:1, ‘b’:2} being returned.
-
attempts
This should be straightforward. We can enforce the arguments be even by throwing a
ValueErrorand we can use slicing andzipto construct the dictionary in a single iteration. We’ll call the functionlist2dict:def list2dict(*args): if len(args) % 2 == 1: raise ValueError(f"list2dict takes an even number of arguments: {len(args)} were given") return {key:value for key, value in zip(args[::2], args[1::2])} print(list2dict('a', 1, 'b', 2)){'a': 1, 'b': 2}Clean and simple.
We could also save the slicing and use
itertools.batched()instead:from itertools import batched def list2dict(*args): if len(args) % 2 == 1: raise ValueError(f"list2dict takes an even number of arguments: {len(args)} were given") return {key:value for key, value in batched(args, 2)} print(list2dict('a', 1, 'b', 2)){'a': 1, 'b': 2}
dict partitioning
-
problem
Write a function ,
dict_partition, that takes one dict (d) and a function (f) as arguments.dict_partitionwill return two dicts, each containing key-value pairs fromd. The decision regarding where to put each of the key-value pairs will be made according to the output fromf, which will be run on each key- value pair ind. IffreturnsTrue, then the key-value pair will be put in the first output dict. IffreturnsFalse, then the key-value pair will be put in the second output dict.
-
attempst
I’m assuming
fwill take a key value pair as input and not just one or the other.This just boils down to iterating over key value pairs:
def dict_partition(d, f): d1, d2 = {}, {} for key, value in d.items(): if f(key,value): d1[key] = value else: d2[key] = value return d1, d2 test_dict = { 'name': 'Alice', 'age': 30, 'city': 'Berlin', 'job': 'Engineer', 'active': True, 'salary': 75000, 'hobby': 'photography' } test_f = lambda k, v: len(k) > 3 print(dict_partition(test_dict, test_f))({'name': 'Alice', 'city': 'Berlin', 'active': True, 'salary': 75000, 'hobby': 'photography'}, {'age': 30, 'job': 'Engineer'})