Managing (aka. cleaning up) a nested dictionary in Python

We are storing configurations. Let’s assume we are working on a store and we want to save prices for different products. But there are a set of configurations we have to consider¹:

We are working over multiple countries.
Every country has got some locations.
Every location has a set of articles.
For an article, I might have multiple sizes.
Finally, for every size I have got prices.

Now let’s say that we want to store all of that in a Python dictionary. Luckily, the set of options is not too large, so that we can easily store it in a nested dictionary, which looks something like that:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
prices = {
    "AT": {
        "Graz": {
            "shoe": {
                "size-42": 99.99
            }
        }
    },
    "DE": { ... }
}

Now this all works out fine. All my prices are pushed to a Kafka topic so that multiple downstream consumers can poll all the prices and have the live prices available at all time.

Then there is a deletion request - say this shoe in size 42 was the last article in Graz and this now also does not get sold any more. Ideally, I want to drop the full AT object from my prices, as there is essentially nothing more to store for Austria.

The challenge

Now the simple way is the following: Remove the size, then find out that the shoe is empty and remove the shoe. Next, find out that Graz now is empty and remove Graz. Finally, remove Austria. These are just some nested if statements.

But I did not study maths and development of algorithms to just write a bunch of nested if stamements! So let’s over-engineer the problem and find a proper soltion that can work with as many layers as needed.

My solution

So let’s take another approach. This definitely is a recursive problem. I did choose a semi-recursive concept, though.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
keys = ["AT", "Graz", "shoe", "size-42"]
prices = ...

def delete_attribute(keys, prices)

    children = [prices]

    for key in keys[:-1]:
        if key not in children[-1]:
            # If some key did not exist at
            # all, then do not go more down.
            # I then have nothing to do anyways.
            return
        children.append(children[-1][key])

    # Now, delete the desired size
    children[-1].pop(keys[-1], None)

    for key in reversed(keys[:-1]):
        if not children.pop(-1):
            children[-1].pop(key)
        else:
            # Dictionary not empty, stop
            # Do not remove anything further
            return

delete_attributes(keys, prices)

Note that as given in my case, the relevant key attributes are passed to me in an ordered list. This is important for the rest to work out.

I put the whole concept into a function. That way, I can easily stop at any time if required by just leaving the function. Now what happens:

I setup a children list which for now just holds the whole prices object.
Now I iterate over each key, except for the last one, in order. For every key element, I add another entry to the children array which just contains the dictionary for this respective key.
- If my key does not exist then this means that something went wrong. For example, assume that I wanted to delete the socks in size 43 from the Graz store. Since there are no socks in Graz, there is nothing to delete. So in the for loop when key == "socks", the check key not in children[-1] is True and we will stop.

Now the relevant state obviously is the children list. After the first for loop, it will look as follows:

1
2
3
4
5
6
[
  {'AT': {'Graz': {'shoe': {'size-42': 99.99}}}}, # This is `prices`
  {'Graz': {'shoe': {'size-42': 99.99}}}, # prices["AT"]
  {'shoe': {'size-42': 99.99}}, # prices["AT"]["Graz"]
  {'size-42': 99.99} # prices["AT"]["Graz"]["shoe"]
]

Note that every element of children is a reference to an already existing dictionary, and Dictionaries in Python are passed by reference (unless explicitly copied). This has two relevant advantages:

When editing an element of children then also the prices dictionary gets updated accordingly.
children does not really take a lot of storage space. Not that we should take about efficiency in Python.

Now in the last element of children, it is simple to remove the size-42 attribute. This is what happens in the next line.
Finally, there is another for loop, which does the cleanup. We will now traverse the same keys as before, but in reverse order. For every key, we pop the last element of children. This directly returns the dictionary to the currently considered key.
With this dictionary we just popped, we only do one thing: We check if it still has elements. The pop and check happens combined in line 20, and we then react on it accordingly:
- If the dictionary is not empty (the else part) then we stop here, as nothing else has to be done.
- If the dictionary is empty, then we want to remove this key from it’s parent dictionary. Since we popped the value of the current key, children[-1] already is the parent dictionary of the key, and a pop here is easy!
  We don’t even need a .pop(..., None) here, as we know that the key we want to pop definitely exists.

Conclusion

This is a totally overpowered version of cleaning up a nested dictionary. But i like it, as I can use the exact same code, no matter how deeply the dictionary is nested!

Please note that price data should be stored in another format, a database for example. But I did have to work with such a concept, altough it was not article-price-related. ↩︎

The challenge#

My solution#

Conclusion#

Comments

The challenge

My solution

Conclusion