python - Quickest way to dedupe list in dict

Tuesday, June 26, 2018

python - Quickest way to dedupe list in dict

I have a dict containing lists and need a fast way to dedupe the lists.

I know how to dedupe a list in isolation using the set() function, but in this case I want a fast way of iterating through the dict, deduping each list on the way.

hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]}

I'd like it to appear like;

hello = {'test1':[2,3,4,5,6], 'test2':[5,8,4,3,9]}

Though I don't necessarily need to have the original order of the lists preserved.

I've tried using a set like this, but it's not quite correct (it's not iterating properly and I'm losing the first key)

for key, value in hello.items(): goodbye = {key: set(value)}

>>> goodbye
{'test2': set([8, 9, 3, 4, 5])}

EDIT: Following PM 2Ring's comment below, I'm now populating the dict differently to avoid duplicates in the first place. Previously I was using lists, but using sets prevents dupes to be appended by default;

>>> my_numbers = {}
>>> my_numbers['first'] = [1,2,2,2,6,5]
>>> from collections import defaultdict
>>> final_list = defaultdict(set)

>>> for n in my_numbers['first']: final_list['test_first'].add(n)
... 
>>> final_list['test_first']
set([1, 2, 5, 6])

As you can see, the final output is a deduped set, as required.

Blog

Tuesday, June 26, 2018