I have a dict containing lists and need a fast way to dedupe the lists.
I know how to dedupe a list in isolation using the set() function, but in this case I want a fast way of iterating through the dict, deduping each list on the way.
hello = {'test1':[2,3,4,2,2,5,6], 'test2':[5,5,8,4,3,3,8,9]}
I'd like it to appear like;
hello = {'test1':[2,3,4,5,6], 'test2':[5,8,4,3,9]}
Though I don't necessarily need to have the original order of the lists preserved.
I've tried using a set like this, but it's not quite correct (it's not iterating properly and I'm losing the first key)
for key, value in hello.items(): goodbye = {key: set(value)}
>>> goodbye
{'test2': set([8, 9, 3, 4, 5])}
EDIT: Following PM 2Ring's comment below, I'm now populating the dict differently to avoid duplicates in the first place. Previously I was using lists, but using sets prevents dupes to be appended by default;
>>> my_numbers = {}
>>> my_numbers['first'] = [1,2,2,2,6,5]
>>> from collections import defaultdict
>>> final_list = defaultdict(set)
>>> for n in my_numbers['first']: final_list['test_first'].add(n)
...
>>> final_list['test_first']
set([1, 2, 5, 6])
As you can see, the final output is a deduped set, as required.
No comments:
Post a Comment