How to implement effective way of set_intersection for duplicate values | Sololearn: Learn to code for FREE!


How to implement effective way of set_intersection for duplicate values

Hi All I tried to implement the better way of finding set_intersection for non sorted unique elements of the input... I am stuck at how can I make it work for duplicate values as well.. please throw some light on how to achieve the same ? More details is present as code comment for below code:

7/23/2020 4:59:00 AM

Ketan Lalcheta

5 Answers

New Answer


Oh! Like, a multiset intersection or so. Hmm, how about this: keep a `std::unordered_map<T, std::tuple<int, int>>` where all the keys are your vector elements and the values are tuples, where the first element is the number of occurrences in list `a` and the second element is the number of occurrences in list `b`. You can build that map in O(n). Then, filter list `b`. Foreach `x` in `b`: Look `x` up in the map. Are both elements in the tuple > 0? If no, throw `x` away. If yes, keep `x` and decrement both elements in the tuple by 1. Should be O(n)!


After you intersect you could filter your vector by running through it once and putting each element into a new set. Unless the element is already present, then you found a duplicate. You can even do that in the same single loop. Also I don't believe that's really O(n). It's O(n) with a big asterisk, as inserting into a set is only O(1) in the average case, with O(n) in the worst case so you have O(n²) worst case performance. Also the sort-and-remove algorithm is so simple that it probably works better for average sized arrays anyway. I guess you'd have to test it.


Schindlabua true that algorithm function are almost most of the time great compared to the custom developed... however , that set_intersection misplace output sequence input need to be sorted before passing to function..... So yes , it's always almost a better choice to use already available untill we are not sure about fully custom developed. I am sorry but I could not understand your first para... So , request you to elaborate on how to achieve custom developed algorithm for the finding of common elements when we have duplicates in the input... I feel I have mis articulated the problem... Let me try with example: input a = {10,2,3,5,10,5,6,9} input b = {5,5,9,7,10}should have answer as {5,5,9,10} for common elements and many thanks for your help 👍


Thinking about it you only need to count how many `a`s there are. So scratch the tuple and make it a unordered_map<T, int>! That's almost what you are already doing in your first example anyway.


Perfect 👍 thanks Schindlabua