0

Mobile number and similarity

There is a very huge database and every user only inserts into system using phone number (for simplicity, we need to provide only 10 digit number). As soon as user registered, we have to count how Many odd digits are there and how many even digits are there in phone number. Requirement is to notify new user saying that your phone number has similarity with X users (X is number of people whose phone number have same odd and even digits count). What data structure is to be used here ? I would go with map of int,int as key will be no of odd digits count and corresponding user count will be value of map. Any other data structure will be good here ? This map solution seems to me a constant space Requirement and any thoughts please ? P.s. : do not go into database please. Requirement is from C++ and In memory storage of data.

17th Jun 2025, 5:22 PM
Ketan Lalcheta
Ketan Lalcheta - avatar
4 Answers
+ 2
A simple array of 11 (int) or (long) elements should work, representing 0 through 10 count of odd digits. How to use it: Initialize the array elements to zero. As each phone number is entered, count odd digits. Use the count as an array index. At that index increment the array value. Return the array value as the number of users having that same count of odd digits. The count of even digits may be calculated simply as 10 - odd count.
17th Jun 2025, 5:48 PM
Brian
Brian - avatar
0
can't you use char instead? it's smaller than int. and you really only need to store the odd numbers count OR the even numbers count. storing both seems redundant if the phone number is always 10 digit so odd/even can be calculated from 10-n. that issue aside, odd/even count seems like a bad metric for determining similarity. i could suggest computing the Levenshtein distance instead but your condition is to present to the user how many similar numbers there are in the database. computing a phone number's similarity with every number in the database and tallying the results would be very slow. if the database is being constructed from scratch, encoding the data as they are stored and storing them in an ordered manner would greatly speed up the retrieval. you can build up a graph of similar numbers and use that as a metric for comparison. you would need to have that system running in the background updating the statistics. https://sololearn.com/compiler-playground/clbIyg73905r/?ref=app
18th Jun 2025, 1:45 AM
Bob_Li
Bob_Li - avatar
0
Bob_Li using char type would not work for the array that I described. In my design the odd (or even) counts are not stored, but rather used as the index to look up how many records match. E.g., int similarity[11] {}; size_t odd_count; // loop // get phone number // count odd digits cout << ++similiarity[odd_count]; Each element must be able to hold potentially up to the total number of records in this "very huge database".
18th Jun 2025, 2:39 AM
Brian
Brian - avatar
0
Brian I was thinking of Ketan Lalcheta's idea of using map, but instead of (int, int), it would be (int, char) instead, since int feels too big just to store a count of 0 to 10. edit: reading the question again, it would seem like I totally misunderstood the description. oops... so the second int is not the odd or even digits count but the number of 'similar' phone numbers? wouldn't everyone of these need to be updated every time a phone number is added, changed or deleted? but there's still the odd/even as a measure of similarity. it doesn't make sense... counting even digits: 2222222222 and 8888888888 1234567890 and 9034512687 would come up as similar because they have the same even digit count
18th Jun 2025, 3:10 AM
Bob_Li
Bob_Li - avatar