[Assignment] - dictionnary based numbers compression | Sololearn: Learn to code for FREE!
New course! Every coder should learn Generative AI!
Try a free lesson
+ 2

[Assignment] - dictionnary based numbers compression

You want to compress a set of numbers s=[s0,s1,...sn-1] to k numbers [v0...vk-1] with n>k. Each number in s will be replaced with the nearest of the numbers in v. ie si will be replaced by vi such that min(abs(si-vp for p in 0..k-1))=abs(si-vi) To maximize the quality of the compression, you will try to minimize the squared error, ie Sum for i in [0..n-1] (min(Si-vj)^2 for j in [0...k-1]) Program the function compress(s,k) returning v0...vk-1

10th Jun 2018, 3:19 PM
VcC
VcC - avatar
17 Answers
+ 3
yeah, I was pondering if I write a generic n-dimensional solution or if I stick to one dimension and get some optimization on the distance calculation, ordering and averaging. Your proposal is to stick in 1D, well! Let's try it.
14th Jun 2018, 6:24 AM
Edward
Edward - avatar
+ 3
The week was full, so only today I had some time to work on it. https://code.sololearn.com/cvv8C7XYd7sJ/?ref=app
16th Jun 2018, 6:43 AM
Edward
Edward - avatar
+ 3
Edward VcC I've found that by choosing the initial values based on the largest gaps you get an error that is smaller in most cases. Edit: In some cases! https://code.sololearn.com/cyNv1QxMoawy/#py
23rd Jun 2018, 7:27 AM
Louis
Louis - avatar
+ 2
I tried a few things, but I am not close to where you and Edward are. Started having a look at K-means. Pythons graphics limitations are frustrating. Found this nice online sample. check it out https://miguelmota.com/blog/k-means-clustering-in-javascript/demo/
16th Jun 2018, 4:55 PM
Louis
Louis - avatar
+ 2
VcC what I could figure out playing around with k-means in octave is that in well defined problems, where k has the size of well grouped points, the algorithm is really good. In ill defined problems like k 1 or 2 bigger than the points groups the algorithm gets very sensitive to initial conditions, not necessarily delivering the global minimum. If you want I can send the code from octave (i.e. Matlab) so you can play with it. Louis for you as well.
18th Jun 2018, 8:56 PM
Edward
Edward - avatar
+ 2
interesting !
23rd Jun 2018, 9:13 AM
VcC
VcC - avatar
+ 1
You sure are !
13th Jun 2018, 7:02 PM
VcC
VcC - avatar
+ 1
Edward kmeans for dimension 1 can be optimized versus the naive version i propose.
14th Jun 2018, 5:49 AM
VcC
VcC - avatar
+ 1
Edward what is funny is that when you compare kmean with the exact solution given by my codes you often have differences even for small sets...
16th Jun 2018, 7:31 AM
VcC
VcC - avatar
+ 1
yes, Kmeans does not guarantee the optimum solution and is VERY dependent on the inicial choice of starting points. But Kmean implementations are relatively fast and can do a job that is 99,99% good at a fraction of the time. Specially at large sets (above many thousend) kmeans tends to perform very well. In your program this can be seen.
16th Jun 2018, 12:16 PM
Edward
Edward - avatar
+ 1
Additionay, Kmeans is developed to separate unlabeled data in categories and not to express the data in a mathematical precise compressed form.
16th Jun 2018, 12:23 PM
Edward
Edward - avatar
+ 1
Edward kmeans can be up to 20% more distance than the optimal. Agree it is fast but there are cases where it gives local minimums which are fnot so close from the optimal.
16th Jun 2018, 5:17 PM
VcC
VcC - avatar
+ 1
Edward the problem stated here is exactly what kmeans are made for- reduce n points to k centroids while minimizing the squared distance
17th Jun 2018, 6:03 AM
VcC
VcC - avatar
0
cc Louis Cépagrave Nevfy LukArToDo Edward any proposition ?
13th Jun 2018, 2:01 PM
VcC
VcC - avatar
0
Thanks VcC. I'm not sure if I'm clever enough, but I'll give it a go.
13th Jun 2018, 2:51 PM
Louis
Louis - avatar
0
VcC I'm assuming that Si is a list of floats you're trying to convert to integer while minimizing the error. The list Vi has same number of elements as Si (k). I do not recognize the issue...my statement has a mathematical closed form (quite simple), so maybe I'm missing your point here. [edit] I cheated and looked to your code in order to understand what you mean ... so list S has n elements and list V has k elements, and n>=k. Since I cheated (and I knew the optimized version of the algorithm because I took the class, can be solved with 3 or 4 lines of NumPy within a loop) I consider myself an "outrunner" not participating in the contest.
13th Jun 2018, 8:46 PM
Edward
Edward - avatar