0
General guidelines about how to improve performance
When I participate in the assignments here I sometimes recognize that, while doing basically the same, my program does it better or worse than others (quicker or slower, higher or lower numbers). Often the reason will lie in the details of the code, the language and the environment; but I'm still wondering: Are there general guidelines or even checklists, what sort of code leads to better or worse performance and how to improve on that? What's your personal tricks to increase efficiency and effectiveness?
4 Answers
+ 2
0. Worry about performance last!
Making code fast usually takes time and isn't that important. It'll eat up your development time..
If you are going to optimize, optimize only the "hot" parts of your code, those parts that run over and over.
1. Profiling, profiling..
Performance issues can pop up anywhere so testing and fixing and testing is the only way to get rid of them. That said there are a few guidelines and tricks:
2. Big O
You can do a rough performance estimate using big O notation. I happen to have a write-up about that, though it needs some reworking:
https://code.sololearn.com/WS8SBRFRVxcn/?ref=app
Long story short: Avoid nested loops and use the right data structures. If your code is Î(nÂł) no amount of performance tricks will help.
(continued in comment 2)
+ 2
4. Profiling...
Performance issues come in all sorts of unexpected forms. Here's a really famous StackOverflow answer: https://stackoverflow.com/questions/11227809/why-is-it-faster-to-process-a-sorted-array-than-an-unsorted-array
Cache locality and Big O are the big ones to know (maybe SIMD and multithreading aswell), for everything else seeing what works and what doesn't will do the trick.
+ 1
for python I use the timeit function to measure process time, especially when I am debating different variations of code implementation. Obviously the faster the better, but I like to set the number argument to 1,000,000. This removes operating system interference and it gives a more accurate idea of which method/function is more efficient... at least for me
+ 1
3. Understanding the cache
Your CPU has a couple of different caches where it will store values that are being used a lot, because fetching things from RAM is 100 times as slow as the cache.
A way to put it is that your "inner loops" should only work on a few values if possible, as to not produce cache misses.
3.1 Cache locality
When you produce a cache miss and the CPU has to load things from RAM into the cache, it will typically grab 64 bytes with one go. So if you only use 1 byte of those 64 then that's a waste. A very real example is when you are looping through 2D arrays or images:
for(int x = 0; x < width; x++){
for(int y = 0; y < height; y++){
image[y * width + x] = ...
}
}
This looks ok, but images are usually stored row after row; so y being the inner loop means that you will jump rows and produce a cache miss on every single iteration.
for(int y = 0; y < height; y++){
for(int x = 0; x < width; x++){
}
}
This works on pixels one after the other and is much faster.
(3)