Has anyone had a "real world" need to evaluate, classify, and validate an algorithm's complexity using Big O notation?
After 22+ years of software development across various tech stacks and platforms throughout a wide range of application architectures, I've not once been faced with a situation where I needed to evaluate the performance tradeoffs with accuracy between the explicit Big Os of two algorithms, like say... O(n^2) and O(n log n). Nor have I had to explicitly qualify an algorithm as O(n) vs O(n^2) to know which of the two was more optimized. (Thank goodness for that.) That's not to say I've not evaluated algorithms to optimize them. I've just never needed to go all Big O to make that determination. So... outside of academia, I'm curious if anyone has had a real world need to evaluate, classify, and validate an algorithm's complexity using Big O notation?
11/21/2018 1:48:20 AMDavid Carroll
21 AnswersNew Answer
I predict the answer from everyone will be a Big (n)O.
thanks for sharing your experience about O. I secretly thought it was a waste of time but I had no proof. Your 22 years experience is a major proof.
i agree with haris though, it makes sense. let s say you have large database of clients, let s imagine you have database for healthcare data of all citizens of the USA, sorting and searching algorithm will certainly matter. working with db that large would be a gold mine but less frequent than usual opportunity
Jay Matthews , Sreejith You use differential equations & calculus every day without realizing it. ☺ Some examples: Ever heard of a differential in a car transmission? https://en.m.wikipedia.org/wiki/Differential_(mechanical_device) https://en.m.wikipedia.org/wiki/Differential_analyser Also for games that have realistic physics engines you 10000000000000% will need calculus & differential equations. Application of topology & differential equations for neural networks: http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
Sreejith - Twice in my career I needed integration to solve programming problems: once in medical imaging, and once for a control system. I did the integrations by hand and used the answers in the programs. Many programmers are doing discrete calculus without realizing it when they implement loops with incremental calculations. I have known Middle School neophyte programmers who implemented PID control code for robots, and with no calculus training. I never used Laplace Transform outside of class, though I have an ardent dream to invent the equivalent for software systems analysis. Fourier Transform and Taylor's Theorem have been useful on occasion - again in medical imaging and process control. I toyed with both for stock market analysis, too. Yet, in all the optimizing I have done there has been no call to use Big O(), which does not solve problems. It only quantifies what developers already do intuitively.
Algorithm efficiency matters when handling big data. Sorting algorithms are a very good example of real world need for big O. E.g. you need a list of 1000000 customers sorted by some criteria, lets say alphabetically. You can sort this case by case which will take a big amount of time. Or you can sort this with "smarter/faster" algos. ☺
I have found I have time complexity to be the most important when I haven't thought about it in advance. If something runs slower than expected then it's something to think about. I'm not sure if you would count my examples. I designed and implemented a compression and decompression algorithm. I couldn't see any record of anyone trying what I had done. Turns out my compression algorithm is O(n^2) when every one in use is linear. It did technically pass my originally design specs, as I was optimising for compression size and DEcompression time. I was working on a potential alternative to AES encryption. (More of a wild idea than something production ready.) I tried to code a rush test version, something simpler than the real algorithm. It was very slow if the message was more than 16k, and unusable if more than 64k. Turns out that algorithm was exponential! There was a simple fix though.
One of many experiences for me (many moons ago) involved a preprocessor pipeline component I implemented for splitting up massive flat files, ranging in size from 1GB to 10TB, into much smaller chunks to be asynchronously processed in parallel and then reassembling the completion status for each record across all those chunks back into a single acknowledgement file within 24 hours. Prior to me coming into this particular task, the process was taking days, maybe weeks to complete and would often error out. The massive files were being dropped from various mainframe archive dumps to be on-boarded into another system. Fun times. At no point was Big O notation and analysis required in this effort.
I've, also, done my fair share of big data related development as well. Big data optimizations for me involved the reshaping of data with ETL tooling and processes into denormalized multi-dimensional data stores where the intersection of slices are defined and used to quickly generate reports. Performance optimizations centered around reducing the time delay for real time data to be reflected in BI reports. Again, no Big O analysis involved. These are just two examples where I would have assumed Big O to play a role or at least have some value in real world development. There is no question that I'm always evaluating algorithm optimizations by reviewing execution paths that might be superfluous or could be short-circuited. However, this isn't remotely close to using Big O in a real world application.
Console Indeed... I do agree that Big O notation can be useful for testing efficiency of algorithms. However, it just occurred to me that in the many years of reading countless technical articles, SDKs, release notes, and performance benchmarks of various algorithms used in numerous libraries, I don't recall Big O notation being used to qualify their differences. It's highly possible I'm reading the wrong articles or somehow glossed over those sections where Big O was emphasized. However, I suspect I'm not the only one noticing the absence of Big O outside of academia and esoteric white papers published by academic types. Rather, the real world seems to lean toward and value actual performance metrics in time, space, accuracy, capacity, cycles, etc. Then, this data is easily comprehended and burned into our memory with visualized benchmark tables, charts, and the like. Perhaps, this is just one of the many disconnects between theoritical academia and the real world.
Did you ever use integration or differentiation or laplace for developing any software? Especially laplace transformation
LOL... Brian You get the prize until I have a legit use case submitted. Thanks for that. Sreejith I can't say that I've ever implemented any logic involving calculus. I suspect there may be specialized/niche applications where Big O assessment is common. Examples might include building any of the following: - a 3D game engine or anything interfacing directly with GPUs - an engine for traversing a node graph to isolate the optimal path based on a set of conditions - an engine for encoding video - an algorithm for optimizing predictive analysis based on a high volume of data points for something like predicting consumer behavior based on internet activity - an algorithm for decoding and cryptanalysis - an engine for simulating complex calculations like synthetic benchmark tools Regardless... I'm only speculating and unsure if, even in these scenarios, Big O analysis was a common practice in these edge cases.
Thanks to everyone who have chimed in so far. If nothing else, this question managed to flush out a few other experienced developers not previously on my radar and hear from a few of the regulars who make this community so great. 😉 It's nice to meet you Jared Bird and Brian. And as always, it's great to see responses from Paul, Jay Matthews, and Haris. 🤙 It looks like many of us could swap some interesting stories regarding the critical need for algorithm optimizations from our past experiences.
Console I totally get what you're saying. However, have you, personally, utilized Big O to optimize an algorithm for a real world purpose - used outside of an academic context. If so, I'd love to hear about it. Otherwise, we both agree what Big O should be good for. However, surprisingly, it's heavily underutilized in the real world.
Haris I think I just got smarter by reading that article on Neural Networks Manifolds Topology. Or... did I get dumber. I can't tell. I'm fighting off a cold. 🤓 Thanks for the link.
David Carroll My pleasure. I hope I helped you to get smarter (even though you already are plenty) Get well soon! ☺
I have some more applications: routing, scanning, financial processes. In these cases I've worked on projects where it was necessary to calculate this before production, because the corresponding test environments would have been too large and expensive (update: or just impossible to get). And since it's obviously not possible to scale this linear, it's necessary to calculate before relying on the good intuition of dozens of developers not to crash the production. And additionally: It's very useful to calculate the required hardware environment before go live.
i have had quite some knowledge on Big Os and coupled with my mathematical experience, it's a logical method of testing the performance of algorithms. most of the powerful stl algorithms in c++ use the Big O notation to test the efficiency of the algorithms. For my years in c++ i haven't really tested an algorithm with the Big O but in case any algorithm designer cares about efficiency in his algorithm, then the Big O will definitely help you.
David Carroll yeah but the Big O still plays a role even in real world problems. considering the property of time, I'll say efficiency is a function of time and speed is also a function of time too. efficiency and speed are the properties the Big O focuses on. so imagine you write a searching algorithm that needs to search data in a large data base, I'll really appreciate speed and efficiency of the algorithm. in general what I'm trying to say is; the Big O is a very important concept for designers of algorithms that take on Big Data.
I do back-of-the-napkin estimations all the time. It's pretty easy to O(n²) yourself. Say, you are doing LINQ queries in a loop. I try to keep track of that as I go and if I can't fix it immediately I'll at least look at it later. You probably do it aswell but you don't call it Big O. But Big O is really not an exact science. At least as far as coding is concerned, most of the time.