On the perils of time complexity
This is yet another post on my not-long-not-short experience in CS pedagogy. There are many things in our intro CS education that are IMO overemphasized, and one of them is time complexity. We have conditioned generations of students into assessing the efficiency (and thus, to a great extent, the quality) of an algorithm by its time complexity, and failed to acknowledge other real-world factors that are just as, if not more, important.
Time complexity is taught almost as soon as one has any grasp of algorithms. One's first dabble in data structures or sorting algorithms is immediately followed by the introduction of Big-O notation, and each new algorithm introduced needs to go under the scrutiny of time complexity analysis. We know that "linked lists have insertion time while arrays have "; we know that "quicksort has average time complexity while bubble sort has "; we know that "hash tables have lookup time while binary search trees have ". Therefore, many people, either spontaneously or after being conditioned, jump to the conclusion that "write linked lists instead of arrays when updates are frequent", "write quicksort instead of bubble sort", "write hash tables instead of binary search trees", etc. Many realize the nuance when they become more seasoned engineers, but others are forever stuck in the mindset of "time complexity is efficiency, and efficiency is quality". Here are some of said nuances that are often overlooked:
- Data is often small enough that the time doesn't matter anyway. Sure, if you are designing a data structure for storing a billion users, you'd better make it as performant as possible. But here's my empirical conjecture: most of the data structures you deal with will hold no more than 10,000 items, 100,000 at best. At the scale of a few hundred or thousand items, any algorithm below will finish within milliseconds. In such cases, you should prioritize readability and simplicity over performance.
- Even when the time is not completely negligible, the difference in time complexity may not translate to a significant difference in actual runtime. The hidden constants and lower-order terms are often overlooked in theoretical analysis, but bear significant weight in practice. The example I always cite is linked lists—while linked lists have beautiful insertion time and traversal time, the cache-unfriendliness means that each sequential access can easily be 1000x slower than an array (which hits the CPU cache), which you can only make up on the asymptotic end.
- Your runtime may be dwarfed by other parts of the system. Congratulations on optimizing your scheduling queue from 100ms to 20ms! But if your database query takes 600ms, your API call takes 800ms, and your rendering takes 300ms, then overall you only improved your system by less than 5%. Even worse, if your component is part of a slow pipeline, it may simply starve instead of yielding any improvement at all.
- Other engineering concerns—compiler optimizability, extra allocation, sorting stability, edge-case performance, concurrency, etc.—are often overlooked in introductory courses, but are just as important as time complexity in real-world applications. What looks elegant on paper may actually be gnarly in practice.
I specifically have a bone to pick with linked lists. I don't know why it continues to be a staple in CS education, but in my opinion, it has use cases about as niche as Fibonacci heaps, tries, or order statistic trees, despite the nice theoretical guarantees. Case in point, in my 7 years of AoC experience, I've happened to need linked lists exactly once, but also order statistic trees exactly once, and AoC is already supposed to be designed in a way that elicits the most arcane data structures and algorithms. On the other hand, it practically goes against all other principles of good data structure design: avoid frequent allocations/deallocations, keep memory together, stay closer to compiler primitives, etc. The preface of the book Learn Rust With Entirely Too Many Linked Lists says it best:
We should all as a community say no to linked lists as a "standard" data structure. It's a fine data structure with several great use cases, but those use cases are exceptional, not common.
Followed by a detailed analysis of its pitfalls and the few cases where its use case is warranted. Assuming we are not sitting in an advanced systems programming class, I believe that the only reason for linked lists to continue to be taught is to demonstrate pointers and dynamic memory allocation, as well as to paint the idea that you can have multiple ways to do the same thing with tradeoffs—but there are much more viable alternatives, such as tree maps vs. hash maps since trees also involve pointer chasing.
Finally, the emphasis on time complexity instills a real allergy towards so-called "NP-hard" algorithms. In my time spent on r/adventofcode, one of the common questions I see is "How do I do this? This looks like an NP-hard problem!" to which the response is often "NP will run reasonably fine for the input size of this problem". Indeed, while things like TSP and LP are NP-hard, heuristics, approximations, and special cases of these problems can often be solved efficiently in practice. Hence the following quote often cited in discussions of SAT solvers:
NP is the new P!
This is the opposite kind of tension between engineering and theory: when theory dismisses certain problems as intractable or infeasible, engineering is able to find practical solutions for real-world-relevant subsets of these problems. Nevertheless, the stigma around NP-hard problems demoralizes the unprimed before they even give it a try.
In conclusion, I think our introductory CS classes are painting the wrong picture of basic data structures and algorithms. If we want CS to be practical and applicable and not just remain as a few diagrams and formulae, we need to take a step forward and focus on the engineering realities. As I've said before: a principled, holistic, and nuanced understanding of various technical tradeoffs is much more important than the knowledge of factoids and implementations.