> technical > Algorithms

Computer Science all the way down

They would take their software out and race it in the black desert of the electronic night.

Data Structures

Storing state in a domain specific manner can greatly increase performance.

There are 2 base structures that all (more abstract) data structures are implemented with: arrays and pointers. Contiguous vs. Linked.

Array

[1, 2, 3, 4]

Linked List

(1) -> (2) -> (3)

Abstractions

Data Structures have a small interface of methods (e.g. insert, delete, search) which usually are pretty similar with each other. The differences are in the time complexities of each. Data Structures fall into one of two categories

  1. Containers

storage and retrieval of items independent of contents

  1. Dictionaries

access to items by contents (key/value)

Heap

A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.

A heap is a tree with the property that each node is the minimum-valued node in its subtree.

KD-Trees

K Dimensional binary search tree

Big O

Rough estimates of algorithms to turn them into “black box” tools. For example, I know I am going to be doing a lot of inserts, which algorithm is the best for that?

big O notation is used to classify algorithms according to how their run time (time complexity) or space requirements (space complexity) grow as the input size grows

Amortized analysis is used for algo­rithms that have expensive opera­tions that happen only rarely

General Hierarchy

From least to most dominant

O(1) Constant

O(logn) Logarithmic

O(n) Linear

O(nlogn) Superlinear

O(n^2) Quadratic

O(n^3) Cubic

O(n^c) Polynomial

O(c^n) Exponential

O(n!) Factorial

Sorting

Sorting is the basis of a lot of algorithms which don’t obviously involve sorting upfront. Its a tool.

The best case time complexity scenario (lower bound) for any sorting is O(n*log(n)) because we have to do pairwise comparisons on the n elements.

Insertion Sort

Heapsort

Mergesort

Quicksort

Divide and Conquer

Master Theorem for recurrence relation T(n) = a * T(n / b) + f(n)

This makes a lot more sense when thinking about the tree of work created when you do something like mergesort. The weird part is that a doesn’t have to equal b.

Without f(n), the Big Oh is O(n^logb(a))

When we add f(n), there are 3 “shortcut” cases that can be used to quickly find the Big Oh. These do not cover all possible scenarios, just 3.

To find which case a scenario fits in, compare O(f(n)) with O(n^loga(b))

Possibilities:

1. A[low] <  A[i] <  A[high]
2. A[low] <= A[i] <  A[high]
3. A[low] <  A[i] <= A[high]
4. A[low] <= A[i] <= A[high]

2 Implementation:

def binarySearch(A, X):
    low, high = 0, len(A)
    while low < high:
        i = low + (high - low) // 2
        if X == A[i]:
            return i
        elif X > A[i]:
            low = i + 1
        else: # X < A[i]
            high = i
    return -1

When processing sequences (array, list, et cetera), always use half-open ranges, where the first index is the first element in the range, and the second index is one past the last element in the range. This has many advantages down the line and makes many operations very simple. Splitting one range in two means simply picking a single pivot and using that in both of the new ranges: [x, y) -> [x, pivot), [pivot, y). An empty range is one where both indices are the same: [x, x). The count of elements is a single subtraction: y – x.

Minwise Hashing

Neat hashing trick based on probability.

How can we test if 2 documents are similar? Hash all the words in doc 1 and select the one with the smallest hash (this is pretty much a random selection, but deterministic). Now hash all the words in doc 2. If they have the same minhash, chances are, they are very similar. Can store a handful of values for more confidence. Useful when comparing a lot of things (in this case, documents).

Graphs

BFS

DFS

Weighted

Weighted graphs are more complex, but have some standard algorithms

Minimum Spanning Tree (MST) // Prim’s

Shortest Path // Dijkstra’s

Backtracking

Dynamic Programming

memoization vs. tabulation

Both tactics can be used in dynamic programming.

P vs. NP

NP-Complete

if you can reduce one problem to another (e.g. problem A is at-least as hard as problem B, and I know problem B is hard as shit) can make a call if its worth exploring

“NP Complete” == problem is at least as hard as NP set of problems

Satisfiable

Met this in real life with dependency version calculations in Gradle