# Algorithms

They would take their software out and race it in the black desert of the electronic night.

## Data Structures

Storing state in a domain specific manner can greatly increase performance.

There are 2 base structures that all (more abstract) data structures are implemented with: **arrays** and **pointers**. Contiguous vs. Linked.

### Array

`[1, 2, 3, 4]`

### Linked List

`(1) -> (2) -> (3)`

- cycle detection through Floyd’s (fast and slow pointers)

### Abstractions

Data Structures have a small interface of methods (e.g. insert, delete, search) which usually are pretty similar with each other. The differences are in the time complexities of each. Data Structures fall into one of two categories

**Containers**

storage and retrieval of items independent of contents

*Stack/Queue*- wrappers around linked lists (could use arrays too I guess)
- aim to have O(1) insert, read operations.

**Dictionaries**

access to items by contents (key/value)

*Trees*(Tree, Binary Tree, Binary Search Tree (sorted), Red-Black Tree)*Hash Tables*- amortized constant-time
- reduce the look up time from O(n) to O(1) by trading space for speed

#### Heap

A heap is a useful data structure when it is necessary to repeatedly remove the object with the highest (or lowest) priority.

- a specialized dictionary, restricted compared to a tree, can only return min or max, but fast insert/search for specifc cases
- heap insert is O(1) (vs. BST O(log n)) assuming resize latencies are amortized
- dictionary is not used for total order, just the parent <=> children max/min relationship (a.k.a. dominance)
- common way to implement a
*priority queue*

A heap is a tree with the property that each node is the minimum-valued node in its subtree.

#### KD-Trees

K Dimensional binary search tree

- each node contains k number of coordinates, one for each dimension
- each tree level acts on a different dimension (e.g. 2D would switch off from x and y axis)
- nice in small dimension scenarios because can quickly prune data

## Big O

Rough estimates of algorithms to turn them into “black box” tools. For example, I know I am going to be doing a lot of inserts, which algorithm is the best for that?

big O notation is used to classify algorithms according to how their run time (

timecomplexity) or space requirements (spacecomplexity) grow as the input size grows

- multiplication in Big O is for when an operation is ran per operation (inner for loop)
- log is usually base 2 since its usually used in “divide and conquer” scenarios, but the log base
*does not matter*as far as Big Oh is concerned

Amortizedanalysis is used for algorithms that have expensive operations that happen only rarely

- re-sizing a dynamic array doubles the size of the array

### General Hierarchy

From least to most dominant

**O(1) Constant**

- point lookups, no dependency on
`n`

**O(logn) Logarithmic**

- grows slowly with
`n`

- shows up a lot in trees (binary search)

**O(n) Linear**

- cost of looking at each
`n`

x number of times

**O(nlogn) Superlinear**

- grows a little faster than linear
- shows up in sort a lot: “for each element perform sort”

**O(n^2) Quadratic**

- looking at pairs of items in n elements
- nested loops
- special case polynomial of 2

**O(n^3) Cubic**

- looking at triples of items in n elements
- nested loops
- special case polynomial of 3

**O(n^c) Polynomial**

- the
`P`

of`NP`

**O(c^n) Exponential**

- when
`n`

is the exponent - exponentiation is repeated multiplication (similar to multiplication being repeated addition)
- Power Set: finding all the subsets on a set. (
`2^n`

)- every bump in n causes x2 new subsets
- finding any combination length
`C(n,k) = n Choose k = n! / ( (n-k)! * k!)`

has an upper bound of`2^n`

- recursive-fibonacci
`F(n) = F(n-1) + F(n-2)`

- to solve for F(n), need 2 calculations, which each require 2 calculations…

**O(n!) Factorial**

- `n * (n-1) * (n-2) * … * 1
- find all permutations in an array
- traveling salesman brute force

## Sorting

Sorting is the basis of a lot of algorithms which don’t obviously involve sorting upfront. Its a tool.

The best case time complexity scenario (lower bound) for any sorting is `O(n*log(n))`

because we have to do pairwise comparisons on the `n`

elements.

### Insertion Sort

- very simple O(n^2) algorithm to compare to
- split array into sorted/unsorted (sorted with just the first element at the start)
- roll over every element in unsorted and swap it with the element where it belongs in sorted (n x n)

### Heapsort

- Shove everything into a (min/max) heap and then pull them out one by one into a sorted list
- heaps take O(n*logn) to construct (superlinear) and then linear to pull out all the n elements
- doesn’t require an auxiliary buffer array…but still have to put things in the heap

### Mergesort

- recursive approach
- split array in two, sort each array and then merge together; base case is when array size is < 2
- great for sorting linked lists cause does not rely on random access
- if used on arrays, it needs a 3rd array buffer to store the merged arrays

### Quicksort

- recursive approach
- pivot and partition the array elements instead of splitting and merging (mergesort)
- has an O(n^2) worst case if elements are pre-ordered, but doesn’t happen much with random data and can be avoided by randomizing the input

## Divide and Conquer

- mergesort and binary search are the best examples
- can turn
`O(n^2)`

situations into`O(n*log(n))`

or`O(log(n))`

- often involves some sort of recursion which can usually be described with the abstract Master Theorem

**Master Theorem** for recurrence relation
`T(n) = a * T(n / b) + f(n)`

This makes a lot more sense when thinking about the tree of work created when you do something like mergesort. The weird part is that `a`

doesn’t have to equal `b`

.

Without `f(n)`

, the Big Oh is `O(n^logb(a))`

When we add `f(n)`

, there are 3 “shortcut” cases that can be used to quickly find the Big Oh. These do not cover all possible scenarios, just 3.

- Case 1:
`f(n)`

is asymptotically less =>`O(n^logb(a))`

- Case 2:
`f(n)`

is asymptotically the same =>`O(n^logb(a) * log(n))`

- the work for f(n) is done on every “step” aka height of tree? Not sure why its not the same base, maybe cause base doesn’t matter in Big Oh?

- Case 3:
`f(n)`

is asymptotically more =>`O(f(n))`

To find which case a scenario fits in, compare O(f(n)) with O(n^loga(b))

### Binary Search

- split loop invariant can get confusing with 2 pointers (left and right)

Possibilities:

```
1. A[low] < A[i] < A[high]
2. A[low] <= A[i] < A[high]
3. A[low] < A[i] <= A[high]
4. A[low] <= A[i] <= A[high]
```

- 2 and 4 are the most commonly implemented and 2 should be preferred for simplicity
- the invariant must remain true at beginning and end of loop

2 Implementation:

```
def binarySearch(A, X):
low, high = 0, len(A)
while low < high:
i = low + (high - low) // 2
if X == A[i]:
return i
elif X > A[i]:
low = i + 1
else: # X < A[i]
high = i
return -1
```

When processing sequences (array, list, et cetera), always use half-open ranges, where the first index is the first element in the range, and the second index is one past the last element in the range. This has many advantages down the line and makes many operations very simple. Splitting one range in two means simply picking a single pivot and using that in both of the new ranges: [x, y) -> [x, pivot), [pivot, y). An empty range is one where both indices are the same: [x, x). The count of elements is a single subtraction: y – x.

## Minwise Hashing

Neat hashing trick based on probability.

How can we test if 2 documents are similar? Hash all the words in doc 1 and select the one with the smallest hash (this is pretty much a random selection, but deterministic). Now hash all the words in doc 2. If they have the same *minhash*, chances are, they are very similar. Can store a handful of values for more confidence. Useful when comparing a lot of things (in this case, documents).

## Graphs

- BFS and DFS create tree’s over a general graph
- BFS uses a queue (first in first out)
- DFS uses a stack (last in first out) which can be implemented with recursion

### BFS

- if you hang on to the parent as you BFS, you can easily find the shortest path from any vertex
- can “short circuit” on finding shortest path to a vertex

### DFS

- DFS is a little more complex than BFS, but has a nice side effect of organizing edges in a precise way
*preorder*,*inorder*,*postorder*define when recursion occurs- root, left, right -> explore the roots before inspecting any leaves
- left, root, right -> visualize a tree
- left, right, root -> explore all the leaves before any nodes, deleting a tree

*topological*sorting (a.k.a. DAGs) use DFS

### Weighted

Weighted graphs are more complex, but have some standard algorithms

#### Minimum Spanning Tree (MST) // Prim’s

- MST is a tree that covers a graphs vertices with smallest sum of edges with no cycles
- You know the whole graph when you start Prim’s, its not “discovery”
- Start at the root node
- Choose the least costly edge that touches the current tree
- Must not create a cycle

#### Shortest Path // Dijkstra’s

- vs. Prim’s
- Greedy choice is different compared to Prim’s
- Prim optimizes for over sum, where Dijkstra optimizes for path from root so they don’t always add the same edges to the tree

- keep track of parent so we can find path back to root

## Backtracking

- Use DFS to enumerate all possible solutions of a problem
- Use DFS over BFS for memory usage, BFS uses memory based on width of tree where DFS is based on height (height is almost always less)

- Builds solutions up like a tree
- enumerate all possible steps at each node instead of trying to visualize “path”

- Prune options when we know no more can be found down a branch

## Dynamic Programming

- A tradeoff of space for time
- Greedy algorithms make best
*local*decisions at each step, so are efficient, but not guaranteed to be right cause not looking at all solutions- local decisions do not have to look at entire set of data to make a call
- exhaustive search looks at all solutions, but can be slow

- Dynamic programming caches work which has been done and will probably be repeated to try and do best of both worlds
- Dynamic programming problems must obey
*principle of optimality*: partial solutions can be optimally extended given the*state*after the partial solution, instead of the specifics of the partial solution itself (can’t re-write a decision)

### memoization vs. tabulation

Both tactics can be used in dynamic programming.

*memoization*caches calculations on the fly, usually used with recursion*tabulation*caches calculations up front, usually “bottoms up” iteratively from a base case

## P vs. NP

**NP-Complete**

if you can

reduceone problem to another (e.g. problem A is at-least as hard as problem B, and I know problem B is hard as shit) can make a call if its worth exploring

“NP Complete” == problem is at least as hard as NP set of problems

*discovery*vs.*verification*can have different levels of difficulty, NP discovery problems can have P verification- the split of P vs NP is a kinda arbitrary level of complexity, similar to turing complete
**P**problems are polynomial-time algorithms**NP**problems can be verified in polynomial time, but no polynomial algorithm exists; can be solved via brute force aided by some heuristics- “nondeterministic polynomial time”.

- all NP problems can be boiled down to the same “hardness” (solving dependencies aka
*satisfiable*)- so if any NP could be solved in poly time it means they all could, but we haven’t figured that out yet

### Satisfiable

Met this in real life with dependency version calculations in Gradle