🧠 AI Computer Institute
Content is AI-generated for educational purposes. Verify critical information independently. A bharath.ai initiative.

GATE Computer Science Comprehensive Exam Guide

exam-prepGrades 129 sections

Discrete Mathematics Essentials

GATE Critical: Logic, sets, graphs, recurrence relations, combinatorics. 8-10 questions typical.

// Propositional Logic
Statements: True or False
Operators: AND (∧), OR (∨), NOT (¬), → (implies), ↔ (iff)
Tautology: Always true
Contradiction: Always false
Contingency: Sometimes true, sometimes false

Truth table for (p → q) ∨ (¬p ∧ q):
p | q | p→q | ¬p | ¬p∧q | (p→q)∨(¬p∧q)
T | T |  T  | F  |  F   |      T
T | F |  F  | F  |  F   |      F
F | T |  T  | T  |  T   |      T
F | F |  T  | T  |  F   |      T

Logical equivalences:
p → q ≡ ¬p ∨ q
¬(p ∧ q) ≡ ¬p ∨ ¬q (De Morgan)

// Set Theory
A ∪ B: Union (elements in A or B)
A ∩ B: Intersection (elements in both)
A - B: Difference (in A but not B)
A': Complement (not in A)
|A ∪ B| = |A| + |B| - |A ∩ B|

Power set P(A): All subsets
If |A| = n, |P(A)| = 2^n

Cartesian product A × B: All pairs (a,b)
|A × B| = |A| × |B|

// Permutations & Combinations
Permutations P(n,r) = n!/(n-r)!
  Ordered selections, order matters
  P(5,2) = 5×4 = 20

Combinations C(n,r) = n!/(r!(n-r)!)
  Unordered selections
  C(5,2) = 5×4/(2×1) = 10

Binomial theorem: (x+y)^n = Σ C(n,k)x^(n-k)y^k

// Recurrence Relations
Linear homogeneous: a_n = c1·a_(n-1) + c2·a_(n-2)
Solution method: Characteristic equation
r^2 - 2r - 3 = 0 → (r-3)(r+1)=0 → r=3 or -1
General solution: a_n = A·3^n + B·(-1)^n

Non-homogeneous: a_n = c1·a_(n-1) + f(n)
Particular solution for f(n) type
General solution = homogeneous + particular

// Graph Theory
Vertices V, Edges E
Degree: Number of edges at vertex
Complete graph K_n: All vertices connected
n vertices, n(n-1)/2 edges

Bipartite graph: Vertices split into 2 sets
No edges within same set
Complete bipartite K_{m,n}: m·n edges

Tree: n vertices, n-1 edges, connected, acyclic
Spanning tree: Subtree connecting all vertices
All spanning trees of graph have same edge count

// Shortest Path Formulas
Dijkstra: Non-negative weights, O((V+E)log V)
Floyd-Warshall: All pairs, O(V³), handles negative
Bellman-Ford: Handles negative, detects cycles, O(VE)

// Graph Coloring
Chromatic number χ(G): Minimum colors needed
Bipartite: χ = 2
Triangle: χ = 3
Complete graph K_n: χ = n
Planar graph: χ ≤ 4 (Four Color Theorem)

Data Structures & Complexity Analysis

GATE Essential: Advanced DS, amortized analysis, complexity proofs.

// Time Complexity Analysis

Asymptotic notations:
O(f(n)): Upper bound (worst case)
Ω(f(n)): Lower bound (best case)
Θ(f(n)): Tight bound (average case)
o(f(n)): Strictly less than
ω(f(n)): Strictly greater than

Definition: f(n) = O(g(n)) if ∃c,n₀: f(n) ≤ c·g(n) ∀n > n₀

// Algorithm Analysis Framework
Count dominant operations
Ignore constants and lower-order terms
T(n) = 2n² + 5n + 3 = O(n²)

Loop analysis:
for i=1 to n:
    for j=1 to n:
        op()
= n·n = O(n²)

Nested loop with dependency:
for i=1 to n:
    for j=i to n:
        op()
= 1+2+...+n = n(n+1)/2 = O(n²)

// Binary Search Tree Analysis
Balanced: O(log n) per operation
Worst case (skewed): O(n)
AVL tree: Guaranteed O(log n)
Red-Black tree: Guaranteed O(log n)

// Hashing
Hash function: Maps key → index
Collision handling:
  Linear probing: Check next slot, O(1/(1-λ))
  Chaining: List at each bucket, O(1+λ)
Load factor λ = n/m
Expected lookup time with chaining: O(1+λ)
Expected with open addressing: O(1/(1-λ))

// Heap Operations
Min-heap: Parent ≤ children
Max-heap: Parent ≥ children
Insert: O(log n)
Delete-min: O(log n)
Build heap: O(n)

Heap sort:
1. Build max-heap: O(n)
2. Extract max n times: O(n log n)
Total: O(n log n)

// Union-Find (Disjoint Set Union)
Operations:
- Make-set: O(1)
- Find: O(α(n)) with path compression
- Union: O(α(n))

α(n): Inverse Ackermann (practically constant)

// B-Tree (Database indexing)
Order m: Each node has m-1 to m keys
All leaves at same depth
Search, insert, delete: O(log_m n)
Used in databases for efficient I/O

// Trie (Prefix Tree)
Search: O(m) where m = length of key
Insert: O(m)
Delete: O(m)
Space: O(ALPHABET_SIZE × n)
Use: Autocomplete, IP routing

// Skip List (Probabilistic)
Search, insert, delete: O(log n) average
O(n) worst case but unlikely
Space: O(n)
Simpler than balanced trees

Sorting & Searching Algorithms

GATE asks: Algorithm comparison, stability, in-place property, tradeoffs.

// Sorting Algorithm Comparison
Algorithm      | Best     | Average    | Worst    | Space  | Stable | In-place
Bubble Sort    | O(n)     | O(n²)      | O(n²)    | O(1)   | Yes    | Yes
Selection      | O(n²)    | O(n²)      | O(n²)    | O(1)   | No     | Yes
Insertion      | O(n)     | O(n²)      | O(n²)    | O(1)   | Yes    | Yes
Merge Sort     | O(n ln n)| O(n ln n)  | O(n ln n)| O(n)   | Yes    | No
Quick Sort     | O(n ln n)| O(n ln n)  | O(n²)    | O(ln n)| No     | Yes
Heap Sort      | O(n ln n)| O(n ln n)  | O(n ln n)| O(1)   | No     | Yes
Counting       | O(n+k)   | O(n+k)     | O(n+k)   | O(k)   | Yes    | No
Radix Sort     | O(n·d)   | O(n·d)     | O(n·d)   | O(n+k) | Yes    | No

Stable: Equal elements maintain relative order
In-place: O(1) extra space (not counting input)

// Quick Sort Analysis
Worst case: O(n²) if pivot always min/max
Average: O(n log n)
Randomized pivot: Avoids worst case with high probability

Partition using Hoare's scheme:
Start left, right pointers from ends
Swap if both violating partition property
Reduces comparisons vs Lomuto scheme

// Merge Sort
Divide: Split into halves
Merge: Combine two sorted arrays in O(n)
Recurrence: T(n) = 2T(n/2) + O(n)
By Master Theorem: T(n) = O(n log n)
Always O(n log n), stable, requires O(n) space

// Heap Sort
Build max-heap: Sift-down for each element
Extract max n times: Swap root with last, sift-down
Total comparisons: O(n log n)
Worst case also O(n log n)
In-place, not stable

// Lower bound for comparison-based sorting
Decision tree with n! leaves (permutations)
Minimum tree height: log₂(n!) ≥ n log₂(n) - O(n)
Therefore: O(n log n) is optimal

Non-comparison sorts: Counting, radix, bucket
Can be O(n) but restricted to certain keys

// Searching
Linear search: O(n), works on unsorted
Binary search: O(log n), needs sorted input
Interpolation search: O(1) best, O(n) worst (uniform distribution)
Exponential search: O(log n) for unbounded arrays

// Search in Special Structures
BST: O(log n) balanced, O(n) skewed
Hash table: O(1) average, O(n) worst
Trie: O(m) where m = key length
Skip list: O(log n) average
Segment tree: O(log n) range queries

// Insertion Position
In sorted array: Use binary search O(log n)
Then shift elements O(n)
Total: O(n)

In sorted list: Find position O(n)
Insert: O(1)
Total: O(n)

Operating Systems Fundamentals

GATE Critical Topic: Processes, threads, synchronization, deadlock, memory, scheduling.

// Process vs Thread
Process:
- Independent execution unit
- Separate memory space
- Heavier (slow context switch)
- Inter-process communication: pipes, sockets

Thread:
- Lightweight process
- Shared memory space
- Fast context switch
- Race conditions possible
- Shared resources: variables, files

// Process States
New → Ready: Created
Ready → Running: Scheduled by CPU
Running → Blocked: Wait for I/O
Blocked → Ready: I/O complete
Running → Terminated: Finished

// Context Switching
Save process state (registers, PC)
Load next process state
Time: 10-100+ microseconds
High context switches → Performance loss

// CPU Scheduling Algorithms
FCFS (First Come First Served):
- Non-preemptive
- Fair but long wait times
- Average wait = (0 + 8 + 16 + 24)/4 = 12

Shortest Job First (SJF):
- Optimal average wait time
- Impossible to know future burst times
- Used as lower bound for comparison

Priority Scheduling:
- High priority runs first
- Risk: Starvation of low priority
- Solution: Aging (increase priority over time)

Round Robin (RR):
- Time quantum q
- Each process gets q time
- Preemptive, fair
- q too large: FCFS, q too small: high overhead
- Good for interactive systems

// Synchronization Problems
Race condition: Concurrent access to shared data
Critical section: Code accessing shared data
Mutual exclusion: At most one process in CS

Semaphore:
- Integer s, initially s=N
- P(s): If s>0, s--; else wait
- V(s): s++, wake one waiting
- Binary semaphore: s ∈ {0,1}
- Counting semaphore: s can be any value

Mutex (Mutual exclusion lock):
- Lock before CS, unlock after
- Simpler than semaphore
- Faster on single processor

Monitor:
- Encapsulates shared data
- Methods: Mutual exclusion guaranteed
- Condition variables: wait(), signal()
- Higher level than semaphore

// Deadlock Conditions (all must hold)
1. Mutual exclusion: Resource not shared
2. Hold and wait: Hold resources while requesting more
3. No preemption: Cannot take held resources
4. Circular wait: Cycle in resource allocation

// Deadlock Prevention
Break one condition:
1. Allow resource sharing (impractical)
2. Request all resources at once
3. Allow preemption (complex recovery)
4. Impose resource ordering

// Deadlock Avoidance
Banker's algorithm: Grant only if safe state
Safe state: No deadlock possible
Check before allocation

// Virtual Memory
Paging: Fixed-size pages (4KB typical)
Segmentation: Variable-size segments
Page fault: Access page not in RAM
Page replacement: LRU (Least Recently Used) best

TLB (Translation Lookaside Buffer):
- Cache for page table entries
- Hit: O(1), Miss: 100+ cycles slower

// Memory Management
Contiguous: Entire process in one block
Non-contiguous: Process scattered
Fragmentation:
  External: Holes between processes
  Internal: Unused space in allocated block

First fit, Best fit, Worst fit allocation strategies

// File System
Inode: Contains file metadata and block pointers
Directory: Map filename to inode
Hard link: Same inode, multiple names
Soft link (symlink): Link to another file path

Database Management Systems

GATE Essential: Normalization, transactions, concurrency, query optimization.

// Relational Model
Relation: Table with rows and columns
Tuple: Row (record)
Attribute: Column (field)
Domain: Valid values for attribute

// Keys
Super key: Uniquely identifies tuple
Candidate key: Minimal super key
Primary key: Chosen candidate key
Foreign key: References primary key in another table
Alternate key: Candidate key not chosen as primary

// Functional Dependencies (FD)
X → Y: If X same, then Y same
Trivial FD: Y ⊆ X (always true)
Non-trivial: Y ⊄ X
Completely non-trivial: Y ∩ X = ∅

// Normalization (Remove anomalies)
1NF: Atomic attributes (no multivalued)
2NF: 1NF + No partial dependency
     No non-prime attribute depends on part of candidate key

3NF: 2NF + No transitive dependency
     Non-prime → Non-prime (directly or indirectly)

BCNF: 3NF + Every determinant is candidate key
     Stricter than 3NF, prevents all anomalies

Example: StudentCourse(StudentID, CourseID, Grade)
Not 2NF if StudentID → Grade (partial dependency)
Make: Student(StudentID), Course(CourseID), Grade(StudentID, CourseID, Grade)

// Decomposition
Lossless join: Can reconstruct original
Dependency preserving: All FDs preserved
Goal: Find decomposition that's lossless and preserves

// SQL Transactions
ACID properties:
A - Atomicity: All or nothing
C - Consistency: Valid state to valid state
I - Isolation: Concurrent transactions don't interfere
D - Durability: Committed data persists

Transaction: BEGIN → Statements → COMMIT/ROLLBACK

// Concurrency Problems
Dirty read: Read uncommitted data
Non-repeatable read: Data changed between reads
Phantom read: New rows appear between reads
Lost update: Concurrent modifications overwrite

// Isolation Levels
Read Uncommitted: Allows dirty reads
Read Committed: Prevents dirty reads
Repeatable Read: Prevents non-repeatable reads
Serializable: Full isolation, slowest

// Locking
Shared lock (read lock): Multiple readers
Exclusive lock (write lock): Exclusive access
Deadlock: Circular lock dependency
Two-phase locking: Growing phase, shrinking phase

// Query Optimization
DBMS chooses execution plan
Consider: Table size, indexes, join order
Cost: Disk I/O, CPU, memory

Indexing:
- Hash index: O(1) exact match, no range
- B-tree index: O(log n) range queries
- Bitmap index: Many low-cardinality columns

// Join Algorithms
Nested loop: O(|R|×|S|) always works
Sort-merge join: O(|R|log|R| + |S|log|S|)
Hash join: O(|R|+|S|) average, good for large tables

// Query Plans
SELECT * FROM R, S WHERE R.id = S.id
Can join as: (R ⋈ S) or (S ⋈ R)
Choose smallest intermediate results

Computer Networks & Protocols

GATE covers: Layers, protocols, routing, congestion control, addressing, security.

// OSI Model (7 Layers)
7. Application: HTTP, FTP, SMTP, DNS, SSH
6. Presentation: Encryption, compression
5. Session: Session management
4. Transport: TCP, UDP
3. Network: IP, Routing, Logical addressing
2. Data Link: Ethernet, MAC, Frames
1. Physical: Cables, signals, bits

TCP/IP Model (4 layers):
4. Application: HTTP, SMTP, Telnet, etc.
3. Transport: TCP, UDP
2. Internet: IP (routing, logical addressing)
1. Link: Ethernet, PPP, WiFi

// TCP (Transmission Control Protocol)
Connection-oriented: Three-way handshake
SYN_SENT → SYN_RCVD → ESTABLISHED
Reliable delivery: Acknowledgments, retransmission
In-order delivery: Sequence numbers
Flow control: Window-based
Congestion control: AIMD (Additive Increase Multiplicative Decrease)

// UDP (User Datagram Protocol)
Connectionless: No handshake
Unreliable: No acknowledgment
Low overhead: No flow control
Speed: Faster than TCP
Use: Streaming, DNS, gaming

// IP Addressing
IPv4: 32 bits, 4 octets (192.168.1.1)
Classes: A (1-126), B (128-191), C (192-223)
Subnet mask: Identifies network portion
CIDR: Classless routing
192.168.1.0/24 means /24 bits are network

IPv6: 128 bits, 16-bit groups (2001:0db8::1)
Larger address space
Simplified header

// Routing
Routing table: Destination → Next hop
Forwarding: Direct decision at each hop
Longest prefix match: Most specific route

Routing protocols:
RIP: Distance-vector, hop count metric
OSPF: Link-state, Dijkstra algorithm
BGP: Border Gateway Protocol, inter-domain

// TCP Congestion Control
Slow start: Exponential growth until threshold
Congestion avoidance: Linear growth
Threshold: Halved on loss
Retransmit timeout: Exponential backoff

Algorithms:
Tahoe: Slow start, threshold reduction
Reno: Fast recovery, selective ACK
CUBIC: Cubic increase function (modern)

// Network Applications
DNS: Recursive resolver, caching
HTTP: Stateless, text-based
SMTP: Send email (push)
POP3/IMAP: Receive email (pull)
FTP: File transfer, separate control/data connections
SSH: Secure shell, encryption
Telnet: Insecure remote access

// Subnetting
Divide network into subnets
Borrow bits from host portion
255.255.255.0 → 255.255.255.128 (borrowing 1 bit)
Increases subnets, decreases hosts per subnet

// Network Address Translation (NAT)
Maps private → public addresses
Enables sharing single public IP
Port forwarding: Traffic redirection
Breaks end-to-end principle

// Wireless (WiFi)
802.11 standards: a, b, g, n, ac, ax
CSMA/CA: Collision avoidance
Frequency bands: 2.4 GHz, 5 GHz
Channel planning: Minimize interference

// Security
Encryption: Confidentiality
Authentication: Verify identity
Integrity: Detect tampering
TLS/SSL: Secure HTTP (HTTPS)

Theory of Computation & Compiler Design

GATE Topics: Automata, languages, parsing, optimization.

// Regular Languages & DFA
DFA (Deterministic Finite Automaton):
- Q: States
- Σ: Alphabet
- δ: Transition function
- q₀: Start state
- F: Accept states

String accepted if path from q₀ to state in F

Example: DFA for binary strings ending with 01
States: q0 (start), q1 (saw 0), q2 (saw 01)
Accept state: q2

// NFA (Nondeterministic Finite Automaton)
Multiple transitions on same input
ε (epsilon) transitions possible
Equivalent to DFA (exponential blowup)
Easier to design, harder to implement

// Regular Expressions
a|b: a or b
a*: Zero or more a's
a+: One or more a's
(ab)*: Zero or more "ab"
[0-9]: Digit
[a-z]+: One or more lowercase

Equivalence: Regex ↔ NFA ↔ DFA

// Context-Free Languages & Grammar
Productions: A → αBβ (α,β,A,B can be multiple symbols)
Parse tree: Derivation graphically
Ambiguity: Multiple parse trees for one string

Grammar properties:
Left-recursive: A → Aα (problematic for top-down)
Right-recursive: A → αA (OK for top-down)
LL(1): Single lookahead token disambiguates

// Parsing
Top-down (predictive):
- Recursive descent: Recursive functions for rules
- LL(1) parsing: Table-driven
- Builds tree root to leaves

Bottom-up (shift-reduce):
- LR parsing: Powerful, handles larger grammars
- Reduces substrings to non-terminals
- Builds tree leaves to root

Complexity: Both O(n) for unambiguous grammars

// Lexical Analysis
Tokenize input: Keywords, identifiers, numbers
Regular expressions → NFA → DFA
States and transitions implemented as tables
Lookahead: Often 1 character

// Semantic Analysis
Type checking: Operands compatible?
Symbol table: Track declared variables
Scope: Global, local, nested
Forward references: Not yet declared

// Intermediate Code Generation
Quadruples: (operator, arg1, arg2, result)
Three-address code: At most 3 operands
Translation: Source → Intermediate → Target
Enables optimization and retargeting

// Optimization
Constant folding: 5 + 3 → 8 (compile time)
Dead code elimination: Unreachable code removal
Common subexpression: a+b computed once, reused
Loop optimization: Invariant code motion, strength reduction

// Code Generation
Register allocation: Minimize memory access
Instruction selection: Efficient opcodes
Scheduling: Hide latencies

// Turing Machines
Tape: Infinite sequence of cells
Head: Read/write, move left/right
States: Finite control
Equivalence: Church-Turing thesis
Computational power: Can compute computable functions

// Decidability
Halting problem: Undecidable
No algorithm determines if program halts
Rice's theorem: Non-trivial properties undecidable

Digital Logic & Computer Organization

GATE requires: Gates, circuits, instruction set architecture, performance metrics.

// Logic Gates & Circuits
Combinational: Output depends only on current input
Sequential: Output depends on history (state)

Gate implementations:
2-to-1 Mux: Y = S·A + S'·B
Decoder (n→2^n): One output active
Encoder (2^n→n): Binary code of active input
Multiplexer: Select one input

// Adder Design
Half adder: Σ = A⊕B, Cout = A·B
Full adder: Cout = A·B + Cin·(A⊕B)
Carry lookahead: Predict carry bits
Reduces propagation delay from O(n) to O(log n)

// Arithmetic Logic Unit (ALU)
Performs: Add, subtract, AND, OR, shift
Control signals: Select operation
Flags: Zero, carry, overflow, negative

// Processor Design
Fetch-Decode-Execute cycle
Instruction format: Opcode | Operands
Addressing modes: Immediate, register, memory
Pipeline: Overlap fetch-decode-execute

5-stage pipeline: IF, ID, EX, MEM, WB
Hazards:
- Structural: Same hardware needed
- Data: Depends on previous instruction result
- Control: Branch changes PC

Forwarding: Bypass register, use ALU result directly
Speculative execution: Predict branch, back off if wrong

// Memory Hierarchy
L1 cache: ~4 KB, ~4 cycles
L2 cache: ~256 KB, ~10 cycles
L3 cache: ~8 MB, ~40 cycles
RAM: ~8 GB, ~100 cycles
Disk: ~500 GB, ~10⁶ cycles

Cache miss: Fetch from next level
Cache line: Unit of transfer (typically 64 bytes)
Replacement policy: LRU, FIFO

Associativity:
Direct-mapped: Index directly
Set-associative: Search within set
Fully associative: Search all (slow, small)

// Virtual Memory
Paging: Fixed 4KB pages
Page fault: Trap to kernel, load from disk
Page table: Virtual → Physical address mapping
TLB: Cache for page table entries
TLB miss: Consult page table (~40 cycles)

// Performance Metrics
CPI: Clock cycles per instruction
IPC: Instructions per clock
MIPS: Million instructions per second
Throughput: Instructions per unit time
Latency: Time to complete operation

Performance equation:
Time = Instructions × CPI × Clock_period
Optimize: Reduce instructions (code), CPI (pipeline), or frequency

// Instruction Formats
3-address: Op dest, src1, src2
2-address: Op src1, src2 (src1 = result)
1-address: Op with implicit accumulator
0-address: Stack-based

RISC: Few, simple instructions
CISC: Many, complex instructions

// Control Flow
Branch prediction: Speculate direction
Static: Always not taken, taken, backward taken
Dynamic: History, target address cache
Misprediction: Flush pipeline, re-execute

// Parallelism
Instruction-level: Pipeline, multiple issue
Thread-level: Multiple cores
Data-level: SIMD (same instruction, multiple data)
Amdahl's law: Speedup = 1 / (S + (1-S)/P)
where S = serial fraction, P = processors

AI, Machine Learning & Advanced Topics

GATE Modern Topics: AI search, learning, optimization, emerging techniques.

// Artificial Intelligence Fundamentals
Agent: Entity that perceives and acts
Environment: What agent interacts with
Goal: Desired outcome
Rational: Chooses best action

Search problems:
- State space: All possible configurations
- Initial state: Starting point
- Goal test: Reached objective?
- Transition model: Actions and results
- Path cost: Distance/resources

// Search Algorithms
Uninformed (Blind):
BFS: O(b^d) space, complete, optimal
DFS: O(bd) space, not complete, not optimal
Iterative deepening: O(b^d) space, complete, optimal

Informed (Heuristic):
h(n): Estimated distance to goal
Admissible: Never overestimates
Consistent: h(n) ≤ cost(n,n') + h(n')

A* search: f(n) = g(n) + h(n)
Optimal with admissible heuristic
O(b^d) space, exponential worst case

Greedy: f(n) = h(n) (not optimal)
Hill climbing: Local optimum only

// Machine Learning Basics
Supervised: Learn from labeled data
Regression: Continuous output
Classification: Discrete classes
Unsupervised: Find patterns in unlabeled data

Training: Learn from training set
Validation: Tune hyperparameters
Testing: Evaluate on unseen data
Overfitting: Model memorizes, poor generalization
Underfitting: Model too simple

// Linear Regression
Model: y = w₀ + w₁x₁ + ... + wₙxₙ
Loss: Sum of squared errors
Update: w ← w - α∇L(w)
Gradient descent: Move opposite to gradient

// Classification
Binary: Two classes
Multiclass: More than two classes

Logistic regression:
P(y=1|x) = 1/(1+e^(-wx))
Decision boundary: w·x = 0

Naive Bayes:
P(y|x) ∝ P(y)∏P(xᵢ|y)
Independence assumption: Features independent given class

Decision trees:
Recursive partitioning
Information gain: Entropy reduction
Overfitting: Pruning needed

SVM (Support Vector Machine):
Maximize margin between classes
Kernel trick: Non-linear boundaries
Dual formulation: Solve in dual space

// Neural Networks
Perceptron: Single neuron
Activation: Sigmoid, ReLU, tanh
Backpropagation: Chain rule for gradients
Hidden layers: Feature learning
Overfitting: Regularization (L1, L2)

// Clustering (Unsupervised)
K-means: Partition into K clusters
Distance metric: Euclidean typical
Lloyd's algorithm: Iterative optimization
Silhouette score: Cluster quality

Hierarchical clustering:
Agglomerative: Bottom-up merge
Dendrogram: Visualization of merges

// Performance Evaluation
Accuracy: (TP+TN)/(TP+TN+FP+FN)
Precision: TP/(TP+FP) (positive correctness)
Recall: TP/(TP+FN) (positive capture)
F1: 2·Precision·Recall/(Precision+Recall)
ROC curve: TPR vs FPR tradeoff

// Cross-validation
k-fold: Split data into k parts
Train on k-1, test on 1, repeat k times
Reduces variance in evaluation
Standard: 5-fold, 10-fold

// Gradient Descent Variants
Batch: All data per update (stable, slow)
Stochastic: One sample per update (noisy, fast)
Mini-batch: Subset per update (compromise)
Adam: Adaptive learning rate (popular)

// Regularization
L1 (Lasso): |w| penalty, sparse solutions
L2 (Ridge): w² penalty, smooth solutions
Dropout: Random units deactivated during training
Early stopping: Stop before overfitting

More Cheat Sheets