Databases Cheat Sheet

systemsGrades 10-126 sections

SQL vs NoSQL

Aspect	SQL (Relational)	NoSQL (Non-relational)
Structure	Tables with fixed schema	Flexible (documents, key-value, graphs)
Schema	Predefined	Dynamic
Scaling	Vertical (bigger server)	Horizontal (distributed)
ACID	Guaranteed	BASE (eventual consistency)
Joins	Multiple tables (powerful)	Denormalized data
Transactions	Complex, multi-table	Limited
Examples	PostgreSQL, MySQL, Oracle	MongoDB, Redis, Cassandra, DynamoDB
Best for	Complex queries, ACID needs	Big data, speed, flexibility

ACID Properties

// ACID: Guarantees for reliable transactions
A - Atomicity: All or nothing
  Either all operations complete OR all rollback
  Example: Transfer ₹100 between accounts
  - If withdrawal succeeds but deposit fails → ROLLBACK both

C - Consistency: Valid state transitions
  Database moves from valid state to valid state
  Constraints maintained (primary key unique, foreign keys valid)
  Example: Account balance never negative

I - Isolation: No interference between transactions
  Concurrent transactions don't see each other's intermediate states
  Levels (weakest → strongest):
  - READ UNCOMMITTED: Dirty reads allowed
  - READ COMMITTED: See committed data only (default)
  - REPEATABLE READ: Same data on re-read
  - SERIALIZABLE: Transactions are serialized

D - Durability: Persisted data survives failures
  Committed data isn't lost
  Written to disk, survives power loss
  Example: After COMMIT, ₹100 transfer is permanent

// Example transaction (SQL)
BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

// If error between statements:
ROLLBACK;  -- Reverts both updates

Normalization & Design

// Normalization: Eliminate redundancy
1st Normal Form (1NF):
- Atomic values only (no repeating groups)
- ❌ User: {name, tags: [java, python]}
- ✓ UserTag table: {user_id, tag}

2nd Normal Form (2NF):
- 1NF + Non-key attributes fully depend on primary key
- ❌ Order: {order_id, user_id, user_name, product_id}
  (user_name depends on user_id, not order_id)
- ✓ User & Order tables separately

3rd Normal Form (3NF):
- 2NF + Non-key attributes don't depend on each other
- ❌ User: {id, email, country, country_code}
  (country_code depends on country, not user)
- ✓ User & Country tables separately

Boyce-Codd Normal Form (BCNF):
- Stricter than 3NF, every determinant is a candidate key
- Rare in practice

// Denormalization: Intentional redundancy for speed
- Duplicate data to avoid expensive joins
- Trade: Update complexity vs query speed
- Example: Store user_name in Order table (avoid User join)

// Key concepts
Primary Key: Unique identifier
Foreign Key: References another table
Surrogate Key: Artificial (auto-increment)
Natural Key: Domain-meaningful (email)

Indexing & Query Optimization

// Indexes: Speed up queries
B-Tree: Default, range queries fast
Hash: Fast equality, not range
Full-text: Text search
Bitmap: Low cardinality (gender: M/F)

// When to index
✓ WHERE clause columns
✓ JOIN columns
✓ ORDER BY columns
✓ SELECT columns in cover query (avoid table lookup)
❌ Low cardinality (gender, boolean)
❌ Frequently updated columns

// Composite indexes
CREATE INDEX idx_name_email ON users(name, email);
Useful for:
- (name, email) queries
- name-only queries
- NOT useful for email-only (index skipped)

// Query analysis
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
Look for:
- Index used (good)
- Full table scan (bad)
- Sequential scan (expected sometimes)

// Slow queries
❌ Functions in WHERE: YEAR(date) = 2024
✓ Range: date >= '2024-01-01' AND date < '2025-01-01'

❌ SELECT *
✓ SELECT specific columns

❌ N+1 queries: Loop fetching individual records
✓ Batch query or join

// Caching strategies
Query cache: Cache results
Application cache: Redis, Memcached
CDN: Cache static assets

Sharding & Scaling

// Scaling: Growing data/traffic
Vertical: Bigger server (limited, expensive)
Horizontal: More servers (distributed)

// Sharding: Horizontal partitioning
Split data across multiple databases
Each shard holds subset of data

Sharding strategies:
Range-based: id 1-1M on shard1, 1M-2M on shard2
Hash-based: hash(user_id) % num_shards
Directory-based: Lookup table maps keys to shards
Geographic: By region

Challenges:
- Hot shards (uneven distribution)
- Cross-shard joins (complex)
- Rebalancing (moving data)
- Transactions across shards

// Replication
Master-Slave: Master writes, slaves read-only
Multi-Master: All can write (conflicts possible)
Trade-offs: Consistency vs availability

// CAP Theorem
Can achieve 2 of 3:
C - Consistency: All nodes see same data
A - Availability: Service always available
P - Partition tolerance: Network failures

Examples:
CP: PostgreSQL (prioritize consistency)
AP: MongoDB (prioritize availability)

NoSQL Types

Type	Storage	Examples	Use Cases
Document	JSON/BSON	MongoDB, CouchDB	Flexible schema, nested data
Key-Value	Hash table	Redis, Memcached	Cache, sessions, real-time
Column Family	Sparse columns	Cassandra, HBase	Time-series, wide tables
Graph	Nodes & edges	Neo4j, ArangoDB	Relationships, social networks
Search	Inverted index	Elasticsearch, Solr	Full-text search, logs

Databases Cheat Sheet

SQL vs NoSQL

ACID Properties

Normalization & Design

Indexing & Query Optimization

Sharding & Scaling

NoSQL Types

More Cheat Sheets

Networking Cheat Sheet

System Design Cheat Sheet

Cybersecurity Cheat Sheet