🧠 AI Computer Institute
Content is AI-generated for educational purposes. Verify critical information independently. A bharath.ai initiative.

Databases Cheat Sheet

systemsGrades 10-126 sections

SQL vs NoSQL

AspectSQL (Relational)NoSQL (Non-relational)
StructureTables with fixed schemaFlexible (documents, key-value, graphs)
SchemaPredefinedDynamic
ScalingVertical (bigger server)Horizontal (distributed)
ACIDGuaranteedBASE (eventual consistency)
JoinsMultiple tables (powerful)Denormalized data
TransactionsComplex, multi-tableLimited
ExamplesPostgreSQL, MySQL, OracleMongoDB, Redis, Cassandra, DynamoDB
Best forComplex queries, ACID needsBig data, speed, flexibility

ACID Properties

// ACID: Guarantees for reliable transactions
A - Atomicity: All or nothing
  Either all operations complete OR all rollback
  Example: Transfer ₹100 between accounts
  - If withdrawal succeeds but deposit fails → ROLLBACK both

C - Consistency: Valid state transitions
  Database moves from valid state to valid state
  Constraints maintained (primary key unique, foreign keys valid)
  Example: Account balance never negative

I - Isolation: No interference between transactions
  Concurrent transactions don't see each other's intermediate states
  Levels (weakest → strongest):
  - READ UNCOMMITTED: Dirty reads allowed
  - READ COMMITTED: See committed data only (default)
  - REPEATABLE READ: Same data on re-read
  - SERIALIZABLE: Transactions are serialized

D - Durability: Persisted data survives failures
  Committed data isn't lost
  Written to disk, survives power loss
  Example: After COMMIT, ₹100 transfer is permanent

// Example transaction (SQL)
BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 100 WHERE id = 1;
  UPDATE accounts SET balance = balance + 100 WHERE id = 2;
COMMIT;

// If error between statements:
ROLLBACK;  -- Reverts both updates

Normalization & Design

// Normalization: Eliminate redundancy
1st Normal Form (1NF):
- Atomic values only (no repeating groups)
- ❌ User: {name, tags: [java, python]}
- ✓ UserTag table: {user_id, tag}

2nd Normal Form (2NF):
- 1NF + Non-key attributes fully depend on primary key
- ❌ Order: {order_id, user_id, user_name, product_id}
  (user_name depends on user_id, not order_id)
- ✓ User & Order tables separately

3rd Normal Form (3NF):
- 2NF + Non-key attributes don't depend on each other
- ❌ User: {id, email, country, country_code}
  (country_code depends on country, not user)
- ✓ User & Country tables separately

Boyce-Codd Normal Form (BCNF):
- Stricter than 3NF, every determinant is a candidate key
- Rare in practice

// Denormalization: Intentional redundancy for speed
- Duplicate data to avoid expensive joins
- Trade: Update complexity vs query speed
- Example: Store user_name in Order table (avoid User join)

// Key concepts
Primary Key: Unique identifier
Foreign Key: References another table
Surrogate Key: Artificial (auto-increment)
Natural Key: Domain-meaningful (email)

Indexing & Query Optimization

// Indexes: Speed up queries
B-Tree: Default, range queries fast
Hash: Fast equality, not range
Full-text: Text search
Bitmap: Low cardinality (gender: M/F)

// When to index
✓ WHERE clause columns
✓ JOIN columns
✓ ORDER BY columns
✓ SELECT columns in cover query (avoid table lookup)
❌ Low cardinality (gender, boolean)
❌ Frequently updated columns

// Composite indexes
CREATE INDEX idx_name_email ON users(name, email);
Useful for:
- (name, email) queries
- name-only queries
- NOT useful for email-only (index skipped)

// Query analysis
EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
Look for:
- Index used (good)
- Full table scan (bad)
- Sequential scan (expected sometimes)

// Slow queries
❌ Functions in WHERE: YEAR(date) = 2024
✓ Range: date >= '2024-01-01' AND date < '2025-01-01'

❌ SELECT *
✓ SELECT specific columns

❌ N+1 queries: Loop fetching individual records
✓ Batch query or join

// Caching strategies
Query cache: Cache results
Application cache: Redis, Memcached
CDN: Cache static assets

Sharding & Scaling

// Scaling: Growing data/traffic
Vertical: Bigger server (limited, expensive)
Horizontal: More servers (distributed)

// Sharding: Horizontal partitioning
Split data across multiple databases
Each shard holds subset of data

Sharding strategies:
Range-based: id 1-1M on shard1, 1M-2M on shard2
Hash-based: hash(user_id) % num_shards
Directory-based: Lookup table maps keys to shards
Geographic: By region

Challenges:
- Hot shards (uneven distribution)
- Cross-shard joins (complex)
- Rebalancing (moving data)
- Transactions across shards

// Replication
Master-Slave: Master writes, slaves read-only
Multi-Master: All can write (conflicts possible)
Trade-offs: Consistency vs availability

// CAP Theorem
Can achieve 2 of 3:
C - Consistency: All nodes see same data
A - Availability: Service always available
P - Partition tolerance: Network failures

Examples:
CP: PostgreSQL (prioritize consistency)
AP: MongoDB (prioritize availability)

NoSQL Types

TypeStorageExamplesUse Cases
DocumentJSON/BSONMongoDB, CouchDBFlexible schema, nested data
Key-ValueHash tableRedis, MemcachedCache, sessions, real-time
Column FamilySparse columnsCassandra, HBaseTime-series, wide tables
GraphNodes & edgesNeo4j, ArangoDBRelationships, social networks
SearchInverted indexElasticsearch, SolrFull-text search, logs

More Cheat Sheets