database

Intermediate

Database Interview Preparation (Intermediate Level)


Overview

Database systems the backbone of modern applications, handling everything from transactional analytical workloads. At the intermediate level, interviews often probe deeper understanding beyond basic SQL syntax — focusing on design patterns, optimization strategies, scalability challenges, and theoretical principles. This topic is commonly tested atmid-to-senior roles** atAmazon, Google, Microsoft, Netflix, and FAANG+**, where robust data handling is critical.


Core Concepts

ACID Properties

  • Definition: : Guarantees for reliable database transactions: Atomicity, Consistency, Isolation, Durability.
  • Key Points:
    • Atomicity: All-or-nothing execution of transactions
    • Consistency: Database remains in valid state before/after transaction
    • Isolation: Concurrent transactions don’t interfere
    • Durability: Committed data survives failures
  • Example:
    BEGIN TRANSACTION;
    UPDATE accounts SET balance = balance - 100 WHERE user_id = 1;
    UPDATE accounts SET balance = balance + 100 WHERE user_id = 2;
    COMMIT;  -- Either both updates happen or neither
    

Normalization & Denormalization

  • Definition: : Structuring tables to reduce redundancy (normalization) vs. intentionally duplicating data for performance (denormalization).
  • Key Points:
    • Normal forms: 1NF (atomic values) → 3NF (no transitive dependencies) → BCNF (stronger 3NF)
    • Denormalization used in OLAP systems or read-heavy workloads
  • Example:
    Normalized (3 tables): users, orders, order_items
    Denormalized: orders table includes user_name for faster reporting

Indexing Strategies

  • Definition: : Data structures (usually B-trees) to accelerate data retrieval.
  • Key Points:
    • Primary vs. Secondary indexes
    • Composite indexes: Order matters ((a,b)(b,a))
    • Covering indexes: Include all columns needed for a query
  • Example:
    CREATE INDEX idx_user_email ON users(email); -- Speed up WHERE email = '...'
    CREATE INDEX idx_orders_date_status ON orders(order_date, status); -- Good for range queries
    

Transaction Isolation Levels

  • Definition: : Degrees to which transactions are isolated from each other.

  • Key Points:

    • Read Uncommitted: Dirty reads possible
    • Read Committed: No dirty reads (default in PostgreSQL)
    • Repeatable Read: No dirty or non-repeatable reads
    • Serializable: Full isolation (possible performance cost)
  • Comparison Table:

    Isolation LevelDirty ReadsNon-repeatable ReadsPhantom Reads
    Read Uncommitted
    Read Committed
    Repeatable Read
    Serializable

Common Interview Questions

  1. What are the differences between OLTP and OLAP systems? How would you design a schema for each?

    • Answer:
      • OLTP: Optimized for transactions (many small writes), normalization, high concurrency. Example schema: orders with foreign keys.
      • OLAP: Optimized for analysis (complex reads), denormalization (star/snowflake schemas), batch processing.
      • Design: Use normalized tables for OLTP; dimensional modeling (fact + dimension tables) for OLAP.
  2. Explain how to optimize this query: SELECT * FROM orders WHERE order_date BETWEEN '2023-01-01' AND '2023-12-31' ORDER BY total DESC LIMIT 10;

    • Answer:
      • Add composite index: CREATE INDEX idx_orders_date_total ON orders(order_date, total DESC);
      • **Avoid SELECT ***: Select only needed columns.
      • Use covering index to eliminate table scan.
  3. What is a deadlock? How would you resolve it?

    • Answer:
      • Deadlock: Two+ transactions waiting on each other’s locks.
      • Resolution:
        • Use lower isolation levels (e.g., Read Committed)
        • Set lock timeouts
        • Consistent lock ordering across transactions
        • Retrying failed transactions
  4. How do you handle data consistency in a distributed database?

    • Answer:
      • CAP Theorem: Choose between Consistency, Availability, Partition tolerance.
      • Solutions:
        • Two-phase commit (sacrifices availability)
        • Event sourcing + CQRS for eventual consistency
        • CRDTs for conflict-free replication
  5. What are window functions? Give an example use case.

    • Answer:
      • Window functions perform calculations across sets of rows related to current row.
      • Example: Calculating running total:
        SELECT 
          order_date, 
          amount,
          SUM(amount) OVER (ORDER BY order_date) AS running_total
        FROM sales;
        

Detailed Examples

Example 1: Designing a Social Media Feed Schema

Scenario: Design a database for a Twitter-like feed with 1B users.

Steps:

  1. Normalized Core:
    CREATE TABLE users (
      user_id BIGINT PRIMARY KEY,
      username TEXT UNIQUE,
      created_at TIMESTAMP
    );
    
    CREATE TABLE tweets (
      tweet_id BIGINT PRIMARY KEY,
      user_id BIGINT REFERENCES users(user_id),
      content TEXT,
      created_at TIMESTAMP
    );
    
  2. Denormalized Read Optimization:
    • Materialized view or separate feed table with pre-computed timelines
    • Use Redis for hot user feeds

Example 2: Diagnosing a Slow Report Query

Query:

EXPLAIN ANALYZE 
SELECT product_id, AVG(price) 
FROM sales 
WHERE sale_date >= '2023-01-01' 
GROUP BY product_id;

Optimization Steps:

  1. Ensure partitioning on sale_date (e.g., monthly partitions)
  2. Create index on (sale_date, product_id, price)
  3. Use approximate aggregates (e.g., HyperLogLog) if exact results aren’t critical

Practice Problems

  1. Easy: Write a query to find the nth highest salary from the employees table.
    > Hint: Use Window Functions (RANK() or DENSE_RANK())

  2. Intermediate: Design a shopping cart database schema supporting concurrent updates and idempotent operations.
    > Hint: Consider optimistic locking with version numbers

  3. Intermediate+: Explain how to implement eventual consistency for a user profile service that syncs with a analytics database.
    > Hint: Use message queues (Kafka) and change data capture (CDC)


Key Takeaways

  • Master ACID and Isolation Levels: They’re foundational for any transaction-heavy system.
  • Indexing is an Art: Understand when composite, covering, and partial indexes shine.
  • Normalization vs. Denormalization: Balance data integrity with query performance needs.
  • Practice Real-World Scenarios: Focus on trade-offs in distributed systems and optimization.
  • Common Pitfalls:
    • Over-indexing leading to slow writes
    • Ignoring partition eligibility in large tables
    • Not testing concurrency behaviors in transactions

Ready for a new challenge?

Start a new session to explore different topics or increase the difficulty level.

Start New Session
database Preparation Guide | Interview Preparation