SQL

SQL is the standard language used to define, query, modify, secure, and manage data in relational databases. It matters because real systems such as UPI payments, hospital records, SaaS dashboards, and e-commerce checkouts depend on accurate, fast, consistent data retrieval. After reading, you can design queries, reason about performance, and answer interview-style SQL questions confidently.

SQL sits between application code and structured data storage, so backend developers, data analysts, data engineers, DBAs, and product analytics teams rely on it daily. If you need a compact command reference while practising, keep the List of SQL Commands with Examples handy for revision.

You will be able to classify SQL commands, write joins and subqueries, use constraints and transactions correctly, apply indexes thoughtfully, and explain advanced features such as CTEs, window functions, stored procedures, triggers, and cursors.


Who This Guide Is For

This guide is specifically designed for:


Core Concepts

SQL is not one feature; it is a family of language categories, relational operations, integrity rules, and performance tools. A mature SQL user knows when to define schema, when to query data, when to change data, when to secure access, and when to control transactions. The table below maps the standard concepts you must know.

SQL Command Categories

SQL commands are grouped by intent. DDL changes structure, DML changes rows, DQL reads rows, DCL manages permissions, and TCL controls transactions. This classification is heavily tested because many learners remember commands by syntax but not by purpose.

A familiar example is a PAN verification app: DDL creates the applicant table, DML inserts verification status, DQL retrieves pending cases, DCL restricts access to compliance staff, and TCL commits the entire verification update only after all checks pass. An industry-specific example is a SaaS billing platform where schema changes, invoice inserts, revenue queries, role permissions, and payment transaction commits must be separated cleanly.

A standard exam question asks: classify CREATE, INSERT, SELECT, GRANT, and COMMIT. The answer is DDL, DML, DQL, DCL, and TCL respectively.

Code Example

Data Definition Language

DDL defines and changes database objects. Common DDL commands include CREATE, ALTER, DROP, TRUNCATE, and RENAME. These commands affect structure, so they need extra care in production because a wrong DROP or TRUNCATE can remove critical objects or data very quickly.

A familiar example is creating a table for metro smart-card top-ups with card number, recharge amount, and timestamp. An industry-specific example is an ed-tech platform adding a new nullable column for proctored exam status after launching online assessments. DDL is also where naming conventions, data types, keys, and storage-related choices begin.

DDL changes database structure. Many database engines auto-commit DDL statements, so rollback behavior can differ across systems. Check your DBMS before running destructive DDL.

Code Example

Data Manipulation Language

DML changes the data stored inside tables. INSERT adds rows, UPDATE modifies existing rows, DELETE removes rows, and MERGE performs conditional insert-or-update behavior in systems that support it. Good DML always targets the correct rows, especially when writing UPDATE and DELETE statements.

A familiar example is updating the delivery status for a grocery app order after the rider picks it up. An industry-specific example is a banking reconciliation job that marks settled NEFT transfers after matching them with a settlement file. In both cases, a missing WHERE clause can damage thousands of rows.

The most common DML mistake is running UPDATE or DELETE without a WHERE clause. In production, preview target rows with SELECT before modifying them.

Code Example

Querying and Filtering

DQL is mainly represented by SELECT. It retrieves data using projection, filtering, sorting, limiting, grouping, and expressions. Intermediate SQL skill begins when you stop writing broad SELECT * queries and start selecting only the columns and rows needed for the task.

A familiar example is filtering FASTag toll transactions for a specific vehicle and date range. An industry-specific example is a healthcare analytics team retrieving only high-risk lab reports from the last seven days for a clinical dashboard. Good filters reduce network transfer, improve privacy, and help the optimizer choose better plans.

Logical SQL processing order is commonly tested: FROM, WHERE, GROUP BY, HAVING, SELECT, ORDER BY. The written order is different from the logical evaluation order.

Code Example

Constraints and Keys

Constraints are database-enforced rules that protect data integrity. Standard constraints include PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, CHECK, and DEFAULT. Keys describe identity and relationships: a primary key identifies a row, a foreign key references another table, candidate keys can uniquely identify rows, and composite keys use multiple columns together.

A familiar example is Aadhaar-based demographic storage where one resident record should have one unique Aadhaar value. An industry-specific example is a pharmacy inventory system where a prescription item must reference a valid prescription and quantity must be greater than zero. Constraints prevent invalid data even when multiple applications write to the same database.

Primary keys cannot be NULL and must be unique. Foreign keys may be NULL unless explicitly declared NOT NULL, but non-NULL foreign key values must match a referenced key.

Code Example

Joins and Relationships

Joins combine rows from related tables. INNER JOIN keeps matching rows, LEFT JOIN keeps all rows from the left side, RIGHT JOIN keeps all rows from the right side, FULL OUTER JOIN keeps all rows from both sides, CROSS JOIN creates all combinations, SELF JOIN joins a table to itself, and NATURAL JOIN automatically joins same-named columns where supported.

A familiar example is an IRCTC booking report that joins passengers with tickets to show train, berth, and payment details. An industry-specific example is a logistics control tower joining shipment, warehouse, and carrier tables to identify delayed consignments. Joins are where relational modeling becomes visible in query form.

Do not use NATURAL JOIN casually. If a new same-named column is added later, the join condition can silently change and produce wrong results.

Code Example

Subqueries and CTEs

Subqueries place one query inside another. They can be scalar, row, table, correlated, or non-correlated, and they appear in SELECT, FROM, WHERE, HAVING, INSERT, UPDATE, and DELETE statements. CTEs, written with WITH, name a temporary result set and make complex logic easier to read.

A familiar example is a wallet app finding users whose monthly spending is above their city average. An industry-specific example is a telecom fraud-detection query that first builds suspicious call patterns and then filters accounts using that intermediate result. For deeper syntax practice, the detailed guide on CTE in SQL is a natural companion to this topic.

A correlated subquery executes with reference to the outer query row. Interviewers often ask why it can be slower than a join or CTE-based rewrite on large datasets.

Code Example

Aggregation and Grouping

Aggregation converts detailed rows into summaries. COUNT, SUM, AVG, MIN, and MAX are the core aggregate functions, usually combined with GROUP BY. WHERE filters rows before grouping, while HAVING filters groups after aggregation.

A familiar example is calculating monthly electricity bill totals for each consumer category. An industry-specific example is a marketplace finance team computing gross merchandise value, refund value, and net revenue by seller. Aggregation is essential for dashboards, reconciliations, compliance summaries, and analytics interviews.

Use WHERE for row-level filtering before GROUP BY and HAVING for aggregate filtering after GROUP BY. This distinction is a frequent source of wrong answers.

Code Example

Window Functions

Window functions calculate across related rows without collapsing the result set. Common categories include ranking functions such as ROW_NUMBER, RANK, and DENSE_RANK; offset functions such as LAG and LEAD; aggregate windows such as running SUM; and distribution functions such as NTILE, CUME_DIST, and PERCENT_RANK where supported.

A familiar example is ranking exam candidates within each city while still showing every candidate row. An industry-specific example is a stock-broking platform comparing each trade price with the previous trade price for the same instrument. Window functions are a major step beyond GROUP BY because they preserve row-level detail.

RANK leaves gaps after ties, DENSE_RANK does not leave gaps, and ROW_NUMBER always assigns a unique sequence. This difference is widely tested in SQL interviews.

Code Example

Views and Indexes

A view is a saved query exposed like a table. It can hide complex joins, restrict columns, and standardize business logic. A materialized view, supported by several databases, stores the query result physically and refreshes it based on database-specific rules.

An index is a data structure that speeds up lookups, joins, sorting, and uniqueness checks, usually at the cost of extra storage and slower writes. A familiar example is a tax portal view that exposes only safe taxpayer fields to support agents. An industry-specific example is a food-delivery search index on restaurant city and cuisine to make discovery faster during peak traffic.

Indexes are not automatically good for every column. Low-selectivity columns, very small tables, and heavy-write workloads may not benefit from extra indexes.

Code Example

Transactions and ACID

A transaction is a logical unit of work that succeeds completely or fails safely. ACID stands for Atomicity, Consistency, Isolation, and Durability. Transaction isolation levels commonly include READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE, though exact behavior differs by database engine.

A familiar example is a UPI transfer where debit and credit must happen together or not at all. An industry-specific example is an insurance claim approval flow where claim status, ledger entry, and audit trail must stay consistent. Transactions protect correctness when many users and services operate at the same time.

The most tested transaction anomalies are dirty read, non-repeatable read, phantom read, and lost update. Higher isolation reduces anomalies but can reduce concurrency.

Code Example

Stored Logic

Stored logic moves reusable behavior into the database. Stored procedures perform actions, functions return values, triggers run automatically on table events, and cursors process result sets row by row. These tools are useful but should not replace clear application design or set-based SQL when a simpler query works.

A familiar example is a coupon system using a stored procedure to apply discount rules consistently. An industry-specific example is a hospital audit trigger that records every update to a patient admission status. Cursors are less common in modern set-based SQL, but they still appear in administrative scripts and legacy enterprise systems.

Prefer set-based SQL for bulk operations. Use cursors only when each row genuinely needs sequential handling, external calls, or procedural state that cannot be expressed cleanly as a set operation.

Code Example

Normalization and Design

Normalization organizes data to reduce duplication and update anomalies. First Normal Form removes repeating groups, Second Normal Form removes partial dependency on a composite key, Third Normal Form removes transitive dependency, and BCNF strengthens determinant rules. In production, teams sometimes denormalize carefully for performance, but they should understand the trade-off.

A familiar example is separating voter profile data from polling booth allocation so booth changes do not duplicate voter details. An industry-specific example is a subscription platform separating plan, customer, invoice, and payment tables so price changes and invoice history remain accurate. Good schema design makes queries simpler and constraints more meaningful.

Normalization questions often ask which normal form is violated. Check repeating groups for 1NF, partial dependency for 2NF, transitive dependency for 3NF, and non-candidate-key determinants for BCNF.

Code Example

Security and Injection

SQL security includes authentication, authorization, least privilege, auditing, data masking, encryption support, and safe query execution. DCL commands manage permissions, while application code must use parameterized queries to prevent SQL injection. Security belongs in both database design and application development.

A familiar example is a college results portal where students should read only their own marks, not the entire marks table. An industry-specific example is a lending platform where credit-risk analysts can read anonymized borrower features but cannot update disbursal records. Least privilege keeps accidental and malicious damage limited.

Never build SQL by concatenating raw user input. Parameterized queries are the standard defense against SQL injection in application code.

Code Example

Write SQL in this order mentally: model the data, enforce integrity, query only what you need, protect transactions, index based on real access patterns, and secure every permission path.

Learning Path

Use this path to move from correct syntax to production-ready reasoning. Practise every phase on a real database such as PostgreSQL, MySQL, SQL Server, SQLite, or Snowflake, because SQL behavior can vary across engines.


Frequently Asked Questions

What is SQL?

SQL is a language for defining, querying, modifying, controlling, and securing data in relational database systems. It is used in application backends, reporting systems, analytics platforms, data warehouses, and administrative scripts.

What is the difference between SQL and MySQL?

SQL is the language, while MySQL is a relational database management system that implements SQL with its own features and dialect. PostgreSQL, SQL Server, Oracle Database, SQLite, and Snowflake also support SQL, but syntax and behavior can differ.

What is the difference between WHERE and HAVING?

WHERE filters rows before grouping, while HAVING filters groups after aggregation. Use WHERE for conditions such as city equals Mumbai and HAVING for conditions such as total sales greater than one lakh.

When should I use a join instead of a subquery?

Use a join when you need columns from multiple related tables or when the join expresses the relationship clearly. Use a subquery or CTE when it improves readability, isolates logic, or represents a filtering condition naturally.

What are the main types of SQL joins?

The standard join types are INNER JOIN, LEFT OUTER JOIN, RIGHT OUTER JOIN, FULL OUTER JOIN, CROSS JOIN, and SELF JOIN. NATURAL JOIN also exists in some databases, but it should be used carefully because it depends on same-named columns.

What is an index in SQL?

An index is a database structure that helps the engine find rows faster, similar to an index in a book. Indexes improve many read operations but add storage overhead and can slow down inserts, updates, and deletes.

What are ACID properties?

ACID stands for Atomicity, Consistency, Isolation, and Durability. These properties describe how transactions keep data correct during failures, concurrent access, and multi-step changes.

What is the biggest SQL mistake in interviews?

The biggest mistake is writing a query that looks syntactically correct but ignores duplicates, NULL values, join cardinality, or aggregation level. Interviewers often care more about correct reasoning than a clever one-line query.


Interview Preparation

SQL interview questions test both syntax and reasoning. Strong answers explain data relationships, row counts, duplicate handling, NULL behavior, constraints, and performance trade-offs before presenting the final query.

Conceptual Questions

  • Why is SELECT classified separately from DML in many textbooks? SELECT reads data but does not modify stored rows, so it is commonly classified as DQL. INSERT, UPDATE, DELETE, and MERGE change data and belong to DML.
  • What is the difference between PRIMARY KEY and UNIQUE? A primary key uniquely identifies each row and cannot contain NULL values. A UNIQUE constraint prevents duplicate values, but NULL handling depends on the database system.
  • Why can indexes make writes slower? Every INSERT, UPDATE, or DELETE may also update one or more index structures. More indexes can speed reads but increase write cost and storage usage.
  • What is the difference between RANK and DENSE_RANK? RANK assigns the same rank to ties and leaves gaps after ties. DENSE_RANK also assigns the same rank to ties but continues without gaps.

Applied / Problem-Solving Questions

  • How would you find customers with no orders? Use a LEFT JOIN from customers to orders and filter rows where the order key is NULL. This pattern preserves all customers and identifies missing matches.
  • How would you get the second-highest salary or score? Use DENSE_RANK over the value in descending order and filter rank equals two. This handles ties better than a simple MAX less than MAX pattern.
  • How would you prevent double spending in a wallet system? Use a transaction, check sufficient balance in the debit update, lock or isolate competing updates appropriately, and commit debit and credit together. Also enforce database constraints to prevent negative balances where possible.
  • How would you optimize a slow report query? Start with the execution plan, verify filters and join conditions, remove unnecessary columns, check indexes, and confirm table statistics. Avoid guessing; measure before and after each change.
The most tested SQL interview point is join behavior. Standard question: what does LEFT JOIN return? Standard answer: all rows from the left table plus matching right-table rows, with NULLs for missing right-side matches.

Key Takeaways

SQL mastery depends on five concrete skills: classifying commands correctly, modeling data with keys and constraints, writing joins and aggregations at the right grain, using transactions for consistency, and applying indexes only when access patterns justify them.

For GATE and interviews, the most tested points are command categories, join outputs, WHERE versus HAVING, primary key versus foreign key, normalization forms, ACID properties, isolation anomalies, and RANK versus DENSE_RANK. Practise explaining the result shape before writing syntax.

The natural next step is revising Rank() function in SQL, because ranking questions are common in analytics, product, and placement interviews.


Further Reading