PostgreSQL Advanced Queries

rohit sharma
Feb 19, 2022
11 min read

Hello Folks,

Hope you all are doing well!

After a long time, I m again back with one more interesting topic which is "Advanced Queries in PostgreSQL Database".

As we are already familiar with basic queries and their use cases like SELECT, INSERT, UPDATE and DELETE along with aggregate functions like MAX, MIN, SUM.

Let's explore some advanced queries and their use cases.

(Note:-We are using Postgresql 10.19 For this you must require basic knowledge of queries and understand how GROUP BY and other aggregate functions works. If not you can check this LINK)

1) bool_and and bool_or query:-

As the name suggests it returns a boolean value (i.e.; true or false) based on given conditions inside bool_and and bool_or.

bool_and:- It works the same as our legendary AND conditions if all the conditions are true in that case it will return true otherwise false.
bool_or:- It works the same as our legendary OR conditions if any of the conditions are true in that case it will return true otherwise false.

Let's quickly jump into an example and check out how its works.

Let's quickly create a database and its respective tables with our beautiful commands.

postgres=# create database advanced_queries;
CREATE DATABASE

Let's create our legendary table named "users".

advanced_queries=# create table users (id serial primary key,gender char(1) not null,voted boolean not null);
CREATE TABLE
advanced_queries=# insert into users(gender,voted) values ('M',true);
INSERT 0 1
advanced_queries=# insert into users(gender,voted) values ('F',true);
INSERT 0 1
advanced_queries=# insert into users(gender,voted) values ('M',false);
INSERT 0 1
advanced_queries=# insert into users(gender,voted) values ('F',true);
INSERT 0 1
advanced_queries=# insert into users(gender,voted) values ('M',false);
INSERT 0 1

We have successfully created our table and inserted some values. Let's say we have a use case where we need to check if all the genders have been given votes or not by any gender or not. It can be done easily with the help of bool_and let's see how

advanced_queries=# select gender,bool_and(voted=true) AS "is every one voted" from users group by gender;
 gender | is every one voted 
--------+--------------------
 M      | f
 F      | t
(2 rows)

Voila! That's all and we have our result. Here the bool_and function will check each row condition if it's satisfied or not if any of the conditions don't get satisfied it will return as false. Here we are checking if the "voted" column is having the value true for all genders or not.

Suppose now we have a use case where we need to check if any of the genders have given at least a single vote. It can be easily done with the help of bool_or and let's see how

advanced_queries=# select gender,bool_or(voted=true) AS "is any one voted" from users group by gender;
 gender | is any one voted 
--------+------------------
 M      | t
 F      | t
(2 rows)

That's it! You can pass any valid conditions and get a boolean value; For example, you can pass if everybody has an age greater than i.e. (bool_and(age>100))

2) GROUP BY ROLLUP:-

It's similar to our traditional GROUP BY query but by adding ROLLUP it's gonna add one more row and give complete statics of a particular set. Let's understand with a quick example.

Let's create our "items" table with the following attributes.

advanced_queries=# create table "items"(id serial primary key,item_name varchar(50) not null,quantity integer not null,price decimal(6,2) not null,category_id integer not null);
CREATE TABLE

Let's insert some data in a table with the respective values.

advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Hair Oil',5,10,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Soya Oil',90,100,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Fortune Oil',100,70,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Oil',67,70,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Face wash',45,78,2);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Mama Earth Face wash',34,67,2);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Mama Earth beard wash',48,69,2);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Himalaya scrub wash',56,89,2);
INSERT 0 1

Let's say our use case we need to find the minimum price, maximum price, minimum quantity, maximum quantity, and item name along with each category. Quickly write our beautiful query.

advanced_queries=# select item_name,category_id,max(price) AS "maximum price",min(price) AS "minimum price",max(quantity) AS "maximum quantity",min(quantity) AS "minimum quantity" FROM items group by category_id,item_name ORDER BY category_id,item_name;

Above is our result set.

Let me explain the above query functions in reverse order

ORDER BY category_id,item_name:- This will sort the result set in ascending order first by category_id and after that by item name.
group by category_id,item_name:- As in our table, there are items that are associated with the same category id so by using group by we are grouping the same items having the same category_id in individual groups then after grouping with category_id we are now grouping by item_name.
max(quantity):- After grouping, we are finding the maximum quantity from the group sets
min(quantity):- After grouping, we are finding the minimum quantity from the group sets
min(price):- After grouping, we are finding the minimum price from the group sets
max(price):- After grouping, we are finding the maximum price from the group sets

Suppose we want to find out what is the aggregated minimum price, maximum price, minimum quantity, maximum quantity along with each category. Here comes the group by rollup in the show. Let's see how we can achieve this.

advanced_queries=# select item_name,category_id,max(price) AS "maximum price",min(price) AS "minimum price",max(quantity) AS "maximum quantity",min(quantity) AS "minimum quantity" FROM items group by rollup (category_id,item_name) ORDER BY category_id,item_name;

That's it we just need to replace group by with group by rollup along with the parenthesis.

If you look closely after the "item_name" "Soya Oil" there is a result that has no item_name but has other attributes that are the row that shows the aggregated minimum price, maximum price, minimum quantity, maximum quantity of category id 1. and for category id 2 its just after "item name" "Mama Earth Face wash". And the last rows which don't have category_id, as well as item_name, is having aggregated minimum price, maximum price, minimum quantity, the maximum quantity of both category_id 1 and 2.

3) GROUP BY CUBE:-

It's the same as GROUP BY ROLLUP and but the returning result set is a bit different from GROUP BY ROLLUP. Let's take our previous table and see how it's different.

advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Mama Earth beard wash',48,69,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Mama Earth beard wash',48,70,2);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Himalaya scrub wash',70,89,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Mama Earth Face wash',34,67,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Mama Earth Face wash',39,80,2);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Himalaya scrub wash',70,89,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Himalaya scrub wash',74,89,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Himalaya scrub wash',74,879,1);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Mama Earth Face wash',56,80,2);
INSERT 0 1
advanced_queries=# insert into items(item_name,quantity,price,category_id) values ('Mama Earth Face wash',56,90,2);
INSERT 0 1

and now write the GROUP BY CUBE query

If you see the above result set it's the same as GROUP BY ROLLUP one the only difference is along with each category aggregate results at last along with each product name. So if you see after the 7th row in the result set it's showing the product_name and its aggregated minimum price, maximum price, minimum quantity, the maximum quantity regardless of category. If you have a use case where you need to check the aggregated results of all the entries in grouped sets you can use GROUP BY CUBE.

4) FILTER Function:-

Suppose we have a use case where you need to categorize in the price range along with category id after grouping we can do this easily by using the FILTER function.

Let's see how

advanced_queries=# select item_name,category_id,max(price) AS "maximum price",min(price) AS "minimum price",max(quantity) AS "maximum quantity",min(quantity) AS "minimum quantity",count(id) filter(where price between 0 and 50) AS "price range from 0 to 50",count(id) filter(where price between 51 and 100) as "price range from 51 to 100",count(id) filter (where quantity between 0 and 100) as "quantity from 0 to 100" FROM items group by (category_id,item_name) ORDER BY category_id,item_name;

Let's breakdown each query

ORDER BY category_id,item_name:- This will sort the result set in ascending order first by category_id and after that by item name.
group by category_id,item_name:- As in our table, there are items that are associated with the same category id so by using group by we are grouping the same items having the same category_id in individual groups then after grouping with category_id we are now grouping by item_name.
max(quantity):- After grouping, we are finding the maximum quantity from the group sets
min(quantity):- After grouping, we are finding the minimum quantity from the group sets
min(price):- After grouping, we are finding the minimum price from the group sets
max(price):- After grouping, we are finding the maximum price from the group sets
count(id) filter(where price between 0 and 50):- Here we are calculating the count of items whose price range lies between 0 to 50 from the grouping sets
count(id) filter(where price between 51 and 100):- Here we are calculating the count of items whose price range lies between 51 to 100 from the grouping sets
count(id) filter (where quantity between 0 and 100):- Here we are calculating the count of items whose quantity range lies between 0 to 100 from the grouping sets

Voila! this is our result set!

5) WINDOW FUNCTIONS:-

Windows functions in PostgreSQL is a very powerful function that helps us to perform complex operations easily.

a) Partition By Columns

Suppose we have a use case where we want the average price of each category that we have in our table. Let's see how we gonna do it via the window function.

Below is the following query

advanced_queries=# select item_name,category_id,avg(price) over (partition by category_id) from items;
      item_name       | category_id |         avg          
----------------------+-------------+----------------------
 Mama Earth Face wash |           1 | 245.2000000000000000
 Himalaya scrub wash  |           1 | 245.2000000000000000
 Himalaya scrub wash  |           1 | 245.2000000000000000
 Himalaya scrub wash  |           1 | 245.2000000000000000
 Himalaya scrub wash  |           1 | 245.2000000000000000
 Mama Earth Face wash |           2 |  80.0000000000000000
 Mama Earth Face wash |           2 |  80.0000000000000000
 Mama Earth Face wash |           2 |  80.0000000000000000
 Mama Earth Face wash |           2 |  80.0000000000000000
 Mama Earth Face wash |           2 |  80.0000000000000000
(10 rows)

That's it! Let's break down the query and understand what it's doing exactly. The avg(price) is an aggregate function that is helpful to find the average of column "price". After that, we have the user the over (partition by category_id). The "over" function binds the aggregrate function with the window function. And the partition defines on behalf of which column we want to partition and calculate the average price.

We can also calculate the min, max by creating aliases too. Let's see how

advanced_queries=# select item_name,category_id,avg(price) over (partition by category_id),min(price) over (partition by category_id),max(price) over (partition by category_id) from items;
      item_name       | category_id |         avg          |  min  |  max   
----------------------+-------------+----------------------+-------+--------
 Mama Earth Face wash |           1 | 245.2000000000000000 | 80.00 | 879.00
 Himalaya scrub wash  |           1 | 245.2000000000000000 | 80.00 | 879.00
 Himalaya scrub wash  |           1 | 245.2000000000000000 | 80.00 | 879.00
 Himalaya scrub wash  |           1 | 245.2000000000000000 | 80.00 | 879.00
 Himalaya scrub wash  |           1 | 245.2000000000000000 | 80.00 | 879.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
(10 rows)

As we can see we need to write partition by category_id multiple times we can also create an alias for the same instead of rewriting let's see how!

advanced_queries=# select item_name,category_id,avg(price) over (category_partition),min(price) over (category_partition),max(price) over (category_partition) from items window category_partition as (partition by category_id);
      item_name       | category_id |         avg          |  min  |  max   
----------------------+-------------+----------------------+-------+--------
 Mama Earth Face wash |           1 | 245.2000000000000000 | 80.00 | 879.00
 Himalaya scrub wash  |           1 | 245.2000000000000000 | 80.00 | 879.00
 Himalaya scrub wash  |           1 | 245.2000000000000000 | 80.00 | 879.00
 Himalaya scrub wash  |           1 | 245.2000000000000000 | 80.00 | 879.00
 Himalaya scrub wash  |           1 | 245.2000000000000000 | 80.00 | 879.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
 Mama Earth Face wash |           2 |  80.0000000000000000 | 80.00 |  80.00
(10 rows)

if you look closely we have created the alias of the statement "partition by category_id". by using the "window" keyword. "category_partition" is the alias name that you wanna create you can use anything you like "ABC", "XYZ", its totally fine. Once we have created the partition alias we are using along with our aggregate functions i.e.; avg(price) over (category_partition)

b) ORDER BY COLUMNS

Suppose we have a case where we need to do a consecutive summation of the given attributes. In this case, we can take the help of "ORDER BY" along with the aggregate function.

advanced_queries=# create table "numbers" (num integer not null);
CREATE TABLE
advanced_queries=# insert into "numbers"(num) select i from generate_series(1,10) as t(i);
INSERT 0 10

advanced_queries=# select num, sum(num) OVER (ORDER BY num) AS "ordered sum" from numbers;
 num | ordered sum 
-----+-------------
   1 |           1
   2 |           3
   3 |           6
   4 |          10
   5 |          15
   6 |          21
   7 |          28
   8 |          36
   9 |          45
  10 |          55
(10 rows)

If we look at the above query we have got the consecutive sums the row 4th is having the sum of the previous three rows and row 8 is having the sum of the previous 7 rows and so on.

We can also control how many rows we want to include in one go.


advanced_queries=# select num, sum(num) OVER (ORDER BY num rows between 0 preceding AND 2 following) AS "3 rows" from numbers;
 num | 3 rows 
-----+--------
   1 |      6
   2 |      9
   3 |     12
   4 |     15
   5 |     18
   6 |     21
   7 |     24
   8 |     27
   9 |     19
  10 |     10
(10 rows)

If we see the above-highlighted part we are defining the rows precedence in our case we saying "hey query buddy please include 0 previous rows and 2 forward rows including your position value from each position". So at row number 1, we have no previous row but we have 2 forward rows value that is "3+2" which gives us 5, and hence we are currently at position 1 having the value "1" so the overall value will be "6" and so on.

c) RANK AND DENSE_RANK FUNCTIONS

Suppose we need to find out the ranking from the table based on some criteria then, in that case, we can take the help of the RANK and DENSE_RANK functions both do the same job i.e.; assigning ranking based on column attributes but their ordering methodology is different.

Let's see how

advanced_queries=# select price,rank() over (order by price) AS "rank",dense_rank() OVER (ORDER BY price) AS "dense_rank" from items;
 price  | rank | dense_rank 
--------+------+------------
  80.00 |    1 |          1
  80.00 |    1 |          1
  80.00 |    1 |          1
  80.00 |    1 |          1
  80.00 |    1 |          1
  80.00 |    1 |          1
  89.00 |    7 |          2
  89.00 |    7 |          2
  89.00 |    7 |          2
 879.00 |   10 |          3
(10 rows)

RANK():- The rank function assigns the same order number if the same value is found and instead of giving the next ranking it assigns the next row number. If you see in the "rank" column from first to the sixth row its assigns the same value i.e.; 1 but in the 7th row the rank is assigned with row number i.e; 7th and then at 10th row so on.
DENSE_RANK():- The rank function assigns the same order number if the same value is found and it assigns the next rank number. If you see in the "dense_rank" column from first to the sixth row its assigns the same value i.e.; 1 but in the 7th row the rank is assigned with the next rank number i.e; 2nd and then at 10th row so on.

d) ROW_NUMBER:-

Suppose we need to group the items having the same category and assign an ordered row number. If we have a case like this then we can take the help of the ROW_NUMBER function. Let's see how it gonna work.

advanced_queries=# select price,category_id,item_name,row_number() OVER(PARTITION BY item_name ORDER BY category_id) from items;
 price  | category_id |      item_name       | row_number 
--------+-------------+----------------------+------------
  89.00 |           1 | Himalaya scrub wash  |          1
 879.00 |           1 | Himalaya scrub wash  |          2
  89.00 |           1 | Himalaya scrub wash  |          3
  89.00 |           1 | Himalaya scrub wash  |          4
  80.00 |           1 | Mama Earth Face wash |          1
  80.00 |           2 | Mama Earth Face wash |          2
  80.00 |           2 | Mama Earth Face wash |          3
  80.00 |           2 | Mama Earth Face wash |          4
  80.00 |           2 | Mama Earth Face wash |          5
  80.00 |           2 | Mama Earth Face wash |          6
(10 rows)

If we see the highlighted part we are assigning the row number with the help of the "row_number()" function and grouping or partitioning it by "item_name" so the same item name will be grouped into each individual group and then we are ordering by "category_id".

And we can see row numbers are assigned to groups of individuals having the same name.

That's all for this blog let me know in comment section if you wanna learn about any other query or explanation.

Happy Coding! keep Learning!

PostgreSQL Advanced Queries

Recent Posts

Comments