Tuesday, February 1, 2022

SELECT * in the cloud or not


I was in a recent Tweeter debate about the use of SELECT * within CTEs (Common Table Expressions) in particular.  But really this goes beyond just using them in CTEs. 

 

Sure the * (star or splat as some folks say) is a nice short cut to get all the columns from a table (or set of tables).  But it can be a problem.  

 

For example, take this simple query:

 

select * from emp, dept where emp.deptno = dept.deptno;

 

This works just fine in SQL*Plus (or SQLcl).  There is a duplicate column name in the two tables DEPTNO.  SQL*Plus handles this just fine showing the same column twice.  SQLcl names the second occurrence of the column DEPTNO_1, below output is from SQL*Dev.  Knowing which column is from which table isn’t too hard to figure out (the columns are in order as defined in the tables) but it isn’t necessarily obvious to the casual observer. 

 

 


 

But put that same query in a CURSOR in a PL/SQL block and you got trouble:

 

declare 

   CURSOR EMP_CUR is

   select * from emp, dept where emp.deptno = dept.deptno;

begin

   FOR EMP_REC in EMP_CUR

   LOOP

   DBMS_OUTPUT.PUTLINE('Name: '||EMP_REC.ENAME);

   END LOOP;

end;

/

 

Error report -

ORA-06550: line 5, column 19:

PLS-00402: alias required in SELECT list of cursor to avoid duplicate column names

ORA-06550: line 5, column 4:

PL/SQL: Statement ignored

 

PL/SQL doesn’t like duplicate column names and will not run.  Bummer.  This can be a huge problem when working on a query and you’re embedding it into a testing script that is calling the SQL within a PL/SQL block.  And imagine a query that is 100s of lines with many subqueries.  This can be a nightmare to find the duplicate columns.  

 

Yes, I’ve had this exact problem in the past.  

 

More than once.  

 

Not fun. 

 

Then there is the issue of CTEs.  Why not have CTEs with SELECT * FROM? Even if we don’t have the duplicate column issue, this can make the code harder to debug.  It can be difficult to figure out “where did this column come from?” Think of a set of CTEs that are, in effect, processed in a sequential fashion.  When each one has SELECT * FROM  as its main select list and has a prior CTE (or more than one) in the FROM.  It can be quite a trick to figure out where a column enters into the mix.

 

Also, this certainly can be wasteful, and might even be a problem.  The issue is memory.  Each CTE is often materialized, meaning it becomes a structure that is akin to a temporary table.  This table like structure takes up space in the PGA (Program Global Area, not Professional Golf Association). 

 

It may not seem like much but each column takes up space within the row, and that space is multiplied by the number of rows in the return set.  Sure, when I say do a SELECT * FROM the good old EMP table with only 9 columns and 14 rows, it doesn’t really make a difference if I were to leave out some columns.  

 

But we are rarely optimizing a query with tables that small, either in columns or rows.  Say you have a table with 80+ columns, some are the very popular VARCHAR2(4000), and many have at least some data in them, and to top it off, you don’t use any of them in your query.  With a CTE returning 100s of thousands into the millions of rows, that’s a lot of wasted space.  And that’s just one table. 

 

Also remember that it’s unlikely that many of our queries are just run by one person.  Take that query and have a couple 100 (or 1000) folks running it at more or less then same time.  This makes that extra (unnecessary) memory usage a problem. 

 

And just to add to the fun, there is the problem of someone altering the table to add a column (or columns) that your query wasn’t expecting which could break your code.  Typically, this will be an output issue even if the query still executes without error.  And how about they add a column that now causes a duplicate column name that wasn’t there before! Ouch! 

 

Of course dropping a column from a table that is used in your query is a problem whether naming the columns or using the star. 

 

But Ric! It’s just so easy to use SELECT * FROM!  Yes, I understand it many take a bit of work and time to get that column list tweaked just right. But it sure can make working on the query later easier and can save on memory which can be seriously helpful in the running of the SQL on the system over time.


I know you may be thinking something like "this code I'm writing is a short simple query, it's no big deal to use the  SELECT * FROM."  That might be true, but this is also about the habit, like using your blinkers on your car, best to get in the right habit of using the blinkers every time you make a turn.  It's better for everyone.  


2 comments:

  1. When i worked at a place that did code reviews, "Select * from" would get you a write up the first time and let go the second. You're not taking production down because you didn't want to list the columns in your query. Most IDEs will pop up or let you drag in the column list so those 2 seconds you save aren't going to fly

    ReplyDelete
  2. Ouch! Seems that might be a bit harsh, but it does get at the issue for sure. More often then not the query doesn't even need to get all the columns anyway. Sure get the entire list using the tool, but then cut out all the ones you don't need. Just copying all the columns and going with that doesn't solve the problem.

    ReplyDelete