Feuertip #23: Let's get weird with pipelined table functions

25Aug

Feuertip #23: Let’s get weird with pipelined table functions

By: Steven Feuerstein On: August 25, 2021 In: APEX Developer Solutions, Feuertips Comments: 0

In Feuertip #22, I introduced a really special and useful type of function: the table function. It’s a function you can query from as if it were a relational table. How cool is that?

In Feuertip #23 (this one right below), I move on to an even more special type of table function: the pipelined table function.

I’ll refer to it as PTF for the duration of this post.

A PTF addresses two significant drawbacks to the “regular” table function:

The SQL engine has to stop and wait for the table function to execute and return its collection before continuing. That’s to be expected, but what if you are executing a parallel query? It can wreak havoc by causing serialization.
You build and return a collection. If you have lots of data, you have a big collection….which is stored in your session-specific memory, a.k.a., Process Global Area or PGA. These have limits. You could blow the limit and make users unhappy.

The basic idea behind a PTF is that instead of building and returning the collection, which is then traded like a table, you “pipe” back each piece of data to the calling query instead of adding it to the collection.

Let’s take a look at the differences between the table function and the PTF. Here’s a very simple demonstration:

create or replace type list_of_names_t is table of varchar2(100)
/

create or replace function big_brained_species return list_of_names_t 
is
   l_list list_of_names_t := list_of_names_t ();
begin
   l_list.extend(3);
   l_list(1) := 'Octopus';
   l_list(2) := 'Raven';
   l_list(3) := 'Human';
   return l_list;
end;
/

And here’s that same silly function in its pipelined version:

create or replace function big_brained_species return list_of_names_t
   pipelined
is
begin
   pipe row ('Octopus');
   pipe row ('Raven');
   pipe row ('Human');
   return;
end;
/

select column_value species_name
from table (big_brained_species ())
order by species_name desc
/

SPECIES_NAME
------------
Raven
Octopus
Human

And from this very basic example, you see all the key features of a PTF:

The function header must include the pipelined keyword
No need to declare a local collection (and consume PGA)
Use pipe row to send that data back to the calling query
Execute a return statement that returns control but no data

Of course, you can pipe out more than a scalar value. In fact, pipelined table functions usually return multiple columns of information. This is often done by relying on a nested table of object types.

Here’s an example:

create type ticker_ot as object (
   ticker     varchar2(20),
   pricedate  date,
   pricetype  varchar2(1),
   price      number
);
/

create type tickers_nt as table of ticker_ot;
/

create or replace package stock_mgr authid definer 
is
   type stocks_rc is ref cursor return stocks%ROWTYPE;
end stock_mgr;
/

create or replace function doubled_pl (
   rows_in stock_mgr.stocks_rc
) return tickers_nt
   pipelined
   authid definer
is
   type stocks_aat is
      table of stocks%ROWTYPE index by pls_integer;
   l_stocks stocks_aat;
begin
   loop
      fetch rows_in
        bulk collect into l_stocks limit 100;
      exit when l_stocks.COUNT = 0;
      for l_row in 1..l_stocks.COUNT 
      loop
         pipe row ( ticker_ot(l_stocks(l_row).ticker, 
                              l_stocks(l_row).trade_date, 
                              'O', 
                              l_stocks(l_row).open_price) );

         pipe row ( ticker_ot(l_stocks(l_row).ticker, 
                              l_stocks(l_row).trade_date, 
                              'C', 
                              l_stocks(l_row).close_price) );
      end loop;
   end loop;

   return;
end;
/

Performance and memory impact of PTFs

I don’t have a parallel-enabled database handy, but I can still show you the difference a PTF makes over a regular, old table function.

I’m not going to include all the code in this post. It’s available in the LiveSQL script down below. The “executive summary” is that I am using a table function to move rows from the stocks table to the tickers table, doubling the count of rows along the way. In my test block, I will simply add a “rownum < 10” where clause:

begin
   utils.initialize('Pipelined');
   insert into tickers
      select *
        from table ( doubled_pl(cursor(
                      select *
                        from stocks
                   )) )
       where rownum < 10;

   utils.show_results('First 9 rows');
   utils.initialize('Not Pipelined');
   insert into tickers
      select *
        from table ( doubled_nopl(cursor(
                      select *
                        from stocks
                   )) )
       where rownum < 10;

   utils.show_results('First 9 rows');
end;
/

Here are the results:

Pipelined
Ticker row count: 9
"First 9 rows" completed in: 18 centisecs; pga at: 327680 bytes
Not Pipelined
Ticker row count: 9
"First 9 rows" completed in: 517 centisecs; pga at: 1287847936 bytes

It should be clear what’s going on here: with the pipelined table function, each row is piped immediately back. So as soon as the calling query gets 9 rows, it stops executing the PTF and proceeds. With the non-pipelined version, the calling query has to wait and wait and wait while all 1M rows in stocks are doubled to 2M rows, then they are passed back to the query.

At which point the SQL engine says: “Just want those first 9, thanks!” and ignores all the rest. But oh, the price that is paid along the way …

That’s rather convincing, is it not? Lots less memory and lots faster to get the first 9 rows!

Always go pipelined?

So why not always use a pipelined table function? I can think of these possible reasons why not:

You can only call it from inside the from clause of a select statement. You cannot call it directly in PL/SQL.
You don’t need the pipelining (no parallel query, PGA consumption not an issue)
The pipe row etc. syntax is weird, and you don’t want future developers maintaining your code feeling weird about the code that they have to maintain
If a pipelined table function is slower than a regular table function

OK, remember, I said “possible,” so first let’s examine #4 – could it be that a PTF pays a performance penalty?

I modified the previous performance test block to repeatedly query from the pipelined and non-pipelined table functions.

declare
   l_iterations integer := 10000;
   l_count integer := 0;
begin
   utils.initialize('Pipelined');
   for indx in 1..l_iterations loop
      for rec in (
         select *
           from table ( doubled_pl(cursor(
                         select *
                           from stocks
                          where rownum < 100
                      )) )
      ) loop
         l_count := l_count + 1;
      end loop;
   end loop;
      dbms_output.put_line(l_count);

   utils.show_results(l_iterations|| ' invocations');
   utils.initialize('Not Pipelined');
   for indx in 1..l_iterations loop
      for rec in (
         select *
           from table ( doubled_nopl(cursor(
                         select *
                           from stocks
                          where rownum < 100
                      )) )
      ) loop
         l_count := l_count + 1;
      end loop;
   end loop;
      dbms_output.put_line(l_count);

   utils.show_results(l_iterations|| ' invocations');
end;
/

Here are the results:

Pipelined
1980000
Ticker row count: 0
"10000 invocations" completed in: 753 centisecs; pga at: 65536 bytes
Not Pipelined
3960000
Ticker row count: 0
"10000 invocations" completed in: 1004 centisecs; pga at: 56098816 bytes

Now, I confess this is a rough and incomplete performance test, but still, I believe it gives us sufficient confidence to say that you will not sacrifice performance by switching to pipelining, even if you do not benefit directly from the asynchronous return of data to the calling query.

How about the pipelining syntax being weird and off-putting to future developers?

First, I doubt it will be that much of an issue. Second, professional software developers should know how to make the most of the language they work with. If they see unfamiliar syntax, they should take it in stride and learn what that syntax does for them. In short order, it will be more than familiar enough.

PTF resources

Oracle Magazine intro to PTFs

ORACLE-BASE on PTFs

LiveSQL tutorial demonstrating performance benefit

Note: the performance code will not actually compile and run in LiveSQL since it uses some v$ views to show memory allocation. Specifically: v$mystat and v$statname

Tom Kyte’s blog

Tom never fails to come through. Here’s a blog post from 2010 that concisely explains the special no_data_needed exception that can be raised (silently, usually) by a pipelined table function.

Feuertips

Every other Wednesday at 11 am ET, I go live on YouTube and Facebook with Michelle Skamene to provide you with a not-to-be-missed PL/SQL tip and fun conversation. Sure, we record it so you can always watch later, but live is so much more rewarding!

One participant will be selected at random to choose from three organizations dedicated to the preservation of our planet and its biodiversity.

Insum will then make a donation of $25 to that group in your name.

What could be better than levelling up your PL/SQL tips and helping one of these worthy organizations? Join us every other Wednesday at 11!

Feuertip #23: Let’s get weird with pipelined table functions