MySQL5: Gentle introduction to stored procedures of MySQL5

Z PostgreSQL
Skočit na navigaci Skočit na vyhledávání

Translated by Věra Peterová

Google AdSense

I started to write this article before Christmas when I was excited about the fact that MySQL supported the stored procedures (SP). This quality was some time called the most expected and useful benefit of MySQL 5.0. Therefore a lot of articles about this topic appeared. As a surprise, after the first euphoria, the interest of SP rapidly failed. Unfortunately, the first versions of MySQL comprised a lot of mistakes which weren´t took out as fast as they should have been. And the implementation of stored procedures didn´t avoid the mistakes as well. When I wanted to finish the article, I had to wait for the elimination of mistake which made the recursive calling of procedure impossible. Again, in the version 5.1, the development of stored procedures is still difficult regarding the miserable diagnostics of mistakes and the number of uncorrected mistakes or missing qualities of the language. In regard of the fact that MySQL is practically the last O.S. database where the stored procedures are implemented, the language of stored procedures comes from ANSI SQL and at the point of syntaction it´s the most similar O.S. database to SQL/PSM. I have already mentioned that the integrity and quality of SP is not comparable to e.g. Postgres of Firebird. I understand that the priorities of MySQL are somewhere else: clustering, partitioning and replication. The examples which I give in this article are debugged in MySQL 5.1.9 and don´t have to work in other versions.

Introduction

The language of S.P. respects ANSI SQL (as e.g. DB2). Two years ago I complained on a lack of documentation about SP. Now there´s relatively enough documentation on the Internet but sometimes there are problems with quality and recency. Even the oficial documentation sometimes contains quite important mistakes. Besides, two articles on linuxsoft appeared within the serial about MySQL. There are two basic reasons we all agree on, and then other two reasons we probably wouldn´t agree on, why to use SP:

  1. speed – in some cases considerably higher speed of task processing (it depends on the ratio between processed and final data volumes) results from the fact that we minimalize data transmission between client and server (serialization, communication...). It´s significant especially when there are brakes such as ODBC or BDE between database and your application. Of course the higher speed of task processing must positively project in general permeability of the server. Hardly anybody would make stored procedure for numeric solution of integral. It wouldn´t be so bad but such foolishment has completely different reason – point no.4. Sometimes there appears a thesis that stored procedures are translated, precompiled. It generally doesn´t mean that the code would be translated into processor code but that we use so called prepared statements. New versions of Oracle where SP are really translated are an exception. MySQL doesn´t execute the implicit transformation of SQL commands on prepared SQL commands.
  2. safety – by means of stored procedures we can distribute our rights to tables (or data). How? We must define procedure with author´s rights (SECURITY DEFINER). In the second mode (SECURITY INVOKER) the procedure gets the invoker´s rights.
  3. adaptability and portability – while nearly all interested people agree on first two points, here first opinion disagreements begin. Stored procedures create level between data and client application. Usually this level isn´t absolut – sometimes one SELECT stays in the application. But this forgotten SELECT can do problems when you change table structure or in application portation. Portation and actualization of stored procedures aren´t so problematic as they could seem at first sight. The most painful limitation of stored procedures is absolut restriction of interaction with user (it harms some time untill you get used to it) – you can´t go away and run message box and ask the user if he means it really seriously. But, surprisingly, it leads to better-arranged and more readable code which can be easily modified and maintained. Furthermore, the noninteractive SP can be well tested automatically. There are diferences in language syntax but they don´t cause bigger problems (when you think about them). Of course there´s always something which is ported with difficulties. E.g. multirecordset (stacked recordset) is a bit hard generated by cursors in PostgreSQL or Oracle, and very easily generated as a result of all available SELECTs in MySQL or T-SQL.
  4. architecture – we definitely can´t agree on this. Basically it´s about the ratio between client´s application code and stored procedures. I am an extremist, I see the optimum somewhere around zero. The opposite extreme is degradation database on mere data storage (the value goes to infinity). This issue is a bit wider, it relates to application (in)dependence on database. The more I want to exploit the opportunities of some system, the more dependent I become and the more I need specialists. Logic which we move to the database can be used by everyone who has the access to the database – such distributed objects. As the database code gets bigger, the application code gets smaller – we gain better-balanced application. On the other hand, it often goes to the fact that we must duplicate some parts of the code. E.g. web – form contains code for ergonomic values entering and data check, and the same check is in the database. But this is not insuperable problem as well. Furthermore, the majority of modern RDBMS provides at least one possibility of sharing libraries between SP and other server side applications (Yukon ~ dot NET, Oracle ~ Java, PostgreSQL ~ Perl, Python, Php, MsSQL2k – COM).

The support of stored procedures in MySQL5 means that everywhere where this database is installed, there is available some minimal environment for running our SQL/PSM scripts. So, we can solve many tasks right in MySQL without installation of another additional software – testing data generating, data filtration and transformation etc.

I suppose that reader knows what are stored procedures. If not – they are scripts, activated by database system in reaction on direct or indirect demand of database system user. Against common scripts they vary in source code storage in the database and in exploitation of specific functional interface which enables access to internal functions of database system. Nevertheless, not always the code has to be saved in database (PL/Java) and not always the API database has to be exploited (pl/sh). Each stored procedure has its name, list of arguments and a body containing SQL commands, local variable declaration, error handling, loops and conditions etc. There´s a difference between procedures and functions. Usually we must respect some restrictions when designing functions, but we can use these functions in the command SELECT.

Since the time of K&R, each manual has to begin with program helloworld.

1  DELIMITER //
2  SET sql_mode=ansi//
3  CREATE PROCEDURE hello(IN whom varchar(20)) SELECT 'hello '||whom; //
4  DELIMITER ;
5  SET sql_mode='';
6
7  CALL hello('world');

I skipped the zero variant of function without arguments. Notice: a) I enter the code of stored procedure by command into DDL database, b) I can run the stored procedure by any mysql client, no matter if it´s mysql support in php or in interactive console or in phpmysqladmin. In this case the output from the procedure is realized by a command SELECT (there can be more SELECTs, and then the result is multirecordset). Another possibility is using OUT parametres – an analogy of passing parametres by link in classical programming languages.

1  DELIMITER //
2  SET sql_mode=ansi//
3  CREATE PROCEDURE hello(IN whom varchar(20), OUT res varchar(40)) SET res='hello '||whom; //
4  DELIMITER ;
5  SET sql_mode='';
6
7  CALL hello('world', @x);
8  SELECT @x;
9
10 DROP PROCEDURE hello;

SQL commands in stored procedures must be ended by semi-colon. But this symbol is also used in console as symbol of the end of SQL command. If we want to write stored procedure in console, we have to define another symbol for the end of SQL command – for this there´s a command DELIMETER [1]. Ansi_mode MySQL will respect ANSI SQL script by setting a global variable. In initial mode we don´t have the operator ||. Even when I cancel the ansi_mode [5], the procedure runs correctly. Why? The actual MySQL configuration is saved together with the procedure code. Unfortunately here is a mistake again, or at least some inconsistence. We must set the system variable max_sp_recursion_depth separately during each log in the database. We remove the procedure by command DROP PROCEDURE [10].

We mark global (session) variables by prefix @. We don´t declare them, we just give a value to them [7]. They are really variables on server side so you can use them in each interface. We use the global variables not only for the display of results of stored procedures. Very often the SQL commands follows each other and are linked by a value which we can get by first SELECT. By global variables I can eliminate parametrization (assembling) of SQL on the client side (there´s a risk of wrong conversion between server and targer interface). In MySQL5 everything stays on server side, I send only SQL commands. Inter alia the global variables can be used for rows numbering. Like in T-SQL the prefix @@ is used for system variables.

1  SET @r = 0;
2  SELECT @r := @r + 1, tab.* FROM tab

PHP will be the most frequent client of MySQL which is the reason for illustration of calling the stored procedure from this interface (PHP5). The examples of Perl and C# will be introduced later. PHP5 contains new API for MySQL which accesses the qualities of version 4.11 and higher – among others bound input and output variables and prepared commands (for clarity I don´t show the errors check).

1  $mysqli = new mysqli($host, $user, $password, $db);
2  $stmt = $mysqli->prepare("CALL Hello(?, @x)");
3  $stmt->bind_param("s", $param); /* s as string */
4  $param = "world";
5  $stmt->execute();
6  $stmt = $mysqli->prepare("SELECT @x");
7  $stmt->execute();
8  $stmt->bind_result($col1_x);
9  if ($stmt->fetch())
10		printf("%s ", $col1_x);
11 $stmt->close();
12 $mysqli->close();

Command bind_param in the row [3] creates relation between first command parameter and the variable $param. The first argument of this method is format string – one symbol, one variable, determining the conversion. There are for posibilities available: i – integer, d – double, s – string, b – blob. Command bind_result in the row [8] creates relation between first column of the output and the variable $col1_x. Only for completeness I supply that the variable $param is not read until command execute [5], and fills $col1_x by command fetch [9]. Bound variables together with prepared commands are very effective against SQL injection.

Cross configuration generating

The language of stored procedures is procedural language as all others – C, Perl, Modula or ADA. We can find there declaration of variables, loops, conditions. A rarity is nonexistence of input/output functions which is a tax for transfer on server (also it´s not possible to wait for user interaction in stored procedures – the time on server is too expensive). Another rarity, surely surprising, is integrated SQL. It took some time till I started to like it. It´s my own opinion which I don´t push in anyone, but because SQL commands aren´t broken by commas and closed into function parametres, they are much more readable. And I´m not speaking about the fact that you don´t have to learn anything new: you´ve already known functions and data types and you´ve used them in SQL. Then there are only conditions, loops and variables are – a stick which keeps SQL commands together. On the internet you can find a lot of simple examples on which you can clearly see syntactic representation of some construction but no benefit against nonprocedural style. One of nice examples I found is a procedure generating cross configurations.

I take the example from Roland Bauman´s blog (I recommend you to read the blog). The bloggers were excited about MySQL5 – no wonder. Against MySQL4 it´s really big jump, generation gap, and moreover it´s supported by relatively good documentation. Basically all MySQL5 features are properly described including stored procedures. I must appreciate their support of users and how they keep and build a community. I modified the code a little bit by rewriting into ANSI SQL.

What´s going on? I show the procedure (besides, it´s really smart, old, general procedure how to make cross table without need of system support):

CREATE TABLE employees (
    id INT auto_increment PRIMARY KEY,
    shop_id INT,
    gender ENUM('m', 'f'),
    name VARCHAR(32),
    salary INT
);

CREATE TABLE shops (
    shop_id INT auto_increment PRIMARY KEY,
    shop VARCHAR(32)
);

INSERT INTO shops (shop)
VALUES ('Zurich'), ('New York'), ('London');

INSERT INTO employees (shop_id, gender, name, salary)
VALUES
(1, 'm', 'Jon Simpson', 4500),
(1, 'f', 'Barbara Breitenmoser', 4700),
(2, 'f', 'Kirsten Ruegg', 5600),
(3, 'm', 'Ralph Teller', 5100),
(3, 'm', 'Peter Jonson', 5200);

We get very easily a table of sale in London or Zurich. But it´s a bit more complicated to get the dependence about the amount of sale according to a city or dealer´s sex (columns: f, m, total; rows: cities).

SELECT shop, 
		SUM(CASE gender WHEN 'f' THEN salary ELSE 0 END) AS f, 
		SUM(CASE gender WHEN 'm' THEN salary ELSE 0 END) AS m, 
		SUM(salary) AS total 
	FROM employees INNER JOIN shops USING (shop_id)
	GROUP BY shop

The first half of the work is done by aggregation (GROUP BY), i.e. separation of data into groups according to cities. Other work we must do ourselves. In SQL we can´t manage that the „wrong“ values won´t appear in the column ‘f’. But we are able to secure that the values won´t be counted and will be ignored – CASE gender WHEN ‘f’ THEN salary ELSE 0 END. Roland´s procedure makes that SELECT is generated and executed in this form. Control question: how to modify this demand in order to display numbers (amount) instead of sums?

1  DELIMITER //
2  SET sql_mode=ansi //
3  CREATE PROCEDURE xtab3(dimx_name VARCHAR(32), dimx_source VARCHAR(32),
                                 dimy_name VARCHAR(32), dimy_source VARCHAR(256),
                                 expr VARCHAR(32))
4  BEGIN
5          SET @col_list_expr = 'SELECT  GROUP_CONCAT(DISTINCT '
                || '\'SUM(CASE ' || dimx_name || ' WHEN \'\'\'||'
                || dimx_name || '||\'\'\' THEN ' || expr || ' ELSE 0 END) '
                || ' AS  \'\'\'||'||dimx_name||'||\'\'\'   \')INTO @col_list ' || dimx_source;
                
6          PREPARE col_list_pc FROM @col_list_expr;
7          EXECUTE col_list_pc;
8          DEALLOCATE PREPARE col_list_pc;

9          SET @xtab_expr = 'SELECT '||dimy_name||','||@col_list
                || ', SUM('||expr||') AS Total '|| dimy_source ||' GROUP BY '||dimy_name;
                                
10        PREPARE xtab FROM @xtab_expr;
11        EXECUTE xtab;
12        DEALLOCATE PREPARE xtab;

13  END //
14  SET sql_mode=''//
15  DELIMITER ;

[5] Usage of global variable strikes the eye here. In this way we evade a mistake which makes using local variable in command PREPARE impossible. Generating of SELECT by another SELECT is the fastest but also the least readable way of solution of this problem, as you can see on your own. [7, 11] Execution of prepared command – if SQL command is generated during the procedure running, it´s called dynamic command. [7] It generates a list of parameters, [9] and adds a description and column Total to the list. Dynamic commands must not contain local variables which are out of sight. But they can contain global variables in places of parametres or in the part INTO. Dynamic SQL command must be compiled before initialization – command PREPARE [6, 10]. Generated SELECT will be saved in global variable @xtab_expr (SELECT @xtab_expr);

mysql> CALL xtab3('gender','FROM employees', 'shop','FROM employees INNER JOIN shops USING (shop_id)','salary');
+----------+-------+------+--------+
| shop     | m     | f    | Total  |
+----------+-------+------+--------+
| London   | 10300 |    0 |  10300 |
| New York |     0 | 5600 |   5600 |
| Zurich   |  4500 | 4700 |  10200 |
+----------+-------+------+--------+
3 rows in set (0,00 sec)

Query OK, 0 rows affected (0,00 sec)

Although this procedure does some work, it contains no condition and no loop. Similar functions can´t compensate specialized tools and neither they try it. In cases they are enough, they save to you the work with OLAP configuration. I searched for similar examples of clever stored procedures on the Internet but I didn´t find it. Neither I found commercial libraries - that surprised me. I hoped naively that the situation would get better with the coming of MySQL5. If you know about some useful and nice procedure or a pack, tell me. Note: There´s a very few good libraries of stored procedures (not only for SQL).

Index optimalization

I found one nice and simple procedure on Markus Popp´s blog. MySQL enables to set the length of index. It´s not practical to index strings in their whole length. Mostly we have sufficient ratio of selectivity after first n (n < 20) signs. But how to find out the optimal length? Markus comes with a ratio (he calls it uniqueness) which is computed as ratio between the number of unambiguously identifiable rows and total number of rows.

	
SELECT count(distinct left(field_name, indexed_length)) / 
   count(*) * 100 FROM table_name;

I modified Marcus´s code a bit. The code uses one temporary table which is created in procedure. In the end select from this table is executed. This technique is often used in Microsoft SQL Server. But there´s one little difference. In T-SQL the temporary table has limited lifetime on transaction in which it was created. In MySQL there´s nothing similar so we have to remove the temporary table explicitly before termination.

1  DELIMITER //
2  SET sql_mode=ansi //
3  DROP PROCEDURE IF EXISTS getUniqueness //

4  CREATE PROCEDURE getUniqueness(IN _table VARCHAR(255),
5  	IN _column VARCHAR(255), IN _noFrom INT, IN _noTo INT)
6  BEGIN
7  	DROP TEMPORARY TABLE IF EXISTS tt_uniqueness;
8	CREATE TEMPORARY TABLE tt_uniqueness
9		(noChar int unsigned not null, uniqueness decimal(10,2) not null);

10	SET @sql = 'insert into tt_uniqueness ' 
11		|| 'select ?, cast(count(distinct left(' || _column || ', ?)) / count(*) * 100 as decimal(10,2)) from '
12		||  _table;
	
13	PREPARE pSql FROM @sql;
14	SET @count = _noFrom;
	
15	WHILE @count < _noTo DO
16		EXECUTE pSql USING @count, @count;
17		SET @count = @count + 1;
18	END WHILE;
	
19	SELECT * FROM tt_uniqueness;
20	DROP TEMPORARY TABLE tt_uniqueness;
21  END//
22  DELIMITER ;

What´s going on here? Again we use dynamic SQL command because the name of the table can´t be programmed in static SQL command [10]. Generated dynamic command uses bound variables – symbol ?, and phrases USING in command EXECUTE [16]. Language PSM (Persistent Stored Module) in MySQL doesn´t contain loop FOR (ANSI SQL3/PSM contains it – for the iteration over the table) so we have to work with classical WHILE, END WHILE – actually it´s the first example of loop construction [15]. By free SELECT we get the result [19].

mysql> CALL getUniqueness('phpbb_posts_text', 'post_text', 4, 10);
+--------+------------+
| noChar | uniqueness |
+--------+------------+
|      4 |      51.18 |
|      5 |      59.41 |
|      6 |      67.94 |
|      7 |      71.47 |
|      8 |      78.82 |
|      9 |      84.12 |
|     10 |      85.00 |
+--------+------------+
10 rows in set (0.03 sec)

I add an example of calling this procedure in Perl (again it´s short cut of errors check) by the library DBI.

use strict; use DBI;
my $dbh->DBI->connect(' ', {RaiseError => 1, AutoCommit => 0});
my $sth = $dbh->prepare("call getUniqueness(?,?,?,?)");
$sth->bind_param(1, $tbname, $DBI::SQL_VARCHAR);
$sth->bind_param(2, $colname, $DBI::SQL_VARCHAR);
$sth->bind_param(3, $nofrom, $DBI::SQL_INTEGER);
$sth->bind_param(4, $noTo, $DBI::SQL_INTEGER);
$sth->execute();
$sth->finish();
$dbh->disconnect();

Recursion

When I first met with recursion, I didn´t understand at all what´s going on. Most of compilers and interpreters didn´t support it and people usually looked for nonrecursive solutions. Old times – MicroBases of Pascal, TurboPascal – I am glad I can think back on it. Although I avoided something – the times of punched cards, computers Ural. Era when computation of salaries for ten thousand people took whole night and it was a great adventure for operators. I heard that somewhere one older lady looked on the result and if something didn´t seem very well, it had to be computed again until she was satisfied. Now I´m sitting at super notebook, maybe 1000x faster than CP/M on my Didaktic, and I would change immeadiately. End of nostalgia. Recursion is here. It´s not abstract construction, you can have a look around you. One of database systems branches, living up to this day, but not so striking ( no matter what a few fanatics think), respects it and supports very effective work with trees. They are so called network databases. Relational databases are far not so well-equipped for recursion (there are two distributions – Oracle and ANSI which solve this problem by some means). Open Source databases are 100% unprepared. There´s a patch for PostgreSQL (supporting both syntaxes: Oracle, ANSI) but it was refused. Recursion should be supported by Firebird 3.0. What´s going on? Mostly on task of researching the tree in width or depth. Without the support on SQL level we must solve this problem on application level or by stored procedures. Remark: There are at least three solutions of storage of recursive data, when we can get a list of descendents without recursive calling of procedure.

Implicitly, MySQL doesn´t allow recursive calling of procedures. But this is not a problem. We can easily change the system variable max_sp_recursion_depth:

  
set @@max_sp_recursion_depth=32;

In SQL/PSM, the exceptions trapping plays an important role. It´s a specificity of this language and the way how the exceptions are handled is specific too. Unlike the most of other languages where constructions a la structured exceptions are used, SQL/PSM uses error handlers – subroutines which are activated during an error. It´s possible to catch both concrete and arbitrary errors. But I don´t recommend it because in MySQL there hasn´t been a method how to detect the reason of an error so you get to know nothing about the error. E.g. iteration over the result of demand is solved in SQL/PSM in the following way:

1   BEGIN
2     DECLARE done BOOLEAN DEFAULT false;
3     DECLARE v_a, v_b VARCHAR(20);
4     DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = true;
5     DECLARE c CURSOR FOR SELECT a,b FROM mytab;
6     OPEN c;
7     FETCH c INTO v_a, v_b;
8     WHILE NOT done DO
9       FETCH c INTO v_a, v_b;
10    END WHILE;
11    CLOSE c;
12  END;    

In the case the cursor encounters the end of the set in command FETCH [7, 9], the service of error NOT FOUND is activated. In this case it´s so called CONTINUE HANDLER, i.e. the error handling is run (set of variable done [4]) and it continues with following command [8, 10]. Command FETCH reads one row from input set of opened [6] cursor. Unloaded columns get stored into local variables v_a and v_b. I add, in MySQL the cursors are used for sequential access to data. Opened cursor is released by command CLOSE [1].

Functionally same code demonstrating the loop REPEAT UNTIL:

1   BEGIN
2     DECLARE done BOOLEAN DEFAULT false;
3     DECLARE v_a, v_b VARCHAR(20);
4     DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = true;
5     DECLARE c CURSOR FOR SELECT a,b FROM mytab;
6     OPEN c;
7     REPEAT
8       FETCH c INTO v_a, v_b;
9       IF not done THEN
          ...
10      END IF;
11    UNTIL done END REPEAT;
12    CLOSE c;
13  END; 

Lingual purists and experts might argue which option is more elegant and better. I think they both are the same. The best option is that one which SQL hasn´t supported yet – using of construction FOR. Following command is a modification of an example from company documentation of Petr Gultuzan. I removed only nonlogical errors trapping which is the style I´m really not about to promote in public. Following two procedures make a list of descendents of entered item. At first we create a temporary table [8] which fills with recursive calling of second procedure [12]. Then a free SELECT of this temporary table [14] is executed (i.e. by this way we get the result out of the procedure) and the temporary table is deleted [15]. Second procedure isn´t no robust. If the data didn´t represent the tree, it would come to wrong result or circular error. Second procedure iterates over descendents [28] – stores the record into temporary table (table with results [32]) and recursively initializes itself for search for descendents [33].

1   DELIMITER //
2   DROP PROCEDURE IF EXISTS hierarchy//
3   CREATE PROCEDURE hierarchy (start_with CHAR(10))
4   BEGIN
5     DECLARE v_person_id, v_father_id INT;
6     DECLARE v_person_name CHAR(20);
7     DROP TABLE IF EXISTS Temporary_Table;
8     CREATE TEMPORARY TABLE Temporary_Table (
        person_id INT,
        person_name CHAR(20),
        father_id INT,
        level INT
      );
9     SELECT person_id, person_name INTO v_person_id, v_person_name
        FROM Persons WHERE person_name = start_with LIMIT 1;
10    IF NOT v_person_id IS NULL THEN
11      INSERT INTO Temporary_Table VALUES
          (v_person_id, v_person_name, v_father_id, 0);
12      CALL hierarchy2(v_person_id, 1);
13    END IF;
14    SELECT person_id, person_name, father_id, level 
        FROM Temporary_Table ORDER BY level;
15    DROP TEMPORARY TABLE Temporary_Table;
16  END; //
17  DELIMITER ;

18  DELIMITER //
19  DROP PROCEDURE IF EXISTS hierarchy2//
20  CREATE PROCEDURE hierarchy2(start_with INT, level INT)
21  BEGIN
22    DECLARE v_person_id, v_father_id INT;
23    DECLARE v_person_name CHAR(20);
24    DECLARE done BOOLEAN DEFAULT FALSE;
25    DECLARE c CURSOR FOR
        SELECT person_id, person_name, father_id
        FROM Persons WHERE father_id = start_with;
26    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
27    OPEN c;
28    REPEAT
29      SET v_person_id=NULL;
30      FETCH c INTO v_person_id, v_person_name, v_father_id;
31      IF done=FALSE THEN
32        INSERT INTO Temporary_Table VALUES
            (v_person_id, v_person_name, v_father_id, level);
33        CALL hierarchy2(v_person_id,level+1);
34      END IF;
35    UNTIL done END REPEAT;
36    CLOSE c;
37  END; //
38  DELIMITER ;

To initialize the code we need test data:

CREATE TABLE Persons (
  person_id INT,
  person_name CHAR(20),
  father_id INT
);
INSERT INTO Persons VALUES (1,'Grandpa',NULL);
INSERT INTO Persons VALUES (2,'Pa-1',1),(3,'Pa-2',1);
INSERT INTO Persons VALUES (4,'Grandson-1',2),(5,'Grandson-2',2);

After initialization we get the table:

mysql> call hierarchy('Grandpa');
call hierarchy('Grandpa');
+-----------+-------------+-----------+-------+
| person_id | person_name | father_id | level |
+-----------+-------------+-----------+-------+
| 1         | Grandpa     | NULL      | 0     |
| 2         | Pa-1        | 1         | 1     |
| 3         | Pa-2        | 1         | 1     |
| 4         | Grandson-1  | 2         | 2     |
| 5         | Grandson-2  | 2         | 2     |
+-----------+-------------+-----------+-------+
5 rows in set (0,00 sec)

Query OK, 0 rows affected, 1 warning (0,00 sec)

Database growth monitoring

The inspiration for following example was Oracle XE. I must say that www interface surprised me pleasantly. At previous versions I felt like nobody - apart from a few students and desperados - could work with them and that others rather used command prompt. One of the functions is displaying of table size and growth statistics. I made a stored procedure which displays ten largest tables and ten fastest growing tables. Besides, it´s also an illustration of eventual using of multirecordset. The first task is gaining and archiving data about table size. Generally these values are gained from system tables. ANSI SQL brings so called information schema which MySQL implements. Basically they are standardized views into system tables. Thanks to them one problem disappears in application portation – searching through database structure. Command SELECT table_name FROM information_schema.tables will work on PostgreSQL, MySQL and all other databases which respect ANSI SQL at this point.

1   DELIMITER //
2   DROP PROCEDURE IF EXISTS set_stat //
3   CREATE PROCEDURE set_stat (schma VARCHAR(64))
4   BEGIN
5     DECLARE done, cursor_done BOOLEAN DEFAULT FALSE;
6     DECLARE v_table_name VARCHAR(64);
7     DECLARE v_table_rows, v_data_length INTEGER;
8     DECLARE v_hist_rows, v_hist_length INTEGER;
9     DECLARE c CURSOR FOR
10      SELECT table_name, table_rows, data_length
        FROM information_schema.tables WHERE table_schema = schma;
11    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
12    IF (SELECT table_name FROM information_schema.tables
        WHERE table_name = 'size_log') IS NULL THEN
13      CREATE TABLE size_log (
          table_schema VARCHAR(64),
          table_name VARCHAR(64),
          table_rows INTEGER,
          data_length INTEGER,
          inserted TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        );
15    END IF;
16    OPEN c;
17    REPEAT
18      FETCH c INTO v_table_name, v_table_rows, v_data_length;
19      IF v_table_name <> 'size_log' THEN
13        SET cursor_done = done;
14        SET v_hist_rows = NULL;
15        SELECT table_rows, data_length INTO v_hist_rows, v_hist_length
            FROM size_log WHERE table_name = v_table_name AND table_schema = schma
            ORDER BY inserted DESC LIMIT 1;
16        SET done = false; -- it´s necessary to reset the variable
17        -- I add only new records or changes
18        IF (v_hist_rows IS NULL)
            OR (v_hist_rows IS NOT NULL
            AND v_table_rows <> v_hist_rows AND v_data_length <> v_hist_length) THEN
19          INSERT INTO size_log(table_schema, table_name, table_rows, data_length)
              VALUES(schma, v_table_name, v_table_rows, v_data_length);
20        END IF;
21     END IF;
22   UNTIL cursor_done END REPEAT;
23   CLOSE c;
24  END //
25  DELIMITER ;

Procedure set_stat stores tabel status of set schema. In case there´s no table size_log [12], [13] is created. Then the iteration over all tables [17] apart from table size_log. I compare actual data to archived values [18] and if something changed there, I save actual data about the table size [19]. Regarding that both command FETCH [18] and command SELECT INTO [15] can give me flag NOT FOUND (and indirectly content of variable done), I must back up the content of variable done [13] and reset it repeatedly [16]. Next, in the case that SELECT INTO ends unsuccessfully (NOT FOUND), the content of variables v_hist_rows and v_hist_length [15] is not rewritten. Because I use the variable v_hist_rows for detection if there’s already a record in table size_log, I have to set it on NULL [14].

1   DELIMITER //
2   DROP PROCEDURE IF EXISTS stat //
3   CREATE PROCEDURE stat (sort_by CHAR(1), schma VARCHAR(64))
4   BEGIN
5     DECLARE done, cursor_done BOOLEAN DEFAULT FALSE;
6     DECLARE v_table_name VARCHAR(64);
7     DECLARE v_table_rows, v_data_length INTEGER;
8     DECLARE v_hist_rows, v_hist_length INTEGER;
9     DECLARE c CURSOR FOR
10      SELECT table_name, table_rows, data_length
          FROM information_schema.tables WHERE table_schema = schma;
11    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = TRUE;
12    DROP TABLE IF EXISTS tmp_table;
13    CREATE TEMPORARY TABLE tmp_table (
        table_schema VARCHAR(64),
        table_name VARCHAR(64),
        table_rows INTEGER,
        data_length INTEGER
      ); 
14    IF (SELECT table_name FROM information_schema.tables
        WHERE table_name = 'size_log') IS NULL THEN
15      CREATE TABLE size_log (
          table_schema VARCHAR(64),
          table_name VARCHAR(64),
          table_rows INTEGER,
          data_length INTEGER,
          inserted TIMESTAMP DEFAULT CURRENT_TIMESTAMP
    );
16    END IF;
17    OPEN c;
18    REPEAT
19      FETCH c INTO v_table_name, v_table_rows, v_data_length;
20      SET cursor_done = done;
21      SET v_hist_rows = NULL;
22      SELECT table_rows, data_length INTO v_hist_rows, v_hist_length
          FROM size_log WHERE table_name = v_table_name AND table_schema = schma
          ORDER BY inserted DESC LIMIT 1;
23      SET done = false; -- it´s necessary to reset the variable
24      IF v_hist_rows IS NOT NULL
          AND v_table_rows <> v_hist_rows AND v_data_length <> v_hist_length THEN
25        INSERT INTO tmp_table VALUES(schma, v_table_name,
            v_table_rows - v_hist_rows, v_data_length - v_hist_length);
26      END IF;
27    UNTIL cursor_done END REPEAT;
28    CLOSE c;
29    CASE sort_by
30      WHEN 'r' THEN
31        SELECT table_schema, table_name, table_rows, data_length
            FROM information_schema.tables WHERE table_schema = schma
            ORDER BY table_rows DESC LIMIT 10;
32        SELECT * FROM tmp_table ORDER BY table_rows DESC LIMIT 10;
33      WHEN 'l' THEN
34        SELECT table_schema, table_name, table_rows, data_length
35          FROM information_schema.tables WHERE table_schema = schma
            ORDER BY data_length DESC LIMIT 10;
36        SELECT * FROM tmp_table ORDER BY data_length DESC LIMIT 10;
37    END CASE;
38    DROP TABLE tmp_table;
39  END; //
40  DELIMITER ;

In procedure stat there’s almost nothing more to comment. It takes ten largest tables directly from information schema. I get ten fastest growing tables dynamically as a difference of values from state of actual table and state stored in table size_log. I would mention only the fact that construction CASE in SQL/PSM is not the same as SQL command CASE. You can see the result in following extract:

Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1 to server version: 5.1.9-beta-log

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql> call stat('l','mysql');
call stat('l','mysql');
+--------------+---------------+------------+-------------+
| table_schema | table_name    | table_rows | data_length |
+--------------+---------------+------------+-------------+
| mysql        | help_topic    | 464        | 253564      |
| mysql        | help_keyword  | 384        | 75648       |
| mysql        | help_category | 37         | 21497       |
| mysql        | help_relation | 740        | 6660        |
| mysql        | proc          | 8          | 5852        |
| mysql        | size_log      | 32         | 984         |
| mysql        | db            | 2          | 880         |
| mysql        | user          | 4          | 224         |
| mysql        | Persons       | 5          | 145         |
| mysql        | fx2           | 7          | 49          |
+--------------+---------------+------------+-------------+
10 rows in set (0,26 sec)

+--------------+------------+------------+-------------+
| table_schema | table_name | table_rows | data_length |
+--------------+------------+------------+-------------+
| mysql        | fx2        | 1          | 7           |
+--------------+------------+------------+-------------+
1 row in set (0,27 sec)

Query OK, 0 rows affected, 1 warning (0,28 sec)

If I wanted to make the last example even better, I would set the automatic initiation of procedure set_stat by Event Scheduler. Event Scheduler (from the version 5.1) is an analogy of Jobs in SQL server or crontab in Unix. Event is a database object containing one or more SQL commands which are executed in specified time or repeatedly in set period. By default the event planner is blocked and we must permit it explicitely: SET @@global.event_scheduler = ON. You need the superuser´s rights to run this command. To try, I give an example which adds a record to table test every minute. Nevertheless, at least in my installation there´s an error in this function. The events got activated but in certain moment the associated SQL command was done several times (when I used one-second interval). With a minute interval there were no problems.

CREATE TABLE testt(t TIMESTAMP);
CREATE EVENT into_testt
  ON SCHEDULE EVERY 1 MINUTE DO
    INSERT INTO testt VALUES(CURRENT_TIMESTAMP);

Sure, we won´t call the function set_stat every minute. Let us suppose once a week. Then we can define the event by command:

CREATE EVENT call_set_stat
  ON SCHEDULE EVERY 1 WEEK STARTS '2006-07-01 13:00:00' DO
    CALL set_stat();

In the end I give one pledged example of c# code (mono) for calling stored procedure. In Linux there are two drivers. Free available ByteFX is no more developed (and also doesn´t support stored procedures) but we can get it as RPM pack. Therefore I target the second driver. Driver MySQL connector/NET supports SP but we have to install it manually (only version 1.0.7 and more work even out of o.s. MS Windows). If we get the library MySQL.Data.DII (from mysql.com), we must register it by command:

gacutil -i MySql.Data.dll

When I tried to call stored procedures methodologically, I found a mistake again. It can be caused by using developmental version of MySQL. Therefore I call the procedure clasically by SQL command CALL [13]. One calling of Reader [16] gives me two recordsets. After reading one recordset I move to second recordset by calling method NextResult() [32]. Otherwise, this example represents exemplary reading of recordset by ADO.NET.

1   using System;
2   using System.Data;
3   using MySql.Data.MySqlClient;

4   public class Test
5   {
6      public static void Main(string[] args)
7      {
8        string connectionString = "Database=mysql; User ID=root;";
9        IDbConnection dbcon;
10       dbcon = new MySqlConnection(connectionString);
11       dbcon.Open();
12       IDbCommand dbcmd = dbcon.CreateCommand();
13       dbcmd.CommandText = "CALL stat(?type,?scheme)";
14       dbcmd.Parameters.Add( new MySqlParameter("?type","l"));
15       dbcmd.Parameters.Add( new MySqlParameter("?scheme","mysql"));

16       IDataReader reader = dbcmd.ExecuteReader();

17       Console.WriteLine("".PadRight(62,'-'));
18       Console.WriteLine(" {0,-15} | {1,-15} | {2,10} | {3,10}",
                           "table schema",
                           "table name", "table rows", "data length");
19       Console.WriteLine("".PadRight(62,'-'));
20       Console.WriteLine(" Largest tables ");
21       Console.WriteLine("".PadRight(62,'-'));

22       while(reader.Read()) {
23         string table_schema = (string) reader["table_schema"];
24         string table_name = (string) reader["table_name"];
25         long table_rows = (long) reader["table_rows"];
26         long data_length = (long) reader["data_length"];
27         Console.WriteLine(" {0,-15} | {1,-15} | {2,10:g} | {3,10:g}",
                     table_schema,
                     table_name, table_rows, data_length);
28       }

29       Console.WriteLine("".PadRight(62,'-'));
30       Console.WriteLine(" Fastest growing tables ");
31       Console.WriteLine("".PadRight(62,'-'));

32       reader.NextResult();
33       while(reader.Read()) {
34         string table_schema = (string) reader["table_schema"];
35         string table_name = (string) reader["table_name"];
36         int table_rows = (int) reader["table_rows"];
37         int data_length = (int) reader["data_length"];
38         Console.WriteLine(" {0,-15} | {1,-15} | {2,10:g} | {3,10:g}",
                     table_schema,
                     table_name, table_rows, data_length);
39       }

40       reader.Close();
41       reader = null;
42       dbcmd.Dispose();
43       dbcmd = null;
45       dbcon.Close();
46       dbcon = null;
47     }
48  }

We translate and execute the script by a couple of commands:

mcs mono-test.cs -r:System.Data.dll -r:MySql.Data.dll
mono mono-test.exe

In this article I didn´t describe the entire functionality of stored procedures. I missed out e.g. triggers, protection, characteristic of function (DETERMINISTIC|NON DETERMINISTIC|MODIFIES SQL DATA etc).... Neither I tried to describe it. I haven´t met any person yet who would use SP in MySQL5 in real applications. At present it´s so hot new and I guess that it will take at least one year until all mistakes are debugged and until programmers get used to it – see the coming of PHP5.


Google AdSense