menu
  Home  ==>  papers  ==>  db  ==>  sql_parser   

SQL Parser - John Felix COLIBRI.

  • abstract : This recursive descent parser built from an IEBNF SQL grammar will parse a subset of Interbase SQL requests.
  • key words : SQL - Parser - SQL grammar - BNF - EBNF - IEBNF - Interbase
  • software used : Windows XP, Delphi 6
  • hardware used : Pentium 1.400Mhz, 256 M memory, 140 G hard disc
  • scope : Delphi 1 to 2005 for Windows, Kylix
  • level : Delphi developer
  • plan :


1 - Introduction

For the analysis of the Database schema extracted from an application's .DFMs, we had to analyze the content of several SQL statements. Basically we wanted to get the tables (FROM), the columns (SELECT) and the Master Detail links if any.

So we build a small SQL parser which allowed a quick analysis of the statements.




2 - The SQL Grammar

2.1 - The Hopeless Grammar

To build our parser, we started from a grammar, as we always do (cf the .DFM parser).

SQL is quite a complex language. Not in terms of concepts, but in the way all kind of functionalities ("features") were piled up, layer upon layer, years after years, on top of the language. Since it was supposed to be used as the sole all purpose language in Banks and Insurance companies, it has been enriched with all kinds of warts and hacks.

A full blown SQL grammar cannot exist, since the next day all kind of SQL engine providers will promptly add two or three new features, making sure that they stay incompatible with each other.

Sure standards have been proposed (SQL 92 etc), but I believe that no single established SQL engine uses any standard (implementing all and only all the standard). In addition, the puslished BNF grammars will disgust you to look at any grammar for the rest of your life: 1000 lines of rules or more, for something which, after all, is just a simple non-procedural data manipulation language with a tiny bit of arithmetic.

To avoid being dragged into a lost battle, I only use portions of SQL grammars which serve my immediate needs. I use whatever part of SQL that my application requires, and generate the parser from this ad hoc grammar.



2.2 - The Interbase SQL grammar

In this presentation, I started from the Interbase 6 SQL Reference page. This page presents an overview of the SQL language used by Interbase, with fragments of EBNF grammar embedded in the text. Those portions were extracted, some typos removed, and the whole lot was reorganized in IEBNF (Indented Extended Backus Naur Formalism). The result is around 519 lines long (27 K).

For our application, only the data manipulation part was necessary: SELECT, INSERT and UPDATE. So we removed the other parts (CREATE, ALTER, GRANT etc), which yielded the grammar used in this paper, which is about 50 lines long.



2.3 - The Data Manipulation Grammar

Here is the grammar that we used:

 
sqlinsert | select | update  .
  value_litteralVALUE_LITTERAL | NAME .
  integer_litteralINTEGER .

  table_or_view_nameNAME .
  name_view_procedureNAME .

  column_nameNAME .
  collation_nameNAME .
  alias_nameNAME .

  selectF .
  select_expressionselect .

  insertINSERT INTO table_or_view_name
        [ '(column_name { ',column_name } ')' ]
        ( VALUES '(value_litteral { ',value_litteral } ')' | select_expression ) .

  functionsaverage | count | max | min | sum | upper .
    averageAVG '(' [ ALL | DISTINCT ] value_litteral ').
    countCOUNT '(' '*' | [ ALL | DISTINCT ] value_litteral ').
    maxMAX '(' [ ALL | DISTINCT ] value_litteral ').
    minMIN '(' [ ALL | DISTINCT ] value_litteral ').
    sumSUM '(' [ ALL | DISTINCT ] value_litteral ').
    upperUPPER '(value_litteral ').

  search_conditionsearch_value { ( OR | AND ) search_condition } .
    search_value=
          value_litteral
            ( [ NOT ] ( between | like | in | compare | containing | starting )
              | IS [ NOT ] NULL )
        | ( ALL | SOME | ANY ) '(select_column_list ')'
        | EXISTS '(select_expression ')'
        | SINGULAR '(select_expression ')'
        | '(search_condition ')'
        | NOT search_condition .
      select_one_columnselect .
      select_column_listselect .

      betweenBETWEEN value_litteral AND value_litteral .
      likeLIKE value_litteral [ ESCAPE value_litteral ].
      inIN '(value_litteral { ',value_litteral } | select_column_list ').
      compareoperator ( value_litteral | '(select_one_column ')' ) .
        operator= '=' | '<' | '>' | '<=' | '>=' | '<>.
      containingCONTAINING value_litteral .
      startingSTARTING [ WITH ] value_litteral .

  selectSELECT [ DISTINCT | ALL ]
      ( '*' | functions | value_litteral { ',value_litteral } )
      FROM from_table_reference { ',from_table_reference }
      [ WHERE search_condition ]
      [ GROUP BY column_name
        [ COLLATE collation_name ] { ',column_name [ COLLATE collation_name ] } ]
      [ HAVING search_condition ]
      [ UNION select_expression [ ALL ] ]
      [ ORDER BY order_list ] .
    // -- DUPS: NAME, co-recursive
    from_table_referenceNAME procedure_end | joined_table .
      procedure_end= [ '(value_litteral { ',value_litteral } ')' ] [ alias_name ] .
      joined_table= ( name_view_procedure join_on | '(joined_table ')' ) { join_on } .
        join_onjoin_type ( joined_table | name_view_procedure ) ON search_condition .
          join_type= ( [ INNER | { LEFT | RIGHT | FULL } [OUTER] ] ) JOIN .

    order_list= ( column_name | integer_litteral ) [ COLLATE collation_name ]
        [ ascending_or_descending ] { ',order_list } .
      ascending_or_descendingASC | ASCENDING | DESC | DESCENDING .

  updateUPDATE table_or_view_name
        SET column_name '=value_litteral
        { ',column_name '=value_litteral }
        [ WHERE search_condition ] .



Just a couple of warnings:

  • I did not try to sort the different table, view, column and procedure NAMEs. For instance, after SELECT we accept "VALUE_LITTERAL", which naturally includes column name, but this "VALUE_LITTERAL" is used in many other places. It would be useful to define specific non-terminals, which would indicate what kind of name or expression is allowed.
  • the grammar is not LL1 (one symbol look ahead). Most conspicuously is the FROM part, where the table name, the stored procedure call and the JOIN expression all begin with a NAME. We started to factor this NAME out, but the result was a mess. So I kept the grammar simple, and generated the parser from this ambiguous grammar. The automatically generated LL1 parser also contains this ambiguity (and possibly other) and I will try to sort this out if and when I will stumble upon a parsing error.



3 - The SQL Parser

3.1 - The Symbol definitions

We placed all the symbols in a UNIT with the following INTERFACE (partial):

type t_sql_symbol_type= (e_unknown_symbol,
                         e_integer_litteral_symbol,
                         e_double_litteral_symbol,
                         e_string_litteral_symbol,

                         e_start_punctuation,
                           e_multiply_symbol// *
                           e_semi_colon_symbol// ;
                           // ...
                           e_different_symbol// <>
                         e_end_punctuation,

                         e_identifier_start,
                           e_SELECT_symbol,
                           e_INSERT_symbol,
                           // ...
                           e_UPPER_symbol,
                         e_identifier_symbol,

                         e_end_of_parse_symbol
                     );

      t_sql_symbol_type_setset of t_sql_symbol_type;

const k_sql_litteral_symbol_set= [e_integer_litteral_symbol,
          e_double_litteral_symbole_string_litteral_symbol];
      k_sql_litteral_and_identifier_symbol_setk_sql_litteral_symbol_set
          + [e_identifier_symbole_colon_symbol];

function f_sql_symbol_type_name(p_sql_symbol_typet_sql_symbol_type): String;
function f_sql_symbol_type_set_name(
    p_sql_symbol_type_sett_sql_symbol_type_set): String;
function f_sql_identifier_type(p_symbol_stringString): t_sql_symbol_type;



3.2 - The SQL Scanner

Here is the Scanner definition:

 c_sql_scannerclass(c_text_file)
                  m_sql_symbol_typet_sql_symbol_type;
                  m_blank_stringm_symbol_stringString;

                  m_display_symbol_readBoolean;

                  constructor create_sql_scanner(p_namep_file_nameString);
                  function f_initializedBoolean;
                  function f_read_symbolBoolean;
                  procedure test_sql_scanner;
                end// c_sql_scanner

with the following main symbol fetching function (partial):

function c_sql_scanner.f_read_symbolBoolean;

  // ...

  begin // f_read_symbol
    m_symbol_string:= '';
    m_sql_symbol_type:= e_unknown_symbol;

    if f_end_of_text
      then begin
          Result:= False;
          m_sql_symbol_type:= e_end_of_parse_symbol;
        end
      else begin
          get_blanks;

          if m_buffer_indexm_buffer_size
            then begin
                case m_pt_buffer[m_buffer_indexof
                  'a'..'z''A'..'Z' : get_identifier;
                  '-''0'..'9' : get_number;
                  '''' : get_string_litteral;

                  '*' : get_one_operator(e_multiply_symbol);
                  ':' : get_one_operator(e_colon_symbol);
                  '.' : get_one_operator(e_point_symbol);
                  ';' : get_one_operator(e_semi_colon_symbol);
                  ',' : get_one_operator(e_comma_symbol);
                  '=' : get_one_operator(e_equal_symbol);

                  '<' : if (m_buffer_index+ 1< m_buffer_size)
                            and (m_pt_buffer[m_buffer_index+ 1]= '=')
                          then get_two_operator(e_lower_or_equal_symbol)
                          else
                            if (m_buffer_index+ 1< m_buffer_size)
                                and (m_pt_buffer[m_buffer_index+ 1]= '>')
                              then get_two_operator(e_different_symbol)
                              else  get_one_operator(e_lower_symbol);
                  '>' : if (m_buffer_index+ 1< m_buffer_size)
                            and (m_pt_buffer[m_buffer_index+ 1]= '=')
                          then get_two_operator(e_greater_or_equal_symbol)
                          else get_one_operator(e_greater_symbol);

                  '(' : get_one_operator(e_opening_parenthesis_symbol);
                  ')' : get_one_operator(e_closing_parenthesis_symbol);
                  else
                    display_bug_stop('unknown_char_in_sql 'm_pt_buffer[m_buffer_index]
                        + '< 'IntToStr(Ord(m_pt_buffer[m_buffer_index])));
                end// case
              end
            else m_sql_symbol_type:= e_end_of_parse_symbol;

          Result:= True;
        end// not eof
  end// f_read_symbol



3.3 - The SQL parser

The Parser is defined by:

 c_sql_parserClass(c_basic_object)
                 m_c_sql_scanner_refc_sql_scanner;

                 constructor create_sql_parser(p_nameString);

                   procedure display_error(p_textstring);
                   procedure start_parsing;
                   function f_symbol_typet_sql_symbol_type;
                   function f_symbol_stringString;
                   procedure read_symbol;
                   procedure check(p_sql_symbol_typet_sql_symbol_type);
                   procedure check_set(p_sql_symbol_type_sett_sql_symbol_type_set);
                 procedure parse_sql;
               end// c_sql_parser

As an example, here is the handling of INSERT INTO :

PROCEDURE parse_insert;
  BEGIN
    check(e_INSERT_symbol);
    read_symbol;
    check(e_INTO_symbol);
    read_symbol;

    parse_table_or_view_name;

    IF f_symbol_typee_opening_parenthesis_symbol
      THEN BEGIN
          read_symbol;
          parse_column_name;
          WHILE f_symbol_typee_comma_symbol DO
          BEGIN
            read_symbol;
            parse_column_name;
          END// WHILE {
          check(e_closing_parenthesis_symbol);
          read_symbol;
        END// IF [

    IF f_symbol_typee_VALUES_symbol
      THEN BEGIN
          read_symbol;
          check(e_opening_parenthesis_symbol);
          read_symbol;
          parse_value_litteral;
          WHILE f_symbol_typee_comma_symbol DO
          BEGIN
            read_symbol;
            parse_value_litteral;
          END// WHILE {
          check(e_closing_parenthesis_symbol);
          read_symbol;
        END 
      ELSE 
        IF f_symbol_typee_SELECT_symbol
          THEN
            parse_select_expression 
          ELSE display_error('VALUES, SELECT');
  END// parse_insert



3.4 - The Application Snapshot

Let us take the following SQL request (from IbMastApp):

 
SELECT *
  FROM parts
  WHERE (parts.OnOrder > parts.OnHand)

Here is a snapshot of the application while parsing the previous request:

sql parser snapshot



3.5 - Mini Manual

To use the parser with the actual grammar:
   eventually place your SQL request in the _data folder
   compile and execute
   navigate to the folder containing the SQL request (_data by default)
   click on the ASCII file to start the parsing
If you want to change the grammar (add other SQL statements):
  • modify the symbol definitions in u_sql_symbol_definitions
  • eventually modifiy the scanner if new litterals with special syntax are introduced ("?", "$", double quotes etc)
  • add the parsing procedures



4 - Download the Sources

Here are the source code files:

Those .ZIP files contain:
  • the main program (.DPR, .DOF, .RES), the main form (.PAS, .DFM), and any other auxiliary form
  • any .TXT for parameters
  • all units (.PAS) for units
Those .ZIP
  • are self-contained: you will not need any other product (unless expressly mentioned).
  • can be used from any folder (the pathes are RELATIVE)
  • will not modify your PC in any way beyond the path where you placed the .ZIP (no registry changes, no path creation etc).
To use the .ZIP:
  • create or select any folder of your choice
  • unzip the downloaded file
  • using Delphi, compile and execute
To remove the .ZIP simply delete the folder.



As usual:

  • please tell us at fcolibri@felix-colibri.com if you found some errors, mistakes, bugs, broken links or had some problem downloading the file. Resulting corrections will be helpful for other readers
  • we welcome any comment, criticism, enhancement, other sources or reference suggestion. Just send an e-mail to fcolibri@felix-colibri.com.
  • or more simply, enter your (anonymous or with your e-mail if you want an answer) comments below and clic the "send" button
    Name :
    E-mail :
    Comments * :
     

  • and if you liked this article, talk about this site to your fellow developpers, add a link to your links page ou mention our articles in your blog or newsgroup posts when relevant. That's the way we operate: the more traffic and Google references we get, the more articles we will write.



5 - Conclusion

We presented here a small SQL parser.




6 - References




7 - Other Papers with Source and Links

Database
database reverse engineering Extraction of the Database Schema by analyzing the content of the application's .DFMs
sql parser Parsing SQL requests in Delphi, starting from an EBNF grammar for SELECT, INSERT and UPDATE
ado net tutorial a complete Ado Net architectural presentation, and projects for creating the Database, creating Tables, adding, deleting and updating rows, displaying the data in controls and DataGrids, using in memory DataSets, handling Views, updating the Tables with a DataGrid
turbo delphi interbase tutorial develop database applications with Turbo Delphi and Interbase. Complete ADO Net architecture, and full projects to create the database, the Tables, fill the rows, display and update the values with DataGrids. Uses the BDP
bdp ado net blobs BDP and Blobs : reading and writing Blob fields using the BDP with Turbo Delphi
interbase stored procedure grammar Interbase Stored Procedure Grammar : The BNF Grammar of the Interbase Stored Procedure. This grammar can be used to build stored procedure utilities, like pretty printers, renaming tools, Sql Engine conversion or ports
using interbase system tables Using InterBase System Tables : The Interbase / FireBird System Tables: description of the main Tables, with their relationship and presents examples of how to extract information from the schema
eco tutorial Writing a simple ECO application: the UML model, the in memory objects and the GUI presentation. We also will show how to evaluate OCL expressions using the EcoHandles, and persist the data on disc
delphi dbx4 programming the new dbExpress 4 framework for RAD Studio 2007 : the configuration files, how to connect, read and write data, using tracing and pooling delegates and metadata handling
blackfishsql using the new BlackfishSql standalone database engine of RAD Studio 2007 (Win32 and .Net) : create the database, create / fill / read Tables, use Pascal User Defined Functions and Stored Procedures
rave pdf intraweb how to produce PDF reports using Rave, and have an Intraweb site generate and display .PDF pages, with multi-user access
embarcadero er studio Embarcadero ER Studio tutorial: how to use the Entity Relationship tool to create a new model, reverse engineer a database, create sub-models, generate reports, import metadata, switch to Dimensional Model
Web
sql to html converting SQL ascii request to HTML format
simple web server a simple HTTP web Server and the corresponding HTTP web Browser, using our Client Server Socket library
simple cgi web server a simple CGI Web Server which handles HTML <FORM> requests, mainly for debugging CGI Server extension purposes
cgi database browser a CGI extension in order to display and modify a Table using a Web Browser
whois a Whois Client who requests information about owners of IP adresses. Works in batch mode.
web downloader an HTTP tool enabling to save on a local folder an HTML page with its associated images (.GIF, .JPEG, .PNG or other) for archieving or later off-line reading
web spider a Web Spider allowing to download all pages from a site, with custom or GUI filtering and selection.
asp net log file a logging CLASS allowing to monitor the Asp.Net events, mainly used for undesrtanding, debugging and journaling Asp.Net Web applications
asp net viewstate viewer an ASP.NET utility displaying the content of the viewtate field which carries the request state between Internet Explorer and the IIS / CASSINI Servers
rss reader the RSS Reader lets you download and view the content of an .RSS feed (the entry point into somebody's blog) in a tMemo or a tTreeView. Comes complete with an .HTML downloader and an .XML parser
news message tree how to build a tree of the NNTP News Messages. The downloaded messages are displayed in tListBox by message thread (topic), and for each thread the messages are presented in a tTreeVi"ew
threaded indy news reader a NewsReader which presents the articles sorted by thread and in a logical hierarchical way. This is the basic Indy newsreader demo plus the tree organization of messages
delphi asp net portal programming presentation, architecture and programming of the Delphi Asp Net Portal. This is a Delphi version of the Microsoft ASP.NET Starter Kit Web Portal showcase. With detailed schemas and step by step presentation, the Sql scripts and binaries of the Database
delphi web designer a tiny Delphi "RAD Web Designer", which explains how the Delphi IDE can be used to generate .HTML pages using the Palette / Object Inspector / Form metaphor to layout the page content
intraweb architecture the architecture of the Intraweb web site building tool. Explains how Delphi "rad html generator" work, and presents the CLASS organization (UML Class diagrams)
ajax tutorial AJAX Tutorial : writing an AJAX web application. How AJAX works, using a JavaScript DOM parser, the Indy Web Server, requesting .XML data packets - Integrated development project
asp net master pages Asp.Net 2.0 Master Pages : the new Asp.Net 2.0 allow us to define the page structure in a hierarchical way using Master Pages and Content Pages, in a way similar to tForm inheritance
delphi asp net 20 databases Asp.Net 2.0 and Ado.Net 2.0 : displaying and writing InterBase and Blackfish Sql data using Dbx4, Ado.Net Db and AdoDbxClient. Handling of ListBox and GridView with DataSource components
asp net 20 users roles profiles Asp.Net 2.0 Security: Users, Roles and Profiles : Asp.Net 2.0 offers a vaslty improved support for handling security: new Login Controls, and services for managing Users, grouping Users in Roles, and storing User preferences in Profiles
bayesian spam filter Bayesian Spam Filter : presentation and implementation of a spam elimination tool which uses Bayesian Filtering techniques
TCP/IP
tcp ip sniffer project to capture and display the packets travelling on the Ethernet network of your PC.
sniffing interbase traffic capture and analysis of Interbase packets. Creation of a database and test table, and comparison of the BDE vs Interbase Express Delphi components
socket programming the simplest Client Server example of TCP / IP communication using Windows Sockets with Delphi
delphi socket architecture the organization of the ScktComp unit, with UML diagrams and a simple Client Server file transfer example using tClientSocket and tServerSocket
Object Oriented Programming Components
delphi virtual constructor VIRTUAL CONSTRUCTORS together with CLASS references and dynamic Packages allow the separation between a main project and modules compiled and linked in later. The starting point for Application Frameworks and Plugins
delphi generics tutorial Delphi Generics Tutorial : using Generics (parameterized types) in Delphi : the type parameter and the type argument, application of generics, constraints on INTERFACEs or CONSTRUCTORs
UML Patterns
the lexi editor delphi source code of the Gof Editor: Composite, Decorator, Iterator, Strategy, Visitor, Command, with UML diagrams
factory and bridge patterns presentation and Delphi sources for the Abstract Factory and Bridge patterns, used in the Lexi Document Editor case study from the GOF book
gof design patterns delphi source code of the 23 Gof (GAMMA and other) patterns: Composite, Decorator, Iterator, Strategy, Visitor, Command
Debug and Test
Graphic
delphi 3d designer build a 3d volume list, display it in perspective and move the camera, the screen or the volumes with the mouse.
writing a flash player build your own ShockWave Flash movie Player, with pause, custom back and forward steps, snapshots, resizing. Designed for analyzing .SWF demos.
Utilities
the coliget search engine a Full Text Search unit allowing to find the files in a directory satisfying a complex string request (UML AND Delphi OR Patters)
treeview html help viewer Treeview .HTML Help Viewer : the use of a Treeview along with a WebBrowser to display .HTML files alows both structuring and ordering of the help topics. This tool was used to browse the Delphi PRISM Wiki help.
Delphi utilities
delphi net bdsproj structure and analysis of the .BDSPROJ file with the help of a small Delphi .XML parser
dccil bat generator generation of the .BAT for the Delphi DCCIL command line compiler using the .BDSPROJ
dfm parser a Delphi Project analyzing the .DFM file and building a memory representation. This can be used for transformations of the form components
dfm binary to text a Delphi Project converting all .DFM file from a path from binary to ascii format
component to code generate the component creation and initialization code by analyzing the .DFM. Handy to avoid installing components on the Palette when examining new libraries
exe dll pe explorer presents and analyzes the content of .EXE and .DLL files. The starting point for extracting resources, spying .DLL function calls or injecting additional functionalities
dll and process viewer analyze and display the list of running processes, with their associated DLLs and Memory mapped files (Process Walker)
Controls
find memo a tMemo with "find first", "find next", "sort", "save" capabilities
Helper units
windows environment read and write Windows Environment strings
stdin stdout send and receive strings from a GUI application to a CONSOLE application




8 - The author

Felix John COLIBRI works at the Pascal Institute. Starting with Pascal in 1979, he then became involved with Object Oriented Programming, Delphi, Sql, Tcp/Ip, Html, UML. Currently, he is mainly active in the area of custom software development (new projects, maintenance, audits, BDE migration, Delphi Xe_n migrations, refactoring), Delphi Consulting and Delph training. His web site features tutorials, technical papers about programming with full downloadable source code, and the description and calendar of forthcoming Delphi, FireBird, Tcp/IP, Web Services, OOP  /  UML, Design Patterns, Unit Testing training sessions.
Created: nov-04. Last updated: dec-15 - 99 articles, 220 .ZIP sources, 1068 figures
Copyright © Felix J. Colibri   http://www.felix-colibri.com 2004 - 2015. All rigths reserved
Back:    Home  Papers  Training  Delphi developments  Links  Download
the Pascal Institute

Felix J COLIBRI

+ Home
  + articles_with_sources
    + database
      + interbase
      – firebird_trans_simulator
      + sql_server
      + bdp
      – db_refactoring
      – sql_parser
      – sql_to_html
      – sniffing_interbase
      – eco_tutorial
      – dbx4_programming
      – blackfishsql
      – rave_pdf_intraweb
      – rave_reports_tutorial
      – rave_reports_video
      – embarcadero_er/studio
      + firedac
      – bde_unidac_migration
    + web_internet_sockets
    + oop_components
    + uml_design_patterns
    + debug_and_test
    + graphic
    + controls
    + colibri_utilities
    + colibri_helpers
    + delphi
    + firemonkey
    + compilers
  + delphi_training
  + delphi_developments
  + sweet_home
  – download_zip_sources
  + links
Contacts
Site Map
– search :

RSS feed  
Blog