menu
  Home  ==>  papers  ==>  compilers  ==>  pascal_grammar   

Pascal Grammar - Felix John COLIBRI.

  • abstract : the Pascal ebnf grammar
  • key words : ebnf, iebnf, compiler, pascal
  • software used : Windows XP, Delphi 6
  • hardware used : Pentium 1.400Mhz, 256 M memory, 140 G hard disc
  • scope : Delphi 1 to 8 for Windows, Kylix
  • level : Delphi developer
  • plan :


1 - Introduction

Here are a couple of Pascal Grammars, of the level of the P4 compiler from Zurich.




2 - The grammars

2.1 - The base grammar

The base grammar seems to be the following:

 
actual_functionFUNCTION_NAME .
actual_parameter_list= '(actual_parameter { ',actual_parameter } ').
actual_parameteractual_value | actual_variable | actual_procedure
  | actual_function .
actual_procedurePROCEDURE_NAME .
actual_valueexpression .
actual_variablevariable .
addition_operator= '+' | '-' | OR .
array_typeARRAY '[index_type { ',index_type } ']OF element_type .
array_variablevariable .
assignment_statement= ( variable | FUNCTION_NAME ) ':=expression .
base_typetype .
blockdeclaration_part statement_part .
bound_specificationNAME '..NAME ':ordinal_type_identifier .
case_elementcase_label_list ':statement .
case_label_listconstant { ',constant } .
case_statementCASE expression OF case_element { ';case_element } [ ';' ] END .
component_variableindexed_variable | field_designator | file_buffer .
compound_statementBEGIN statement_sequence END .
conditional_statementif_statement | case_statement .
conformant_array_schemapacked_conformant_array_schema
  | unpacked_conformant_array_schema .
constant_definition_partCONST constant_definition ';' { constant_definition ';' } .
constant_definitionNAME '=constant .
constant= [ '+' | '-' ] ( CONSTANT_NAME | number ) | STRING .
declaration_part= [ label_declaration_part ] [ constant_definition_part ]
  [ type_definition_part ] [ variable_declaration_part ]
directiveFORWARD .
element_list= [ expression { ',expression } ] .
element_typetype .
entire_variableVARIABLE_NAME | FIELD_NAME .
enumerated_type= '(identifier_list ').
expression_listexpression { ',expression } .
expressionsimple_expression [ relational_operator simple_expression ] .
factorNUMBER | STRING | NIL | CONSTANT_NAME | set | variable
  | function_designator | '(expression ')' | NOT factor .
field_designatorrecord_variable '.FIELD_NAME .
field_list= [ ( fixed_part [ ';variant_part ] | variant_part ) [ ';' ] ] .
field_widthexpression .
file_bufferfile_variable '^.
file_component_typetype .
file_typeFILE OF file_component_type .
file_variablevariable .
final_expressionexpression .
fixed_partrecord_section { ';record_section } .
for_statementFOR VARIABLE_NAME ':=initial_expression ( TO | DOWNTO )
  final_expression DO statement .
formal_parameter_list= '(formal_parameter_section { ';formal_parameter_section } ').
formal_parameter_sectionvalue_parameter_section | variable_parameter_section
  | procedure_parameter_section | function_parameter_section .
fraction_lengthexpression .
function_declarationfunction_heading ';' ( block | directive ) .
function_designatorFUNCTION_NAME [ actual_parameter_list ] .
function_headingFUNCTION NAME [ formal_parameter_list ] ':result_type .
function_parameter_sectionfunction_heading .
goto_statementGOTO label .
identifier_listNAME { ',NAME } .
if_statementIF expression THEN statement [ ELSE statement ] .
index_typesimple_type .
indexed_variablearray_variable '[expression_list '].
initial_expressionexpression .
integer_numberNUMBER .
label_declaration_partLABEL label { ',label } ';.
labelNUMBER .
lower_boundconstant .
multiplication_operator= '*' | '/' | DIV | MOD | AND .
numberinteger_number | real_number .
ordinal_type_identifierTYPE_NAME .
output_listoutput_value { ',output_value } .
output_valueexpression [ ';field_width [ ':fraction_length ] ] .
packed_conformant_array_schemaPACKED ARRAY '[bound_specification ']OF TYPE_NAME .
parameter_typeTYPE_NAME | conformant_array_schema .
pointer_type= '^TYPE_NAME .
pointer_variablevariable .
procedure_and_function_declaration_part .
procedure_and_function_declaration_part= { ( procedure_declaration
  | function_declaration ) ';' } .
procedure_declarationprocedure_heading ';' ( block | directive ) .
procedure_headingPROCEDURE NAME [ formal_parameter_list ] .
procedure_parameter_sectionprocedure_heading .
procedure_statementPROCEDURE_NAME [ actual_parameter_list ] .
program_headingPROGRAM NAME '(identifier_list ')' ';.
programprogram_heading block '..
real_numberNUMBER .
record_sectionidentifier_list ':type .
record_typeRECORD field_list END .
record_variablevariable .
referenced_variablepointer_variable '^.
relational_operator= '=' | '<>' | '<' | '<=' | '>' | '>=' | IN .
repeat_statementREPEAT statement_sequence UNTIL expression .
repetitive_statementwhile_statement | repeat_statement | for_statement .
result_typeTYPE_NAME .
set_typeSET OF base_type .
set= '[element_list '].
simple_expression= [ '+' | '-' ] term { addition_operator term } .
simple_statement= [ assignment_statement | procedure_statement | goto_statement ] .
simple_typesubrange_type | enumerated_type .
statement_partBEGIN statement_sequence END .
statement_sequencestatement { ';statement } .
statement= [ LABEL ':' ] ( simple_statement | structured_statement ) .
structured_statementcompound_statement | repetitive_statement
  | conditional_statement | with_statement .
structured_type= [ PACKED ] unpacked_structured_type .
subrange_typelower_bound '..upper_bound .
tag_field= [ NAME ':' ] .
termfactor { multiplication_operator factor } .
type_definition_partTYPE type_definition ';' { type_definition ';' } .
type_definitionNAME '=type .
typesimple_type | structured_type | pointer_type | TYPE_NAME .
unpacked_conformant_array_schemaARRAY '[bound_specification
  { ';bound_specification } ']OF ( TYPE_NAME | conformant_array_schema ) .
unpacked_structured_typearray_type | record_type | set_type | file_type .
upper_boundconstant .
value_parameter_sectionidentifier_list ':parameter_type .
variable_declaration_partVAR variable_declaration ';' { variable_declaration ';' } .
variable_declarationidentifier_list ':type .
variable_parameter_sectionVAR identifier_list ':parameter_type .
variableentire_variable | component_variable | referenced_variable .
variant_partCASE tag_field TYPE_NAME OF variant { ';variant } .
variantcase_label_list ':' '(field_list ').
while_statementWHILE expression DO statement .
with_statementWITH record_variable { ',record_variable } DO statement .



The Wilson and Addyman book ordered the productions alphabetically, and we did the same above. Other authors (Fitzpatrick  ) presented the productions grouped by topic (types, statements etc), and other (Jobst  ) tried to isolate the LL1 part from the non-LL1 parts. Also the traditional Pascal Grammars also add the definition of identifier, integer, real etc. Since this is handled by the scanner we removed the corresponding productions.

In the above description, we found that

  • some productions are never called (output_xxx)
  • there remain left recursions (variable -> component_variable -> field_designator -> record_variable -> variable)


2.2 - Another P4 version

So we used our standard Indentation machine to remove unreachable productions and remove left recursion.

While we were at it, we tried to make small steps in the direction of Turbo:

  • we removed the conformant_array stuff. This was an english trial to add variable size arrays to Pascal, by pushing them on the stack instead of allocating them at compile time
  • we replaced the structure parameter compatibility of WIRTH with type name compatibility of Turbo.
    In UCSD, you could describe a type in a procedure declaration:

    VAR g_structureRECORD
                       nameString;
                       amountARRAY[1..3] OF Integer;
                     END;

    PROCEDURE compute(p_structureRECORD nameString;
       amountARRAY[1..3] OF Integer END);
     BEGIN
       // ...
     END;

    BEGIN
      compute(g_structure);
    END.

    To check compatibility with the caller, the compiler had to recursively check the parts of the declaration. Turbo forced the programmer to define a type name, and we can only use this name in the parameter types:

    TYPE t_strustureRECORD
                        nameString;
                        amountARRAY[1..3] OF Integer;
                      END;
    VAR g_structuret_structure;

    PROCEDURE compute(p_structuret_structure);
     BEGIN
       // ...
     END;

    BEGIN
      compute(g_structure);
    END.

  • in the same vein, we removed the procedural parameters, and added instead the procedural types, which are included in Turbo

  • WIRTH force a fixed LABEL, CONST, TYPE, VAR, PROCEDURE order. Turbo allowed to mix those "compiling" zones at will. So we replaced the fixed order declaration with a loop
  • the original FILE mechanism used a "file buffer" (defined in the file_buffer production above):

    VAR my_fileFILE OF Integer;

    BEGIN
      Put(my_file^);
      Get(my_file^);
    END.

    Turbo nicely replaced this with a generalized Read and Write, which in addition removed the tricky "first record pre fetching" issue at the semantic level. So we also removed the file_buffer production.



Here our modified grammar:

 
programprogram_heading block '..
  identifier_listNAME { ',NAME } .
  program_headingPROGRAM NAME '(identifier_list ')' ';.

  blockdeclaration_part statement_part .

    labelNUMBER .

    formal_parameter_list= '(formal_parameter_section
        { ';formal_parameter_section } ').
      formal_parameter_section= [ VAR ]identifier_list ':parameter_type .
        parameter_typeTYPE_NAME .

    constant= [ '+' | '-' ] ( CONSTANT_NAME | NUMBER ) | STRING .
    case_label_listconstant { ',constant } .

    typesimple_type | structured_type | pointer_type | procedural_type | TYPE_NAME .
      simple_typesubrange_type | enumerated_type .
        subrange_typeconstant '..constant .
        enumerated_type= '(identifier_list ').
      structured_type= [ PACKED ] unpacked_structured_type .
        unpacked_structured_typearray_type | record_type | set_type
            | file_type .
          array_typeARRAY '[index_type { ',index_type } ']OF
              element_type .
            index_typesimple_type .
            element_typetype .
          record_typeRECORD field_list END .
            field_list= [ ( fixed_part [ ';variant_part ] | variant_part ) [ ';' ] ] .
              fixed_partrecord_section { ';record_section } .
                record_sectionidentifier_list ':type .
              variant_partCASE tag_field TYPE_NAME OF variant { ';variant } .
                tag_field= [ NAME ':' ] .
                variantcase_label_list ':' '(field_list ').
          set_typeSET OF base_type .
            base_typetype .
          file_typeFILE [ OF file_component_type ] .
            file_component_typetype .
      pointer_type= '^TYPE_NAME .
      procedural_typeprocedure_type | function_type .
        procedure_typePROCEDURE [ formal_parameter_list ] .
        function_typeFUNCTION [ formal_parameter_list ] ':TYPE_NAME .

    declaration_part= { label_declaration_part | constant_definition_part |
        | type_definition_part | variable_declaration_part
        | procedure_and_function_declaration_part } .
      label_declaration_partLABEL label { ',label } ';.
      constant_definition_partCONST constant_definition ';'
          { constant_definition ';' } .
        constant_definitionNAME '=constant .
      type_definition_partTYPE type_definition ';' { type_definition ';' } .
        type_definitionNAME '=type .
      variable_declaration_partVAR variable_declaration ';'
          { variable_declaration ';' } .
        variable_declarationidentifier_list ':type .
      procedure_and_function_declaration_part=
          ( procedure_declaration | function_declaration ) ';'  .
        directiveFORWARD .
        procedure_declarationprocedure_heading ';' ( block | directive ) .
          procedure_headingPROCEDURE NAME [ formal_parameter_list ] .
        function_declarationfunction_heading ';' ( block | directive ) .
          function_headingFUNCTION NAME [ formal_parameter_list ] ':TYPE_NAME .

    statement_partBEGIN statement_sequence END .
      statement_sequencestatement { ';statement } .

        expressionF .
        expression_listexpression { ',expression } .
        variable_accessACCESS_NAME { end_access_ } .
          end_access_= { array_access_ | record_access_ | '^' | function_parameters_ } .
            array_access_= '[expression_list '].
            record_access_= '.variable_access .
            function_parameters_= '(' [ expression_list ] ').

        actual_parameter_list= '(actual_parameter { ',actual_parameter } ').
          actual_parameteractual_value | actual_variable | actual_procedure
              | actual_function .
            actual_procedurePROCEDURE_NAME .
            actual_functionFUNCTION_NAME .
            actual_valueexpression .
            actual_variablevariable_access .

        expressionsimple_expression [ relational_operator simple_expression ] .
          relational_operator= '=' | '<>' | '<' | '<=' | '>' | '>=' | IN .
          simple_expression= [ '+' | '-' ] term { addition_operator term } .
            addition_operator= '+' | '-' | OR .
            termfactor { multiplication_operator factor } .
              multiplication_operator= '*' | '/' | DIV | MOD | AND .
              factorNUMBER | STRING | NIL | CONSTANT_NAME | set
                  | variable_access | function_designator
                  | '(expression ')' | NOT factor .
                function_designatorFUNCTION_NAME [ actual_parameter_list ] .
                set= '[element_list '].
                  element_list= [ expression { ',expression } ] .

        statement= [ LABEL ':' ] ( simple_statement | structured_statement ) .
          simple_statement= [ assignment_statement | procedure_statement
              | goto_statement ] .
            assignment_statement= ( variable_access | FUNCTION_NAME ) ':=expression .
            procedure_statementPROCEDURE_NAME [ actual_parameter_list ] .
            goto_statementGOTO label .
          structured_statementcompound_statement | repetitive_statement
              | conditional_statement | with_statement .
            compound_statementBEGIN statement_sequence END .
            repetitive_statementwhile_statement | repeat_statement
                | for_statement .
              while_statementWHILE expression DO statement .
              repeat_statementREPEAT statement_sequence UNTIL expression .
              for_statementFOR VARIABLE_NAME ':=initial_expression
                  ( TO | DOWNTO ) final_expression DO statement .
                initial_expressionexpression .
                final_expressionexpression .
            conditional_statementif_statement | case_statement .
              if_statementIF expression THEN statement [ ELSE statement ] .
              case_statementCASE expression OF case_element { ';case_element }
                  [ ';' ] END .
                case_elementcase_label_list ':statement .
            with_statementWITH variable_access { ',variable_access } DO
                statement .
.



Please note that

  • we did not try to compact this grammar in any way. For instance, "parameter_type= TYPE_NAME ." could easily be removed. Such "useless" productions are often kept in order to generate a dedicated parsing procedure which will contain important handling (type checking, address computations etc) in later stages of the compilation. So the degree of redundancy depends on the later stages of the compilation (which were not detailed here)
  • the grammar is still not LL1 (mainly many NAME duplicate FIRSTs), but at least is is no longer left recursive.


2.3 - What's next

The next step could include
  • ELSE in the CASE, BREAK and CONTINUE
  • UNITs: this necessitates the extraction of the declaration part from the block, since the INTERFACE part breaks the nice block recursive structure
  • constant expression: to solve this, we have to extract the expression from the statement part to make it available at the declaration level
  • objects:
    • simple objects (like Turbo 5.5 to Turbo 7) are reasonably easy to handle. After all, they only are some kind of special RECORDs (from a syntactic point of view). Security levels (PRIVATE) are also easy.
    • Delphi kind of CLASSes require the PROPERTY mechanism, the CLASS references, the PROCEDURE OF OBJECT. And this level hides some nasty surprises (words like READ which are not keywords, but nevertheless have very special meaning, or the changing ";" rules in the attributes or directives parts)
    • the Delphi INTERFACE (in the COM sense) also forces another handful of productions



3 - References

Here are a couple of links:


4 - Comments



As usual:
  • please tell us at fcolibri@felix-colibri.com if you found some errors, mistakes, bugs, broken links or had some problem downloading the file. Resulting corrections will be helpful for other readers
  • we welcome any comment, criticism, enhancement, other sources or reference suggestion. Just send an e-mail to fcolibri@felix-colibri.com.
  • or more simply, enter your (anonymous or with your e-mail if you want an answer) comments below and clic the "send" button
    Name :
    E-mail :
    Comments * :
     

  • and if you liked this article, talk about this site to your fellow developpers, add a link to your links page ou mention our articles in your blog or newsgroup posts when relevant. That's the way we operate: the more traffic and Google references we get, the more articles we will write.



5 - The author

Felix John COLIBRI works at the Pascal Institute. Starting with Pascal in 1979, he then became involved with Object Oriented Programming, Delphi, Sql, Tcp/Ip, Html, UML. Currently, he is mainly active in the area of custom software development (new projects, maintenance, audits, BDE migration, Delphi Xe_n migrations, refactoring), Delphi Consulting and Delph training. His web site features tutorials, technical papers about programming with full downloadable source code, and the description and calendar of forthcoming Delphi, FireBird, Tcp/IP, Web Services, OOP  /  UML, Design Patterns, Unit Testing training sessions.
Created: nov-04. Last updated: jul-15 - 98 articles, 131 .ZIP sources, 1012 figures
Copyright © Felix J. Colibri   http://www.felix-colibri.com 2004 - 2015. All rigths reserved
Back:    Home  Papers  Training  Delphi developments  Links  Download
the Pascal Institute

Felix J COLIBRI

+ Home
  + articles_with_sources
    + database
    + web_internet_sockets
    + oop_components
    + uml_design_patterns
    + debug_and_test
    + graphic
    + controls
    + colibri_utilities
    + colibri_helpers
    + delphi
    + firemonkey
    + compilers
      – pl0_grammar
      – pascal_s_grammar
      – pascal_grammar
      – delphi_5_grammar
  + delphi_training
  + delphi_developments
  + sweet_home
  – download_zip_sources
  + links
Contacts
Site Map
– search :

RSS feed  
Blog