|
Pascal Grammar - Felix John COLIBRI.
|
- abstract : the Pascal ebnf grammar
- key words : ebnf, iebnf, compiler, pascal
- software used : Windows XP, Delphi 6
- hardware used : Pentium 1.400Mhz, 256 M memory, 140 G hard disc
- scope : Delphi 1 to 8 for Windows, Kylix
- level : Delphi developer
- plan :
1 - Introduction
Here are a couple of Pascal Grammars, of the level of the P4 compiler from
Zurich.
2 - The grammars
2.1 - The base grammar
The base grammar seems to be the following:
actual_function= FUNCTION_NAME .
actual_parameter_list= '(' actual_parameter { ',' actual_parameter } ')' .
actual_parameter= actual_value | actual_variable | actual_procedure
| actual_function .
actual_procedure= PROCEDURE_NAME .
actual_value= expression .
actual_variable= variable .
addition_operator= '+' | '-' | OR .
array_type= ARRAY '[' index_type { ',' index_type } ']' OF element_type .
array_variable= variable .
assignment_statement= ( variable | FUNCTION_NAME ) ':=' expression .
base_type= type .
block= declaration_part statement_part .
bound_specification= NAME '..' NAME ':' ordinal_type_identifier .
case_element= case_label_list ':' statement .
case_label_list= constant { ',' constant } .
case_statement= CASE expression OF case_element { ';' case_element } [ ';' ] END .
component_variable= indexed_variable | field_designator | file_buffer .
compound_statement= BEGIN statement_sequence END .
conditional_statement= if_statement | case_statement .
conformant_array_schema= packed_conformant_array_schema
| unpacked_conformant_array_schema .
constant_definition_part= CONST constant_definition ';' { constant_definition ';' } .
constant_definition= NAME '=' constant .
constant= [ '+' | '-' ] ( CONSTANT_NAME | number ) | STRING .
declaration_part= [ label_declaration_part ] [ constant_definition_part ]
[ type_definition_part ] [ variable_declaration_part ]
directive= FORWARD .
element_list= [ expression { ',' expression } ] .
element_type= type .
entire_variable= VARIABLE_NAME | FIELD_NAME .
enumerated_type= '(' identifier_list ')' .
expression_list= expression { ',' expression } .
expression= simple_expression [ relational_operator simple_expression ] .
factor= NUMBER | STRING | NIL | CONSTANT_NAME | set | variable
| function_designator | '(' expression ')' | NOT factor .
field_designator= record_variable '.' FIELD_NAME .
field_list= [ ( fixed_part [ ';' variant_part ] | variant_part ) [ ';' ] ] .
field_width= expression .
file_buffer= file_variable '^' .
file_component_type= type .
file_type= FILE OF file_component_type .
file_variable= variable .
final_expression= expression .
fixed_part= record_section { ';' record_section } .
for_statement= FOR VARIABLE_NAME ':=' initial_expression ( TO | DOWNTO )
final_expression DO statement .
formal_parameter_list= '(' formal_parameter_section { ';' formal_parameter_section } ')' .
formal_parameter_section= value_parameter_section | variable_parameter_section
| procedure_parameter_section | function_parameter_section .
fraction_length= expression .
function_declaration= function_heading ';' ( block | directive ) .
function_designator= FUNCTION_NAME [ actual_parameter_list ] .
function_heading= FUNCTION NAME [ formal_parameter_list ] ':' result_type .
function_parameter_section= function_heading .
goto_statement= GOTO label .
identifier_list= NAME { ',' NAME } .
if_statement= IF expression THEN statement [ ELSE statement ] .
index_type= simple_type .
indexed_variable= array_variable '[' expression_list ']' .
initial_expression= expression .
integer_number= NUMBER .
label_declaration_part= LABEL label { ',' label } ';' .
label= NUMBER .
lower_bound= constant .
multiplication_operator= '*' | '/' | DIV | MOD | AND .
number= integer_number | real_number .
ordinal_type_identifier= TYPE_NAME .
output_list= output_value { ',' output_value } .
output_value= expression [ ';' field_width [ ':' fraction_length ] ] .
packed_conformant_array_schema= PACKED ARRAY '[' bound_specification ']' OF TYPE_NAME .
parameter_type= TYPE_NAME | conformant_array_schema .
pointer_type= '^' TYPE_NAME .
pointer_variable= variable .
procedure_and_function_declaration_part .
procedure_and_function_declaration_part= { ( procedure_declaration
| function_declaration ) ';' } .
procedure_declaration= procedure_heading ';' ( block | directive ) .
procedure_heading= PROCEDURE NAME [ formal_parameter_list ] .
procedure_parameter_section= procedure_heading .
procedure_statement= PROCEDURE_NAME [ actual_parameter_list ] .
program_heading= PROGRAM NAME '(' identifier_list ')' ';' .
program= program_heading block '.' .
real_number= NUMBER .
record_section= identifier_list ':' type .
record_type= RECORD field_list END .
record_variable= variable .
referenced_variable= pointer_variable '^' .
relational_operator= '=' | '<>' | '<' | '<=' | '>' | '>=' | IN .
repeat_statement= REPEAT statement_sequence UNTIL expression .
repetitive_statement= while_statement | repeat_statement | for_statement .
result_type= TYPE_NAME .
set_type= SET OF base_type .
set= '[' element_list ']' .
simple_expression= [ '+' | '-' ] term { addition_operator term } .
simple_statement= [ assignment_statement | procedure_statement | goto_statement ] .
simple_type= subrange_type | enumerated_type .
statement_part= BEGIN statement_sequence END .
statement_sequence= statement { ';' statement } .
statement= [ LABEL ':' ] ( simple_statement | structured_statement ) .
structured_statement= compound_statement | repetitive_statement
| conditional_statement | with_statement .
structured_type= [ PACKED ] unpacked_structured_type .
subrange_type= lower_bound '..' upper_bound .
tag_field= [ NAME ':' ] .
term= factor { multiplication_operator factor } .
type_definition_part= TYPE type_definition ';' { type_definition ';' } .
type_definition= NAME '=' type .
type= simple_type | structured_type | pointer_type | TYPE_NAME .
unpacked_conformant_array_schema= ARRAY '[' bound_specification
{ ';' bound_specification } ']' OF ( TYPE_NAME | conformant_array_schema ) .
unpacked_structured_type= array_type | record_type | set_type | file_type .
upper_bound= constant .
value_parameter_section= identifier_list ':' parameter_type .
variable_declaration_part= VAR variable_declaration ';' { variable_declaration ';' } .
variable_declaration= identifier_list ':' type .
variable_parameter_section= VAR identifier_list ':' parameter_type .
variable= entire_variable | component_variable | referenced_variable .
variant_part= CASE tag_field TYPE_NAME OF variant { ';' variant } .
variant= case_label_list ':' '(' field_list ')' .
while_statement= WHILE expression DO statement .
with_statement= WITH record_variable { ',' record_variable } DO statement .
|
The Wilson and Addyman book ordered the productions
alphabetically, and we did the same above. Other authors
(Fitzpatrick ) presented the productions grouped by topic
(types, statements etc), and other (Jobst ) tried to isolate the
LL1 part from the non-LL1 parts. Also the traditional Pascal Grammars also add
the definition of identifier, integer, real etc. Since this is handled by
the scanner we removed the corresponding productions.
In the above description, we found that
- some productions are never called (output_xxx)
- there remain left recursions (variable -> component_variable ->
field_designator -> record_variable -> variable)
2.2 - Another P4 version
So we used our standard Indentation machine to remove unreachable productions
and remove left recursion.
While we were at it, we tried to make small steps in the direction of Turbo:
- we removed the conformant_array stuff. This was an english trial to add
variable size arrays to Pascal, by pushing them on the stack instead of
allocating them at compile time
- we replaced the structure parameter compatibility of WIRTH with type name
compatibility of Turbo.
In UCSD, you could describe a type in a procedure declaration:
VAR g_structure: RECORD
name: String;
amount: ARRAY[1..3] OF Integer;
END;
PROCEDURE compute(p_structure: RECORD name: String;
amount: ARRAY[1..3] OF Integer END);
BEGIN
// ...
END;
BEGIN
compute(g_structure);
END.
|
To check compatibility with the caller, the compiler had to recursively
check the parts of the declaration. Turbo forced the programmer to define a
type name, and we can only use this name in the parameter types:
TYPE t_strusture= RECORD
name: String;
amount: ARRAY[1..3] OF Integer;
END;
VAR g_structure: t_structure;
PROCEDURE compute(p_structure: t_structure);
BEGIN
// ...
END;
BEGIN
compute(g_structure);
END.
|
- in the same vein, we removed the procedural parameters, and added instead
the procedural types, which are included in Turbo
- WIRTH force a fixed LABEL, CONST, TYPE, VAR, PROCEDURE order.
Turbo allowed to mix those "compiling" zones at will. So we replaced the
fixed order declaration with a loop
- the original FILE mechanism used a "file buffer" (defined in the
file_buffer production above):
VAR my_file: FILE OF Integer;
BEGIN
Put(my_file^);
Get(my_file^);
END.
|
Turbo nicely replaced this with a generalized Read and Write, which in
addition removed the tricky "first record pre fetching" issue at the
semantic level. So we also removed the file_buffer production.
Here our modified grammar:
program= program_heading block '.' .
identifier_list= NAME { ',' NAME } .
program_heading= PROGRAM NAME '(' identifier_list ')' ';' .
block= declaration_part statement_part .
label= NUMBER .
formal_parameter_list= '(' formal_parameter_section
{ ';' formal_parameter_section } ')' .
formal_parameter_section= [ VAR ]identifier_list ':' parameter_type .
parameter_type= TYPE_NAME .
constant= [ '+' | '-' ] ( CONSTANT_NAME | NUMBER ) | STRING .
case_label_list= constant { ',' constant } .
type= simple_type | structured_type | pointer_type | procedural_type | TYPE_NAME .
simple_type= subrange_type | enumerated_type .
subrange_type= constant '..' constant .
enumerated_type= '(' identifier_list ')' .
structured_type= [ PACKED ] unpacked_structured_type .
unpacked_structured_type= array_type | record_type | set_type
| file_type .
array_type= ARRAY '[' index_type { ',' index_type } ']' OF
element_type .
index_type= simple_type .
element_type= type .
record_type= RECORD field_list END .
field_list= [ ( fixed_part [ ';' variant_part ] | variant_part ) [ ';' ] ] .
fixed_part= record_section { ';' record_section } .
record_section= identifier_list ':' type .
variant_part= CASE tag_field TYPE_NAME OF variant { ';' variant } .
tag_field= [ NAME ':' ] .
variant= case_label_list ':' '(' field_list ')' .
set_type= SET OF base_type .
base_type= type .
file_type= FILE [ OF file_component_type ] .
file_component_type= type .
pointer_type= '^' TYPE_NAME .
procedural_type= procedure_type | function_type .
procedure_type= PROCEDURE [ formal_parameter_list ] .
function_type= FUNCTION [ formal_parameter_list ] ':' TYPE_NAME .
declaration_part= { label_declaration_part | constant_definition_part |
| type_definition_part | variable_declaration_part
| procedure_and_function_declaration_part } .
label_declaration_part= LABEL label { ',' label } ';' .
constant_definition_part= CONST constant_definition ';'
{ constant_definition ';' } .
constant_definition= NAME '=' constant .
type_definition_part= TYPE type_definition ';' { type_definition ';' } .
type_definition= NAME '=' type .
variable_declaration_part= VAR variable_declaration ';'
{ variable_declaration ';' } .
variable_declaration= identifier_list ':' type .
procedure_and_function_declaration_part=
( procedure_declaration | function_declaration ) ';' .
directive= FORWARD .
procedure_declaration= procedure_heading ';' ( block | directive ) .
procedure_heading= PROCEDURE NAME [ formal_parameter_list ] .
function_declaration= function_heading ';' ( block | directive ) .
function_heading= FUNCTION NAME [ formal_parameter_list ] ':' TYPE_NAME .
statement_part= BEGIN statement_sequence END .
statement_sequence= statement { ';' statement } .
expression= F .
expression_list= expression { ',' expression } .
variable_access= ACCESS_NAME { end_access_ } .
end_access_= { array_access_ | record_access_ | '^' | function_parameters_ } .
array_access_= '[' expression_list ']' .
record_access_= '.' variable_access .
function_parameters_= '(' [ expression_list ] ')' .
actual_parameter_list= '(' actual_parameter { ',' actual_parameter } ')' .
actual_parameter= actual_value | actual_variable | actual_procedure
| actual_function .
actual_procedure= PROCEDURE_NAME .
actual_function= FUNCTION_NAME .
actual_value= expression .
actual_variable= variable_access .
expression= simple_expression [ relational_operator simple_expression ] .
relational_operator= '=' | '<>' | '<' | '<=' | '>' | '>=' | IN .
simple_expression= [ '+' | '-' ] term { addition_operator term } .
addition_operator= '+' | '-' | OR .
term= factor { multiplication_operator factor } .
multiplication_operator= '*' | '/' | DIV | MOD | AND .
factor= NUMBER | STRING | NIL | CONSTANT_NAME | set
| variable_access | function_designator
| '(' expression ')' | NOT factor .
function_designator= FUNCTION_NAME [ actual_parameter_list ] .
set= '[' element_list ']' .
element_list= [ expression { ',' expression } ] .
statement= [ LABEL ':' ] ( simple_statement | structured_statement ) .
simple_statement= [ assignment_statement | procedure_statement
| goto_statement ] .
assignment_statement= ( variable_access | FUNCTION_NAME ) ':=' expression .
procedure_statement= PROCEDURE_NAME [ actual_parameter_list ] .
goto_statement= GOTO label .
structured_statement= compound_statement | repetitive_statement
| conditional_statement | with_statement .
compound_statement= BEGIN statement_sequence END .
repetitive_statement= while_statement | repeat_statement
| for_statement .
while_statement= WHILE expression DO statement .
repeat_statement= REPEAT statement_sequence UNTIL expression .
for_statement= FOR VARIABLE_NAME ':=' initial_expression
( TO | DOWNTO ) final_expression DO statement .
initial_expression= expression .
final_expression= expression .
conditional_statement= if_statement | case_statement .
if_statement= IF expression THEN statement [ ELSE statement ] .
case_statement= CASE expression OF case_element { ';' case_element }
[ ';' ] END .
case_element= case_label_list ':' statement .
with_statement= WITH variable_access { ',' variable_access } DO
statement .
.
|
Please note that
- we did not try to compact this grammar in any way. For instance,
"parameter_type= TYPE_NAME ." could easily be removed. Such "useless"
productions are often kept in order to generate a dedicated parsing
procedure which will contain important handling (type checking, address
computations etc) in later stages of the compilation. So the degree of
redundancy depends on the later stages of the compilation (which were not
detailed here)
- the grammar is still not LL1 (mainly many NAME duplicate FIRSTs), but at
least is is no longer left recursive.
2.3 - What's next
The next step could include
- ELSE in the CASE, BREAK and CONTINUE
- UNITs: this necessitates the extraction of the declaration part from the
block, since the INTERFACE part breaks the nice block recursive
structure
- constant expression: to solve this, we have to extract the expression from
the statement part to make it available at the declaration level
- objects:
- simple objects (like Turbo 5.5 to Turbo 7) are reasonably easy to handle.
After all, they only are some kind of special RECORDs (from a syntactic
point of view). Security levels (PRIVATE) are also easy.
- Delphi kind of CLASSes require the PROPERTY mechanism, the CLASS
references, the PROCEDURE OF OBJECT. And this level hides some
nasty surprises (words like READ which are not keywords, but
nevertheless have very special meaning, or the changing ";" rules in the
attributes or directives parts)
- the Delphi INTERFACE (in the COM sense) also forces another handful of
productions
3 - References
Here are a couple of links:
4 - Comments
As usual:
- please tell us at fcolibri@felix-colibri.com if you found some errors, mistakes, bugs or had
some problem downloading the file. Resulting corrections will be helpful
for other readers
- we welcome any comment, criticism, enhancement, other sources or reference
suggestion. Just send an e-mail to fcolibri@felix-colibri.com.
- or more simply, enter your (anonymous or with your e-mail if you want an
answer) comments below and clic the "send" button
- and if you liked this article, talk about this site to your fellow
developpers, add a link to your links page ou mention our articles in
your newsgroup posts when relevant. That's the way we operate: the more
traffic and Google references we get, the more articles we will write.
5 - The author
Felix John COLIBRI works at the Pascal
Institute. He programs in Pascal since 1979, and is mainly active in the area
of custom software
development and training, and is a frequent speaker at Borland
Developer Conferences. His web site features
tutorials, technical papers about programming with full downloadable source
code, and the description and calendar of forthcoming Delphi,
Interbase, Asp.Net, Ado.Net and OOP / UML training sessions.
|