The Alsacian notation - Felix John COLIBRI.
- abstract : the Alsacian prefixing notation: k_constant, t_type, g_global, l_local, p_parameter. Presentation, rationale and benefits.
- key words : programming style - notation - coding conventions - Hungarian Notation
- scope : Pascal, Delphi in all its makes and shapes
- level : Pascal / Delphi developer
- plan :
1 - Why use Coding notations
When we write some new piece of code, we have all the project in our head. We named all the identifiers, and remember what they stand for.
On the other hand, when we maintain a project, we have to look at some lines
that we wrote weeks or months ago, or written by someone who was moved to another project or even left the company. In this case, each line should convey as much information as possible to help us understand the intent of the original developer.
The overall direction is to continuously move toward more abstract and complex concepts. In 1980, a program had a dozen of Integer, Strings, ARRAYs. Today we handle sophisticated CLASSes with attributes which are themselves
structured elements. And everybody would agree that a CLASS is more "heavy" than a lonely Boolean.
Therefore, in order somehow reduce the complexity of understanding what we are manipulating, we adopted the Alsacian Notation.
2 - The Alsacian Notation
2.1 - The basic notation In a nutshell we prefix the identifier names according to their program area:
- all CONST are prefixed with k_
- TYPEs with t_
- global VAR with g_
- local VAR with l_
- parameters with p_
- FUNCTIONs with f_
Here is a quick example:
| program p_first_example; |
Const k_margin= 0.90;
type t_money= Double;
var g_net_amount: t_money;
function f_quantity: Integer;
end; // f_quantity
procedure compute_invoice(var pv_net_amount: t_money);
var l_sales_price: t_money;
pv_net_amount:= k_margin* l_sales_price* f_quantity ;
end; // compute_invoice
begin // main
end. // main
The rationale for using those prefixes is the following:
2.2 - The full Notation Here is a more complete presentation of the Alsacian Notation
2.3 - quick example
Here is a typical example, from our .XML parser (used, for instance, to parse .RSS feeds). An .XML file is made of
and the text can itself contain free text, some <tag> text </tag> etc
- a starting tags <my_tag>
- some text
- an ending tag, or anti tag </my_tag)
Here is the function which analyzes a tag (partial, nested function, see the RSS Reader paper for the full sources):
function f_c_parse_tag_content_recursive: c_xml_tag; |
var l_c_xml_sub_tag: c_xml_tag;
IF l_symbol_type= e_closing_tag_symbol
// -- skip ">"
// -- await "</"
while l_symbol_type<> e_opening_anti_tag_symbol do
IF l_symbol_type= e_opening_tag_symbol
// -- a nested tag: recurse
// -- any content different from a tag
if not f_contains_only(l_symbol_string, [' ', k_tabulation, k_return, k_line_feed])
end; // while
// -- skip "</"
END // l_symbol_type= e_closing_tag_symbol
ELSE // l_symbol_type<> e_closing_tag_symbol
IF l_symbol_type= e_closing_anti_tag_symbol
ELSE display_parser_error('>, />');
END; // f_c_parse_tag_content_recursive
2.4 - Convention Gallore
We use other conventions, not directly related to program identifiers
- for files, the following prefixes allow us to quickly identify the file type
- u_ for UNITs
- p_ for PROGRAMs or Delphi projects
- u_c_ for UNITs mainly containing CLASSes
- d_ for DLLs
- pk_ for Packages
- for identifier casing
- the keywords are un upper case: BEGIN
- identifier from Windows, Delphi UNITs or any other outside library have at lease one uppercase letter: tForm1
- the identifier we create are all in lowercase : g_c_account.
This allows any reader to quickly find whether an identifier is part of an outside library, or was created by us
- everything IN a UNIT starts at lease at 3 (they are IN the UNIT)
- INTERFACE, IMPLEMENTATION, END. are at 3
- everything within the INTERFACE and IMPLEMENTATION start at 5 (they
are IN the INTERFACE or IMPLEMENTATION
- everything in a PROCEDURE is indented by 2 columns more than the header (they are IN the PROCEDURE)
- we use nested PROCEDURE or FUNCTIONs, for the same reason that we use local variables. In his Pascal P4 compiler, Niklaus WIRTH went up to 6 level nesting. For some Delphi utilities (pretty printers, lexers etc) this
depth was also sometimes reached
- our implementation methods are organized roughly in "usage order": first the CONSTRUCTOR, then the basic routines, then the more "heavy" routines, and at the end, the DESTRUCTOR
Alphabetical ordering would be the other common convention
- we never use abreviation: all identifiers are full words, as found in the Webster dictionary
2.5 - Never ending Conventions ?
At some time, we also used prefixes for
- open arrays:
VAR g_oa_closing_price: ARRAY OF Double= Nil; |
- initialized constants, or whatever name you give to those strange constructs:
CONST ki_limit: Integer= 15; |
But at some stage, you have to stop. If the benefit of using the prefixes is overhelmed by the time you spend finding a "good prefix", then use normal notation.
It is also easy to spot some inconsistencies in all those conventions:
- for the main unit, should we not use U_F_MAIN.PAS, the F standing for tForm ?
- if we write a component, all our identifier are in lowercase. Right ? But if
you purchase our component, the same identifier should have at least one uppercase. Someone's own identifier obviously can become someone else's outside identifier
- all our BEgin are always aligned with the matching ENd. But not for the THEN BEGIN, and this was done to save 1 line on an Apple ][ 24 line screen (even less with the editor's header and footer). Sure this could be
corrected, but currently in all our .ZIP this is the way we currently write IFs
3 - Where is the Balance ?
3.1 - Does everybody agree ?
Of course, not all developers will find this Alsacian notation pleasant and useful:
- some German gentleman told in a newsgroup that our style was "beruhmt geruchtig" (meaning something like "well known to be stinking"). Well, at
least, it is "beruhmt" !. Whether it is "geruchtig" or not, is your decision
- other told us that underscores make code unreadable: they pretend that
| export_sales_amount:= total_sales- domestic_sales |
is much less easy to read than:
| ExportSalesAmount:= TotalSales- DomesticSales |
We are quite indifferent on this count. We use underscores because, at a distance, it_looks_like_a_written_line_in_a_book or newspaper, whereas reading the Washington Post or the Times
WithTheCapitalConventionWouldBeUnnacceptable to most of us.
- the f_ for function would also be considered harmful. For instance, Bertrand MEYER in his Eiffel language, explained that the programmer
should see no difference between FUNCTION call and ARRAY indexing, because it is an implementation detail. With the following line:
total= amount* rate(value); |
the reader can understand that rate takes some "parameter or index" to return a value used in this expression. At this stage, whether the coder used an FUNCTION or an ARRAY is indifferent to the reader.
Maybe so, but not to the person trying to fix some bug:
- one friend of mine who is a math teacher explained that he could not use our "no abreviation" rule.
Well, this one is easy to answer: for mathematician the whole exercise it to climb the abstraction ladder.
"let ro be a Banach space"
and he will see loads of Axioms and Theorems jumping to this mind. The same goes for formulas like:
The code writer has a completely different problem on his hands: maintain the code. Lets assume that your job is to fix a code line like:
It certainly is more painful to fix than the following line:
sales_price:= net_price / (1- margin)+ sales_tax; |
However, for very mathematical programs, one has to agree that too long names will produce expression spread over many lines, and the reader might be fascinated by the tree, and totally miss the forest. So for algorithmic
code (matrix computation, wavelet filtering, option contract forecasting etc), we would have to adopt some shortened names. For invoicing, order processing, inventory recording, our identifiers are much more "short
lived". They change from project to project, from developer to developer. Nothing compared to universal constants like the Avogradro constant, or c^2, the light of speed. So, for maintenance purposes, nice long words are still best, in our opinion.
3.2 - Some other notation
- in the early days of Pascal, we used to prefix our variables with the first letter of the TYPE: ixxx for Integer, cxxx for Char, sxxx for String.
- this was later dropped, and we switched to the Alsacian notation which was first explained in books we wrote and published around 1982. Not in the introductory Pascal books, but in more specific books, like those on b-Trees
or 8086 disassembly programs
- when Windows 3.1 appeared, we all discovered the Hungarian
Notation, invented by Charles SIMONYI which was one of the Windows Developers. This notation mainly prefixes the variables with the TYPE:
As you can see, this notation is about TYPEs only, whereas the Alsacian Notation is about "program area": CONST, TYPE, global VAR, etc
- ulAccountNum : meaning that the account number is an unsigned long integer
- szName which stands for "zero-terminated string"
- pszOwner for a pointer to zero-terminated string
And it is not surprising that SIMONYI only prefixed the TYPEs, since C,
which was his favorite language, totally obfuscates the distinction between definitions, declarations, statements, constants etc. Just look at the .H horror, or the Struct mess, with mixed TYPEs and VARs. I am not
criticizing SIMONYI here, but the C language.
- Delphi brought the following conventions:
The AParameter convention is the quite strange, an I read somewhere that
A does not stand for the undefinite article, but I cannot remember why they chose this one.
- Txxx for type
- Pxxx for pointer types
- Fxxx for PRIVATE fields
- Axxx for parameters, like AOwner
- prefixes, like cl for "color", like in clRed
The main benefit of the Delphi notation is that is in the VCL source code, and most of the Delphi developers use it.
3.3 - The Bottom Line Writing code in a team, with some years of programming experience, will always foster some coding conventions. It quickly becomes irritating when the members
of the team use different notations: some will write "msg", other "messg", "Message" or "wm_xxx" etc. And when you have to maintain some code, and have to use a "message" in an assignment, you would never know which convention was
used by this code's original author without looking at the definition or declaration. So bringing some order to this chaos is only natural. Which convention your team decides to adopt, is a matter of common agreement.
I never explained the Alsacian notation on the web before. You will find tons of papers telling you which style you SHOULD use. Some of those written by people with only one or two years of programming on their hands. So writing a
"style paper" immediately looks moralizing: YOU tell the universe how THEY SHOULD behave. Therefore, the defense reaction of the reader automatically is: "and who are YOU anyway ?"
In addition, anyone has an opinion about notation and can jump into over-heated discussions about the benefits of this or that convention. Reminds me of Steve WOZNIAK (one of the creator of the Apple ][ ) who said that everybody had an
opinion about the style of the Mac Windows or the appearance of a button, but selecting the rigth electronic component and architecture was a very lonely decision.
Are we trying to convince you to use our notation ? Not at all. We have no
vested interest whatsoever in doing so. Sadly enough, whether you adopt it or not, will not make us any richer. However should we sometime subcontract you some piece of code, than we would definitely include it in the contract, but
this is another story. And if you don't like them in our .ZIP source codes, feel free to remove all g_, l_, k_, f_ etc.
So, at the end of the day, is the Alsacian notation better, in some sense, that
another one ? You have to decide for yourself.
For us, it comes down to Dollar amounts: which coding style will reduce the cost of programming most ? At the Pascal Institute, we found that the
Alsacian notation had some logic, which makes it reasonably easy to learn and remember. And it conveyes useful syntactic information, with an acceptable decrease in readability. Therefore it allows us to concentrate on much more
important decisions, like architecture, design and testing.
4 - Your Comments As usual:
- please tell us at firstname.lastname@example.org if you found some errors, mistakes, bugs, broken links or had some problem downloading the file. Resulting corrections will
be helpful for other readers
- we welcome any comment, criticism, enhancement, other sources or reference suggestion. Just send an e-mail to email@example.com.
- or more simply, enter your (anonymous or with your e-mail if you want an answer) comments below and clic the "send" button
- and if you liked this article, talk about this site to your fellow developpers, add a link to your links page ou mention our articles in your blog or newsgroup posts when relevant. That's the way we operate:
the more traffic and Google references we get, the more articles we will write.
5 - The author Felix John COLIBRI works at the Pascal
Institute. Starting with Pascal in 1979, he then became involved with Object Oriented Programming, Delphi, Sql, Tcp/Ip, Html, UML. Currently, he is mainly
active in the area of custom software development (new projects, maintenance, audits, BDE migration, Delphi
Xe_n migrations, refactoring), Delphi Consulting and Delph
training. His web site features tutorials, technical papers about programming with full downloadable source code, and the description and calendar of forthcoming Delphi, FireBird, Tcp/IP, Web Services, OOP / UML, Design Patterns, Unit Testing training sessions.