menu
  Home  ==>  papers  ==>  colibri_utilities  ==>  the_alsacian_notation   

The Alsacian notation - Felix John COLIBRI.

  • abstract : the Alsacian prefixing notation: k_constant, t_type, g_global, l_local, p_parameter. Presentation, rationale and benefits.
  • key words : programming style - notation - coding conventions - Hungarian Notation
  • scope : Pascal, Delphi in all its makes and shapes
  • level : Pascal / Delphi developer
  • plan :


1 - Why use Coding notations

When we write some new piece of code, we have all the project in our head. We named all the identifiers, and remember what they stand for.

On the other hand, when we maintain a project, we have to look at some lines that we wrote weeks or months ago, or written by someone who was moved to another project or even left the company. In this case, each line should convey as much information as possible to help us understand the intent of the original developer.

The overall direction is to continuously move toward more abstract and complex concepts. In 1980, a program had a dozen of Integer, Strings, ARRAYs. Today we handle sophisticated CLASSes with attributes which are themselves structured elements. And everybody would agree that a CLASS is more "heavy" than a lonely Boolean.

Therefore, in order somehow reduce the complexity of understanding what we are manipulating, we adopted the Alsacian Notation.




2 - The Alsacian Notation

2.1 - The basic notation

In a nutshell we prefix the identifier names according to their program area:
  • all CONST are prefixed with k_
  • TYPEs with t_
  • global VAR with g_
  • local VAR with l_
  • parameters with p_
  • FUNCTIONs with f_


Here is a quick example:

program p_first_example;
  Const k_margin= 0.90;
  type t_moneyDouble;
  var g_net_amountt_money;

  function f_quantityInteger;
    begin
      Result:= 100;
    end// f_quantity

  procedure compute_invoice(var pv_net_amountt_money);
    var l_sales_pricet_money;
    begin
      pv_net_amount:= k_marginl_sales_pricef_quantity ;
    end// compute_invoice

  begin // main
    compute_invoice(g_net_amount);
  end// main



The rationale for using those prefixes is the following:

  • in a program, Niklaus WIRTH made a sharp distinction between CONST, TYPE, VAR and statements
  • when you maintain code, and try to fix some piece of code, it might be difficult to infer from the line you are reading what kind of identifier you are using
  • without prefixes, the code would look like:

    net_amount:= marginsales_pricequantity ;

    which is syntactically correct. But

    • can I change the value of margin ?
    • has any of the left side identifiers some side effects ?
    • can I locally rename any of those identifiers, without raising compiler errors ?

  • on the other hand:

    g_net_amount:= f_marginpv_sales_pricel_quantity ;

    does tells a completely different story as the original line:

    pv_net_amount:= k_marginl_sales_pricef_quantity ;



2.2 - The full Notation

Here is a more complete presentation of the Alsacian Notation
  • constants:
    • all CONST are prefixed with k_

      CONST k_rate_count= 1000;


  • TYPEs all start with t_
    • ordinary TYPEs like:

      CONST k_count= 100;
      TYPE t_ratesARRAY[0..k_count- 1] OF Double;

    • enumeration identifiers start with an e_

      TYPE t_payment= (e_unknown_payment
                          e_check_paymente_credit_card_paymente_cash_payment);
      VAR g_paymentt_payment;

        IF g_paymente_credit_card_payment
          THEN ...


    • For pointers, the prefix becomes t_pt_

      TYPE t_pt_customer_list= ^t_customer;
           t_customerRECORD
                         m_nameString;
                         m_ageInteger;
                         m_pt_next_customert_pt_customer;
                       END// t_customer

      VAR g_pt_customert_pt_customerNIL;  

        New(g_pt_customer);
        g_pt_customer^:= 'IBM';


    • procedural types would look like:

      TYPE t_pr_compute_areaPROCEDURE(VAR Double);
           t_f_convert_currencyFUNCTION(Double): Double;

      PROCEDURE compute_square(VAR pv_sideDouble);
        BEGIN
        END// compute_square

      FUNCTION f_euro_to dollar(p_euro_to_dollarDouble): Double;
        BEGIN
        END// f_euro_to_dollar

      VAR g_pr_compute_areat_pr_compute_area;
          g_f_convert_currencyt_f_convert_currency;

        g_pr_compute_area:= compute_square;
        g_pr_compute_area(200.15);

        g_f_convert_currency:= f_euro_to_dollar;
        g_amount:= g_f_convert_currency(1.33);


    • and for PROCEDURE OF OBJECT and events:

      TYPE t_po_resize_shapePROCEDURE(DoubleOF OBJECT;
           t_po_socket_eventPROCEDURE(c_client_socketOF OBJECT;

           c_figure
               CLASS
                 PROCEDURE resize_shape(p_ratioDouble);
               END// c_figure

           c_ftp
               CLASS
                 m_on_received_commandt_po_socket_event;
               END// c_ftp



  • For CLASSes
    • the prefix is c_, to stress the importance of those elements.
    • the attributes of a CLASS use the m_ prefix (m like Member)
    TYPE 
      c_shape
         CLASS
           m_sizeInteger
         END// c_shape


  • PROCEDUREs and FUNCTIONs
    • FUNCTIONs use f_
    • value parameters have a p_, and VAR parameters pv_, CONST parameters pk_

    PROCEDURE convert(p_amountp_rateDoublevar pv_new_amountDouble)

    FUNCTION f_projection(p_lengthp_angleDoubleCONST pk_unitDouble): Double


    And if the parameter is a pointer or a CLASS we cumulate the prefixes:

    PROCEDURE c_figure.resize_figure(p_c_shapec_shape);

  • for variables:
    • global VAR have a starting g_
    • local VAR an l_
    And this is eventually combined with the previous notation about pointers and CLASSes

    VAR g_amountInteger;
        g_c_salesc_salesNil;
        g_pt_customert_pt_customerNil;

    PROCEDURE compute_total(p_rateInteger);
      VAR l_totalInteger;
          l_c_statisticsc_statistics;
      BEGIN
        l_total:= p_rateg_c_sales.m_exportl_c_statistics.m_month(7);
      END// compute_total





2.3 - quick example

Here is a typical example, from our .XML parser (used, for instance, to parse .RSS feeds). An .XML file is made of
  • a starting tags <my_tag>
  • some text
  • an ending tag, or anti tag </my_tag)
and the text can itself contain free text, some <tag> text </tag> etc

Here is the function which analyzes a tag (partial, nested function, see the RSS Reader paper for the full sources):

function f_c_parse_tag_content_recursivec_xml_tag;
   var l_c_xml_sub_tagc_xml_tag;
      l_c_xml_stringc_xml_string;
  BEGIN
    check(e_opening_tag_symbol);
    read_symbol;

    Result:= f_c_parse_tag_name_and_attributes;

    IF l_symbol_typee_closing_tag_symbol
      THEN BEGIN
          // -- skip ">"
          read_text_symbol;

          // -- await "</"
          while l_symbol_type<> e_opening_anti_tag_symbol do
          begin
            IF l_symbol_typee_opening_tag_symbol
              THEN begin
                  // -- a nested tag: recurse
                  l_c_xml_sub_tag:= f_c_parse_tag_content_recursive;
                  Result.m_c_xml_tag_content_list.add_xml_string(l_c_xml_sub_tag);
                end
              else begin
                  // -- any content different from a tag
                  if not f_contains_only(l_symbol_string, [' 'k_tabulationk_returnk_line_feed])
                    then begin
                        l_c_xml_string:= c_xml_string.create_xml_string(l_symbol_string);
                        Result.m_c_xml_tag_content_list.add_xml_string(l_c_xml_string);
                      end;

                  read_symbol;
                end;
          end// while

          // -- skip "</"
          check(e_opening_anti_tag_symbol);
          read_symbol;

          f_parse_tag_name;
          check(e_closing_tag_symbol);

          read_symbol;
        END // l_symbol_type= e_closing_tag_symbol
      ELSE // l_symbol_type<> e_closing_tag_symbol
        IF l_symbol_typee_closing_anti_tag_symbol
          THEN
            read_symbol
          ELSE display_parser_error('>, />');
  END// f_c_parse_tag_content_recursive




2.4 - Convention Gallore

We use other conventions, not directly related to program identifiers
  • for files, the following prefixes allow us to quickly identify the file type
    • u_ for UNITs
    • p_ for PROGRAMs or Delphi projects
    • u_c_ for UNITs mainly containing CLASSes
    • d_ for DLLs
    • pk_ for Packages
  • for identifier casing
    • the keywords are un upper case: BEGIN
    • identifier from Windows, Delphi UNITs or any other outside library have at lease one uppercase letter: tForm1
    • the identifier we create are all in lowercase : g_c_account.
      This allows any reader to quickly find whether an identifier is part of an outside library, or was created by us
  • indentation
    • everything IN a UNIT starts at lease at 3 (they are IN the UNIT)
    • INTERFACE, IMPLEMENTATION, END. are at 3
    • everything within the INTERFACE and IMPLEMENTATION start at 5 (they are IN the INTERFACE or IMPLEMENTATION
    • everything in a PROCEDURE is indented by 2 columns more than the header (they are IN the PROCEDURE)
  • we use nested PROCEDURE or FUNCTIONs, for the same reason that we use local variables. In his Pascal P4 compiler, Niklaus WIRTH went up to 6 level nesting. For some Delphi utilities (pretty printers, lexers etc) this depth was also sometimes reached
  • our implementation methods are organized roughly in "usage order": first the CONSTRUCTOR, then the basic routines, then the more "heavy" routines, and at the end, the DESTRUCTOR
    Alphabetical ordering would be the other common convention
  • we never use abreviation: all identifiers are full words, as found in the Webster dictionary


2.5 - Never ending Conventions ?

At some time, we also used prefixes for
  • open arrays:

    VAR g_oa_closing_priceARRAY OF DoubleNil;

  • initialized constants, or whatever name you give to those strange constructs:

    CONST ki_limitInteger= 15;

      ki_limit:= 33;




But at some stage, you have to stop. If the benefit of using the prefixes is overhelmed by the time you spend finding a "good prefix", then use normal notation.



It is also easy to spot some inconsistencies in all those conventions:

  • for the main unit, should we not use U_F_MAIN.PAS, the F standing for tForm ?
  • if we write a component, all our identifier are in lowercase. Right ? But if you purchase our component, the same identifier should have at least one uppercase. Someone's own identifier obviously can become someone else's outside identifier
  • all our BEgin are always aligned with the matching ENd. But not for the THEN BEGIN, and this was done to save 1 line on an Apple ][ 24 line screen (even less with the editor's header and footer). Sure this could be corrected, but currently in all our .ZIP this is the way we currently write IFs


3 - Where is the Balance ?

3.1 - Does everybody agree ?

Of course, not all developers will find this Alsacian notation pleasant and useful:
  • some German gentleman told in a newsgroup that our style was "beruhmt geruchtig" (meaning something like "well known to be stinking"). Well, at least, it is "beruhmt" !. Whether it is "geruchtig" or not, is your decision

  • other told us that underscores make code unreadable: they pretend that

    export_sales_amount:= total_salesdomestic_sales

    is much less easy to read than:

    ExportSalesAmount:= TotalSalesDomesticSales

    We are quite indifferent on this count. We use underscores because, at a distance, it_looks_like_a_written_line_in_a_book or newspaper, whereas reading the Washington Post or the Times WithTheCapitalConventionWouldBeUnnacceptable to most of us.

  • the f_ for function would also be considered harmful. For instance, Bertrand MEYER in his Eiffel language, explained that the programmer should see no difference between FUNCTION call and ARRAY indexing, because it is an implementation detail. With the following line:

    totalamountrate(value);

    the reader can understand that rate takes some "parameter or index" to return a value used in this expression. At this stage, whether the coder used an FUNCTION or an ARRAY is indifferent to the reader.

    Maybe so, but not to the person trying to fix some bug:

    • here is the first version:

      totalamountrate[value];

      we immediately know that

      • we cannot use any kind of type for the value index (it must be a scalar: Integer, enumeration etc) and not Double, or RECORD
      • we must check that we a using the R+ compiler option
      • in a bullet proof project, we SHOULD check whether the index is between both limits, or at least lower than Length or Count or whatever

    • and the second version:

      totalamountf_rate(value);

      so the seasoned developer would remember to check

      • is this parameter a value or VAR parameter
      • does the FUNCTION have any side effects ?

  • one friend of mine who is a math teacher explained that he could not use our "no abreviation" rule.

    Well, this one is easy to answer: for mathematician the whole exercise it to climb the abstraction ladder.

     
    "let ro be a Banach space"

    and he will see loads of Axioms and Theorems jumping to this mind. The same goes for formulas like:

     
    pv= RT
    V= RI
    e= mc^2

    The code writer has a completely different problem on his hands: maintain the code. Lets assume that your job is to fix a code line like:

    mn / (1- m)+ t

    It certainly is more painful to fix than the following line:

    sales_price:= net_price / (1- margin)+ sales_tax;

    However, for very mathematical programs, one has to agree that too long names will produce expression spread over many lines, and the reader might be fascinated by the tree, and totally miss the forest. So for algorithmic code (matrix computation, wavelet filtering, option contract forecasting etc), we would have to adopt some shortened names. For invoicing, order processing, inventory recording, our identifiers are much more "short lived". They change from project to project, from developer to developer. Nothing compared to universal constants like the Avogradro constant, or c^2, the light of speed. So, for maintenance purposes, nice long words are still best, in our opinion.



3.2 - Some other notation

  • in the early days of Pascal, we used to prefix our variables with the first letter of the TYPE: ixxx for Integer, cxxx for Char, sxxx for String.

  • this was later dropped, and we switched to the Alsacian notation which was first explained in books we wrote and published around 1982. Not in the introductory Pascal books, but in more specific books, like those on b-Trees or 8086 disassembly programs

  • when Windows 3.1 appeared, we all discovered the Hungarian Notation, invented by Charles SIMONYI which was one of the Windows Developers. This notation mainly prefixes the variables with the TYPE:
    • ulAccountNum : meaning that the account number is an unsigned long integer
    • szName which stands for "zero-terminated string"
    • pszOwner for a pointer to zero-terminated string
    As you can see, this notation is about TYPEs only, whereas the Alsacian Notation is about "program area": CONST, TYPE, global VAR, etc

    And it is not surprising that SIMONYI only prefixed the TYPEs, since C, which was his favorite language, totally obfuscates the distinction between definitions, declarations, statements, constants etc. Just look at the .H horror, or the Struct mess, with mixed TYPEs and VARs. I am not criticizing SIMONYI here, but the C language.

  • Delphi brought the following conventions:
    • Txxx for type
    • Pxxx for pointer types
    • Fxxx for PRIVATE fields
    • Axxx for parameters, like AOwner
    • prefixes, like cl for "color", like in clRed
    The AParameter convention is the quite strange, an I read somewhere that A does not stand for the undefinite article, but I cannot remember why they chose this one.

    The main benefit of the Delphi notation is that is in the VCL source code, and most of the Delphi developers use it.



3.3 - The Bottom Line

Writing code in a team, with some years of programming experience, will always foster some coding conventions. It quickly becomes irritating when the members of the team use different notations: some will write "msg", other "messg", "Message" or "wm_xxx" etc. And when you have to maintain some code, and have to use a "message" in an assignment, you would never know which convention was used by this code's original author without looking at the definition or declaration. So bringing some order to this chaos is only natural. Which convention your team decides to adopt, is a matter of common agreement.

I never explained the Alsacian notation on the web before. You will find tons of papers telling you which style you SHOULD use. Some of those written by people with only one or two years of programming on their hands. So writing a "style paper" immediately looks moralizing: YOU tell the universe how THEY SHOULD behave. Therefore, the defense reaction of the reader automatically is: "and who are YOU anyway ?"

In addition, anyone has an opinion about notation and can jump into over-heated discussions about the benefits of this or that convention. Reminds me of Steve WOZNIAK (one of the creator of the Apple ][ ) who said that everybody had an opinion about the style of the Mac Windows or the appearance of a button, but selecting the rigth electronic component and architecture was a very lonely decision.

Are we trying to convince you to use our notation ? Not at all. We have no vested interest whatsoever in doing so. Sadly enough, whether you adopt it or not, will not make us any richer. However should we sometime subcontract you some piece of code, than we would definitely include it in the contract, but this is another story. And if you don't like them in our .ZIP source codes, feel free to remove all g_, l_, k_, f_ etc.

So, at the end of the day, is the Alsacian notation better, in some sense, that another one ? You have to decide for yourself.

For us, it comes down to Dollar amounts: which coding style will reduce the cost of programming most ? At the Pascal Institute, we found that the Alsacian notation had some logic, which makes it reasonably easy to learn and remember. And it conveyes useful syntactic information, with an acceptable decrease in readability. Therefore it allows us to concentrate on much more important decisions, like architecture, design and testing.




4 - Your Comments

As usual:
  • please tell us at fcolibri@felix-colibri.com if you found some errors, mistakes, bugs, broken links or had some problem downloading the file. Resulting corrections will be helpful for other readers
  • we welcome any comment, criticism, enhancement, other sources or reference suggestion. Just send an e-mail to fcolibri@felix-colibri.com.
  • or more simply, enter your (anonymous or with your e-mail if you want an answer) comments below and clic the "send" button
    Name :
    E-mail :
    Comments * :
     

  • and if you liked this article, talk about this site to your fellow developpers, add a link to your links page ou mention our articles in your blog or newsgroup posts when relevant. That's the way we operate: the more traffic and Google references we get, the more articles we will write.



5 - The author

Felix John COLIBRI works at the Pascal Institute. Starting with Pascal in 1979, he then became involved with Object Oriented Programming, Delphi, Sql, Tcp/Ip, Html, UML. Currently, he is mainly active in the area of custom software development (new projects, maintenance, audits, BDE migration, Delphi Xe_n migrations, refactoring), Delphi Consulting and Delph training. His web site features tutorials, technical papers about programming with full downloadable source code, and the description and calendar of forthcoming Delphi, FireBird, Tcp/IP, Web Services, OOP  /  UML, Design Patterns, Unit Testing training sessions.
Created: nov-04. Last updated: jul-15 - 98 articles, 131 .ZIP sources, 1012 figures
Copyright © Felix J. Colibri   http://www.felix-colibri.com 2004 - 2015. All rigths reserved
Back:    Home  Papers  Training  Delphi developments  Links  Download
the Pascal Institute

Felix J COLIBRI

+ Home
  + articles_with_sources
    + database
    + web_internet_sockets
    + oop_components
    + uml_design_patterns
    + debug_and_test
    + graphic
    + controls
    + colibri_utilities
      – delphi_net_bdsproj
      – dccil_bat_generator
      – coliget_search_engine
      – dfm_parser
      – dfm_binary_to_text
      – component_to_code
      – exe_dll_pe_explorer
      – dll_process_viewer
      – the_alsacian_notation
      – html_help_viewer
      – cooking_the_code
      – events_record_playback
    + colibri_helpers
    + delphi
    + firemonkey
    + compilers
  + delphi_training
  + delphi_developments
  + sweet_home
  – download_zip_sources
  + links
Contacts
Site Map
– search :

RSS feed  
Blog