Expression Syntax

A C or C++ program is made up of statements of various kinds. This note explains the syntax of the expression statement.

Syntax means grammar. Grammar is the the surface structure of a program: it says what you are allowed to write, and to some extent, how the compiler will interpret what you write. However, syntax does not say it all. There are additional rules that must be followed if the compiler is to understand your program. In this note, we will indicate some of the more important rules. Even if the syntax is correct, and you follow the additional rules, there is no guarentee that the program will function correctly. Syntax is the first of many steps to a correctly functioning program.

Syntax is simple. It has a simple mathematical means of description whereby a complex statement is broken down by a fixed set of rules into its primitive components. If a statement can be broken down by the rules, then the syntax is correct. This breaking down is also called parsing.

Here is the rule for an expression statement:

The bold ; means that the semicolon appears literally. We have yet to describe expression, but the rule says that to form an expression-statement, take an expression an put a ; after it.

An expression is any of the following:

We have the additional definitions:

Important: When forming two-character symbols such as ==, there must be no spaces between the characters.


Now we show by an example how to use these definitions to parse an expression statement.

      i = 1 ;
is an expression statement. We use the above rules of parse the statement as follows:
  1. i is a variable name, hence it is an expression.
  2. 1 is a constant, hence it is an expression.
  3. i=1 is of the form expression assignment-operation expression, hence it is an expression.
  4. i=1; is an expression statement, it is of the form expression followed by a semi-colon.
This parsing is worked bottom-up, showing how the final text is built up from names, constants and operation symbols. Here's a parsing worked top-down: expression-statement
=> expression ;
=> variable-name assignment-operation expression ;
=> i = constant ;
=> i = 1 ;

The famous

    cout <<"Hello World"
line in the HellowWorld.C program is also an example of an expressions statement. cout is a variable name, << is an operator, although not one that we listed above, and "Hello World" is a constant, although of type string, rather than type integer. The point is that C++ code follows strict yet simple rules of construction. It pays to understand the pattern.

Here's another example:

    j = - ( i + 2 ) ;
is an expression-statment: expression-statement
=> expression ;
=> variable-name assignment-operation expression ;
=> j = unary-operation expression ;
=> j = - ( expression ) ;
=> j = - ( expression arithmetic-operation expression ) ;
=> j = - ( variable-name + constant ) ;
=> j = - ( i + 2 ) ;


Even though i = 1 ; has correct syntax, the compiler will still complain if it hasn't been previously introducted to the variable name, usually with a declaration such as:

   int i ;
Syntax is only one element of getting your program to compile. More insidieous than using a variable before it is declared is the popular habit among beginning programmers to use the value of a variable before anything has ever been stored in the variable:
   int i ;
   int j ;
   j = i + 1 ;
The compiler will generally not complain about this situation, but it will cause the program to malfunction.

Whenenver you use a variable, mentally check that:

  1. The variable has been declared, so that the compiler is happy.
  2. The value of the variable has been set, so that you are happy.

The syntax of an expression is often ambiguous, so that the text can be parsed in two different ways. Be very careful that you and the compiler agree on the parsing. Use parenthesis to make the intended meaning unambigous. For instance,

    3 * 1 / 2 
is ambigouous. Is it (3*1)/2 or 3*(1/2)? The first evaluates to 1, the second evaluates to 0. It is very important to resolve this ambiguity.

These two parse sequences make clear the ambibuity:

Parse version 1:
expression
=> expression arithmetic-operation expression
=> constant * expression arithmetic-operation expression
=> 3 * constant / constant
=> 3 * 1 / 2
Parse version 2:
expression
=> expression arithmetic-operation expression
=> expression constant

=> epression arithmetic-operation expression / 2
=> constant * constant / 2
=> 3 * 1 / 2
Which parse sequence do you mean? Which does the compiler mean? Do all compilers agree with your compiler on the parsing? All very fascinating questions, all have good answers. However, I suggest talking to your compiler like this:
    (3 * 1) / 2
This expression has correct syntax, and can be parsed in only one way, because of the parentheses.


Exercises

Parse the following (top-down, as illustrated above):
  1. i = 2 * j ;
  2. i = 2 * j + 3 ; // Ambiguous - is it obvious which to choose?
  3. i = 2 + j - 3 ; // Ambiguous - but does it matter?
  4. (2) ; // Useless, but OK syntax.
  5. i = i ; // Useless, but OK syntax.
  6. i = ( j = 3 ) ; // OK, but what does it mean?
  7. i = 2 - j + 3 ; // Ambiguous - is it obvious which to choose?
The following are not correct syntax. Convince yourself that they cannot be parsed by our rules:
  1. 3 = i ;
  2. 2 - ;
  3. 2 i + 1 ;
  4. 8-) ;