In this assignment you will implement two lexers and parsers for a small language called Fish (short for Fortran-ish). You will use parsing combinators to construct the first lexer and parser. You will use the tools Lex and Yacc to build the second front-end.
huids.txt
. This will help us perform blind grading.The Fish language is very simple and only includes integer expressions and statements. Fish is intended to have syntax similar to that of C and Java (in fact, we’ll soon scale Fish up to have more features of these languages).
Here is an example Fish program:
/* this is a comment */
x = 42;
y = 31;
for (s = 0; x > 0; x = x - 1) {
y = y + 1;
}
return y;
More example programs can be found in the test directory.
We have defined the abstract syntax of Fish in ast.ml. However, in the interest of not giving too much away, we have intentionally not specified the language’s concrete syntax thoroughly. In the course of this assignment you will design grammars defining the concrete syntax of Fish. Here are some guidelines that we expect you to follow:
/*
and end with */
. Comments do not nest.*
and /
bind tighter than +
and -
, which bind tighter than comparisons,
and comparisons bind tighter than &&
and ||
, and finally, assignment comes last in the precedence.
There are some tricky cases, and it might be helpful to look up C operator precedence.-x
.while(1) /* do nothing */;
.
You can use skip
(defined in ast.ml) to represent an empty statement.In general, we expect that the concrete syntax you develop for Fish will support “reasonable” programs, such as the ones in the test directory (which are not exhaustive). Feel free to post questions clarifying syntax on Piazza.
The file lcombinators.ml contains parsing combinator definitions similar to the ones that we saw in class and which you should use to define your first lexer and parser for Fish. The files comblexer.ml and combparser.ml contain basic definitions for the lexer and parser. You will modify these files to build a parser for Fish programs. Additionally, please write the BNF grammar that your parser implements in a comment in combparser.ml. This grammar must be unambiguous.
To help you get started, we have written a basic parser that can parse some arithmetic expressions and return statements. You can test the parser like so:
$ make
...
$ cat test/01cexpr_01add.fish
return 12 + 4;
$ ./proj2comb test/01cexpr_01add.fish
answer = 16
Feel free to change or remove any of the skeleton code as you see fit.
You can use dummy values for position information.
For this part of the assignment, all arithmetic and logical operations may be
parsed as right associative.
For example, return 1 - 2 - 3;
may be parsed as return 1 - (2 - 3);
.
Note that operator precedence should still be followed.
You may need to recursively refer to a parser you are composing. Do it lazily! Consider the following simple grammar:
exp := n | '(' exp ')'
Your program will immediately diverge if you attempt to do something like the following:
let rec make_exp_parser () =
...
let subexp_parser = seq (lparen_parser, (seq (make_exp_parser (), rparen_parser))) in
...
We have provided you a lazy_seq
combinator to avoid infinite recursion.
Your grammar cannot be left-recursive. Implementing the following grammar with combinators will result in a divergent program:
exp := n | e1:exp '+' e2:exp
Instead you need to manipulate it to something like the following:
exp := n exp_rest
exp_rest := ε | '+' e2:exp
Please test your parser! We have provided two ways to do so.
First, as mentioned above, make
will generate the executable proj2comb
, which expects a Fish source file to parse.
Second, you can run make testcomb
to test your implementation on a set of test cases.
The cases we have provided are identical to the ones in the test directory.
Since these are far from exhaustive, you will want to add more, which can be done by inlining them in the test.py script.
You might want to create negative examples (i.e., ill-formed programs that your parser should reject).
In summary, there are three deliverables for this part: 1. A lexer defined in comblexer.ml. 1. An unambiguous BNF grammar for the concrete syntax of Fish in combparser.ml. 1. A parser implementing this grammar in combparser.ml.
The files lex.mll, parse.mly, and fishyacc.ml contain the boilerplate code that you’ll need for your Lex and Yacc-based front-end (you should only have to make changes to the first two of these files). Tutorials for Ocamllex and Ocamlyacc can be found here and here, respectively.
parse.output
, which can be helpful for debugging conflicts.proj2yacc
executable generated by make
to test individual Fish source files or run make testyacc
to test your implementation on the cases in test.py.In summary, there are two deliverables for this part: 1. A lexer with rules defined in lex.mll and tokens defined in parse.mly. 1. A parser defined in parse.mly