Google Summer of Code: Week 2

Posted on June 7, 2019 by Ankit Pandey

See the previous post for Week 1.

For this week, I’ve continued working on adding support for LFortran to SymPy’s code generation capabilities. This week mostly involved getting the infrastructure for testing the functionality of the new code generator working. I also extended the number of expressions the generator can handle, in addition to adding to LFortran’s ability to parse numbers upstream.

More Expressions

I’ve added support for four more expression types that the generator can handle: Float, Rational, Pow and Function. Since our base translation class was already in place from last week, implementing these was relatively straightforward and involved just defining the node visitors for each expression type (The commit that implements this can be found here). Here’s a demonstration showing the abstract syntax tree generated from translating the expression (43)x\left(\frac{4}{3}\right)^{x} :

However, the translator fails for expressions that should in theory work. Right now, we can’t add an integer to a symbol because symbols default to real numbers, resulting in a type mismatch:

Fortran allows the implicit conversion of a float to a real, and the expression shouldn’t generate an error. This is functionality that will hopefully be implemented by the time I come back to this project close to the end of the summer.

Testing the LFortran Converter

I also added the initial infrastructure for testing the new code generation functions, with the starting commit available here. As Aaron mentioned in one of our meetings, the plan right now is for code generated by the LFortran backend to be equivalent to the output generated by the existing fcode at the AST level. Each test should be in the form of an assertion that tests the (parsed) output of fcode applied to a SymPy expression against the same AST generated by our newly implemented sympy_to_lfortran. The LFortran project already has code to check generated ASTs against expected values, so I adapted this to the testing library of our code generator (I’m also not sure how this works in terms of licensing, since both SymPy and LFortran use the BSD-3 license).

One problem that immediately became apparent was the way that LFortran represents numbers. Looking at the expression tree above, the real numbers are actually stored as strings. On the parser side, LFortran stores a real number as the string used to represent that number. This means that the ASTs of two expressions that represent the same number in different ways are not identical (for example, 1.0_dp and 1.d0 both represent the same double precision floating point number, but the strings stored by LFortran will be different). It’s only at the “annotation” stage of evaluation that LFortran canonicalizes floating point representations. For now, the tests use the annotation function of this stage, and I filed a merge request on the LFortran project to add support for parsing numbers in the way that fcode generates them.

While the initial infrastructure is in place, I haven’t added any tests yet. Since the LFortran project is still in early alpha, the functionality needed to compare the syntax tree made by the builder API against the syntax tree parsed from the output of fcode hasn’t been implemented yet. Again, this is something that will hopefully be implemented in LFortran near the end of the summer when I start on this portion of the project again.

Contributing Upstream

After I filed the merge request to add the functionality I needed to LFortran, Ondřej (the creator of LFortran and one of my mentors) mentioned that he was planning on eventually removing the module I contributed to. The merge request I filed actually wasn’t the one I had in mind at first. I thought about adding support for canonicalizing number nodes right after they’re created in the builder API, but I decided against this because I felt that any changes I made would have to be minimally invasive. In retrospect, this was probably a misplaced concern, since it’s important to consider the development stage of a project when deciding how much of it should be changed. Because of this, LFortran will probably end up with something I opted at the moment to not implement.

Next Steps

There’s still some work left to be done with LFortran, such as filing issues I encountered and preparing the pull request for a merge (though it’ll probably remain a work in progress for some time). After that, I’ll be finished with LFortran for the time being and move on to extending support for matrix expressions in the Python code generator. The Python code generator can already convert (most) matrix expressions through NumPy, though there are still some bugs owing to an incomplete implementation. For next week, I’ll have to figure out what this missing functionality is how it can be implemented.