Thursday, August 18, 2016

It's alive! - The final days of GSOC 2016

I promised asyncio with async-await syntax, so here you have it :)
I fixed all bugs I could find (quite many to be more exact, which is normally a good sign), and as a result I could run some programs using asyncio with async-await syntax without any error, with the same results cpython 3.5 would give.
I implemented all tests of cpython for the async-await syntax and added some tests to check if everything works together well if combined in a more complex program. It's all working now.
The GIL problem I described earlier was just an incompatibility of asyncio with the PyPy interpreter (pyinteractive). The error does not occur otherwise.
I have been working on the py3.5-async branch in the PyPy repository lately and that is also where I did all my checks if everything is working. I also merged all my work into the main py3.5 branch. This branch was really broken though, because there are some major changes from py3k to py3.5. I fixed some things in there, and the PyPy team managed to fix a lot of it as well. My mentor sometimes took some time to get important things to work, so my workflow wouldn't get interrupted. While I was working on py3.5-async, a lot of fixes have been done on py3.5, changing the behaviour a bit and (possibly) breaking some other things. I have not yet checked everything of my proposal on this branch yet, but with some help I think I managed to get everything working there as well. At least it all looks very promising for now.
Next to that I also managed to do some fixes I could find in the unpack feature of my proposal. There have been some special cases which lead to errors, for example if a function looks like this: def f(*args, **kwds), and a call like that: f(5,b=2,*[6,7]), I got a TypeError saying “expected string, got list object”. The problem here was that certain elements had a wrong order on the stack, so pulling them normally would not work. I made a fix checking for that case, there are probably better solutions but it seems to work for now.


I will probably keep working on py3.5 for a little bit longer, because I would like to do some more things with the work of my proposal. It would be interesting to see a benchmark to compare the speed of my work with cpython. Also there is a lot to be done on py3.5 to work correctly, so I might want to check that as well.



Here is a link to all of my work:
https://bitbucket.org/pypy/pypy/commits/all?search=Raffael+Tfirst



My experience with GSOC 2016
It's really been great to work with professional people on such a huge project. My mind was blown as I could witness how much I was able to learn in this short time. The PyPy team was able to motivate me to achieve the goal of my proposal and to get a lot more interested in compiler construction than I have already been before.
I am glad I took the chance to participate in this years GSOC. Of course my luck with having such a helpful and productive team on my side is one of the main reasons why I enjoyed it so much.

Thursday, August 4, 2016

Only bug fixes left!


All changes of cpython have been implemented in PyPy, so all that's left to do now is fixing some bugs. Some minor changes had to be done, because not everything of cpython has to be implemented in PyPy. For example, in cpython slots are used to check the existence of a function and then call it. The type of an object has the information of valid functions, stored as elements inside structs. Here's an example of how the __await__ function get's called in cpython:
ot = Py_TYPE(o);
if (ot->tp_as_async != NULL) {
getter = ot→tp_as_async→am_await;
}
if (getter != NULL) {
PyObject *res = (*getter)(o);
if (res != NULL) { … }
return res;
}
PyErr_Format(PyExc_TypeError,
"object %.100s can't be used in 'await' expression",
ot→tp_name);
return NULL;

This 'getter' directs to the am_await slot in typeobject.c. There, a lookup is done having '__await__' as parameter. If it exists, it gets called, an error is raised otherwise.
In PyPy all of this is way more simple. Practically I just replace the getter with the lookup for __await__. All I want to do is call the method __await__ if it exists, so that's all to it. My code now looks like this:
w_await = space.lookup(self, “__await__”)
if w_await is None: …
res = space.get_and_call_function(w_await, self)
if is not None: …
return res


I also fixed the _set_sentinel problem I wrote about in the last post. All dependency problems of other (not yet covered) Python features have been fixed. I can already execute simple programs, but as soon as it gets a little more complex and uses certain asyncio features, I get an error about the GIL (global interpreter lock):
“Fatal RPython error: a thread is trying to wait for the GIL, but the GIL was not initialized”
First I have to read some descriptions of the GIL, because I am not sure where this problem could come up in my case.
There are also many minor bugs at the moment. I already fixed one of the bigger ones which didn't allow async or await to be the name of variables. I also just learned that something I implemented does not work with RPython which I wasn't aware of. My mentor is helping me out with that.
I also have to write more tests, because they are the safest and fastest way to check for errors. There are a few things I didn't test enough, so I need to catch up on writing tests a bit.


Things are not going forward as fast as I would love it to, because I often get to completely new things which I need to study first (like the GIL in this case, or the memoryview objects from the last blog entry). But there really shouldn't be much left to do now until everything works, so I am pretty optimistic with the time I have left. If I strive to complete this task soon, I am positive my proposal will be successful.

Thursday, July 21, 2016

Progress async and await


It's been some time, but I made quite some progress in the new async feature of Python 3.5! There is still a bit to be done though and the end of this years Google Summer of Code is pretty close already. If I can do it in time will mostly be a luck factor, since I don't know how much I will still have to do in order for asyncio to work. The module is dependent of many new features from Python 3.3 up to 3.5 that have not been implemented in PyPy yet.

Does async and await work already?
Not quite. PyPy now accepts async and await though, and checks pretty much all places where it is allowed and where it is not. In other words, the parser is complete and has been tested.
The code generator is complete as well, so the right opcodes get executed in all cases.
The new bytecode instructions I need to handle are: GET_YIELD_FROM_ITER, GET_AWAITABLE, GET_AITER, GET_ANEXT, BEFORE_ASYNC and SETUP_ASYNC.
These opcodes do not work with regular generators, but with coroutine objects. Those are based on generators, however they do not imlement __iter__ and __next__ and can therefore not be iterated over. Also generators and generator based coroutines (@asyncio.coroutines in asyncio) cannot yield from coroutines. [1]
I started implementing the opcodes, but I can only finish them after asyncio is working as I need to test them constantly and can only do that with asyncio, because I am unsure what the values normally lying on the stack are. That is also valid for some functions in coroutine objects. Coroutine objects are working, however they are missing a few functions needed for the async await-syntax feature.
These two things are the rest I have to do though, everything else is tested and should therefore work.

What else has been done?
Only implementing async and await would have been too easy I guess. With it comes a problem I already mentioned, and that is the missing dependencies of Python 3.3 up to 3.5.
The module sre (offers support for regular expressions) was missing a macro named MAXGROUPS (from Python 3.3), the magic number standing for the number of constants had to be updated as well. The memoryview objects also got an update from Python 3.3 that is needed for an import. It has a function called “cast” now, which converts memoryview objects to any other predefined format.
I just finished implementing this as well, now I am at the point where it says inside threading.py:
_set_sentinel = _thread._set_sentinel
AttributeError: 'module' object has no attribute '_set_sentinel'

What to do next?
My next goal is that asyncio works and the new opcodes are implemented. Hopefully I can write about success in my next blog post, because I am sure I will need some time to test everything afterwards.

A developer tip for execution of asyncio in pyinteractive (--withmod)
(I only write that as a hint because it gets easily skipped in the PyPy doc, or at least it happened to me. The PyPy team already thought about a solution for that though :) )
Asyncio needs some modules in order to work which are by default not loaded in pyinteractive. If someone stumbles across the problem where PyPy cannot find these modules, –withmod does the trick [2]. For now, –withmod-thread and –withmod-select are required.

[1] https://www.python.org/dev/peps/pep-0492/
[2] http://doc.pypy.org/en/latest/getting-started-dev.html#pyinteractive-py-options


Update (23.07.): asyncio can be imported and works! Well that went better than expected :)
For now only the @asyncio.coroutine way of creating coroutines is working, so for example the following code would work:

import asyncio
@asyncio.coroutine
def my_coroutine(seconds_to_sleep=3):
    print('my_coroutine sleeping for: {0} seconds'.format(seconds_to_sleep))
    yield from asyncio.sleep(seconds_to_sleep)
loop = asyncio.get_event_loop()
loop.run_until_complete(
    asyncio.gather(my_coroutine())
)
loop.close(

(from http://www.giantflyingsaucer.com/blog/?p=5557)

And to illustrate my goal of this project, here is an example of what I want to work properly:

import asyncio

async def coro(name, lock):
    print('coro {}: waiting for lock'.format(name))
    async with lock:
        print('coro {}: holding the lock'.format(name))
        await asyncio.sleep(1)
        print('coro {}: releasing the lock'.format(name))

loop = asyncio.get_event_loop()
lock = asyncio.Lock()
coros = asyncio.gather(coro(1, lock), coro(2, lock))
try:
    loop.run_until_complete(coros)
finally:
    loop.close()

(from https://docs.python.org/3/whatsnew/3.5.html#whatsnew-pep-492)

The async keyword replaces the @asyncio.coroutine, and await is written instead of yield from. "await with" and "await for" are additional features, allowing to suspend execution in "enter" and "exit" methods (= asynchronous context manager) and to iterate through asynchronous iterators respectively.

Sunday, July 3, 2016

Unpacking done! Starting with Coroutines


It took a bit longer than anticipated, but the additional unpacking generalizations are finally completed.

Another good thing: I am now finally ready to work full time and make up for the time I lost, as I don't have to invest time into studying anymore.

The unpackings are done slightly different than in cpython, because in PyPy I get objects with a different structure for the maps. So I had to filter them for the keys and values and do a manual type check if it really is a dict. For the map unpack with call I had to implement the intersection check. For that I just check if a key is already stored in the dict that gets returned.

Now it's time to implement coroutines with async and await syntax. There is still a problem with the translation of PyPy, which is connected to the missing coroutines syntax feature. I will need to get this working as well, starting with implementing async in the parser.

Thursday, June 23, 2016

Progress summary of additional unpacking generalizations

Currently there's only so much to tell about my progress. I fixed a lot of errors and progressed quite a bit at the unpacking task. The problems regarding AST generator and handler are solved. There's still an error when trying to run PyPy though. Debugging and looking for errors is quite a tricky undertaking because there are many ways to check whats wrong. What I already did though is checking and reviewing the whole code for this task, and it is as good as ready as soon as that (hopefully) last error is fixed. This is probably done by comparing the bytecode instructions of cpython and pypy, but I still need a bit more info.

As a short description of what I implemented: until now the order of parameters allowed in function calls was predefined. The order was: positional args, keyword args, * unpack, ** unpack. The reason for this was simplicity, because people unfamiliar with this concept might get confused otherwise. Now everything is allowed, breaking the last thought about confusions (as described in PEP 448). So what I had to do was checking parameters for unpackings manually, first going through positional and then keyword arguments. Of course some sort of priority has to stay intact, so it is defined that "positional arguments precede keyword arguments and * unpacking; * unpacking precedes ** unpacking" (PEP 448). Pretty much all changes needed for this task are implemented, there's only one more fix and a (not that important compared to the others) bytecode instruction (map unpack with call) to be done.

As soon as it works, I will write the next entry in this blog. Also, next in the line is already asyncio coroutines with async and await syntax.

Short Update (25.06.): Because of the changes I had to do in PyPy, pretty much all tests failed for some time as function calls haven't been handled properly. I managed to reduce the number of failing tests by about 1/3 by fixing a lot of errors, so there's missing just a bit for the whole thing to work again.

Update 2 (26.06.): And the errors are all fixed! As soon as all opcodes are implemented (that's gonna be soon), I will write the promised next blog entry.

Thursday, June 9, 2016

The first two weeks working on PyPy


I could easily summarize my first two official coding weeks by saying: some things worked out while others didn’t! But let’s get more into detail.


The first visible progress!
I am happy to say that matrix multiplication works! That was pretty much the warm-up task, as it really only is an operator which no built-in Python type implements yet. It is connected to a magic method __matmul__() though, that allows to implement it. For those who don't know, magic methods get invoked in the background while executing specific statements. In my case, calling the @ operator will invoke __matmul__() (or __rmatmul__() for the reverse invocation), @= will invoke __imatmul__(). If “0 @ x” gets called for example, PyPy will now try to invoke “0.__matmul__(x)”. If that doesn't exist for int it automatically tries to invoke “x.__rmatmul__(0)”.
As an example the following code already works:
>>> class A:
...         def __init__(self,val):
...            self.value = val
...         def __matmul__(self, other):
...            return self.value + other
>>> x = A(5)
>>> x @ 2
7


The next thing that would be really cool is a working numpy implementation for PyPy to make good use of this operator. But that is not part of my proposal, so it's probably better to focus on that later.


The extended unpacking feature is where I currently work on. Several things have to be changed in order for this to work.
Parameters of a function are divided into formal (or positional) arguments, as well as non-keyworded (*args) and keyworded (**kwargs) variable-length arguments. I have to make sure that the last two can be used any number of times in function calls. Until now, they were only allowed once per call.
PyPy processes arguments as args, keywords, starargs (*) and kwargs (**) individually. The solution is to compute all arguments only as args and keywords, checking all types inside of them manually instead of using a fix order. I'm currently in the middle of implementing that part while fixing a lot of dependencies I overlooked. I broke a lot of testcases with my changes at an important method, but that should work again soon. The task requires the bytecode instruction BUILD_SET_UNPACK to be implemented. That should already work, it still needs a few tests though.
I also need to get the AST generator working with the new syntax feature. I updated the grammar, but I still get errors when trying to create the AST, I already got clues on that though.


Some thoughts to my progress so far
The cool thing is, whenever I get stuck at errors or the giant structure of PyPy, the community helps out. That is very considerate as well as convenient, meaning the only thing I have to fight at the moment is time. Since I have to prepare for the final commitments and tests for university, I lose some time I really wanted to spend in developing for my GSoC proposal. That means that I probably fall a week behind to what I had planned to accomplish until now. But I am still really calm regarding my progress, because I also planned that something like that might happen. That is why I will spend way more time into my proposal right after I that stressful phase to compensate for the time I lost.
With all that being said, I hope that the extended unpacking feature will be ready and can be used in the coming one or two weeks.

Wednesday, May 18, 2016

GSoC 2016, let the project begin!


First of all I am really excited to be a part of this years Google Summer of Code! From the first moment I heard of this event, I gave it my best to get accepted. I am happy it all worked out :)

About me

I am a 21 years old student of the Technical University of Vienna (TU Wien) and currently work on my BSc degree in software and information engineering. I learned about GSoC through a presentation of PyPy, explaining the project of a former participant. Since I currently attend compiler construction lectures, I thought this project would greatly increase my knowledge in developing compilers and interpreters. I was also looking for an interesting project for my bachelor thesis, and those are the things that pretty much lead me here.

My Proposal

The project I am (and will be) working on is PyPy (which is an alternative implementation of the Python language [2]). You can check out a short description of my work here.

Here comes the long but interesting part!

So basically I work on implementing Python 3.5 features in PyPy. I already started and nearly completed matrix multiplication with the @ operator. It would have been cool to implement the matmul method just like numpy does (a Python package for N-dimensional arrays and matrices adding matrix multiplication support), but sadly the core of numpy is not yet functional in PyPy.
The main advantage of @ is that you can now write:

S = (H @ beta - r).T @ inv(H @ V @ H.T) @ (H @ beta - r)

instead of:
S = dot((dot(H, beta) - r).T, dot(inv(dot(dot(H, V), H.T)), dot(H, beta) - r))

making code much more readable. [1]

I will continue with the additional unpacking generalizations. The cool thing of this extension is that it allows multiple unpackings in a single function call.
Calls like

function(*args,3,4,*args,5)

are possible now. That feature can also be used in variable assignments.

The third part of my proposal is also the main part, and that is asyncio and coroutineswith async and await syntax. To keep this short and understandable: coroutines are functions that can finish (or 'return' to be precise) and still remember the state they are in at that moment. When the coroutine is called again at a later moment, it continues where it stopped, including the values of the local variables and the next instruction it has to execute.
Asyncio is a Python module that implements those coroutines. Because it is not yet working in PyPy, I will implement this module to make coroutines compatible with PyPy.
Python 3.5 also allows those coroutines to be controlled with “async” and “await” syntax. That is also a part of my proposal.

I will explain further details as soon as it becomes necessary in order to understand my progress.

Bonding Period

The Bonding Period has been a great experience until now. I have to admit, it was a bit quiet at first, because I had to finish lots of homework for my studies. But I already got to learn a bit about the community and my mentors before the acceptance date of the projects. So I got green light to focus on the development of my tasks already, which is great! That is really important for me, because it is not easy to understand the complete structure of PyPy. Luckily there is documentation available (here http://pypy.readthedocs.io/en/latest/) and my mentors help me quite a bit.
My timeline has got a little change, but with it comes a huge advantage because I will finish my main part earlier than anticipated, allowing me to focus more on the implementation of further features.
Until the official start of the coding period I will polish the code of my first task, the matrix multiplication, and read all about the parts of PyPy that I am still a bit uneasy with.

My next Blog will already tell you about the work of my first official coding weeks, expect good progress!