During my meeting with cotto on Tuesday (my time), he suggested looking into replacing entries within the op_func_table on the instrumented interpreter (CHILD). So the past few days, I’ve been working on it. The initial plan was to create a copy of that table, replacing ops that have hooks attached with a stub function that will fire the hooks before executing the op itself. The operation itself went swimmingly, until the monster known as “dynops” popped up to say “OHAI!”.
Before I go into the problems I face with dynops my initial approach, I have to say that cotto was right in that this approach will have better performance characteristics that what I was doing earlier, which was to store the hooks in a ResizablePMCArray which holds the hooks in a Hash. That was a bit of a cop-out on my part, as that was the path of least resistance in terms of adding new and removing hooks. Doing it in that manner simplifies the code for adding and removing hooks, at a cost of going through more levels of VTABLEs. With that, I’ve changed the data structure to hold the probes. It is now an array of linked lists, with a 1-1 mapping between each linked list and op. With this major change, running the example code [0] against examples/pir/mandel.pir yields approximately 5% performance increase (user 16.077s vs user 17.001s previously).
Back on topic, so the plan is to create a copy of the op_func_table, instrument it and change CHILD’s to refer to it instead of the core op_func_table. This was rather easy. Then, dynops dropped by.
With dynops in the equation, it did not turn out easy. Problem is, if the instrumented table is not switched back to the core table, the core table will end up pointing to the instrumented table instead at the end of dynops_register (see src/runcore/main.c). Also, how is the supervising interpreter (PARENT) supposed to know when new dynops are loaded in the CHILD? The PARENT has to know when the core op_func_table is changed, so that it can update its own references accordingly. Currently, dynops_register gets around this problem by prohibiting loading dynops when there is more than 1 interpreter in play.
So, hold on. “more than 1 interpreter in play”. But there are two interpreters, PARENT and CHILD! Apparently, creating interpreters through Parrot_new will only register the first interpreter in the list of interpreters held by core. Thus,
Parrot_Interp PARENT = Parrot_new(NULL);
Parrot_Interp CHILD = Parrot_new(PARENT);
PARENT will be registered, as traced through Parrot_new -> Parrot_cx_init_scheduler -> pt_add_to_interpreters (I might have missed some steps in between).
However, CHILD is not, as in Parrot_cx_init_scheduler (see src/scheduler.c), since CHILD has a parent interpreter, pt_add_to_interpreters (see src/thread.c) is not called. So, as far as core is concerned, there is only 1 interpreter in play, and that interpreter is PARENT. Thus that is why the check “n_interpreters > 1″ in dynop_register fails, and how I inadvertently was able to run 2 interpreters and have dynops loading, although it segfaults later on when PARENT tries to access its outdated tables. If CHILD is manually registered by calling pt_add_to_intepreters, then that is a no go, as then both PARENT and CHILD cannot load any dynop libraries. Given that CHILD has to be able to load dynop libraries, I’m at an impasse.
To recap, this is the problem at this juncture, if CHILD loads a dynop library, the instrumented table must be swapped out and replaced with the core table. After loading is complete, the instrumented table is then updated and swapped back in. After all that, PARENT must be notified of this loading so that it itself can update its own table references.
In my previous post, I mentioned making use of the events system to detect when CHILD is going to execute a “loadlib” op and to raise an event that PARENT can handle and do the necessary stuff. But that won’t work for “.loadlib” directives (see [1]), as that is not an op. So scratch that idea for now (although, I think it is a good idea for normal “loadlib” ops). Digging into the events system, I noticed that it already does some of what I was doing, in that, to check for events, a copy of the core op_func_table is made (see include/parrot/interpreter.h , interp->evc_func_table), and all entries in that table points to the “check_events__” op. This table is also helpfully taken care of in dynop_register, in that it helpfully extends it whenever new dynops are added.
Then I realised something, I’m going to create a copy of the op_func_table, and replace each entry with a stub function. This table will then be used for op lookup and execution by the runcore. And I control CHILD’s runcore. Why don’t I move the call to the stub function in the runcore itself. So that’s where I’m at now, square one + a bit.
To get around the problem brought about by dynops temporarily, I simply add a check in the runcore to update PARENT’s op tables whenever it does not match CHILD’s op tables. I did some thinking on this, maybe make dynop loading a STOP THE WORLD event, just like GC. This I think can be done in dynops_register, broadcasting to all to halt and when all is halted, proceed to load the dynop library before broadcasting again to do the necessary updates before resuming itself. This I will revisit again later, seeing that I’m running out of time for this week’s tasks, which is, to recap, bug hunting and squashing of tracer.nqp, implementing the event notifications library and tests.
So bug hunting. Apparently I did something and now I can obtain and print out the STRING KEY constants. But the segfault with example/pir/io.pir remains when I try to trace it using tracer.nqp. Running it under a debugger shows that it segfaults when trying to access INTERP->vtables[SELF->vtable->base_type]->_namespace (in src/pmc/default.pmc line 549). Before this line, control is in find_method_direct_1 (in src/oo.c line 1051), where the variable _class is being queried for its namespace. Only problem is, the base_type of _class is 101, which is not a valid number given that at that point in time, there is only about 87 pmc types.
Running simple-tracer.pir [0] on it has no problems, of which I can only suspect that a GC run happened and the object got cleaned up, so bad data was being read. This would make sense given tracer.nqp is written in NQP and it does lots of string concatenations, so that should cause the GC to run more frequently. Now, if there’s a tool that I can use to confirm this… Oh wait, I’m supposed to be making those tools.. GAH!
[0] Simple-tracer.pir (Edited 13/06/2010)
.sub '' :load :init :anon
load_bytecode 'Instrument/Instrument.pbc'
load_bytecode 'Instrument/Probe.pbc'
.end
.sub 'main' :main
.param pmc args
$P0 = shift args
.local pmc pr, in, probe_class
probe_class = get_hll_global ['Instrument'], 'Probe'
pr = new probe_class
pr.'make_catchall'()
pr.'set_callback'('cb')
in = new ['Instrument']
in.'attach'(pr)
$S0 = args[0]
in.'run'($S0, args)
.end
.sub 'cb'
.param int pc
.param pmc op
.param pmc instr
print 'Op: '
$I0 = op[0]
say $I0
.end
[1] http://irclog.perlgeek.de/parrot/2010-06-01#i_2391934