57
I Use This!
Inactive

News

Analyzed about 9 hours ago. based on code collected about 13 hours ago.
Posted almost 14 years ago
In Parrot, opcodes are currently implemented as a standalone piece of code that is invoked whenever its corresponding opcode number is encountered in the bytecode. Each opcode is described internally through the use of an op_info_t struct, which ... [More] , taking from <parrot/op.h> is the following: typedef struct op_info_t { const char *name; const char *full_name; const char *func_name; unsigned short jump; short op_count; arg_type_t types[PARROT_MAX_ARGS]; arg_dir_t dirs[PARROT_MAX_ARGS]; char labels[PARROT_MAX_ARGS]; } op_info_t; These descriptions are stored as an array, with the index of each entry being the opcode’s own opcode number. This array is pointed to by a field within the interpreter struct, the field being “op_info_table”. This is just for the descriptions of the opcodes. In Parrot, currently each opcode is defined in a C-ish language that gets parsed into C source code, with each opcode constituting a function that satisfies the following function prototype: typedef opcode_t *(*op_func_t)(opcode_t *, PARROT_INTERP); Similar to the opcode descriptions, these opcode function pointers are stored as an array, with the opcode’s number being the index to the position of the opcode’s function pointer. This table is pointed to by the field “op_func_table” within the interpreter’s struct. So what happens during execution? A pointer, which is the program counter (PC), is passed to the runcore’s runops function. This runops function is the one that looks up the function pointer for the current op pointed to by the PC and calls that function for execution. The code for an opcode will do a number of things. First, it will grab the current context of the interpreter. The current context consists of a number of items, but chief and most important is that the current context holds the current values of the registers (Parrot being a register-based VM). With this context, it will then grab the required values from certain registers and after performing the required logic, will write values to a certain register. After that, it will advance the PC by a certain number, which generally is (PC + 1 + NO_OF_PARAMS)). So how would you know which registers the parameters are in? The PC actually points to somewhere within the bytecode. Generally, the PC will point to the start of an opcode. If an opcode has no parameters, advancing the PC by 1 will get us to the next opcode to execute. However, if the opcode has parameters, advancing the PC by 1 will get us the register or INTVAL constant for the first parameter. Similarly, advancing by 2 will yield the register or INTVAL constant for the second parameter. This is all well and good except for the fact that Parrot allows the loading of dynamic op libs, which extends the capabilities of the VM by providing additional ops, similar to how MMX/SSE/SSE2/etc extends the capabilities of an x86 processor. Parrot also shares the two tables “op_info_table” and “op_func_table” between interpreters, which does make sense, as having duplicates of these tables can be expensive memorywise, given that Parrot already has more than 1000 core opcodes. Handling these two issues won’t be trivial. With regards to dynamic op libs, there needs to be a way to detect when the library being loaded is a dynop library. One way is to trap loadlib opcodes and then peek to check if the op tables change. It would also be good to investigate if we can duplicate any shared tables such that the interpreter running the instruments has its own private tables that will be untouched by any changes to the tables of other interpreters. [Less]
Posted almost 14 years ago
Last Wednesday, I discussed a little bit of the rationale behind my GSoC project and summarized the most low-level portion of my project: PAST::Walker. Today, I want to describe another portion of my project: PAST::Pattern. PAST::Walker provides a ... [More] very powerful and complete interface. Any possible transformation or other traversal of a PAST should be implementable using it. However, it will not be very convenient if you only want to turn return nodes containing only a call node into tail-call nodes. read more [Less]
Posted almost 14 years ago
Having an instrumentation framework opens the doors to having many different tools that can help to diagnose problems within a piece of code. One main example of this is Valgrind. Valgrind provides an interface for making many different tools that ... [More] help to diagnose and identify certain specific problems, ranging from memory leaks to multithreaded data races between threads. Furthermore, the framework is also used to provide profiling tools, such as Callgrind and Cachegrind, to determine useful information such as call graphs and execution times of functions. read more [Less]
Posted almost 14 years ago by [email protected] (Whiteknight)
Parrot now uses immutable strings internally for it's string operations. In a lot of ways this was a real improvement in terms of better code and better performance for many benchmarks. However, many HLL compilers, specifically NQP and Rakudo ... [More] suffered significant performance decreases with immutable strings. Why would this be?It turns out that Immutable strings are great for many operations. Copy operations are cheap, maybe creating a new STRING header to point to an existing buffer. There's no sense to actually copy the buffer because nobody can change it. Substrings are likewise very cheap, consisting of only a new STRING header pointing into the middle of an existing immutable buffer.Some operations however are more expensive. A great example of that is string appends. When we append two strings, we need to allocate a new buffer, copy both previous buffers into the new buffer (possibly translating charset and encoding to match) and creating a new header to point to the new buffer. With the older system of COW strings, appends were far less expensive, and many pieces of code--especially NQP and PCT code generators--used them in large numbers.After the switch to immutable strings, any code that was optimized to use lots of cheap appends began to take huge amounts of time and waste lots and lots of memory.The solution to these problems is not to use many bare append operations on native strings, but instead to create a new StringBuilder PMC type. StringBuilder stores multiple chunks of strings together in an array or tree structure, and only coalesces them together into a single string buffer when requested. This allows StringBuilder to calculate the size of the allocated string buffer only once, only perform a single set of copies, not create lots of unnecessary temporary STRING headers, etc.Several contributors have worked on a branch, "codestring", for this purpose, and some results I saw this morning are particularly telling about the performance improvements it brings. Here's numbers from the benchmark of building the NQP compiler:JimmyZ: trunk with old nqp-rx real:8m5.546s user:7m37.561s sys:0m10.281sJimmyZ: trunk with new nqp-rx real:7m48.292s user:7m11.795s sys:0m10.585sJimmyZ: codestring with new nqp-rx real:6m58.873s user:6m22.732s sys:0m6.356sThe "new nqp-rx" he's talking about there is nqp-rx modified to make better use of the new CodeString and StringBuilder PMCs in the branch. The numbers are quite telling, build time is down about 12%. I think the finishing touches are being put on, and the branch will be merged back to trunk soon. [Less]
Posted almost 14 years ago
The post from last week talked about what NFG was and tried to explain how it was a good feature for parrot to have. Today I'll be slightly more concrete and talk a bit about how NFG fits inside the parrot string structure. There's other parts ... [More] of parrot that will need hacking on, but this time I want to limit myself to the the two bottommost pointers in the STRING structure definition and the concepts behind them. read more [Less]
Posted almost 14 years ago
The Perl 6 design team met by phone on 05 May 2010. Larry, Allison, Patrick, Will, and chromatic attended. Larry: various spec updates, some majorremoved p5=> description because it's not supported in coredeleted self:sort construct because self ... [More] isn't a real syntactic categoryexplained Perl patterns in terms of PEGs, and spec'ed tiebreaking rules explicitlylast but not least, finally purveyed the long-threatened revamp of proto to keep routine and method semantics similarthey all now work much more like the multiple dispatch semantics currently used by STD, where we always call the proto firstthe proto is then always in charge of the actual multiple dispatch; it can of course delegate thatand the default for a null body corresponds closely to current semanticsin hacking news, the lexer generator mislaid any alternative that was a bare . pattern, so cursor_fate never called its alternative, oopstook me a long time to run that one down, because it resulted in a horrendous backtrack causing mysterious misplaced errorsrevamped character class parsing to be more helpful and correctSTD now check a normal regex bracket's innards for old-school character class, and warns if foundadded a .looks_like_cclass method to Cursor to detect most accidental uses of P5 rangessome valid P6 brackets will complain, but the workarounds are easyjust put whitespace on both ends is one wayremoved a few of these old-school-ish character classes from STDchanged :tr language to :cc language since character classes share it(translation pays more attention to ordering, but the language is the same)turned out parsing character classes discovered issues in STD; various character classes needed to backslash # that would otherwise be a commentto that end, we now allow \# in character classes instead of misparsing as unspaceif we find an invalid - in a regex, we now presume we're in an old-school character class and fail with a sorry instead of a panic to give the character class code a shot at itSTD now uses ~ syntax for regex brackets to set $*GOAL correctlycleaned up recursive panic detection; it was possible to get both false positives and negatives beforeSTD shouldn't use 'note' to emit a panic inside a suppose because that leaks the message that should be trappedSTD now suppresses duplicate sorry messages more correctly sorry no longer uses panic in a supposition, but dies directly to throw the exception to the suppose's try blockSTD now allows subscripts on regex variables so $x[0] isn't taken as a character class; still needs speccing Patrick: can we make them consistent? Larry: historically S05 has allowed bare arrays to mean interpolation Patrick: we've never had a working implementation of that Larry: a bare @ would be illegal Patrick: it's currently illegal Larry: you'd have to backslash it to match part of an email addressit's not like the @ alternations are a big deal one way or anotherthat'd be a little more consistentI forced it to think of the sigil as $ than what it really is Patrick: after seeing how Jonathan et all did interpolation for quoted strings, I thought we should do the same thing in regexes Larry: STD now has a partial fix to prevent leakage of ::T from role signaturesunfortunately, the current fix will lose signatures of file-scoped generic rolesthis probably has to do with not knowing whether we're really going to want a new pad; unfortunately we'd have to look ahead to know that currentlyvarious other minor tweaks and bug fixes in STD and Cursor Patrick: mostly responding to messages and reportsshould be able to get back to coding full-time and online for the next weekplan to resolve the list and closure issues with NQP and Rakudowill answer other questions and try to keep other people productiveplanning for the Rakudo Star release on June Allison: busy with the last week of classesspent most of it writing a little language with PCTit was easy to use and easy to swap the stages of PCTI remembered what Patrick did with LOLCODEalso had a discussion of source code control systemsnext week should be more productiveneed to work more closely with Debian packagers to get packages into Debian Will: cleaning out as many deprecations in Parrot as possibletrying to improve the speed of CodeString after the immutable STRINGs mergebundling lots of little concats helpshope to merge in an optimization branch for that by the weekendwant to make that faster or less memory intensivemay require the use of a new StringBuilder for Parrothopefully will result in a faster Rakudo build Patrick: I've never seen CodeString take a long timeunless you run into memory problems * discussion of the StringBuilder PMC * c: still working on optimizations, particularly CodeStringlooking at more PBC and PBC-building optimizationsPBC size went down dramatically and startup improved for Rakudoshould have that much faster for the 2.4 releasewill poke at GC tasks starting next week [Less]
Posted almost 14 years ago
Threading systems let multiple code paths run at the same time. Why would anyone want that? Simple: impatience. It's no fun waiting for the computer to finish one thing when you want it to be doing something else. So what are "hybrid" threads and why ... [More] does Parrot need them? Well, there are two common schools of thought in building threading systems for high level language runtimes. The Java people call them "green" threads and "native" threads. As with any design tradeoff the right answer is to cheat and just take all the good properties of both options. read more [Less]
Posted almost 14 years ago
Having an instrumentation framework opens the doors to having many different tools that can help to diagnose problems within a piece of code. One main example of this is Valgrind. Valgrind provides an interface for making many different tools that ... [More] help to diagnose and identify certain specific problems, ranging from memory leaks to multithreaded data races between threads. Furthermore, the framework is also used to provide profiling tools, such as Callgrind and Cachegrind, to determine useful information such as call graphs and execution times of functions. As such,  given a good framework to work with, a whole new world of performance and error analysis tools is opened up. In the case of Parrot, which aims to be the virtual machine for many different kinds of languages, having such a framework can assist in creating debugging tools that cater to each languages’ idiosyncrasies or be general enough to apply to all. With proper hooks in place, such a framework can also provide the ability to inspect on the inner workings of the virtual machine itself, introspection that is more in depth that what is currently offered in the various PMCs offered by Parrot. Thus, over the summer I will be working on creating such a framework for Parrot. In general terms, I would be looking to create an interface for various user-generated tools to hook into a Parrot program. Such tools can be written in PIR, and are run in a separate interpreter from the code to instrument against. There will be three layers that I hope to be able to provide an interface for instrumenting, which are the opcode layer, PMC layer and the GC layer. Explanations for these three layers are given below: Opcode layer This layer allows the tools to profile and inspect the code on an opcode level. Abstractions can be made such that the tools can inspect on subroutine, class or file levels. PMC layer This layer allows the tools to observe the creation, deletion and accesses of the various PMCs. GC layer This layer allows the tools to observe the behavior of the GC, seeing when it is invoked, what gets freed and etcetera. I intend to implement the interfaces in the order shown above and as each layer gets implemented, create tools that serve specific purposes such as call graphing, I/O monitoring and more. Tentatively, the API for the tools should look like the following: $P0 = new ['Instrument'] $P0.’attach’(‘ops’, ‘catchall’, ’sub_callback) $P0.’finalize’(‘finalize_sub’) $P0.’run’(args) In the example shown above, the ‘:main’ subroutine of the tool will perform the required initialization, creating an Instrument instance that it then registers callbacks into before executing the code to instrument against. During this Community Bonding period, I will be prototyping the instrumenting framework as a PMC and will be looking at ways on how to insert hooks into the various PMCs and the GC subsystem in a safe manner. I would be posting more information on the project based on my prototyping efforts here and at the following blog, http://parrot.mangkok.com, and would welcome any feedback regarding this project on #parrot or through email at [email protected]. [Less]
Posted almost 14 years ago by [email protected] (Whiteknight)
Long-time Parrot contributor kid51 posted a nice comment on my previous post about Git and SVN. He had some issues with the proposed distributed workflow I suggested. As my reply started to grow far too long for the stupid little blogger comment edit ... [More] box, I realized I should turn it into a full blog post.I've been pretty conflicted myself over the idea of a release manager. On one hand releases are boring which means anybody can do them and they don't require a huge time commitment. On the other hand, there is hardly a lot of "management" that goes on in a release: The release manager has some authority to declare a feature freeze and has some ability to select which revision becomes the final release; but that's it. Sure release managers also try to motivate people to perform the regular tasks like updating the NEWS and PLATFORM files, but my experience is that the release manager ends up doing the majority of those tasks herself. I really feel like we need more direction and vision than this most months, and the release manager is a good person (though not the only possible person) to do it.Our releases, especially supported releases, have had certain taglines stemming back to PDS in 2008. The 1.0 release was famously labeled "Stable API for developers" and the 2.0 release was labeled "ready for production use", when even a cursory review shows that these two releases hardly met their respective goals. A release manager with more authority to shape that release, and more authority to shape previous releases as well might have done more to change that. A development focus, be it for weekly development between #ps meetings or for a monthly release, only matters if somebody is focusing on it and motivating the rest of the community to focus as well. That person either needs to be the architect or some other single authority (though playing cheerleader and chief motivator constantly would be quite a drain on that person) or the release manager. The benefit to using the release manager to motivate the team and shape the release is--even though it's more of a time commitment for the release manager--that we share the burden and no one person gets burnt out month after month.A tiered development system has a number of benefits. Bleeding edge development can occur unfettered (as it happens now in branches). From there we can pull features into integration branches where we assure all the assorted changes and new additions work well together. Development releases can be cherry-picked to represent the stable baseline features that we want people to play with and rely on, and long-term supported releases would represent only those features which are tested, documented, and worthy of being included under our deprecation policy. I don't think end-users of our supported releases should ever be exposed to features marked "experimental", for instance, or any feature at all that we don't want covered by our long-term deprecation policy. If any feature included in a supported release must be deprecated and cannot be removed or fundamentally altered for at least three more months, we should be particularly careful about including new things in a supported release, and there should be some level of gatekeeper who is able to say "this thing isn't ready for prime time yet". That person, I think, should be the release manager.Compare our system for dealing with new experimental features (Goes into trunk, maybe with mention in #ps but with very little fanfare, and is then automatically included in the next release unless somebody removes it), to a system where features are added to a development branch, vetted, tested, documented, and then pulled into a release candidate branch only when it's known by the community to pass muster. I think that's a much better system and would lead us to releases with higher overall quality and stability.All this sort of ignores your point, which is an extremely valid one, that switching to Git represents more than just a small change in the commands that a developer types in to the console. Yes, it's a small difference to say "git commit -a" instead of saying "svn commit". Yes, it's a small personal change for me to commit locally and then push. The far bigger issues are the community workflow and even the communtity culture changes that will occur because of Git. These things don't need to change, we could lock Git down and use it exactly the way we use SVN now, but I don't think any of the Git proponents in the debate want it that way.I would much rather plan for the changes in work flow and even build enthusiasm for them primarily than sell the small changes in tooling and then be swept up in larger culture changes that nobody is prepared for. I think we should want these changes and embrace them, in which case Git is just the necessary vehicle, and not he end in itself. [Less]
Posted almost 14 years ago
One of the advantages of a common virtual machine for various languages is the ability to apply the same optimizations to all of those languages. For example, LLVM includes optimization passes to propagate constants, eliminate dead arguments, code ... [More] , and globals, inline functions, and eliminate recursive tail calls, among others. Any language with a compiler to LLVM can easily take advantage of these optimizations without any additional work by the compiler writer. read more [Less]