57
I Use This!
Inactive

News

Analyzed about 5 hours ago. based on code collected 1 day ago.
Posted over 11 years ago
So my 'vacation' was a visit to the hospital for the birth of my son. Now that this has happened, my schedule is going to be even more fun. Was in the hospital for most of a week and am now adjusting to life back home. I've been slowly turning my ... [More] disassembler program into a "library" of sorts so I can call it repeatedly from tests. Now to write some tests that convert PIR to Packfiles and Packfiles to PACT.Packfiles... [Less]
Posted over 11 years ago
I had made a round of fixes with regards to encodings in the whiteknight/io_cleanup1 branch a few days ago. Rakudo hacker Moritz was able to take a look at Rakudo’s spectests and verify that more tests were indeed passing because of it. The remaining ... [More] test failures represent the changing semantics for the read method and what appear to be two genuine regressions or bugs. Hopefully I will be able to get all these things sorted out this week before I go away on a mini vacation next weekend. Otherwise I can’t imagine this branch gets merged before the 4.6 release this month. A few days ago I wrote a post about readline and some of the intricacies involved in that, and some of the weird semantics that I was attempting to unify. It turns out that some of these semantics are a major cause in one of the last bugs in the branch. Let’s look at some code in master to see where the hangup is. First, readline on a Socket: METHOD readline(STRING *delimiter :optional, INTVAL has_delimiter :opt_flag) { INTVAL idx; STRING *result; STRING *buf; GET_ATTR_buf(INTERP, SELF, buf); if (!has_delimiter) delimiter = CONST_STRING(INTERP, "\n"); if (Parrot_io_socket_is_closed(INTERP, SELF)) RETURN(STRING * STRINGNULL); if (buf == STRINGNULL) buf = Parrot_io_reads(INTERP, SELF, CHUNK_SIZE); while ((idx = Parrot_str_find_index(INTERP, buf, delimiter, 0)) < 0) { STRING * const more = Parrot_io_reads(INTERP, SELF, CHUNK_SIZE); if (Parrot_str_length(INTERP, more) == 0) { SET_ATTR_buf(INTERP, SELF, STRINGNULL); RETURN(STRING *buf); } buf = Parrot_str_concat(INTERP, buf, more); } idx += Parrot_str_length(INTERP, delimiter); result = Parrot_str_substr(INTERP, buf, 0, idx); buf = Parrot_str_substr(INTERP, buf, idx, Parrot_str_length(INTERP, buf) - idx); SET_ATTR_buf(INTERP, SELF, buf); RETURN(STRING *result); } We can ignore the fact that this implementation of readline doesn’t call Parrot_io_readline like every other PMC does. Or that if we did call that function the program would throw an exception because Parrot_io_readline doesn’t support sockets anyway. Whatever. Moving on… For comparison, let’s look at the version from the Handle PMC (which is inherited by FileHandle): METHOD readline() { STRING * const string_result = Parrot_io_readline(INTERP, SELF); RETURN(STRING *string_result); } The Socket version takes a delimiter parameter which is a STRING. When doing readline on a Socket, you can pass in any arbitrary string which is used as the token for end of line. With FileHandle, you don’t seem to have that. However, you can definitely use custom delimiters with FileHandle. However, we clearly don’t take a delimiter here and we aren’t passing one in as an argument to Parrot_io_readline like we do in the branch. Let’s see how it’s done instead. Here’s a snippet from Handle PMC: ATTR INTVAL record_separator; /* Record separator (only single char supported) */ We don’t need to look at any other code. This is the smoking gun. Socket.readline() can take any arbitrary STRING to use as a record separator, but FileHandle.readline() can only use a single codepoint, which it doesn’t take as an argument. So that’s the problem right there. When I standardized the readline mechanics between types, I picked the FileHandle semantics. This was probably the wrong decision, because not only could Sockets use a more general mechanism but Rakudo relies on that behavior in its spectests. This does raise a question about why nobody ever expected this same behavior from FileHandle, or why the difference was not considered some kind of bug. It really goes to show how immature our IO system has been for all these years, and how we had all just grown accustomed to the arbitrary, inconsistent, nonsensical behaviors. It just works for some basic usages, so nobody ever complains about it. That time is, thankfully, coming quickly to an end. Fixing this issue is actually going to take some serious work. Several function signatures are going to need updating to take a STRING delimiter instead of an INTVAL codepoint, and a major chunk of buffering logic is going to need to be rewritten to work on substrings instead of on individual codepoints. This, in turn, is going to require a heck of a lot more testing. Last night I started putting in some of the changes necessary to use a substring terminator instead of a single codepoint. Most of what I’ve already done has been modifying function signatures. The real changes need to occur deep within the buffering logic and will require a little bit more time. I’m looking forward to getting this branch fixed up and merged back to master so I can get to work on my next project. I think 6model is going to be the next thing I dig into, before I find something else that annoys me enough to put in a huge amount of effort to rewrite it. I’ll post more updates about my future projects and plans as I go. [Less]
Posted over 11 years ago
Working on flags and permissions this past weekend in security api. Slow and steady progress. In terms of the timeline I am behind, but I am making every effort to get back on track. Monday should be an interesting day to show this past weekends progress.
Posted over 11 years ago
Or, the internal Parrot C API. It is open, now. At least, parts of it anyway, and hopefully somewhat limited in scope. When I set out to write mod_parrot it was my goal to use the 'new' embedding API - the one with all the Parrot_api_* calls. This is ... [More] a limited API designed for loading and running the parrot interpreter and some scripts. It isn't perfect or even elegant but it works. Moreover, People have Promised it to be Stable. However, because it was designed to be used outside of the parrot runloop, these functions are not re-entrant in a rather subtle manner. read more [Less]
Posted over 11 years ago
Till now was able to add the function to compute the eigenvalues. Also fixed the segmentation fault in the inverse function to do this has to edit the LU decomposition function to get results of couple of arrays. Now will be starting to work on the ... [More] implementation of the function for eigenvectors and also try to develop the tests for inverse and eigenvalues functions. [Less]
Posted over 11 years ago
HTML is a derivative of SGML, just like XML is. Sure, they look pretty much the same for the most part, but there are a few key differences that prevent HTML from being parsed exactly like XML. Part of the reason why I like XHTML so much is that it’s ... [More] more usable with more parsers, including many of simpler and full-featured XML parsers. Simplicity in parsing was one of the original motivations of the XML design, at least in comparison to a full SGML parser or even something like a full HTML parser. But that’s all besides the point. I’ve been in something of a backyard gardening kick lately. We bought our house only a few short months ago, and are only half way through the first summer growing season in my modest little garden. My plans for next year are much more expansive. I’ve finally talked my wife into letting me buy some cherry trees to plant. She was also pretty willing to get a few grape vines planted (especially when I sketched out the beautiful wooden arbor they would be growing on). She put her foot down when I started talking about blueberries, apples and pears, however. And another garden bed or two for more vegetables. For some reason she’s convinced that we need some measure of open space in our little plot so the kid has somewhere to run and play. Some people have weird priorities. This is all sort of besides the point too. Getting the things I need for all this gardening work I’ve talked myself into is not cheap. Cherry tree seeds actually do grow on trees so that’s not a big deal, but other things like fertilizers, soil amendments, tools, materials for building a grape trellis and raised garden beds, not to mention a longer hose to reach all the new things that are going to require regular watering all cost money. And maybe a sprinler, like one of those fancy ones on an electronic timer. I can avoid some of that cost by getting things used and at discount on sites like Craigslist. So I’ve been going there. Every day. And it’s tedious. I have to sort through hundreds of listings for things I don’t want, in categories that seem far too course. Sometimes, because things often get incorrectly categorized, I have to look in other related categories too, sorting through things that are even less relevant on average to try and find the occasional gem. This is all on top of the hardware-related problems I have being unable to use the trackpad on my laptop so web navigation on sites without keyboard shortcuts is an extreme pain. I start to think to myself: I can do better, I’m a programmer! For some values of “better” and “programmer”. Enter Rosella. Now with Parrot, Winxed and Rosella I can use the Net library to fetch the text of the HTML code of the page. After some hacking in the last few days, I can parse that code with my Xml library (set in a new lenient mode) and start to work with it in a meaningful way: function main[main]() { var rosella = load_packfile("rosella/core.pbc"); Rosella.initialize_rosella("xml", "net", "string"); var ua = new Rosella.Net.UserAgent.SimpleHttp(); var response = ua.get("http://philadelphia.craigslist.org/w4m/"); var doc = Rosella.Xml.read_string(response.content, false); doc.get_document_root() .get_children_named("body") .get_children_named("blockquote") .get_children_named("p", "row":[named("class")]) .map(function(node) { return { "title": node.first_child("a").get_inner_xml(), "link": node.first_child("a").attributes["href"], "price": node.first_child("span", "itempp":[named("class")]).get_inner_xml(), "has_pic": !Rosella.String.null_or_empty( node.first_child("span", "itempx":[named("class")]).get_inner_xml() ) }; }) .filter(function(obj) { return indexof(obj["title"], "compost") >= 0; }) .map(function(obj) { return Rosella.String.format_obj("<a href='{link}'>{title} for {price}</a>", obj); }) .foreach(function(string s) { say(s); }); } That second argument to Rosella.Xml.read_string tells the parser to go into “non-strict” mode, which is basically my attempt to fudge the XML parsing rules to allow for the SGML nonsense in HTML. Without that, the parser will blow up pretty early in the parse because of unbalanced tags. The XML parser by default does not handle tags which are not balanced and which do not have the trailing slash to indicate a standalone tag, and the Craigslist source is filled with those kinds of things. All I need to do is set this scraper up on a timer, and have it send me results somehow. If I set up a small server with mod_parrot and some kind of tool for generating RSS feeds, I could have this output neatly delivered to me on a regular basis. Considering that mod_parrot is moving along so smoothly and RSS is just another XML format, I think this is a pretty reasonable idea. So, I started working on that. As of last night, I’ve sketched out two small libraries, one for RSS feeds and one for the competing standard, Atom. These libraries are thin wrappers around the XML library to deal with the specifics of RSS and Atom. Here’s an example of consuming an RSS feed: var rss = Rosella.Rss.read_url("http://www.parrot.org/rss.xml"); rss .channels() .first() .items() .foreach(function(i) { say(Rosella.String.format_obj("{title} (by {creator}) : {description}", i)); }); You can do almost exactly the same thing with an Atom feed too, if you’ve got one of those instead. Right now RSS and Atom are implemented in two separate libraries, but I may combine them together for simplicity and to avoid unnecessary code duplication. I’m working on an interface to write and publish feeds as well, though that’s not quite ready yet. You can bet that when I’ve got that working, I’ll be setting up a copy of mod_parrot to use it with. I’ve been sort of kicking around the idea of a specialized HTML parsing library, which would more or less be an SGML parser with some schema information. I’m not sure I want to get into that hassle because HTML is a pretty messy thing and it will take a huge amount of effort to get something that works most of the time. But, if you’re willing to put up with a little bit of oddity, the Xml library works well enough for many cases. [Less]
Posted over 11 years ago
How many days are in a week? Judging from my weekly blog posts, there are 14 days in a week. *sigh* Well, I knew my schedule was going to be a little erratic this summer but apparently underestimated slightly. Now, to be fair, I'm actually not all ... [More] that far behind schedule. It might have looked that way over the last couple of weeks, but that's because I tend to hold onto code and continue to revise commits until I have a large chunk of functionality working. read more [Less]
Posted over 11 years ago
The io_cleanup1 branch is nearing completion, though as always the last few details are what holds everything up. In the past few days all the remaining tests in the parrot repo were passing. The coding standards tests, as usual, the last to be ... [More] resolved. Then I started building and testing other things on the branch: Winxed builds and tests fine. So does Rosella. Then I looked at NQP and Rakudo. Both built fine, but Rakudo was failing two socket-related spectests. That’s not entirely unexpected. Even though my intention was to make this branch as painless as possible there were still some unavoidable changes to interfaces and semantics. There are a few places where older semantics are surrounded by large /* HACK! */ comments, but for the most part I’ve tried to make everything sane. That’s why I wasn’t surprised to see Rakudo failing a few tests. I was much more surprised that Rakudo built without any problems the first time I tried it. I figured the test failures represented some kind of semantic mismatch, and getting Rakudo passing again would have been as easy as getting the old semantics returned, with a note about a future update path. It turns out this wasn’t exactly the case. For one test it was the simple difference in the way we read on streams with multibyte encodings. This was expected and we can fix it to use the old behavior if that’s what Rakudo prefers. For the second failing test, it’s not that there’s a semantic difference per se, but instead there is a glaring and serious bug in master that was corrected in the new branch. Here, I’m going to explain what’s going on. Look at this code: Parrot_io_recv_handle(PARROT_INTERP, ARGMOD(PMC *pmc), size_t len) { Parrot_Socket_attributes * const io = PARROT_SOCKET(pmc); /* This must stay ASCII to make Rakudo and UTF-8 work for now */ STRING * res = Parrot_str_new_noinit(interp, len); INTVAL received = Parrot_io_recv(interp, io->os_handle, res->strstart, len); res->bufused = received; res->strlen = received; return res; } This is a pared-down version of the code behind the recv method on Socket. It creates a new string with the specified length pre-allocated, then passes the buffer to the low-level recv C API (which has been abstracted a little to account for platform differences). Notice the comment there in the middle which says the string uses the ASCII encoding, for use by Rakudo. This is what I saw, and this is the semantic I followed in the new system: When you read from a socket by default in the new system, the string is encoded as ASCII unless you specify differently. Just for my own verification, I had to look at the Parrot_str_new_noinit function to verify that the string was, in fact, being set to ASCII: Parrot_str_new_noinit(PARROT_INTERP, UINTVAL capacity) { STRING * const s = Parrot_gc_new_string_header(interp, 0); s->encoding = Parrot_default_encoding_ptr; Parrot_gc_allocate_string_storage(interp, s, (size_t)string_max_bytes(interp, s, capacity)); return s; } Elsewhere in the system, we have this: Parrot_default_encoding_ptr = Parrot_ascii_encoding_ptr; So yes, the string returned by the Socket does indeed use the ASCII encoding in master. And, after double-checking, the version in the io_cleanup1 branch was using ASCII also. However, in the new branch Rakudo’s test fails because of an exception about a lossy conversion of non-ascii data into the the lower bit-width format. A quick check shows that both systems create an ASCII string buffer and both systems call the same recv function to fill it. So where’s the problem? What the hell? For comparison, here’s the snippet of code from the new branch that reads data into a STRING, possibly using a buffer: bytes_read = Parrot_io_buffer_read_b(interp, buffer, handle, vtable, s->strstart + s->bufused, byte_length); s->bufused += bytes_read; STRING_scan(interp, s); We’re reading out a number of bytes, appending them into the string’s pre-allocated storage and updating the number of bytes actually used. That’s all the same as in master. However, the last line, STRING_scan does not appear in master. What is it? STRING_scan() loops through the data in the string to verify that it correctly matches the string’s encoding. For instance, if the string is encoded as ASCII, STRING_scan will loop through to make sure all character values are lower than 128. If the string is UTF-16, STRING_scan verifies that we have an even number of bytes and that each value is an acceptable codepoint. master doesn’t do this, which means there is a bug. In master, we don’t scan the string after recv but before we return it to the user, which means we can have non-ASCII data in a string marked with the ASCII encoding. The Rakudo test puts UTF-8 data into the socket on the server side, and then reads out a string and encodes that to UTF-8 to verify that it comes out correctly. However in the new branch we actually check that the string is valid before giving it out to user code, and it isn’t, so we throw an exception. Combine that with the fact that the Socket PMC has no way to change the encoding it uses in master, which means all Sockets used in Parrot master are potential sources of bugs. Two nights ago I added methods to Socket to get/set the encoding to use, and everybody’s favorite Moritz created a branch for Rakudo to use it. Last night I did some playing with default encodings. Tonight and into the weekend I’m hoping to wrap up the last few details to get the Rakudo spectest passing like normal again. Hopefully, if all goes well, we can start talking about a merger within the next week or two. [Less]
Posted almost 12 years ago
I have been working on the api.c file that handles the functions for the security. Revising and editing functions with the help of Whiteknight and Dukeleto. Right now I am allocating, initializing and freeing memory for the functions as well as ... [More] integrating the api.c and utility.c files I am working on into root.in. I expect to get much done this week in terms of the api. [Less]
Posted almost 12 years ago
And by we, I mean myself, parrot, and mod_parrot. That is simple: cgi-style running LIVES AGAIN (almost, just need to fix headers ;-)). And with it, all the infrasturcture to implement more and nicer loaders, such as those for PSGI and / or WSGI, and ... [More] the famous inline loader-in-the-sky I will be writing. Pretty nice, no? For the technically interested, what has happened is that: Loaders now accept 3 arguments: the request (as a PtrBuf). This is an opaque handle, that can be used to bind to the apache input / output handles. read more [Less]