I Use This!
Very High Activity

News

Analyzed 6 days ago. based on code collected 7 days ago.
Posted 7 days ago by Air Mozilla
Once a month web developers across the Mozilla community get together (in person and virtually) to share what cool stuff we've been working on in...
Posted 7 days ago by Scott DeVaney
Stranded users feel sadness. If you’re the developer of a legacy add-on with an aim to migrate to WebExtensions, here are a few tips to consider so your users aren’t left behind. Port your user data If your legacy add-on stores user data, we ... [More] encourage you to take advantage of Embedded WebExtensions to transfer the data to a format that can be used by WebExtensions. This is critical if you want to seamlessly migrate your users—without putting any actionable burden on them—to the new WebExtensions version when it’s ready. (Embedded WebExtensions is a framework that contains your WebExtension inside of a bootstrapped or SDK extension.) Testing beta versions If you want to test a WebExtensions version of your add-on with a smaller group of users, you can make use of the beta versions feature on addons.mozilla.org (AMO). This lets you test pre-release beta versions that are signed and available to Firefox users who want to give them a spin. You’ll benefit from real users interacting with your new version and providing valuable feedback—without sacrificing the good reputation and rating of your listed version. We don’t recommend creating a separate listing on release because this will fragment your user base and leave a large number of them behind when Firefox 57 is released. Don’t leave your users behind Updating your listing on AMO when your WebExtension is ready is the only way to ensure all of your users move over without any noticeable interruption. Need further assistance with your migration journey? You can find real-time help during office hours, or by emailing webextensions-support [@] mozilla [dot] org. The post Migrating to WebExtensions? Don’t Forget Your Users appeared first on Mozilla Add-ons Blog. [Less]
Posted 7 days ago by Nicholas D. Matsakis
For some time now I’ve been interested in better ways to construct LR(1) parsers. LALRPOP currently allows users to choose between the full LR(1) algorithm or the LALR(1) subset. Neither of these choices is very satisfying: the full LR(1) ... [More] algorithm gives pretty intuitive results but produces a lot of states; my hypothesis was that, with modern computers, this wouldn’t matter anymore. This is sort of true – e.g., I’m able to generate and process even the full Rust grammar – but this results in a ton of generated code. the LALR(1) subset often works but sometimes mysteriously fails with indecipherable errors. This is because it is basically a hack that conflates states in the parsing table according to a heuristic; when this heuristic fails, you get strange results. The Lane Table algorithm published by Pager and Chen at APPLC ‘12 offers an interesting alternative. It is an alternative to earlier work by Pager, the “lane tracing” algorithm and practical general method. In any case, the goal is to generate an LALR(1) state machine when possible and gracefully scale up to the full LR(1) state machine as needed. I found the approach appealing, as it seemed fairly simple, and also seemed to match what I would try to do intuitively. I’ve been experimenting with the Lane Table algorithm in LALRPOP and I now have a simple prototype that seems to work. Implementing it required that I cover various cases that the paper left implicit, and the aim of this blog post is to describe what I’ve done so far. I do not claim that this description is what the authors originally intended; for all I know, it has some bugs, and I certainly think it can be made more efficient. My explanation is intended to be widely readable, though I do assume some familiarity with the basic workings of an LR-parser (i.e., that we shift states onto a stack, execute reductions, etc). But I’ll review the bits of table construction that you need. First example grammar: G0 To explain the algorithm, I’m going to walk through two example grammars. The first I call G0 – it is a reduced version of what the paper calls G1. It is interesting because it does not require splitting any states, and so we wind up with the same number of states as in LR(0). Put another way, it is an LALR(1) grammar. I will be assuming a basic familiarity with the LR(0) and LR(1) state construction. Grammar G0 G0 = X "c" | Y "d" X = "e" X | "e" Y = "e" Y | "e" The key point here is that if you have "e" ..., you could build an X or a Y from that "e" (in fact, there can be any number of "e" tokens). You ultimately decide based on whether the "e" tokens are followed by a "c" (in which case you build an X) or a "d" (in which case you build a Y). LR(0), since it has no lookahead, can’t handle this case. LALR(1) can, since it augments LR(0) with a token of lookahead; using that, after we see the "e", we can peek at the next thing and figure out what to do. Step 1: Construct an LR(0) state machine We begin by constructing an LR(0) state machine. If you’re not familiar with the process, I’ll briefly outline it here, though you may want to read up separately. Basically, we will enumerate a number of different states indicating what kind of content we have seen so far. The first state S0 indicates that we are at the very beginning out “goal item” G0: S0 = G0 = (*) X "c" | G0 = (*) Y "d" | ... // more items to be described later The G0 = (*) X "c" indicates that we have started parsing a G0; the (*) is how far we have gotten (namely, nowhere). There are two items because there are two ways to make a G0. Now, in these two items, immediately to the right of the (*) we see the symbols that we expect to see next in the input: in this case, an X or a Y. Since X and Y are nonterminals – i.e., symbols defined in the grammar rather than tokens in the input – this means we might also be looking at the beginning of an X or a Y, so we have to extend S0 to account for that possibility (these are sometimes called “epsilon moves”, since these new possibilities arise from consuming no input, which is denoted as “epsilon”): S0 = G0 = (*) X "c" | G0 = (*) Y "d" | X = (*) "e" X | X = (*) "e" | Y = (*) "e" Y | Y = (*) "e" This completes the state S0. Looking at these various possibilities, we see that a number of things might come next in the input: a "e", an X, or a Y. (The nonterminals X and Y can “come next” once we have seen their entire contents.) Therefore we construct three successors states: one (S1) accounts for what happens when see an "e". The other two (S3 and S5 below) account for what happens after we recognize an X or a Y, respectively. S1 (what happens if we see an "e") is derived by advancing the (*) past the "e": S1 = X = "e" (*) X | X = "e" (*) | ... // to be added later | Y = "e" (*) Y | Y = "e" (*) | ... // to be added later Here we dropped the G0 = ... possibilities, since those would have required consuming a X or Y. But we have kept the X = "e" (*) X etc. Again we find that there are nonterminals that can come next (X and Y again) and hence we have to expand the state to account for the possibility that, in addition to being partway through the X and Y, we are at the beginning of another X or Y: S1 = X = "e" (*) X | X = "e" (*) | X = (*) "e" // added this | X = (*) "e" "X" // added this | Y = "e" (*) Y | Y = "e" (*) | Y = (*) "e" // added this | Y = (*) "e" Y // added this Here again we expect either a "e", an "X", or a "Y". If we again check what happens when we consume an "e", we will find that we reach S1 again (i.e., moving past an "e" gets us to the same set of possibilities that we already saw). The remaining states all arise from consuming an "X" or a "Y" from S0 or S1: S2 = X = "e" X (*) S3 = G0 = X (*) "c" S4 = Y = "e" Y (*) S5 = G0 = Y (*) "d" S6 = G0 = X "c" (*) S7 = G0 = Y "d" (*) We can represent the set of states as a graph, with edges representing the transitions between states. The edges are labeled with the symbol ("e", X, etc) that gets consumed to move between states: S0 -"e"-> S1 S1 -"e"-> S1 S1 --X--> S2 S0 --X--> S3 S1 --Y--> S4 S0 --Y--> S5 S3 -"c"-> S6 S5 -"d"-> S7 Reducing and inconsistent states. Let’s take another look at this state S1: S1 = X = "e" (*) X | X = "e" (*) // reduce possible | X = (*) "e" | X = (*) "e" "X" | Y = "e" (*) Y | Y = "e" (*) // reduce possible | Y = (*) "e" | Y = (*) "e" Y There are two interesting things about this state. The first is that it contains some items where the (*) comes at the very end, like X = "e" (*). What this means is that we have seen an "e" in the input, which is enough to construct an X. If we chose to do so, that is called reducing. The effect would be to build up an X, which would then be supplied as an input to a prior state (e.g., S0). However, the other interesting thing is that our state actually has three possible things it could do: it could reduce X = "e" (*) to construct an X, but it could also reduce Y = "e" (*) to construct a Y; finally, it can shift an "e". Shifting means that we do not execute any reductions, and instead we take the next input and move to the next state (which in this case would be S1 again). A state that can do both shifts and reduces, or more than one reduce, is called an inconsistent state. Basically it means that there is an ambiguity, and the parser won’t be able to figure out what to do – or, at least, it can’t figure out what to do unless we take some amount of lookahead into account. This is where LR(1) and LALR(1) come into play. In an LALR(1) grammar, we keep the same set of states, but we augment the reductions with a bit of lookahead. In this example, as we will see, that suffices – for example, if you look at the grammar, you will see that we only need to do the X = "e" (*) reduction if the next thing in the input is a "c". And similarly we only need to do the Y = "e" (*) reduction if the next thing is a "d". So we can transform the state to add some conditions, and then it is clear what to do: S1 = X = "e" (*) X shift if next thing is a `X` | X = "e" (*) reduce if next thing is a "c" | X = (*) "e" shift if next thing is a "e" | X = (*) "e" "X" shift if next thing is a "e" | Y = "e" (*) Y shift if next thing is a `Y` | Y = "e" (*) reduce if next thing is a "d" | Y = (*) "e" shift if next thing is a "e" | Y = (*) "e" Y shift if next thing is a "e" Note that the shift vs reduce part is implied by where the (*) is: we always shift unless the (*) is at the end. So usually we just write the lookahead part. Moreover, the “lookahead” for a shift is pretty obvious: it’s whatever to the right of the (*), so we’ll leave that out. That leaves us with this, where ["c"] (for example) means “only do this reduction if the lookahead is "c"”: S1 = X = "e" (*) X | X = "e" (*) ["c"] | X = (*) "e" | X = (*) "e" "X" | Y = "e" (*) Y | Y = "e" (*) ["d"] | Y = (*) "e" | Y = (*) "e" Y We’ll call this augmented state a LR(0-1) state (it’s not quite how a LR(1) state is typically defined). Now that we’ve added the lookahead, this state is no longer inconsistent, as the parser always knows what to do. The next few sections will show how we can derive this lookahead automatically. Step 2: Convert LR(0) states into LR(0-1) states. The first step in the process is to naively convert all of our LR(0) states into LR(0-1) states (with no additional lookahead). We will denote the “no extra lookahead” case by writing a special “wildcard” lookahead _. We will thus denote the inconsistent state after transformation as follows, where each reduction has the “wildcard” lookahead: S1 = X = "e" (*) X | X = "e" (*) [_] | X = (*) "e" | X = (*) "e" "X" | Y = "e" (*) Y | Y = "e" (*) [_] | Y = (*) "e" | Y = (*) "e" Y Naturally, the state is still inconsistent. Step 3: Resolve inconsistencies. In the next step, we iterate over all of our LR(0-1) states. In this example, we will not need to create new states, but in future examples we will. The iteration thus consists of a queue and some code like this: let mut queue = Queue::new(); queue.extend(/* all states */); while let Some(s) = queue.pop_front() { if /* s is an inconsistent state */ { resolve_inconsistencies(s, &mut queue); } } Step 3a: Build the lane table. To resolve an inconsistent state, we first construct a lane table. This is done by the code in the lane module (the table module maintains the data structure). It works by structing at each conflict and tracing backwards. Let’s start with the final table we will get for the state S1 and then we will work our way back to how it is constructed. First, let’s identify the conflicting actions from S1 and give them indices: S1 = X = (*) "e" // C0 -- shift "e" | X = "e" (*) [_] // C1 -- reduce `X = "e" (*)` | X = (*) "e" "X" // C0 -- shift "e" | X = "e" (*) X | Y = (*) "e" // C0 -- shift "e" | Y = "e" (*) [_] // C2 -- reduce `Y = "e" (*)` | Y = (*) "e" Y // C0 -- shift "e" | Y = "e" (*) Y Several of the items can cause “Confliction Action 0” (C0), which is to shift an "e". These are all mutually compatible. However, there are also two incompatible actions: C1 and C2, both reductions. In fact, we’ll find that we look back at state S0, these ‘conflicting’ actions all occur with distinct lookahead. The purpose of the lane table is to summarize that information. The lane table we will up constructing for these conflicting actions is as follows: | State | C0 | C1 | C2 | Successors | | S0 | | ["c"] | ["d"] | {S1} | | S1 | ["e"] | [] | [] | {S1} | Here the idea is that the lane table summarizes the lookahead information contributed by each state. Note that for the shift the state S1 already has enough lookahead information: we only shift when we see the terminal we need next (“e”). But state C1 and C2, the lookahead actually came from S0, which is a predecessor state. As I said earlier, the algorithm for constructing the table works by looking at the conflicting item and walking backwards. So let’s illustrate with conflict C1. We have the conflicting item X = "e" (*), and we are basically looking to find its lookahead. We know that somewhere in the distant past of our state machine there must be an item like Foo = ...a (*) X ...b that led us here. We want to find that item, so we can derive the lookahead from ...b (whatever symbols come after X). To do this, we will walk the graph. Our state at any point in time will be the pair of a state and an item in that state. To start out, then, we have (S1, X = "e" (*)), which is the conflict C1. Because the (*) is not at the “front” of this item, we have to figure out where this "e" came from on our stack, so we look for predecessors of the state S1 which have an item like X = (*) e. This leads us to S0 and also S1. So we can push two states in our search: (S0, X = (*) "e") and (S1, X 5B= (*) "e"). Let’s consider each in turn. The next state is then (S0, X = (*) "e"). Here the (*) lies at the front of the item, so we search the same state S0 for items that would have led to this state via an epsilon move. This basically means an item like Foo = ... (*) X ... – i.e., where the (*) appears directly before the nonterminal X. In our case, we will find G0 = (*) X "c". This is great, because it tells us some lookahead (“c”, in particular), and hence we can stop our search. We add to the table the entry that the state S0 contributes lookahead “c” to the conflict C1. In some cases, we might find something like Foo = ... (*) X instead, where the X we are looking for appears at the end. In that case, we have to restart our search, but looking for the lookahead for Foo. The next state in our case is (S1, X = (*) e). Again the (*) lies at the beginning and hence we search for things in the state S1 where X is the next symbol. We find X = "e" (*) X. This is not as good as last time, because there are no symbols appearing after X in this item, so it does not contribute any lookahead. We therefore can’t stop our search yet, but we push the state (S1, X = "e" (*) X) – this corresponds to the Foo state I mentioned at the end of the last paragraph, except that in this case Foo is the same nonterminal X we started with. Looking at (S1, X = "e" (*) X), we again have the (*) in the middle of the item, so we move it left, searching for predecessors with the item X = (*) e X. We will (again) find S0 and S1 have such items. In the case of S0, we will (again) find the context “c”, which we dutifully add to the table (this has no effect, since it is already present). In the case of S1, we will (again) wind up at the state (S1, X = "e" (*) X). Since we’ve already visited this state, we stop our search, it will not lead to new context. At this point, our table column for C1 is complete. We can repeat the process for C2, which plays out in an analogous way. Step 3b: Update the lookahead Looking at the lane table we built, we can union the context sets in any particular column. We see that the context sets for each conflicting action are pairwise disjoint. Therefore, we can simply update each reduce action in our state with those lookaheads in mind, and hence render it consistent: S1 = X = (*) "e" | X = "e" (*) ["c"] // lookahead from C1 | X = (*) "e" "X" | X = "e" (*) X | Y = (*) "e" | Y = "e" (*) ["d"] // lookahead from C2 | Y = (*) "e" Y | Y = "e" (*) Y This is of course also what the LALR(1) state would look like (though it would include context for the other items, though that doesn’t play into the final machine execution). At this point we’ve covered enough to handle the grammar G0. Let’s turn to a more complex grammar, grammar G1, and then we’ll come back to cover the remaining steps. Second example: the grammar G1 G1 is a (typo corrected) version of the grammar from the paper. This grammar is not LALR(1) and hence it is more interesting, because it requires splitting states. Grammar G1 G1 = "a" X "d" | "a" Y "c" | "b" X "c" | "b" Y "d" X = "e" X | "e" Y = "e" Y | "e" The key point of this grammar is that when we see ... "e" "c" and we wish to know whether to reduce to X or Y, we don’t have enough information. We need to know what is in the ..., because "a" "e" "c" means we reduce "e" to Y and "b" "e" "c" means we reduce to X. In terms of our state machine, this corresponds to splitting the states responsible for X and Y based on earlier context. Let’s look at a subset of the LR(0) states for G1: S0 = G0 = (*) "a" X "d" | G0 = (*) "a" Y "c" | G0 = (*) "b" X "c" | G0 = (*) "b" X "d" S1 = G0 = "a" (*) X "d" | G0 = "a" (*) Y "c" | X = (*) "e" X | X = (*) "e" | Y = (*) "e" Y | Y = (*) "e" S2 = G0 = "b" (*) X "c" | G0 = "b" (*) Y "d" | X = (*) "e" X | X = (*) "e" | Y = (*) "e" Y | Y = (*) "e" S3 = X = "e" (*) X | X = "e" (*) // C1 -- can reduce | X = (*) "e" // C0 -- can shift "e" | X = (*) "e" "X" // C0 -- can shift "e" | Y = "e" (*) Y | Y = "e" (*) // C2 -- can reduce | Y = (*) "e" // C0 -- can shift "e" | Y = (*) "e" Y // C0 -- can shift "e" Here we can see the problem. The state S3 is inconsistent. But it is reachable from both S1 and S2. If we come from S1, then we can have (e.g.) X "d", but if we come from S2, we expect X "c". Let’s walk through our algorithm again. I’ll start with step 3a. Step 3a: Build the lane table. The lane table for state S3 will look like this: | State | C0 | C1 | C2 | Successors | | S1 | | ["d"] | ["c"] | {S3} | | S2 | | ["c"] | ["d"] | {S3} | | S3 | ["e"] | [] | [] | {S3} | Now if we union each column, we see that both C1 and C2 wind up with lookahead {"c", "d"}. This is our problem. We have to isolate things better. Therefore, step 3b (“update lookahead”) does not apply. Instead we attempt step 3c. Step 3c: Isolate lanes This part of the algorithm is only loosely described in the paper, but I think it works as follows. We will employ a union-find data structure. With each set, we will record a “context set”, which records for each conflict the set of lookahead tokens (e.g., {C1:{"d"}}). A context set tells us how to map the lookahead to an action; therefire, to be self-consistent, the lookaheads for each conflict must be mutually disjoint. In other words, {C1:{"d"}, C2:{"c"}} is valid, and says to do C1 if we see a “d” and C2 if we see a “c”. But {C1:{"d"}, C2:{"d"}} is not, because there are two actions. Initially, each state in the lane table is mapped to itself, and the conflict set is derived from its column in the lane table: S1 = {C1:d, C2:c} S2 = {C1:c, C2:d} S3 = {C0:e} We designate “beachhead” states as those states in the table that are not reachable from another state in the table (i.e., using the successors). In this case, those are the states S1 and S2. We will be doing a DFS through the table and we want to use those as the starting points. (Question: is there always at least one beachhead state? Seems like there must be.) So we begin by iterating over the beachhead states. for beachhead in beachheads { ... } When we visit a state X, we will examine each of its successors Y. We consider whether the context set for Y can be merged with the context set for X. So, in our case, X will be S1 to start and Y will be S3. In this case, the context set can be merged, and hence we union S1, S3 and wind up with the following union-find state: S1,S3 = {C0:e, C1:d, C2:c} S2 = {C1:c, C2:d} (Note that this union is just for the purpose of tracking context; it doesn’t imply that S1 and S3 are the ‘same states’ or anything like that.) Next we examine the edge S3 -> S3. Here the contexts are already merged and everything is happy, so we stop. (We already visited S3, after all.) This finishes our first beachhead, so we proceed to the next edge, S2 -> S3. Here we find that we cannot union the context: it would produce an inconsistent state. So what we do is we clone S3 to make a new state, S3’, with the initial setup corresponding to the row for S3 from the lane table: S1,S3 = {C0:e, C1:d, C2:c} S2 = {C1:c, C2:d} S3' = {C0:e} This also involves updating our LR(0-1) state set to have a new state S3’. All edges from S2 that led to S3 now lead to S3’; the outgoing edges from S3’ remain unchanged. (At least to start.) Therefore, the edge S2 -> S3 is now S2 -> S3'. We can now merge the conflicts: S1,S3 = {C0:e, C1:d, C2:c} S2,S3' = {C0:e, C1:c, C2:d} Now we examine the outgoing edge S3’ -> S3. We cannot merge these conflicts, so we search (greedily, I guess) for a clone of S3 where we can merge the conflicts. We find one in S3’, and hence we redirect the S3 edge to S3’ and we are done. (I think the actual search we want is to make first look for a clone of S3 that is using literally the same context as us (i.e., same root node), as in this case. If that is not found, then we search for one with a mergable context. If that fails, then we clone a new state.) The final state thus has two copies of S3, one for the path from S1, and one for the path from S2, which gives us enough context to proceed. Conclusion As I wrote, I’ve been experimenting with the Lane Table algorithm in LALRPOP and I now have a simple prototype that seems to work. It is not by any means exhaustively tested – in fact, I’d call it minimally tested – but hopefully I’ll find some time to play around with it some more and take it through its paces. It at least handles the examples in the paper. The implementation is also inefficient in various ways. Some of them are minor – it clones more than it needs to, for example – and easily corrected. But I also suspect that one can do a lot more caching and sharing of results. Right now, for example, I construct the lane table for each inconsistent state completely from scratch, but perhaps there are ways to preserve and share results (it seems naively as if this should be possible). On the other hand, constructing the lane table can probably be made pretty fast: it doesn’t have to traverse that much of the grammar. I’ll have to try it on some bigger examples and see how it scales. Edits The lane table I originally described had the wrong value for the successor column. Corrected. [Less]
Posted 7 days ago by Nicholas D. Matsakis
For some time now I’ve been interested in better ways to construct LR(1) parsers. LALRPOP currently allows users to choose between the full LR(1) algorithm or the LALR(1) subset. Neither of these choices is very satisfying: the full LR(1) ... [More] algorithm gives pretty intuitive results but produces a lot of states; my hypothesis was that, with modern computers, this wouldn’t matter anymore. This is sort of true – e.g., I’m able to generate and process even the full Rust grammar – but this results in a ton of generated code. the LALR(1) subset often works but sometimes mysteriously fails with indecipherable errors. This is because it is basically a hack that conflates states in the parsing table according to a heuristic; when this heuristic fails, you get strange results. The Lane Table algorithm published by Pager and Chen at APPLC ‘12 offers an interesting alternative. It is an alternative to earlier work by Pager, the “lane tracing” algorithm and practical general method. In any case, the goal is to generate an LALR(1) state machine when possible and gracefully scale up to the full LR(1) state machine as needed. I found the approach appealing, as it seemed fairly simple, and also seemed to match what I would try to do intuitively. I’ve been experimenting with the Lane Table algorithm in LALRPOP and I now have a simple prototype that seems to work. Implementing it required that I cover various cases that the paper left implicit, and the aim of this blog post is to describe what I’ve done so far. I do not claim that this description is what the authors originally intended; for all I know, it has some bugs, and I certainly think it can be made more efficient. My explanation is intended to be widely readable, though I do assume some familiarity with the basic workings of an LR-parser (i.e., that we shift states onto a stack, execute reductions, etc). But I’ll review the bits of table construction that you need. First example grammar: G0 To explain the algorithm, I’m going to walk through two example grammars. The first I call G0 – it is a reduced version of what the paper calls G1. It is interesting because it does not require splitting any states, and so we wind up with the same number of states as in LR(0). Put another way, it is an LALR(1) grammar. I will be assuming a basic familiarity with the LR(0) and LR(1) state construction. Grammar G0 G0 = X "c" | Y "d" X = "e" X | "e" Y = "e" Y | "e" The key point here is that if you have "e" ..., you could build an X or a Y from that "e" (in fact, there can be any number of "e" tokens). You ultimately decide based on whether the "e" tokens are followed by a "c" (in which case you build an X) or a "d" (in which case you build a Y). LR(0), since it has no lookahead, can’t handle this case. LALR(1) can, since it augments LR(0) with a token of lookahead; using that, after we see the "e", we can peek at the next thing and figure out what to do. Step 1: Construct an LR(0) state machine We begin by constructing an LR(0) state machine. If you’re not familiar with the process, I’ll briefly outline it here, though you may want to read up separately. Basically, we will enumerate a number of different states indicating what kind of content we have seen so far. The first state S0 indicates that we are at the very beginning out “goal item” G0: S0 = G0 = (*) X "c" | G0 = (*) Y "d" | ... // more items to be described later The G0 = (*) X "c" indicates that we have started parsing a G0; the (*) is how far we have gotten (namely, nowhere). There are two items because there are two ways to make a G0. Now, in these two items, immediately to the right of the (*) we see the symbols that we expect to see next in the input: in this case, an X or a Y. Since X and Y are nonterminals – i.e., symbols defined in the grammar rather than tokens in the input – this means we might also be looking at the beginning of an X or a Y, so we have to extend S0 to account for that possibility (these are sometimes called “epsilon moves”, since these new possibilities arise from consuming no input, which is denoted as “epsilon”): S0 = G0 = (*) X "c" | G0 = (*) Y "d" | X = (*) "e" X | X = (*) "e" | Y = (*) "e" Y | Y = (*) "e" This completes the state S0. Looking at these various possibilities, we see that a number of things might come next in the input: a "e", an X, or a Y. (The nonterminals X and Y can “come next” once we have seen their entire contents.) Therefore we construct three successors states: one (S1) accounts for what happens when see an "e". The other two (S3 and S5 below) account for what happens after we recognize an X or a Y, respectively. S1 (what happens if we see an "e") is derived by advancing the (*) past the "e": S1 = X = "e" (*) X | X = "e" (*) | ... // to be added later | Y = "e" (*) Y | Y = "e" (*) | ... // to be added later Here we dropped the G0 = ... possibilities, since those would have required consuming a X or Y. But we have kept the X = "e" (*) X etc. Again we find that there are nonterminals that can come next (X and Y again) and hence we have to expand the state to account for the possibility that, in addition to being partway through the X and Y, we are at the beginning of another X or Y: S1 = X = "e" (*) X | X = "e" (*) | X = (*) "e" // added this | X = (*) "e" "X" // added this | Y = "e" (*) Y | Y = "e" (*) | Y = (*) "e" // added this | Y = (*) "e" Y // added this Here again we expect either a "e", an "X", or a "Y". If we again check what happens when we consume an "e", we will find that we reach S1 again (i.e., moving past an "e" gets us to the same set of possibilities that we already saw). The remaining states all arise from consuming an "X" or a "Y" from S0 or S1: S2 = X = "e" X (*) S3 = G0 = X (*) "c" S4 = Y = "e" Y (*) S5 = G0 = Y (*) "d" S6 = G0 = X "c" (*) S7 = G0 = Y "d" (*) We can represent the set of states as a graph, with edges representing the transitions between states. The edges are labeled with the symbol ("e", X, etc) that gets consumed to move between states: S0 -"e"-> S1 S1 -"e"-> S1 S1 --X--> S2 S0 --X--> S3 S1 --Y--> S4 S0 --Y--> S5 S3 -"c"-> S6 S5 -"d"-> S7 Reducing and inconsistent states. Let’s take another look at this state S1: S1 = X = "e" (*) X | X = "e" (*) // reduce possible | X = (*) "e" | X = (*) "e" "X" | Y = "e" (*) Y | Y = "e" (*) // reduce possible | Y = (*) "e" | Y = (*) "e" Y There are two interesting things about this state. The first is that it contains some items where the (*) comes at the very end, like X = "e" (*). What this means is that we have seen an "e" in the input, which is enough to construct an X. If we chose to do so, that is called reducing. The effect would be to build up an X, which would then be supplied as an input to a prior state (e.g., S0). However, the other interesting thing is that our state actually has three possible things it could do: it could reduce X = "e" (*) to construct an X, but it could also reduce Y = "e" (*) to construct a Y; finally, it can shift an "e". Shifting means that we do not execute any reductions, and instead we take the next input and move to the next state (which in this case would be S1 again). A state that can do both shifts and reduces, or more than one reduce, is called an inconsistent state. Basically it means that there is an ambiguity, and the parser won’t be able to figure out what to do – or, at least, it can’t figure out what to do unless we take some amount of lookahead into account. This is where LR(1) and LALR(1) come into play. In an LALR(1) grammar, we keep the same set of states, but we augment the reductions with a bit of lookahead. In this example, as we will see, that suffices – for example, if you look at the grammar, you will see that we only need to do the X = "e" (*) reduction if the next thing in the input is a "c". And similarly we only need to do the Y = "e" (*) reduction if the next thing is a "d". So we can transform the state to add some conditions, and then it is clear what to do: S1 = X = "e" (*) X shift if next thing is a `X` | X = "e" (*) reduce if next thing is a "c" | X = (*) "e" shift if next thing is a "e" | X = (*) "e" "X" shift if next thing is a "e" | Y = "e" (*) Y shift if next thing is a `Y` | Y = "e" (*) reduce if next thing is a "d" | Y = (*) "e" shift if next thing is a "e" | Y = (*) "e" Y shift if next thing is a "e" Note that the shift vs reduce part is implied by where the (*) is: we always shift unless the (*) is at the end. So usually we just write the lookahead part. Moreover, the “lookahead” for a shift is pretty obvious: it’s whatever to the right of the (*), so we’ll leave that out. That leaves us with this, where ["c"] (for example) means “only do this reduction if the lookahead is "c"”: S1 = X = "e" (*) X | X = "e" (*) ["c"] | X = (*) "e" | X = (*) "e" "X" | Y = "e" (*) Y | Y = "e" (*) ["d"] | Y = (*) "e" | Y = (*) "e" Y We’ll call this augmented state a LR(0-1) state (it’s not quite how a LR(1) state is typically defined). Now that we’ve added the lookahead, this state is no longer inconsistent, as the parser always knows what to do. The next few sections will show how we can derive this lookahead automatically. Step 2: Convert LR(0) states into LR(0-1) states. The first step in the process is to naively convert all of our LR(0) states into LR(0-1) states (with no additional lookahead). We will denote the “no extra lookahead” case by writing a special “wildcard” lookahead _. We will thus denote the inconsistent state after transformation as follows, where each reduction has the “wildcard” lookahead: S1 = X = "e" (*) X | X = "e" (*) [_] | X = (*) "e" | X = (*) "e" "X" | Y = "e" (*) Y | Y = "e" (*) [_] | Y = (*) "e" | Y = (*) "e" Y Naturally, the state is still inconsistent. Step 3: Resolve inconsistencies. In the next step, we iterate over all of our LR(0-1) states. In this example, we will not need to create new states, but in future examples we will. The iteration thus consists of a queue and some code like this: let mut queue = Queue::new(); queue.extend(/* all states */); while let Some(s) = queue.pop_front() { if /* s is an inconsistent state */ { resolve_inconsistencies(s, &mut queue); } } Step 3a: Build the lane table. To resolve an inconsistent state, we first construct a lane table. This is done by the code in the lane module (the table module maintains the data structure). It works by structing at each conflict and tracing backwards. Let’s start with the final table we will get for the state S1 and then we will work our way back to how it is constructed. First, let’s identify the conflicting actions from S1 and give them indices: S1 = X = (*) "e" // C0 -- shift "e" | X = "e" (*) [_] // C1 -- reduce `X = "e" (*)` | X = (*) "e" "X" // C0 -- shift "e" | X = "e" (*) X | Y = (*) "e" // C0 -- shift "e" | Y = "e" (*) [_] // C2 -- reduce `Y = "e" (*)` | Y = (*) "e" Y // C0 -- shift "e" | Y = "e" (*) Y Several of the items can cause “Confliction Action 0” (C0), which is to shift an "e". These are all mutually compatible. However, there are also two incompatible actions: C1 and C2, both reductions. In fact, we’ll find that we look back at state S0, these ‘conflicting’ actions all occur with distinct lookahead. The purpose of the lane table is to summarize that information. The lane table we will up constructing for these conflicting actions is as follows: | State | C0 | C1 | C2 | Successors | | S0 | | ["c"] | ["d"] | {S3} | | S1 | ["e"] | [] | [] | {S3} | Here the idea is that the lane table summarizes the lookahead information contributed by each state. Note that for the shift the state S1 already has enough lookahead information: we only shift when we see the terminal we need next (“e”). But state C1 and C2, the lookahead actually came from S0, which is a predecessor state. As I said earlier, the algorithm for constructing the table works by looking at the conflicting item and walking backwards. So let’s illustrate with conflict C1. We have the conflicting item X = "e" (*), and we are basically looking to find its lookahead. We know that somewhere in the distant past of our state machine there must be an item like Foo = ...a (*) X ...b that led us here. We want to find that item, so we can derive the lookahead from ...b (whatever symbols come after X). To do this, we will walk the graph. Our state at any point in time will be the pair of a state and an item in that state. To start out, then, we have (S1, X = "e" (*)), which is the conflict C1. Because the (*) is not at the “front” of this item, we have to figure out where this "e" came from on our stack, so we look for predecessors of the state S1 which have an item like X = (*) e. This leads us to S0 and also S1. So we can push two states in our search: (S0, X = (*) "e") and (S1, X 5B= (*) "e"). Let’s consider each in turn. The next state is then (S0, X = (*) "e"). Here the (*) lies at the front of the item, so we search the same state S0 for items that would have led to this state via an epsilon move. This basically means an item like Foo = ... (*) X ... – i.e., where the (*) appears directly before the nonterminal X. In our case, we will find G0 = (*) X "c". This is great, because it tells us some lookahead (“c”, in particular), and hence we can stop our search. We add to the table the entry that the state S0 contributes lookahead “c” to the conflict C1. In some cases, we might find something like Foo = ... (*) X instead, where the X we are looking for appears at the end. In that case, we have to restart our search, but looking for the lookahead for Foo. The next state in our case is (S1, X = (*) e). Again the (*) lies at the beginning and hence we search for things in the state S1 where X is the next symbol. We find X = "e" (*) X. This is not as good as last time, because there are no symbols appearing after X in this item, so it does not contribute any lookahead. We therefore can’t stop our search yet, but we push the state (S1, X = "e" (*) X) – this corresponds to the Foo state I mentioned at the end of the last paragraph, except that in this case Foo is the same nonterminal X we started with. Looking at (S1, X = "e" (*) X), we again have the (*) in the middle of the item, so we move it left, searching for predecessors with the item X = (*) e X. We will (again) find S0 and S1 have such items. In the case of S0, we will (again) find the context “c”, which we dutifully add to the table (this has no effect, since it is already present). In the case of S1, we will (again) wind up at the state (S1, X = "e" (*) X). Since we’ve already visited this state, we stop our search, it will not lead to new context. At this point, our table column for C1 is complete. We can repeat the process for C2, which plays out in an analogous way. Step 3b: Update the lookahead Looking at the lane table we built, we can union the context sets in any particular column. We see that the context sets for each conflicting action are pairwise disjoint. Therefore, we can simply update each reduce action in our state with those lookaheads in mind, and hence render it consistent: S1 = X = (*) "e" | X = "e" (*) ["c"] // lookahead from C1 | X = (*) "e" "X" | X = "e" (*) X | Y = (*) "e" | Y = "e" (*) ["d"] // lookahead from C2 | Y = (*) "e" Y | Y = "e" (*) Y This is of course also what the LALR(1) state would look like (though it would include context for the other items, though that doesn’t play into the final machine execution). At this point we’ve covered enough to handle the grammar G0. Let’s turn to a more complex grammar, grammar G1, and then we’ll come back to cover the remaining steps. Second example: the grammar G1 G1 is a (typo corrected) version of the grammar from the paper. This grammar is not LALR(1) and hence it is more interesting, because it requires splitting states. Grammar G1 G1 = "a" X "d" | "a" Y "c" | "b" X "c" | "b" Y "d" X = "e" X | "e" Y = "e" Y | "e" The key point of this grammar is that when we see ... "e" "c" and we wish to know whether to reduce to X or Y, we don’t have enough information. We need to know what is in the ..., because "a" "e" "c" means we reduce "e" to Y and "b" "e" "c" means we reduce to X. In terms of our state machine, this corresponds to splitting the states responsible for X and Y based on earlier context. Let’s look at a subset of the LR(0) states for G1: S0 = G0 = (*) "a" X "d" | G0 = (*) "a" Y "c" | G0 = (*) "b" X "c" | G0 = (*) "b" X "d" S1 = G0 = "a" (*) X "d" | G0 = "a" (*) Y "c" | X = (*) "e" X | X = (*) "e" | Y = (*) "e" Y | Y = (*) "e" S2 = G0 = "b" (*) X "c" | G0 = "b" (*) Y "d" | X = (*) "e" X | X = (*) "e" | Y = (*) "e" Y | Y = (*) "e" S3 = X = "e" (*) X | X = "e" (*) // C1 -- can reduce | X = (*) "e" // C0 -- can shift "e" | X = (*) "e" "X" // C0 -- can shift "e" | Y = "e" (*) Y | Y = "e" (*) // C2 -- can reduce | Y = (*) "e" // C0 -- can shift "e" | Y = (*) "e" Y // C0 -- can shift "e" Here we can see the problem. The state S3 is inconsistent. But it is reachable from both S1 and S2. If we come from S1, then we can have (e.g.) X "d", but if we come from S2, we expect X "c". Let’s walk through our algorithm again. I’ll start with step 3a. Step 3a: Build the lane table. The lane table for state S3 will look like this: | State | C0 | C1 | C2 | Successors | | S1 | | ["d"] | ["c"] | {S3} | | S2 | | ["c"] | ["d"] | {S3} | | S3 | ["e"] | [] | [] | {S3} | Now if we union each column, we see that both C1 and C2 wind up with lookahead {"c", "d"}. This is our problem. We have to isolate things better. Therefore, step 3b (“update lookahead”) does not apply. Instead we attempt step 3c. Step 3c: Isolate lanes This part of the algorithm is only loosely described in the paper, but I think it works as follows. We will employ a union-find data structure. With each set, we will record a “context set”, which records for each conflict the set of lookahead tokens (e.g., {C1:{"d"}}). A context set tells us how to map the lookahead to an action; therefire, to be self-consistent, the lookaheads for each conflict must be mutually disjoint. In other words, {C1:{"d"}, C2:{"c"}} is valid, and says to do C1 if we see a “d” and C2 if we see a “c”. But {C1:{"d"}, C2:{"d"}} is not, because there are two actions. Initially, each state in the lane table is mapped to itself, and the conflict set is derived from its column in the lane table: S1 = {C1:d, C2:c} S2 = {C1:c, C2:d} S3 = {C0:e} We designate “beachhead” states as those states in the table that are not reachable from another state in the table (i.e., using the successors). In this case, those are the states S1 and S2. We will be doing a DFS through the table and we want to use those as the starting points. (Question: is there always at least one beachhead state? Seems like there must be.) So we begin by iterating over the beachhead states. for beachhead in beachheads { ... } When we visit a state X, we will examine each of its successors Y. We consider whether the context set for Y can be merged with the context set for X. So, in our case, X will be S1 to start and Y will be S3. In this case, the context set can be merged, and hence we union S1, S3 and wind up with the following union-find state: S1,S3 = {C0:e, C1:d, C2:c} S2 = {C1:c, C2:d} (Note that this union is just for the purpose of tracking context; it doesn’t imply that S1 and S3 are the ‘same states’ or anything like that.) Next we examine the edge S3 -> S3. Here the contexts are already merged and everything is happy, so we stop. (We already visited S3, after all.) This finishes our first beachhead, so we proceed to the next edge, S2 -> S3. Here we find that we cannot union the context: it would produce an inconsistent state. So what we do is we clone S3 to make a new state, S3’, with the initial setup corresponding to the row for S3 from the lane table: S1,S3 = {C0:e, C1:d, C2:c} S2 = {C1:c, C2:d} S3' = {C0:e} This also involves updating our LR(0-1) state set to have a new state S3’. All edges from S2 that led to S3 now lead to S3’; the outgoing edges from S3’ remain unchanged. (At least to start.) Therefore, the edge S2 -> S3 is now S2 -> S3'. We can now merge the conflicts: S1,S3 = {C0:e, C1:d, C2:c} S2,S3' = {C0:e, C1:c, C2:d} Now we examine the outgoing edge S3’ -> S3. We cannot merge these conflicts, so we search (greedily, I guess) for a clone of S3 where we can merge the conflicts. We find one in S3’, and hence we redirect the S3 edge to S3’ and we are done. (I think the actual search we want is to make first look for a clone of S3 that is using literally the same context as us (i.e., same root node), as in this case. If that is not found, then we search for one with a mergable context. If that fails, then we clone a new state.) The final state thus has two copies of S3, one for the path from S1, and one for the path from S2, which gives us enough context to proceed. Conclusion As I wrote, I’ve been experimenting with the Lane Table algorithm in LALRPOP and I now have a simple prototype that seems to work. It is not by any means exhaustively tested – in fact, I’d call it minimally tested – but hopefully I’ll find some time to play around with it some more and take it through its paces. It at least handles the examples in the paper. The implementation is also inefficient in various ways. Some of them are minor – it clones more than it needs to, for example – and easily corrected. But I also suspect that one can do a lot more caching and sharing of results. Right now, for example, I construct the lane table for each inconsistent state completely from scratch, but perhaps there are ways to preserve and share results (it seems naively as if this should be possible). On the other hand, constructing the lane table can probably be made pretty fast: it doesn’t have to traverse that much of the grammar. I’ll have to try it on some bigger examples and see how it scales. [Less]
Posted 8 days ago by Air Mozilla
This is a weekly call with some of the Reps to discuss all matters about/affecting Reps and invite Reps to share their work with everyone.
Posted 8 days ago
Let’s define terms: Employable Suitable for paid work. Able to be used. So being employable means that someone is useful and can fill a particular role, whether as an employee or freelancer, for paid work. Having written my doctoral thesis ... [More] on digital literacy, and led Mozilla’s Web Literacy Map from inception to v1.5, one of my biggest frustrations has been how new literacies are developed. It’s all very well having a framework and a development methodology, but how on earth do you get to actually effect the change you want to see in the world? For the past few weeks, the germ of an idea has been growing in my mind. On the one hand there are digital literacy frameworks specifying what people should be able to know, do, and think. On the other there are various approaches to employability skills. The latter is a reaction to formal education institutions being required to track the destination of people who graduate (or leave) them. The third factor in play here is Open Badges. This is a metadat specification and ecosystem for verifiable digital credentials that can be used to provide evidence of anything. In addition, anybody can earn and issue them, the value coming in recognition of the issuing body, and/or whether the supplied evidence shows that the individual has done something worthwhile. The simplest way of representing these three areas of focus is using a Venn diagram: The reason it’s important that Open Badges are in there is that they provide evidence of digital literacies and employability skills. They provide a ‘route to market’ for the frameworks to actually make a difference in the world. I know there’s an appetite for this, as I present on a regular basis and one of the slides I use is this one from Sussex Downs College: People ask me to go back to the slide ‘so they can take a photo of it’. Our co-op knows the team behind this project, as we helped with the kick-off of the project, and then ran a thinkathon for them as they looked to scale it. The Sussex Downs approach has a number of great elements to it: a three part badge system; competencies defined with the help of local businesses; a very visual design; and the homely metaphor of a ‘passport’. One thing that’s perhaps lacking is the unpacking of ‘Digital Literacy’ as more than a single element on the grid. To my mind, there’s a plethora of digital knowledge, skills, and behaviours that’s directly applicable to employability. In my experience, it’s important to come up with a framework for the work that you do. That’s why Sussex Downs’ Employability Passport is so popular. However, I think there’s also a need for people to be able to do the equivalent of pressing ‘view source’ on the framework to see why and how it was put together. What I’d like to do is to come up with a framework for digital employability that looks at the knowledge, skills, and understanding to thrive in the new digital economy. That will necessarily be a mix of old-school things like using Outlook, but also newer workflows such as GitHub. Once that’s defined, preferably with input from a diverse list of contributors and endorsement from various relevant organisations, it will be time to issue badges. Once the ‘rubber hits the road’ we’ll no doubt then need to update the framework. It’s an iterative process. Early days, but I wanted to put something out there. Get in touch if this sounds interesting! Comments? Questions? I’m @dajbelshaw on Twitter, or you can email me: hello@dynamicskillset.com [Less]
Posted 8 days ago by mihai.boldan
Hello Mozillians, This week, Mozilla community from Tamilnadu organized and held a Testday event in various campus clubs from their region. I just wanted to thank you all for taking part in this. With the community help, Mozilla is improving every ... [More] day. Several test cases were executed for the WebM Alpha, Reader Mode Displays Estimate Reading Time and Quantum – Compositor Process features. Many thanks to Prasanth P, Surentharan R A, Monesh, Subash, Rohit R, @varun1102, Akksaya, Roshini, Swathika, Suvetha Sri, Bhava, aiswarya.M, Aishvarya, Divya, Arpana, Nivetha, Vallikannu, Pavithra Roselin, Suryakala, prakathi, Bhargavi.G, Vignesh.R, Meganisha.B, Aishwarya.k, harshini.k, Rajesh, Krithika Sowbarnika, harini shilpa, Dhinesh kumar, KAVIPRIYA.S, HARITHA K SANKARI, Nagaraj V, abarna, Sankararaman, Harismitaa R K, Kavya, Monesh, Harini, Vignesh, Anushri, Vishnu Priya, Subash.M, Vinothini K, Pavithra R. Keep up the good work! Mihai Boldan, QA Community Mentor Firefox for Desktop, Release QA Team [Less]
Posted 8 days ago
The Rust team is happy to announce the latest version of Rust, 1.16.0. Rust is a systems programming language focused on safety, speed, and concurrency. If you have a previous version of Rust installed, getting Rust 1.16 is as easy as: $ rustup ... [More] update stable If you don’t have it already, you can get rustup from the appropriate page on our website, and check out the detailed release notes for 1.16.0 on GitHub. What’s in 1.16.0 stable The largest addition to Rust 1.16 is cargo check. This new subcommand should speed up the development workflow in many cases. What does it do? Let’s take a step back and talk about how rustc compiles your code. Compilation has many “passes”, that is, there are many distinct steps that the compiler takes on the road from your source code to producing the final binary. You can see each of these steps (and how much time and memory they take) by passing -Z time-passes to a nightly compiler: rustc .\hello.rs -Z time-passes time: 0.003; rss: 16MB parsing time: 0.000; rss: 16MB recursion limit time: 0.000; rss: 16MB crate injection time: 0.000; rss: 16MB plugin loading time: 0.000; rss: 16MB plugin registration time: 0.049; rss: 34MB expansion There’s a lot of them. However, you can think of this process in two big steps: first, rustc does all of its safety checks, makes sure your syntax is correct, all that stuff. Second, once it’s satisfied that everything is in order, it produces the actual binary code that you end up executing. It turns out that that second step takes a lot of time. And most of the time, it’s not neccesary. That is, when you’re working on some Rust code, many developers will get into a workflow like this: Write some code. Run cargo build to make sure it compiles. Repeat 1-2 as needed. Run cargo test to make sure your tests pass. GOTO 1. In step two, you never actually run your code. You’re looking for feedback from the compiler, not to actually run the binary. cargo check supports exactly this use-case: it runs all of the compiler’s checks, but doesn’t produce the final binary. So how much speedup do you actually get? Like most performance related questions, the answer is “it depends.” Here are some very un-scientific benchmarks:   thanks cargo diesel initial build 134.75s 236.78s 15.27s initial check 50.88s 148.52s 12.81s speedup 2.648 1.594 1.192 secondary build 15.97s 64.34s 13.54s secondary check 2.9s 9.29s 12.3s speedup 5.506 6.925 1.100 The ‘initial’ categories are the first build after cloning down a project. The ‘secondary’ categories involved adding one blank line to the top of src\lib.rs and running the command again. That’s why the initial ones are more dramatic; they involve also doing this for all dependencies, as well as the crate itself. As you can see, larger projects with many dependencies see a big improvement, but smaller ones see much more modest gains. We are still working on improving compile-times generally as well, though we don’t have anything in particular to highlight at this time. Other improvements To support cargo check, rustc has learned to emit a new kind of file: .rmeta. This file will contain only the metadata about a particular crate. cargo check needs this for your dependencies, to let the compiler check types and such from them. It’s also useful for the Rust Language Server, and possibly more tools in the future. Another large change is the removal of a long-standing diagnostic: consider using an explicit lifetime parameter. This diagnostic would kick in whenever you had an incorrect lifetime annotation, and the compiler thought that you might have meant something else. Consider this code: use std::str::FromStr; pub struct Name<'a> { name: &'a str, } impl<'a> FromStr for Name<'a> { type Err = (); fn from_str(s: &str) -> Result<Name, ()> { Ok(Name { name: s }) } } Here, Rust isn’t sure what to do with the lifetimes; as written, the code doesn’t guarantee that s will live as long as Name, which is required for Name to be valid. Let’s try to compile this code with Rust 1.15.1: > rustc +1.15.1 foo.rs --crate-type=lib error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in generic type due to conflicting requirements --> .\foo.rs:10:5 | 10 | fn from_str(s: &str) -> Result()> { | _____^ starting here... 11 | | Ok(Name { name: s }) 12 | | } | |_____^ ...ending here | help: consider using an explicit lifetime parameter as shown: fn from_str(s: &'a str) -> Result --> .\foo.rs:10:5 | 10 | fn from_str(s: &str) -> Result { | _____^ starting here... 11 | | Ok(Name { name: s }) 12 | | } | |_____^ ...ending here The compiler explains the issue, and gives a helpful suggestion. So let’s try it: modify the code to add in the 'a, and compile again: > rustc +1.15.1 .\foo.rs --crate-type=lib error[E0308]: method not compatible with trait --> .\foo.rs:10:5 | 10 | fn from_str(s: &'a str) -> Result { | _____^ starting here... 11 | | Ok(Name { name: s }) 12 | | } | |_____^ ...ending here: lifetime mismatch | help: consider using an explicit lifetime parameter as shown: fn from_str(s: &'a str) -> Result'a>, ()> --> .\foo.rs:10:5 | 10 | fn from_str(s: &'a str) -> Result()> { | _____^ starting here... 11 | | Ok(Name { name: s }) 12 | | } | |_____^ ...ending here It still doesn’t work. That help message was not actually helpful. It does suggest adding another lifetime, this time on Name. If we do that… > rustc +1.15.1 .\foo.rs --crate-type=lib help: consider using an explicit lifetime parameter as shown: fn from_str(s: &'a str) -> Resulta>, ()> --> .\foo.rs:10:5 … that’s what we already have, compiler! This diagnostic was well-intentioned, but when it’s wrong, it was very wrong, as you can see here. Sometimes it wouldn’t even suggest valid Rust syntax! Furthermore, more advanced Rust users didn’t really need the suggestion, but new Rustaceans would take them to heart, and then be led down this bad path. As such, we decided that for now, we should remove the help message entirely. We may bring it back in the future, but only if we can limit false positives. An aside: the above implementation is not possible; Name would need to use String, not &str. In other diagnostic changes, previous versions of Rust would helpfully attempt to suggest fixes for typos: let foo = 5; println!("{}", ffo); Would give this error: error[E0425]: cannot find value `ffo` in this scope --> foo.rs:4:20 | 4 | println!("{}", ffo); | ^^^ did you mean `foo`? However, this would only happen in certain circumstances: sometimes in local variables, and for fields in structs. This now happens nearly everywhere. When combined with some other related improvements, this results in a significant improvement in these sorts of diagnostics. See the detailed release notes for more. Library stabilizations 21 new bits of API were stabilized this release: VecDeque::truncate VecDeque::resize String::insert_str Duration::checked_add Duration::checked_sub Duration::checked_div Duration::checked_mul str::replacen str::repeat SocketAddr::is_ipv4 SocketAddr::is_ipv6 IpAddr::is_ipv4 IpAddr::is_ipv6 Vec::dedup_by Vec::dedup_by_key Result::unwrap_or_default ::wrapping_offset ::wrapping_offset CommandExt::creation_flags File::set_permissions String::split_off In addition, a number of small improvements to existing functions landed. For example writeln! now has a single-argument form, just like println! has. This ends up writing only a newline, but is a nice bit of symmetry. All structs in the standard library now implement Debug. When slicing a &str, you’ll see better errors. For example, this code: &"abcαβγ"[..4] Is incorrect. It generates this error: thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`' The part after the ; is new. See the detailed release notes for more. Cargo features In addition to cargo check, Cargo and crates.io have some new polish added. For example, cargo build and cargo doc now take a --all flag for building and documenting every crate in your workspace with one command. Cargo now has a --version --verbose flag, mirroring rustc. Crates.io now can show off your TravisCI or AppVeyor badges on your crate’s page. In addition, both Cargo and crates.io understand categories. Unlike keywords, which are free-form, categories are curated. In addition, keywords are used for searching, but categories are not. In other words, categories are intended to assist browsing, and keywords are intended to assist searching. You can browse crates by category here. See the detailed release notes for more. Contributors to 1.16.0 Last release, we introduced thanks.rust-lang.org. We have been doing some behind-the-scenes refactoring work to allow for more projects than only Rust itself; we’re hoping to introduce that in the next release. We had 135 individuals contribute to Rust 1.16. Thanks! [Less]
Posted 8 days ago
The Rust team is happy to announce the latest version of Rust, 1.16.0. Rust is a systems programming language focused on safety, speed, and concurrency. If you have a previous version of Rust installed, getting Rust 1.16 is as easy as: $ rustup ... [More] update stable If you don’t have it already, you can get rustup from the appropriate page on our website, and check out the detailed release notes for 1.16.0 on GitHub. What’s in 1.16.0 stable The largest addition to Rust 1.16 is cargo check. This new subcommand should speed up the development workflow in many cases. What does it do? Let’s take a step back and talk about how rustc compiles your code. Compilation has many “passes”, that is, there are many distinct steps that the compiler takes on the road from your source code to producing the final binary. You can see each of these steps (and how much time and memory they take) by passing -Z time-passes to a nightly compiler: rustc .\hello.rs -Z time-passes time: 0.003; rss: 16MB parsing time: 0.000; rss: 16MB recursion limit time: 0.000; rss: 16MB crate injection time: 0.000; rss: 16MB plugin loading time: 0.000; rss: 16MB plugin registration time: 0.049; rss: 34MB expansion There’s a lot of them. However, you can think of this process in two big steps: first, rustc does all of its safety checks, makes sure your syntax is correct, all that stuff. Second, once it’s satisfied that everything is in order, it produces the actual binary code that you end up executing. It turns out that that second step takes a lot of time. And most of the time, it’s not neccesary. That is, when you’re working on some Rust code, many developers will get into a workflow like this: Write some code. Run cargo build to make sure it compiles. Repeat 1-2 as needed. Run cargo test to make sure your tests pass. GOTO 1. In step two, you never actually run your code. You’re looking for feedback from the compiler, not to actually run the binary. cargo check supports exactly this use-case: it runs all of the compiler’s checks, but doesn’t produce the final binary. So how much speedup do you actually get? Like most performance related questions, the answer is “it depends.” Here are some very un-scientific benchmarks:   thanks cargo diesel initial build 134.75s 236.78s 15.27s initial check 50.88s 148.52s 12.81s speedup 2.648 1.594 1.192 secondary build 15.97s 64.34s 13.54s secondary check 2.9s 9.29s 12.3s speedup 5.506 6.925 1.100 The ‘initial’ categories are the first build after cloning down a project. The ‘secondary’ categories involved adding one blank line to the top of src\lib.rs and running the command again. That’s why the initial ones are more dramatic; they involve also doing this for all dependencies, as well as the crate itself. As you can see, larger projects with many dependencies see a big improvement, but smaller ones see much more modest gains. We are still working on improving compile-times generally as well, though we don’t have anything in particular to highlight at this time. Other improvements To support cargo check, rustc has learned to emit a new kind of file: .rmeta. This file will contain only the metadata about a particular crate. cargo check needs this for your dependencies, to let the compiler check types and such from them. It’s also useful for the Rust Language Server, and possibly more tools in the future. Another large change is the removal of a long-standing diagnostic: consider using an explicit lifetime parameter. This diagnostic would kick in whenever you had an incorrect lifetime annotation, and the compiler thought that you might have meant something else. Consider this code: use std::str::FromStr; pub struct Name<'a> { name: &'a str, } impl<'a> FromStr for Name<'a> { type Err = (); fn from_str(s: &str) -> Result<Name, ()> { Ok(Name { name: s }) } } Here, Rust isn’t sure what to do with the lifetimes; as written, the code doesn’t guarantee that s will live as long as Name, which is required for Name to be valid. Let’s try to compile this code with Rust 1.15.1: > rustc +1.15.1 foo.rs --crate-type=lib error[E0495]: cannot infer an appropriate lifetime for lifetime parameter in generic type due to conflicting requirements --> .\foo.rs:10:5 | 10 | fn from_str(s: &str) -> Result()> { | _____^ starting here... 11 | | Ok(Name { name: s }) 12 | | } | |_____^ ...ending here | help: consider using an explicit lifetime parameter as shown: fn from_str(s: &'a str) -> Result --> .\foo.rs:10:5 | 10 | fn from_str(s: &str) -> Result { | _____^ starting here... 11 | | Ok(Name { name: s }) 12 | | } | |_____^ ...ending here The compiler explains the issue, and gives a helpful suggestion. So let’s try it: modify the code to add in the 'a, and compile again: > rustc +1.15.1 .\foo.rs --crate-type=lib error[E0308]: method not compatible with trait --> .\foo.rs:10:5 | 10 | fn from_str(s: &'a str) -> Result { | _____^ starting here... 11 | | Ok(Name { name: s }) 12 | | } | |_____^ ...ending here: lifetime mismatch | help: consider using an explicit lifetime parameter as shown: fn from_str(s: &'a str) -> Result'a>, ()> --> .\foo.rs:10:5 | 10 | fn from_str(s: &'a str) -> Result()> { | _____^ starting here... 11 | | Ok(Name { name: s }) 12 | | } | |_____^ ...ending here It still doesn’t work. That help message was not actually helpful. It does suggest adding another lifetime, this time on Name. If we do that… > rustc +1.15.1 .\foo.rs --crate-type=lib help: consider using an explicit lifetime parameter as shown: fn from_str(s: &'a str) -> Resulta>, ()> --> .\foo.rs:10:5 … that’s what we already have, compiler! This diagnostic was well-intentioned, but when it’s wrong, it was very wrong, as you can see here. Sometimes it wouldn’t even suggest valid Rust syntax! Furthermore, more advanced Rust users didn’t really need the suggestion, but new Rustaceans would take them to heart, and then be led down this bad path. As such, we decided that for now, we should remove the help message entirely. We may bring it back in the future, but only if we can limit false positives. An aside: the above implementation is not possible; Name would need to use String, not &str. In other diagnostic changes, previous versions of Rust would helpfully attempt to suggest fixes for typos: let foo = 5; println!("{}", ffo); Would give this error: error[E0425]: cannot find value `ffo` in this scope --> foo.rs:4:20 | 4 | println!("{}", ffo); | ^^^ did you mean `foo`? However, this would only happen in certain circumstances: sometimes in local variables, and for fields in structs. This now happens nearly everywhere. When combined with some other related improvements, this results in a significant improvement in these sorts of diagnostics. See the detailed release notes for more. Library stabilizations 21 new bits of API were stabilized this release: VecDeque::truncate VecDeque::resize String::insert_str Duration::checked_add Duration::checked_sub Duration::checked_div Duration::checked_mul str::replacen str::repeat SocketAddr::is_ipv4 SocketAddr::is_ipv6 IpAddr::is_ipv4 IpAddr::is_ipv6 Vec::dedup_by Vec::dedup_by_key Result::unwrap_or_default ::wrapping_offset ::wrapping_offset CommandExt::creation_flags File::set_permissions String::split_off In addition, a number of small improvements to existing functions landed. For example writeln! now has a single-argument form, just like println! has. This ends up writing only a newline, but is a nice bit of symmetry. All structs in the standard library now implement Debug. When slicing a &str, you’ll see better errors. For example, this code: &"abcαβγ"[..4] Is incorrect. It generates this error: thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`' The part after the ; is new. See the detailed release notes for more. Cargo features In addition to cargo check, Cargo and crates.io have some new polish added. For example, cargo build and cargo doc now take a --all flag for building and documenting every crate in your workspace with one command. Cargo now has a --version --verbose flag, mirroring rustc. Crates.io now can show off your TravisCI or AppVeyor badges on your crate’s page. In addition, both Cargo and crates.io understand categories. Unlike keywords, which are free-form, categories are curated. In addition, keywords are used for searching, but categories are not. In other words, categories are intended to assist browsing, and keywords are intended to assist searching. You can browse crates by category here. See the detailed release notes for more. Contributors to 1.16.0 Last release, we introduced thanks.rust-lang.org. We have been doing some behind-the-scenes refactoring work to allow for more projects than only Rust itself; we’re hoping to introduce that in the next release. We had 137 individuals contribute to Rust 1.16. Thanks! [Less]
Posted 9 days ago by Myk Melez
I recently blogged about discontinuing Positron. I’m trying a different tack with a new experiment, codenamed qbrt, that reuses an existing Gecko runtime (and its existing APIs) while simplifying the process of developing and packaging a desktop app ... [More] using web technologies. qbrt is a command-line interface written in Node.js and available via NPM: npm install -g qbrt Installing it also installs a Gecko runtime (currently a nightly build of Firefox, but in the future it could be a stable build of Firefox or a custom Gecko runtime). Its simplest use is then to invoke the ‘run’ command with a URL: qbrt run https://eggtimer.org/ Which will start a process and load the URL into a native window: URLs loaded in this way don’t have privileged access to the system. They’re treated as web content, not application chrome. To load a desktop application with system privileges, point qbrt at a local directory containing a package.json file and main entry script: qbrt run path/to/my/app/ For example, clone qbrt’s repo and try its example/ app: git clone https://github.com/mozilla/qbrt.git qbrt run qbrt/example/ This will start a process and load the app into a privileged context, giving it access to Gecko’s APIs for opening windows and loading web content along with system integration APIs for file manipulation, networking, process management, etc. (Another good example is the “shell” app that qbrt uses to load URLs.) To package an app for distribution, invoke the ‘package’ command, which creates a platform-specific package containing both the app’s resources and the Gecko runtime: qbrt package path/to/my/app/ Note that while qbrt is written in Node.js, it doesn’t provide Node.js APIs to apps. It might be useful to do so, using SpiderNode, as we did with Positron, although Gecko’s existing APIs expose equivalent functionality. Also, qbrt doesn’t yet support runtime version management (i.e. being able to specify which version of Gecko to use, and to switch between them). At the time you install it, it downloads the latest nightly build of Firefox. (You can update that nightly build by reinstalling qbrt.) And the packaging support is primitive. qbrt creates a shell script (batch script on Windows) to launch your app, and it packages your app using a platform-specific format (ZIP on Windows, DMG on Mac, and tar/gzip on Linux). But it doesn’t set icons nor most other package meta-data, and it doesn’t create auto-installers nor support signing the package. In general, qbrt is immature and unstable! It’s appropriate for testing, but it isn’t yet ready for you to ship apps with it. Nevertheless, I’m keen to hear how it works for you, and whether it supports your use cases. What would you want to do with it, and what additional features would you need from it? [Less]