I Use This!
Very High Activity

News

Analyzed 5 days ago. based on code collected about 1 month ago.
Posted 8 days ago
Check it out. The videos from the Mozilla room at FOSDEM are up, and here's me, talking about bug futures. All FOSDEM videos And, yes, the video link Just Works. Bonus link to some background on that: The Fight For Patent-Unencumbered Media Codecs ... [More] Is Nearly Won by Robert O'Callahan Another bonus link: FOSDEM video project, including what those custom boxes do. [Less]
Posted 8 days ago
This is a follow up to Defining the Web platform post from a year ago. It doesn’t try to answer questions but instead attempts to figure out what are the right questions to ask. Trade-offs As the talks within WebGPU community group progress, it ... [More] becomes apparent that the disagreements lie in more domains than simply technical. It’s about what the Web is today, and what we want it to become tomorrow. The further we go towards the rabbit hole of low-level features, the more we have to stretch the very definition of the Web: harder to guarantee security: if the descriptors can be purely GPU driven (like Metal argument buffers), the browser can’t guarantee that they point to owned and initialized GPU memory. if there are many exposed flags for device capabilities, it’s easy for a script to fingerprint users based on that profile. harder to achieve portability: any use of incoherent memory rich memory types spectrum (like Vulkan memory types/heaps) means that different platforms would have to take different code paths if implemented correctly, and that’s difficult to test makes the API increasingly more complex to use, which Web developers may not welcome: declaring a Vulkan like render pass with multiple passes is a challenge finally, all this goes into performance characteristics: potentially lower CPU cost, given the bulk of validation work moved from the driver onto the browser more consistent framerate, given better control of scheduled GPU work Consistency It is clear that the native API have reached decent consistency in their respective programming models. Both Metal and Vulkan appear to be positioned at their local maximas, while I can’t confidently state the same about NXT that is unclear: /*| /**\ ?*? / | .. / \ ..... ? ? ? Metal Vulkan NXT One can’t just take a small step away from an existing API and remain consistent. Finding a whole new local maxima is hard because it means that the previous API designers (Khronos, Microsoft, Apple) either haven’t discovered it or simply discarded it for being inferior. And while we have Microsoft and Apple representatives in the group, we lack the opinion of Vulkan designers from Khronos. Origins My first prototype was based on Metal, after which I had a chance to talk a lot about the future of Web GPU access with fellow Mozillians. The feedback I got was very consistent: use existing Vulkan API, making necessary adjustments to provide security and portability provide maximum possible performance by making a platform, let higher level libraries build on it and expose simpler APIs to the users A similar position was expressed by some of the Vulkan Portability members. And this is where we’ve been going so far with the development of portability layer and experiments with Vulkan-based WebIDL implementation in Servo. Check out the recent talk at Fosdem 2018 as well as our 2017 report for more details. Unfortunately, this direction faces strong opposition from the rest of W3C group. Portability Interestingly, there is an existing precedent of providing a less portable Web platform API (quote from an article by Lin Clark): SharedArrayBuffers could result in race conditions. This makes working with SharedArrayBuffers hard. We don’t expect application developers to use SharedArrayBuffers directly. But library developers who have experience with multithreaded programming in other languages can use these new low-level APIs to create higher-level tools. Then application developers can use these tools without touching SharedArrayBuffers or Atomics directly. So if the question is “Does Web API have to be portable?”, the answer is a definite “no”. It’s a nice property, but not a requirement, and can be justified by other factors, such as performance. Speaking of which… the case for performance gains in SharedArrayBuffers appears to be strong, which convinced the browser vendors to agree on exposing this low-level API. Now, can we apply the same reasoning to get Vulkan level of explicitness and portability to the Web? Can we rely on user libraries to make it more accessible and portable? Having some performance metrics would be great, but obtaining them appears to be extremely difficult. Note: currently SharedArrayBuffers are disabled on all browsers due to the high-resolution timers in them discovered to be exploitable. One could argue that this is related to them being low-level, and thus we shouldn’t take them as a good example for a Web API. This is countered by the fact that the disabling is temporary, and the usefulness of SABs is completely clear (see ECMA security issue, also note this comment). Performance What would be a good benchmark that shows the cost of memory barriers and types being implicit in the API? we need to have the same GPU-intensive workloads running on Vulkan and Metal, preferably utilizing multiple command queues and advanced features like multiple passes or secondary command buffers … on the same hardware, which means Windows installed on a MacBook. Even then, we may see the differences in how OS schedules hardware access, how drivers are implemented, how applications settings are configured on different OSes, etc. the code paths split between Vulkan/Metal should be fairly high. If it’s done via an API abstraction layer like Vulkan Portability, then the results are obviously going to be skewed towards this API. Splitting at high level means taking most advantage of the native API features, but also means more work for the developer. Until that benchmarking is done, we can’t reasonably argue in favor of performance when there are concrete sacrifices in portability, API complexity (and security, to an extent) that come with it… Defending Vulkan-like approach goes like this: A: pipeline barriers are confusing, error-prone, and not evidently fast, let’s make them implicit. Look, our native API does that, and it’s successful. M: but… tracking the actual layout and access flags on our side is hard (for the backends that require them), and it gets really difficult when multiple queues are involved that share resources G: let’s only share resources with immutable access flags then, and transition ownership explicitly between queues otherwise A: manually synchronizing submissions between queues is hard to get right anyway, prone to portability issues, etc. Let’s only have a single queue! And so it goes… one aspect leading to another, proving that the existing APIs are consistent and local maxima, and that Metal is technically easier to fit the shoes of the Web. Result Ok, this post is turning into a rant, rather inevitably… Sorry! The unfortunate part of the story is that the group will not agree on an existing API, no matter what it is, because it would be biased towards a specific platform. And while Mozilla and Apple are at least conceptually aligned to existing API concepts, Google has been actively trying to come up with a new one in NXT. As a result, not only the multi-platform applications would have to add support for yet another API when ported to the Web, that API is also going to be targeted from native via Emscripten and/or WebAssembly, and Google is all into providing the standalone (no-browser) way of using it, effectively adding to the list of native APIs… Without any IHV backing, and promoted by… browser vendors? That is the path the group appears to be moving unconsciously in, instead of building on top of existing knowledge and research. The current struggle of developing the Web GPU API comes down to many factors. One of the, if not the most, important ones here is that the parties have different end results envisioned. Some would like the JavaScript use to be nice, idiomatic, and error-resistant. Some mostly care about WebAssembly being fast. Some can’t afford a badly written application looking different on different mobile phones. Some can’t expect developers to be smart enough to use a complicated API. Some leave WebGL as a backup choice for a simple and accessible but slower API. And the fact we are discussing this within W3C doesn’t help either. We don’t have immediate access to Khronos IHV experts and ISV advisory panels. We desperately need feedback from actual target users on what they want. Who knows, maybe the real answer is that we are better off without yet another API at all? [Less]
Posted 8 days ago by Mitchell Baker
Note: This open letter, penned by Mozilla executive chairwoman Mitchell Baker, appears as a full-page advertisement in the February 9 edition of The Hindustan Times. It is co-signed by 1,447 Mozilla India community members. To learn more about ... [More] Mozilla work regarding India’s data protection law and Aadhaar, visit https://foundation.mozilla.org/campaigns/aadhaar/. Dear Justice Srikrishna and the Honourable Members of the Ministry of Electronics and Information Technology Committee of Experts, With the support of and in solidarity with members of Mozilla’s community in India, I write today to urge you to stand up for the privacy and security of all Indians. Your recent consultation on the form of India’s first comprehensive data protection law comes at an auspicious time. The Supreme Court of India has ruled unequivocally that privacy is a fundamental right guaranteed to all Indians by the Indian Constitution. We ask that you take that decision and ensure that right is made a reality in law. Mozilla’s work on upholding privacy is guided by the Mozilla Manifesto, which states: “Individual security and privacy is fundamental and must not be treated as optional online” (Principle 4). Our commitment to the principle can be seen both in the open source code of our products as well as in our policies such as Mozilla’s Data Privacy Principles. The Mozilla India Community has run numerous campaigns to educate Indians on how to protect themselves online. Data protection is a critical tool for guaranteeing fundamental rights of privacy. It is particular important today as Aadhaar is being driven deeper into all aspects of life. Digital identity can bring many benefits, but it can also become a surveillance and privacy disaster. A strong data protection law is key to avoiding disaster. In the digital age, especially in regards to the Aadhaar, individual security and privacy is increasingly being put at risk. Recently, a private citizen was able to buy access to all of the demographic data in the Aadhaar database for just 500 rupees. There have been countless leaks, security incidents, and instances where private Aadhaar data has been published online. Private companies are increasingly requiring Aadhaar in order to use their services. In the vacuum created by India’s lack of a comprehensive data protection law, the Government of India continues its relentless push to make Aadhaar mandatory for ever more government programs and private sector services, in contravention of the directives of the Supreme Court. We commend you for the strong recommendations and overall framework proposed in your report. While this represents important progress in developing a strong data protection framework, we remain concerned about several missing protections: The current proposal exempts biometric info from the definition of sensitive personal information that must be especially protected. This is backwards, biometric info is some of the most personal info, and can’t be “reset’ like a password. The design of Aadhaar fails to provide meaningful consent to users. This is seen, for example, by the ever increasing number of public and private services that are linked to Aadhaar without users being given a meaningful choice in the matter. This can and should be remedied by stronger consent, data minimization, collection limitation, and purpose limitation obligations. Instead of crafting narrow exemptions for the legitimate needs of law enforcement, you propose to exempt entire agencies from accountability and legal restrictions on how user data may be accessed and processed. Your report also casts doubt on whether individuals should be allowed a right to object over how their data is processed; this is a core pillar of data protection, without a right to object, consent is not meaningful and individual liberty is curtailed. There is resounding support for privacy in India, and the Supreme Court has made clear that the protection of individual privacy and security is an imperative for the Government of India. We hope you and your colleagues in the Government of India will take this opportunity to develop a data protection law that strongly protects the rights of users and makes India’s framework a model for the world. Sincerely, Mitchell Baker, Executive Chairwoman, Mozilla The Hindustan Times advertisement:   The post An Open Letter to Justice Srikrishna About Data Privacy and Aadhaar appeared first on The Mozilla Blog. [Less]
Posted 9 days ago
Recently, we introduced WebAssembly (compiled from Rust) into the source-map JavaScript library. We saw parsing and querying source maps get a whole lot faster. But keeping the compiled .wasm code size small was a challenge. There are a variety of ... [More] tools available for removing dead code from WebAssembly and compressing .wasm sizes: wasm-gc constructs the callgraph of all the functions in a .wasm file, determines which functions are never transitively called by an exported function, and removes them.0 wasm-snip replaces a WebAssembly function’s body with a single unreachable instruction. This is useful for manually removing functions that will never be called at runtime, but which the compiler and wasm-gc couldn’t statically prove were dead. After snipping a function, other functions that were only called by the snipped function become unreachable, so running wasm-gc again after wasm-snip often yields further code size reductions. wasm-opt runs binaryen‘s sophisticated optimization passes over a .wasm file, both shrinking its size and improving its runtime performance. After using these tools, we looked at what was left in our .wasm file, and broke the results down by crate: Half of our code size was coming from dlmalloc, the allocator that Rust uses by default for the wasm32-unknown-unknown target. Source map parsing requires just a couple of large, long-lived allocations, and then does its heavy lifting without allocating any further. Querying a parsed source map doesn’t require any allocation. So we are paying a lot, in code size, for an allocator that we aren’t really using that much. Allocator implementations have trade offs, but Rust’s default is the wrong choice for the source map parsing and querying scenario. Introducing wee_alloc wee_alloc is a work-in-progress memory allocator designed for WebAssembly. It has a tiny code size footprint, compiling down to only a kilobyte of .wasm code. wee_alloc is designed for scenarios like those described above: where there are a handful of initial, long-lived allocations, after which the code does its heavy lifting without any further allocations. This scenario requires some allocator to exist, but we are more than happy to trade performance for small code size. In contrast, wee_alloc is not designed for, and would be a poor choice in, scenarios where allocation is a performance bottleneck. Although WebAssembly is the primary target, wee_alloc also has an mmap-based implementation for unix systems. This enables testing wee_alloc, and using wee_alloc in your own code, without a browser or WebAssembly engine. How wee_alloc Works Allocating WebAssembly Pages WebAssembly module instances have a linear memory space1, and use store and load instructions to access values within it via an index. If an instruction attempts to access the value at an index that is beyond the memory’s bounds, a trap is raised. There are two instructions for manipulating the linear memory space itself, rather than its contents: current_memory and grow_memory2. The current_memory instruction gives the current size of memory, in units of pages. The grow_memory instruction takes an operand n, grows the memory space by n pages, and gives back the old size of memory, units of pages. Alternatively, if growing memory fails, -1 is returned. WebAssembly does not have any facilities for shrinking memory, at least right now. To implement allocating n pages of memory to Rust, we need to use LLVM’s intrinsics. Right now, the intrinsic for grow_memory doesn’t expose the return value, but this should be fixed once Rust updates its LLVM. Therefore, our page allocation routine must use current_memory before grow_memory, when it should just use the result of grow_memory instead. It also means we can’t check for failure yet. extern "C" { #[link_name = "llvm.wasm.current.memory.i32"] fn current_memory() -> usize; #[link_name = "llvm.wasm.grow.memory.i32"] fn grow_memory(pages: usize); } unsafe fn get_base_pointer() -> *mut u8 { (current_memory() * PAGE_SIZE.0) as *mut u8 } unsafe fn alloc_pages(n: Pages) -> *mut u8 { let ptr = get_base_pointer(); grow_memory(n.0); ptr } Careful readers will have noticed the Pages type in alloc_pages’s type signature. wee_alloc uses newtypes for units of bytes, words, and pages. Each of these is a thin wrapper around a usize with relevant operator overloads and inter-conversions. This has been very helpful in catching bugs at compile time, like attempts to offset a pointer by two words rather than two bytes, and compiles away to nothing in the emitted .wasm. Free Lists But we don’t satisfy individual allocation requests by directly allocating pages. First, the WebAssembly page size is 64KiB, which is much larger than most allocations. Second, because there is no way to return unused pages to the WebAssembly engine, it would be incredibly wasteful if we didn’t reuse pages. Instead, we maintain a free list of blocks of memory we’ve already allocated from the WebAssembly engine. Free lists have low complexity and are easy to implement. These properties also lend themselves to a small implementation. The basic idea is to maintain an intrusive linked list of memory blocks that are available. Allocation removes a block from the free list. If the block is larger than the requested allocation size, we can split it in two. Or, if there is no suitable block available in the free list, we can fall back to alloc_pages to get a fresh block of memory. Deallocation reinserts the given block back into the free list, so that it can be reused for future allocations. Because a block is only in the free list if it is not allocated and is therefore unused, we can use a word from the data itself to store the free list links, so long as we ensure that the data is always at least a word in size. Here is a diagram of what the free list looks like in memory, showing a free block, followed by an allocated block, followed by another free block: --. .--------------------------------------. ,----------- | | | | V | V | +----------------\\-----+---------\\-----+--------------- | free ; data... // ... | data... // ... | free ; data... +----------------\\-----+---------\\-----+--------------- Even after choosing to use free lists, we have more design choices to make. How should we choose which block to use from the free list? The first that can satisfy this allocation, aka first fit? The block that is closest to the requested allocation size, in an effort to cut down on fragmentation (more on this in a minute), aka best fit? Pick up the search where we left off last time, aka next fit? Regardless which of first fit, best fit, and next fit we choose, we are dealing with an O(n) search. Indeed, this is the downside to the trade off we made when choosing free lists for their simplicity of implementation. A common technique for speeding up free list allocation is to have separate free lists for allocations of different sizes, which is known as segregated fits or having size classes. With this approach, we can guarantee the invariant that every block in the free list for a particular size can satisfy an allocation request of that size. All we need to do is avoid splitting any block into pieces smaller than that size. Maintaining this invariant gives us amortized O(1) allocation for these sizes with their own free lists. Using first fit within a size’s free list is guaranteed to only look at the first block within the free list, because the invariant tells us that the first block can satisfy this request. It is amortized because we need to fall back to the O(n) allocation to refill this size’s free list from the fallback, main free list whenever it is empty. If we reuse the same first fit allocation routine for both our size classes’ free lists and our main free list, then we get the benefits of size classes without paying for them in extra code size. This is the approach that wee_alloc takes. Fragmentation Our other main concern is avoiding fragmentation. Fragmentation is the degree of wasted space between allocations in memory. High fragmentation can lead to situations where there exist many free blocks of memory, but none of which can fulfill some allocation request, because each individual free block’s size is too small, even if the sum of their sizes is more than enough for the requested allocation. Therefore, a high degree of fragmentation can effectively break an allocator. It had one job — allocate memory — and it can’t even do that anymore. So wee_alloc really should have some kind of story here; punting 100% on fragmentation is not a practical option. Once again there are trade offs, and avoiding fragmentation is not a binary choice. On one end of the spectrum, compacting garbage collectors can re-arrange objects in memory and pack them tightly next to each other, effectively leading to zero fragmentation. The cost that you pay is the size and time overhead of a full tracing garbage collector that can enumerate all pointers in the system and patch them to point to moved objects’ new locations. On the opposite end of the spectrum, if we never re-consolidate two blocks of adjacent memory that we previously split from what had originally been a single contiguous block, then we can expect a lot of fragmentation. As we split blocks into smaller blocks for smaller allocations, we get small bits of wasted space between them, and even after we free all these small allocations, we won’t have any large block in the free list for a large allocation. Even if we have multiple adjacent, small blocks in the free list that could be merged together to satisfy the large allocation. One possibility is keeping the free list sorted by each block’s address, and then deallocating a block would re-insert it into the free list at the sorted location. If either of its neighbors in the free list are immediately adjacent in memory, we could consolidate them. But then deallocation is an O(n) search through the free list, instead of the O(1) push onto its front. We could lower that to O(log n) by representing the free list as a balanced search tree or btree. But the implementation complexity goes up, and I suspect code size will go up with it. Instead of a free list, we could use bitmaps to track which portions of our heap are free or allocated, and then the consolidation could happen automatically as bits next to each other are reset. But then we need to restrict our allocator to parceling out portions of a single, contiguous region of memory. This implies that only a single, global allocator exists, since if there were multiple, each instance would want to “own” the end of the WebAssembly linear memory, and have the power to grow it to satisfy more and larger allocations. And maybe this is a fair constraint to impose in the context of WebAssembly, where memory is already linear and contiguous. But lifting this constraint, while still using bitmaps, implies a hybrid free list and bitmap implementation. The downside to that is more implementation complexity, and a larger code size foot print. wee_alloc takes a third approach: trading some space overhead for easy and fast merging. We maintain a sorted, doubly-linked list of all blocks, whether allocated or free. This adds two words of space overhead to every heap allocation. When freeing a block, we check if either of its adjacent blocks are also free, and if so, merge them together with a handful of updates to the next and previous pointers. If neither of the neighbors are free, then we push this block onto the front of the free list. In this way, we keep both O(1) deallocation and our simple free list implementation. Here is a diagram of what this sorted, doubly-linked list looks like in memory: ,---------------------------------. ,--------------------- | ,--|---------------------------|--. | X | | | | V | V | V | +-----------------------\\-----+-----------------------\\-----+---------------------- | prev ; next ; data... // ... | prev ; next ; data... // ... | prev ; next ; data... +-----------------------\\-----+-----------------------\\-----+---------------------- | ^ | ^ | | | | | | `---------------------' `---------------------' `------------ CellHeader, FreeCell, and AllocatedCell The CellHeader contains the common data members found within both allocated and free memory blocks: the next and previous doubly-linked list pointers. #[repr(C)] struct CellHeader { next_cell_raw: ptr::NonNull<CellHeader>, prev_cell_raw: *mut CellHeader, } We use a low bit of the next_cell_raw pointer to distinguish whether the cell is allocated or free, and can consult its value to dynamically cast to an AllocatedCell or FreeCell. impl CellHeader { const IS_ALLOCATED: usize = 0b01; fn is_allocated(&self) -> bool { self.next_cell_raw.as_ptr() as usize & Self::IS_ALLOCATED != 0 } fn is_free(&self) -> bool { !self.is_allocated() } fn set_allocated(&mut self) { let next = self.next_cell_raw.as_ptr() as usize; let next = next | Self::IS_ALLOCATED; self.next_cell_raw = unsafe { ptr::NonNull::new_unchecked(next as *mut CellHeader) }; } fn set_free(&mut self) { let next = self.next_cell_raw.as_ptr() as usize; let next = next & !Self::IS_ALLOCATED; self.next_cell_raw = unsafe { ptr::NonNull::new_unchecked(next as *mut CellHeader) }; } fn as_free_cell_mut(&mut self) -> Optionmut FreeCell> { if self.is_free() { Some(unsafe { mem::transmute(self) }) } else { None } } } We use pointer arithmetic to calculate the size of a given cell’s data to avoid another word of space overhead, so the next_cell_raw pointer must always point just after this cell’s data. But, because of that restriction, we can’t use a null pointer as the sentinel for the end of the doubly-linked-list. Therefore, we use the second low bit of the next_cell_raw pointer to distinguish whether the data pointed to by next_cell_raw (after the appropriate masking) is a valid cell, or is garbage memory. impl CellHeader { const NEXT_CELL_IS_INVALID: usize = 0b10; const MASK: usize = !0b11; fn next_cell_is_invalid(&self) -> bool { let next = self.next_cell_raw.as_ptr() as usize; next & Self::NEXT_CELL_IS_INVALID != 0 } fn next_cell_unchecked(&self) -> *mut CellHeader { let ptr = self.next_cell_raw.as_ptr() as usize; let ptr = ptr & Self::MASK; let ptr = ptr as *mut CellHeader; ptr } fn next_cell(&self) -> Optionmut CellHeader> { if self.next_cell_is_invalid() { None } else { Some(self.next_cell_unchecked()) } } fn size(&self) -> Bytes { let data = unsafe { (self as *const CellHeader).offset(1) }; let data = data as usize; let next = self.next_cell_unchecked(); let next = next as usize; Bytes(next - data) } } An AllocatedCell is a CellHeader followed by data that is allocated. #[repr(C)] struct AllocatedCell { header: CellHeader, } A FreeCell is a CellHeader followed by data that is not in use, and from which we recycle a word for the next link in the free list. #[repr(C)] struct FreeCell { header: CellHeader, next_free_raw: *mut FreeCell, } Each of AllocatedCell and FreeCell have methods that make sense only when the cell is allocated or free, respectively, and maintain the invariants required for cells of their state. For example, the method for transforming a FreeCell into an AllocatedCell ensures that the IS_ALLOCATED bit gets set, and the method for transforming an AllocatedCell into a FreeCell unsets that bit. impl FreeCell { unsafe fn into_allocated_cell(&mut self) -> &mut AllocatedCell { self.header.set_allocated(); mem::transmute(self) } } impl AllocatedCell { unsafe fn into_free_cell(&mut self) -> &mut FreeCell { self.header.set_free(); mem::transmute(self) } } Implementing Allocation Let’s begin by looking at first fit allocation without any refilling of the free list in the case where there are no available blocks of memory that can satisfy this allocation request. Given the head of a free list, we search for the first block that can fit the requested allocation. Upon finding a suitable block, we determine whether to split the block in two, or use it as is. If we don’t find a suitable block we return an error. unsafe fn walk_free_list<F, T>( head: &mut *mut FreeCell, mut callback: F, ) -> Result<T, ()> where F: FnMut(&mut *mut FreeCell, &mut FreeCell) -> Option<T>, { let mut previous_free = head; loop { let current_free = *previous_free; if current_free.is_null() { return Err(()); } let mut current_free = &mut *current_free; if let Some(result) = callback(previous_free, current_free) { return Ok(result); } previous_free = &mut current_free.next_free_raw; } } unsafe fn alloc_first_fit( size: Words, head: &mut *mut FreeCell, policy: &AllocPolicy, ) -> Resultmut u8, ()> { walk_free_list(head, |previous, current| { // Check whether this cell is large enough to satisfy this allocation. if current.header.size() < size.into() { return None; } // The cell is large enough for this allocation -- maybe *too* // large. Try splitting it. if let Some(allocated) = current.split_alloc(previous, size, policy) { return Some(allocated.data()); } // This cell has crazy Goldilocks levels of "just right". Use it as is, // without any splitting. *previous = current.next_free(); let allocated = current.into_allocated_cell(policy); Some(allocated.data()) }) } Splitting a cell in two occurs when a cell has room for both the requested allocation and for another adjacent cell afterwards that is no smaller than some minimum block size. We use the &AllocPolicy trait object to configure this minimum block size, among other things, for different size classes without the code duplication that monomorphization creates. If there is room to split, then we insert the newly split cell into the free list, remove the current cell, and fixup the doubly-linked list of adjacent cells in the headers. impl FreeCell { fn should_split_for(&self, alloc_size: Words, policy: &AllocPolicy) -> bool { let self_size = self.header.size(); let min_cell_size: Bytes = policy.min_cell_size(alloc_size).into(); let alloc_size: Bytes = alloc_size.into(); self_size - alloc_size >= min_cell_size + size_of::<CellHeader>() } unsafe fn split_alloc( &mut self, previous: &mut *mut FreeCell, alloc_size: Words, policy: &AllocPolicy, ) -> Optionmut AllocatedCell> { if self.should_split_for(alloc_size, policy) { let alloc_size: Bytes = alloc_size.into(); let remainder = { let data = (&mut self.header as *mut CellHeader).offset(1) as *mut u8; data.offset(alloc_size.0 as isize) }; let remainder = &mut *FreeCell::from_uninitialized( remainder, self.header.next_cell_raw, Some(&mut self.header), Some(self.next_free()), policy, ); if let Some(next) = self.header.next_cell() { (*next).prev_cell_raw = &mut remainder.header; } self.header.next_cell_raw = ptr::NonNull::new_unchecked(&mut remainder.header); *previous = remainder; Some(self.into_allocated_cell(policy)) } else { None } } } Refilling a free list when there is not a suitable block already in it is easy. For the main free list, we allocate new pages directly from the WebAssembly engine with the alloc_pages function we defined earlier. For a size class’s free list, we allocate a (relatively) large block from the main free list. This logic is encapsulated in the two different AllocPolicy implementations, and the AllocPolicy::new_cell_for_free_list method. To allocate with a fallback to refill the free list, we do just that: attempt a first fit allocation, if that fails, refill the free list by pushing a new cell onto its front, and then try a first fit allocation once again. unsafe fn alloc_with_refill( size: Words, head: &mut *mut FreeCell, policy: &AllocPolicy, ) -> Resultmut u8, ()> { if let Ok(result) = alloc_first_fit(size, head, policy) { return Ok(result); } let cell = policy.new_cell_for_free_list(size)?; let head = (*cell).insert_into_free_list(head, policy); alloc_first_fit(size, head, policy) } But where do we get the free list heads from? The WeeAlloc structure holds the head of the main free list, and if size classes are enabled, the size classes’ free list heads. pub struct WeeAlloc { head: imp::Exclusivemut FreeCell>, #[cfg(feature = "size_classes")] size_classes: SizeClasses, } struct SizeClasses( [imp::Exclusivemut FreeCell>; SizeClasses::NUM_SIZE_CLASSES], ); impl SizeClasses { pub const NUM_SIZE_CLASSES: usize = 256; pub fn get(&self, size: Words) -> Optionimp::Exclusivemut FreeCell>> { self.0.get(size.0 - 1) } } As you can see, every free list head is wrapped in an imp::Exclusive. The imp module contains target-specific implementation code and comes in two flavors: imp_wasm32.rs and imp_unix.rs. The alloc_pages function we saw earlier is defined in imp_wasm32.rs. There is another alloc_pages function that uses mmap inside imp_unix.rs. The imp::Exclusive wrapper type guarantees exclusive access to its inner value. For WebAssembly, this is a no-op, since SharedArrayBuffers aren’t shipping and there is no shared-data threading. For unix systems, this protects the inner value in a pthread mutex, and is similar to std::sync::Mutex but provides a FnOnce interface rather than an RAII guard. If size classes are not enabled, we always use the main free list head and the LargeAllocPolicy. If size classes are enabled, we try to get the appropriate size class’s free list head, and if that works, then we use the SizeClassAllocPolicy with it. If there is no size class for the requested allocation size, then we fall back to the main free list and the LargeAllocPolicy. impl WeeAlloc { #[cfg(feature = "size_classes")] unsafe fn with_free_list_and_policy_for_size<F, T>(&self, size: Words, f: F) -> T where F: for<'a> FnOnce(&'a mut *mut FreeCell, &'a AllocPolicy) -> T, { if let Some(head) = self.size_classes.get(size) { let policy = size_classes::SizeClassAllocPolicy(&self.head); let policy = &policy as &AllocPolicy; head.with_exclusive_access(|head| f(head, policy)) } else { let policy = &LARGE_ALLOC_POLICY as &AllocPolicy; self.head.with_exclusive_access(|head| f(head, policy)) } } #[cfg(not(feature = "size_classes"))] unsafe fn with_free_list_and_policy_for_size<F, T>(&self, size: Words, f: F) -> T where F: for<'a> FnOnce(&'a mut *mut FreeCell, &'a AllocPolicy) -> T, { let policy = &LARGE_ALLOC_POLICY as &AllocPolicy; self.head.with_exclusive_access(|head| f(head, policy)) } } Finally, all that is left is to tie everything together to implement the alloc method for the Alloc trait: unsafe impl<'a> Alloc for &'a WeeAlloc { unsafe fn alloc(&mut self, layout: Layout) -> Resultmut u8, AllocErr> { if layout.align() > mem::size_of::<usize>() { return Err(AllocErr::Unsupported { details: "wee_alloc cannot align to more than word alignment", }); } let size = Bytes(layout.size()); if size.0 == 0 { return Ok(0x1 as *mut u8); } let size: Words = size.round_up_to(); self.with_free_list_and_policy_for_size(size, |head, policy| { alloc_with_refill(size, head, policy) .map_err(|()| AllocErr::Exhausted { request: layout }) }) } ... } Implementing Deallocation Deallocation either merges the just-freed block with one of its adjacent neighbors, if they are also free, or it pushes the block onto the front of the free list. If we are reinserting a block into a size class’s free list, however, it doesn’t make sense to merge blocks. Because the these free lists are always servicing allocations of a single size, we would just end up re-splitting the merged block back exactly as it is split now. There is no benefit to splitting and merging and splitting again. Therefore, we have the AllocPolicy inform us whether merging is desirable or not. First, let’s examine deallocation without the details of merging. We get the appropriate free list and allocation policy, and conjure up a reference to the AllocatedCell that sits just before the data being freed. Then (assuming we didn’t merge into another block that is already in the free list) we push the block onto the front of the free list. unsafe impl<'a> Alloc for &'a WeeAlloc { ... unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout) { let size = Bytes(layout.size()); if size.0 == 0 || ptr.is_null() { return; } let size: Words = size.round_up_to(); self.with_free_list_and_policy_for_size(size, |head, policy| { let cell = (ptr as *mut AllocatedCell).offset(-1); let cell = &mut *cell; let free = cell.into_free_cell(policy); if policy.should_merge_adjacent_free_cells() { ... } free.insert_into_free_list(head, policy); }); } } When merging cells, the adjacent neighbor(s) we are merging into are also free, and therefore are already inside the free list. Because our free list is singly-linked, rather than doubly-linked, we can’t arbitrarily splice in new elements when we have a handle to an element that is already in the free list. This causes some hiccups. Merging with the previous adjacent cell is still easy: it is already in the free list, and we aren’t changing the location of the CellHeader, so folding this cell into it is all that needs to be done. The free list can be left alone. if policy.should_merge_adjacent_free_cells() { if let Some(prev) = free.header .prev_cell() .and_then(|p| (*p).as_free_cell_mut()) { prev.header.next_cell_raw = free.header.next_cell_raw; if let Some(next) = free.header.next_cell() { (*next).prev_cell_raw = &mut prev.header; } return; } ... } Merging with the next adjacent cell is a little harder. It is already in the free list, but we need to splice it out from the free list, since its header will become invalid after consolidation, and it is this cell’s header that needs to be in the free list. But, because the free list is singly-linked, we don’t have access to the pointer pointing to the soon-to-be-invalid header, and therefore can’t update that pointer to point to the new cell header. So instead we have a delayed consolidation scheme. We insert this cell just after the next adjacent cell in the free list, and set the next adjacent cell’s NEXT_FREE_CELL_CAN_MERGE bit. impl FreeCell { const NEXT_FREE_CELL_CAN_MERGE: usize = 0b01; const _RESERVED: usize = 0b10; const MASK: usize = !0b11; fn next_free_can_merge(&self) -> bool { self.next_free_raw as usize & Self::NEXT_FREE_CELL_CAN_MERGE != 0 } fn set_next_free_can_merge(&mut self) { let next_free = self.next_free_raw as usize; let next_free = next_free | Self::NEXT_FREE_CELL_CAN_MERGE; self.next_free_raw = next_free as *mut FreeCell; } fn next_free(&self) -> *mut FreeCell { let next_free = self.next_free_raw as usize & Self::MASK; next_free as *mut FreeCell } } unsafe impl<'a> Alloc for &'a WeeAlloc { ... unsafe fn dealloc(&mut self, ptr: *mut u8, layout: Layout) { ... if policy.should_merge_adjacent_free_cells() { ... if let Some(next) = free.header .next_cell() .and_then(|n| (*n).as_free_cell_mut()) { free.next_free_raw = next.next_free(); next.next_free_raw = free; next.set_next_free_can_merge(); return; } } ... } } Then, the next time that we walk the free list for allocation, the bit will be checked and the consolidation will happen at that time. Which means that the walk_free_list definition we showed earlier was incomplete, since it didn’t include the code for consolidation. Here is its complete definition: unsafe fn walk_free_list<F, T>( head: &mut *mut FreeCell, policy: &AllocPolicy, mut f: F, ) -> Result<T, ()> where F: FnMut(&mut *mut FreeCell, &mut FreeCell) -> Option<T>, { let mut previous_free = head; loop { let current_free = *previous_free; if current_free.is_null() { return Err(()); } let mut current_free = &mut *current_free; if policy.should_merge_adjacent_free_cells() { // Now check if this cell can merge with the next cell in the free // list. We do this after the initial allocation attempt so that we // don't merge, only to immediately split the cell again right // afterwards. while current_free.next_free_can_merge() { let prev_adjacent = current_free.header.prev_cell_raw as *mut FreeCell; let prev_adjacent = &mut *prev_adjacent; prev_adjacent.header.next_cell_raw = current_free.header.next_cell_raw; if let Some(next) = current_free.header.next_cell() { (*next).prev_cell_raw = &mut prev_adjacent.header; } *previous_free = prev_adjacent; current_free = prev_adjacent; } } if let Some(result) = f(previous_free, current_free) { return Ok(result); } previous_free = &mut current_free.next_free_raw; } } On the other hand, if both the previous and next adjacent cells are free, we are faced with a dilemma. We cannot merge all the previous, current, and next cells together because our singly-linked free list doesn’t allow for that kind of arbitrary appending and splicing in O(1) time. Instead, we use a heuristic to choose whether to merge with the previous or next adjacent cell. We could choose to merge with whichever neighbor cell is smaller or larger, but we don’t. Right now, we prefer the previous adjacent cell because we can greedily consolidate with it immediately, whereas the consolidating with the next adjacent cell must be delayed, as explained above. If we made the minimum allocation size two words, then we would have room for a doubly-linked free list, and could support consolidating previous, current, and next free cell neighbors. We could also remove the delayed consolidation scheme, which would further simplify a bunch of code. But it would mean effectively three words of overhead for single word heap allocations. It isn’t clear to me whether or not that trade off is worth it or not. To make an informed decision, we’d need a corpus of allocations and frees made by typical WebAssembly applications. Conclusion wee_alloc is a work-in-progress prototype, but it already meets its goal of being small. However, it is certainly lacking in other areas. Sergey Pepyakin tried bootstrapping rustc with wee_alloc enabled as the global allocator, and it is a couple orders of magnitude slower than bootstrapping rustc with its default jemalloc. On the one hand, bootstrapping rustc isn’t exactly the scenario wee_alloc was designed for, and it isn’t surprising that this new, unoptimized prototype is slower than the mature, production-grade jemalloc. Furthermore, I doubt that we can compete with jemalloc on speed without regressing code size. But even so, I think wee_alloc is slower than it should be, and I suspect that there are some low-hanging fruit waiting to be plucked. I would love, love, love some help building wee_alloc — it’s a lot of fun! Are you interested in code golfing the smallest .wasm binaries? Do you like nitty-gritty profiling and digging into performance? Is wee_alloc missing some obviously-better technique that you’re familiar with? Want to help Rust be the number one choice for compiling to WebAssembly? Fork the wee_alloc repository on GitHub and then come join us in #rust-wasm on irc.mozilla.org and introduce yourself! Thanks to Mike Cooper and David Keeler for reading early drafts and providing valuable feedback. 0 This will be unnecessary soon, since LLVM’s linker, lld, is gaining support for WebAssembly, and already supports this functionality. The Rust toolchain will also start using lld and then we won’t need wasm-gc anymore. ↩ 1 Right now there is a single linear memory, but it is expected that more address spaces will come. They would be useful for, for example, referencing garbage-collected JavaScript or DOM objects directly from WebAssembly code. ↩ 2 The current_memory and grow_memory instructions will likely be renamed to mem.size and mem.grow. ↩ [Less]
Posted 9 days ago
For the longest time I've used vertical tabs in Firefox and I still find it odd that people don't use it more. It's a simple fact that a horizontal tab strip doesn't scale too well when you get lots of tabs. Of course, for most users this isn't a ... [More] problem, most users do not have a lot of tabs open according to Firefox telemetry: But if you have a few the title just gets squished and squished till it becomes a scroll bar. Vertical tabs look great for this giving you lots of title space on a wide monitor: Firefox is way better at this than Chrome. I sat next to someone who had nothing but triangles as their tab bar in Chrome. How did they cope? With the landing of the tab hiding API in WebExtensions in Firefox 59, I wanted to try and understand what the many people who were clamouring for this API wanted to do. So I wrote a quick extension that's pretty terrible. It provided a "Hide this tab" context menu item on the tab to hide the tab. I then added a quick management page to list all the hidden pages. That was ok, but clicking that menu item was tedious. So then I set it to just perform some actions for me. I've now got it set up to hide a tab if it hasn't been looked it for an hour. Then five hours after that, if I haven't opened it again, the extension just closes the tab. I tried that for a week and found it pretty useful. Tabs that are hidden still show up in the awesome bar and as soon as I click on them, they come back instantly. Eventually they'll get closed. They'll still appear in the awesome bar and I can bring them back, just in a slower manner. If I find myself saying "where was that tab..." I just go to the management view and its likely there. This extension isn't perfect, but its enabled me to stop using vertical tabs most of the time and now I'm torn which workflow is better. Maybe some combination. [Less]
Posted 9 days ago by Ryan T. Harter
Will posted a great article a couple weeks ago, Giving and Receiving Help at Mozilla. I have been meaning to write a similar article for a while now. His post finally pushed me over the edge. Be sure to read Will's post first. The rest of this ... [More] article is an addendum to his post. Avoid Context Free Pings Context free pings should be considered harmful. These are pings like ping or hey. The problem with context free pings are documented elsewhere (1, 2, 3) so I won't discuss them here. Pings are Ephemeral IRC and Slack are nice because they generate notifications. If you need a quick response, IRC or Slack are the way to go. I get Slack and IRC notifications on my phone, so I'm likely to respond quickly. On the other hand, these notifications disappear easily, which makes it easy for me to lose your message. If you don't hear from me immediately, it's a good idea to send an email. Otherwise, I don't mind pings at all. Some folks worry about creating interruptions, but this isn't a problem for me. I limit the notifications I get so if I don't want to get your notification, I won't. If I'm looking at Slack, I'm already distracted. In short, consider these rules of thumb: If it will take me less than 2m to respond to you and it's urgent, ping me If it will take me more than 2m to respond to you and it's urgent, file a bug and ping me If it's not urgent just email me Prefer Open Channels I've spent a lot of time on documentation at Mozilla. It's hard. Our tools are constantly under development and our needs are always changing so our documentation needs constant work. Asking questions in the open reduces our documentation burden. Email is where information goes to die. If we discuss a problem in a bug, that conversation is open and discoverable. It's not always useful, but it's a huge win when it is. File a bug instead of writing an email. @mention me in on #fx-metrics instead of PM-ing me. CC an open mailing list if you need to use email. [Less]
Posted 9 days ago by Nicholas D. Matsakis
So aturon wrote this beautiful post about what a good week it has been. In there, they wrote: Breakthrough #2: @nikomatsakis had a eureka moment and figured out a path to make specialization sound, while still supporting its most important use ... [More] cases (blog post forthcoming!). Again, this suddenly puts specialization on the map for Rust Epoch 2018. Sheesh I wish they hadn’t written that! Now the pressure is on. Well, here goes nothing =). Anyway, I’ve been thinking about the upcoming Rust Epoch. We’ve been iterating over the final list of features to be included and I think it seems pretty exciting. But there is one “fancy type system” feature that’s been languishing for some time: specialization. Accepted to much fanfare as RFC 1210, we’ve been kind of stuck since then trying to figure out how to solve an underlying soundness challenge. As aturon wrote, I think (and emphasis on think!) I may have a solution. I call it the always applicable rule, but you might also call it maximally minimal specialization1. Let’s be clear: this proposal does not support all the specialization use cases originally envisioned. As the phrase maximally minimal suggests, it works by focusing on a core set of impls and accepting those. But that’s better than most of its competitors! =) Better still, it leaves a route for future expansion. The soundness problem I’ll just cover the soundness problem very briefly; Aaron wrote an excellent blog post that covers the details. The crux of the problem is that code generation wants to erase regions, but the type checker doesn’t. This means that we can write specialization impls that depend on details of lifetimes, but we have no way to test at code generation time if those more specialized impls apply. A very simple example would be something like this: impl<T> Trait for T { } impl Trait for &'static str { } At code generation time, all we know is that we have a &str – for some lifetime. We don’t know if it’s a static lifetime or not. The type checker is supposed to have assured us that we don’t have to know – that this lifetime is “big enough” to cover all the uses of the string. My proposal would reject the specializing impl above. I basically aim to solve this problem by guaranteeing that, just as today, code generation doesn’t have to care about specific lifetimes, because it knows that – whatever they are – if there is a potentially specializing impl, it will be applicable. The “always applicable” test The core idea is to change the rule for when overlap is allowed. In RFC 1210 the rule is something like this: Distinct impls A and B are allowed to overlap if one of them specializes the other. We have long intended to extend this via the idea of intersection impls, giving rise to a rule like: Two distinct impls A and B are allowed to overlap if, for all types in their intersection: there exists an applicable impl C and C specializes both A and B.2 My proposal is to extend that intersection rule with the always applicable test. I’m actually going to start with a simple version, and then I’ll discuss an important extension that makes it much more expressive. Two distinct impls A and B are allowed to overlap if, for all types in their intersection: there exists an applicable impl C and C specializes both A and B, and that impl C is always applicable. (We will see, by the way, that the precise definition of the specializes predicate doesn’t matter much for the purposes of my proposal here – any partial order will do.) When is an impl always applicable? Intuitively, an impl is always applicable if it does not impose any additional conditions on its input types beyond that they be well-formed – and in particular it doesn’t impose any equality constraints between parts of its input types. It also has to be fully generic with respect to the lifetimes involved. Actually, I think the best way to explain it is in terms of the implied bounds proposal3 (RFC, blog post). The idea is roughly this: an impl is always applicable if it meets three conditions: it relies only on implied bounds, it is fully generic with respect to lifetimes, it doesn’t repeat generic type parameters. Let’s look at those three conditions. Condition 1: Relies only on implied bounds. Here is an example of an always applicable impl (which could therefore be used to specialize another impl): struct Foo<T: Clone> { } impl<T> SomeTrait for Foo<T> { // code in here can assume that `T: Clone` because of implied bounds } Here the impl works fine, because it adds no additional bounds beyond the T: Clone that is implied by the struct declaration. If the impl adds new bounds that are not part of the struct, however, then it is not always applicable: struct Foo<T: Clone> { } impl<T: Copy> SomeTrait for Foo<T> { // ^^^^^^^ new bound not declared on `Foo`, // hence *not* always applicable } Condition 2: Fully generic with respect to lifetimes. Each lifetime used in the impl header must be a lifetime parameter, and each lifetime parameter can only be used once. So an impl like this is always applicable: impl<'a, 'b> SomeTrait for &'a &'b u32 { // implied bounds let us assume that `'b: 'a`, as well } But the following impls are not always applicable: impl<'a> SomeTrait for &'a &'a u32 { // ^^^^^^^ same lifetime used twice } impl SomeTrait for &'static str { // ^^^^^^^ not a lifetime parmeter } Condition 3: Each type parameter can only be used once. Using a type parameter more than once imposes “hidden” equality constraints between parts of the input types which in turn can lead to equality constraints between lifetimes. Therefore, an always applicable impl must use each type parameter only once, like this: impl<T, U> SomeTrait for (T, U) { } Repeating, as here, means the impl cannot be used to specialize: impl<T> SomeTrait for (T, T) { // ^^^^ // `T` used twice: not always applicable } How can we think about this formally? For each impl, we can create a Chalk goal that is provable if it is always applicable. I’ll define this here “by example”. Let’s consider a variant of the first example we saw: struct Foo<T: Clone> { } impl<T: Clone> SomeTrait for Foo<T> { } As we saw before, this impl is always applicable, because the T: Clone where clause on the impl follows from the implied bounds of Foo. The recipe to transform this into a predicate is that we want to replace each use of a type/region parameter in the input types with a universally quantified type/region (note that the two uses of the same type parameter would be replaced with two distinct types). This yields a “skolemized” set of input types T. When check if the impl could be applied to T. In the case of our example, that means we would be trying to prove something like this: // For each *use* of a type parameter or region in // the input types, we add a 'forall' variable here. // In this example, the only spot is `Foo`, so we // have one: forall { // We can assume that each of the input types (using those // forall variables) are well-formed: if (WellFormed(Foo)) { // Now we have to see if the impl matches. To start, // we create existential variables for each of the // impl's generic parameters: exists { // The types in the impl header must be equal... Foo = Foo, // ...and the where clauses on the impl must be provable. T: Clone, } } } Clearly, this is provable: we infer that T = A, and then we can prove that A: Clone because it follows from WellFormed(Foo). Now if we look at the second example, which added T: Copy to the impl, we can see why we get an error. Here was the example: struct Foo<T: Clone> { } impl<T: Copy> SomeTrait for Foo<T> { // ^^^^^^^ new bound not declared on `Foo`, // hence *not* always applicable } That example results in a query like: forall { if (WellFormed(Foo)) { exists { Foo = Foo, T: Copy, // In this case, we fail to prove T: Copy, because it does not follow from WellFormed(Foo). As one last example, let’s look at the impl that repeats a type parameter: impl<T> SomeTrait for (T, T) { // Not always applicable } The query that will result follows; what is interesting here is that the type (T, T) results in two forall variables, because it has two distinct uses of a type parameter (it just happens to be one parameter used twice): forall { if (WellFormed((A, B))) { exists { (T, T) = (A, B) // What is accepted? What this rule primarily does it allow you to specialize blanket impls with concrete types. For example, we currently have a From impl that says any type T can be converted to itself: impl<T> From<T> for T { .. } It would be nice to be able to define an impl that allows a value of the never type ! to be converted into any type (since such a value cannot exist in practice: impl<T> From for T { .. } However, this impl overlaps with the reflexive impl. Therefore, we’d like to be able to provide an intersection impl defining what happens when you convert ! to ! specifically: impl From for ! { .. } All of these impls would be legal in this proposal. Extension: Refining always applicable impls to consider the base impl While it accepts some things, the always applicable rule can also be quite restrictive. For example, consider this pair of impls: // Base impl: impl<T> SomeTrait for T where T: 'static { } // Specializing impl: impl SomeTrait for &'static str { } Here, the second impl wants to specialize the first, but it is not always applicable, because it specifies the 'static lifetime. And yet, it feels like this should be ok, since the base impl only applies to 'static things. We can make this notion more formal by expanding the property to say that the specializing impl C must be always applicable with respect to the base impls. In this extended version of the predicate, the impl C is allowed to rely not only on the implied bounds, but on the bounds that appear in the base impl(s). So, the impls above might result in a Chalk predicate like: // One use of a lifetime in the specializing impl (`'static`), // so we introduce one 'forall' lifetime: forall { // Assuming the base impl applies: if (exists { T = &'a str, T: 'static }) { // We have to prove that the // specialized impls type's can unify: &'a str = &'static str } } } As it happens, the compiler today has logic that would let us deduce that, because we know that &'a str: 'static, then we know that 'a = 'static, and hence we could solve this clause successfully. This rule also allows us to accept some cases where type parameters are repeated, though we’d have to upgrade chalk’s capability to let it prove those predicates fully. Consider this pair of impls from RFC 1210: // Base impl: impl<E, T> Extend<E, T> for Vec<E> where T: IntoIterator<Item=E> {..} // Specializing impl: impl<'a, E> Extend<E, &'a [E]> for Vec<E> {..} // ^ ^ ^ E repeated three times! Here the specializing impl repeats the type parameter E three times! However, looking at the base impl, we can see that all of those repeats follow from the conditions on the base impl. The resulting chalk predicate would be: // The fully general form of specializing impl is // > impl Extend for Vec forall { // Assuming the base impl applies: if (exists { E = A, T = &'b [B], Vec = Vec, T: IntoIterator }) { // Can we prove the specializing impl unifications? exists { E = A, &'a [E] = &'b [C], Vec = Vec, } } } This predicate should be provable – but there is a definite catch. At the moment, these kinds of predicates fall outside the “Hereditary Harrop” (HH) predicates that Chalk can handle. HH predicates do not permit existential quantification and equality predicates as hypotheses (i.e., in an if (C) { ... }). I can however imagine some quick-n-dirty extensions that would cover these particular cases, and of course there are more powerful proving techniques out there that we could tinker with (though I might prefer to avoid that). Extension: Reverse implied bounds rules While the previous examples ought to be provable, there are some other cases that won’t work out without some further extension to Rust. Consider this pair of impls: impl<T> Foo for T where T: Clone { } impl<T> Foo for Vec<T> where T: Clone { } Can we consider this second impl to be always applicable relative to the first? Effectively this boils down to asking whether knowing Vec: Clone allows us to deduce that T: Clone – and right now, we can’t know that. The problem is that the impls we have only go one way. That is, given the following impl: impl<T> Clone for Vec<T> where T: Clone { .. } we get a program clause like forall { (Vec: Clone) :- (T: Clone) } but we need the reverse: forall { (T: Clone) :- (Vec: Clone) } This is basically an extension of implied bounds; but we’d have to be careful. If we just create those reverse rules for every impl, then it would mean that removing a bound from an impl is a breaking change, and that’d be a shame. We could address this in a few ways. The most obvious is that we might permit people to annotate impls indicating that they represent minimal conditions (i.e., that removing a bound is a breaking change). Alternatively, I feel like there is some sort of feature “waiting” out there that lets us make richer promises about what sorts of trait impls we might write in the future: this would be helpful also to coherence, since knowing what impls will not be written lets us permit more things in downstream crates. (For example, it’d be useful to know that Vec will never be Copy.) Extension: Designating traits as “specialization predicates” However, even when we consider the base impl, and even if we have some solution to reverse rules, we still can’t cover the use case of having “overlapping blanket impls”, like these two: impl<T> Skip for T where T: Read { .. } impl<T> Skip for T where T: Read + Seek { .. } Here we have a trait Skip that (presumably) lets us skip forward in a file. We can supply one default implementation that works for any reader, but it’s inefficient: it would just read and discard N bytes. It’d be nice if we could provide a more efficient version for those readers that implement Seek. Unfortunately, this second impl is not always applicable with respect to the first impl – it adds a new requirement, T: Seek, that does not follow from the bounds on the first impl nor the implied bounds. You might wonder why this is problematic in the first place. The danger is that some other crate might have an impl for Seek that places lifetime constraints, such as: impl Seek for &'static Foo { } Now at code generation time, we won’t be able to tell if that impl applies, since we’ll have erased the precise region. However, what we could do is allow the Seek trait to be designated as a specialization predicate (perhaps with an attribute like #[specialization_predicate]). Traits marked as specialization predicates would be limited so that every one of their impls must be always applicable (our original predicate). This basically means that, e.g., a “reader” cannot conditionally implement Seek – it has to be always seekable, or never. When determining whether an impl is always applicable, we can ignore where clauses that pertain to #[specialization_predicate] traits. Adding a #[specialization_predicate] attribute to an existing trait would be a breaking change; removing it would be one too. However, it would be possible to take existing traits and add “specialization predicate” subtraits. For example, if the Seek trait already existed, we might do this: impl<T> Skip for T where T: Read { .. } impl<T> Skip for T where T: Read + SeekPredicate { .. } #[specialization_predicate] trait UnconditionalSeek: Seek { fn seek_predicate(&self, n: usize) { self.seek(n); } } Now streams that implement seek unconditionally (probably all of them) can add impl UnconditionalSeek for MyStream { } and get the optimization. Not as automatic as we might like, but could be worse. Default impls need not be always applicable This last example illustrates an interesting point. RFC 1210 described not only specialization but also a more flexible form of defaults that go beyond default methods in trait definitions. The idea was that you can define lots of defaults using a default impl. So the UnconditionalSeek trait at the end of the last section might also have been expressed: #[specialization_predicate] trait UnconditionalSeek: Seek { } default impl<T: Seek> UnconditionalSeek for T { fn seek_predicate(&self, n: usize) { self.seek(n); } } The interesting thing about default impls is that they are not (yet) a full impl. They only represent default methods that real impls can draw upon, but users still have to write a real impl somewhere. This means that they can be exempt from the rules about being always applicable – those rules will be enforced at the real impl point. Note for example that the default impl above is not always available, as it depends on Seek, which is not an implied bound anywhere. Conclusion I’ve presented a refinement of specialization in which we impose one extra condition on the specializing impl: not only must it be a subset of the base impl(s) that it specializes, it must be always applicable, which means basically that if we are given a set of types T where we know: the base impl was proven by the type checker to apply to T the types T were proven by the type checker to be well-formed and the specialized impl unifies with the lifetime-erased versions of T then we know that the specialized impl applies. The beauty of this approach compared with past approaches is that it preserves the existing role of the type checker and the code generator. As today in Rust, the type checker always knows the full region details, but the code generator can just ignore them, and still be assured that all region data will be valid when it is accessed. This implies for example that we don’t need to impose the restrictions that aturon discussed in their blog post: we can allow specialized associated types to be resolved in full by the type checker as long as they are not marked default, because there is no danger that the type checker and trans will come to different conclusions. Thoughts? I’ve opened an internals thread on this post. I’d love to hear whether you see a problem with this approach. I’d also like to hear about use cases that you have for specialization that you think may not fit into this approach. Footnotes We don’t say it so much anymore, but in the olden days of Rust, the phrase “max min” was very “en vogue”; I think we picked it up from some ES6 proposals about the class syntax. ↩ Note: an impl is said to specialize itself. ↩ Let me give a shout out here to scalexm, who recently emerged with an elegant solution for how to model implied bounds in Chalk. ↩ [Less]
Posted 9 days ago by Nicholas D. Matsakis
So aturon wrote this beautiful post about what a good week it has been. In there, they wrote: Breakthrough #2: @nikomatsakis had a eureka moment and figured out a path to make specialization sound, while still supporting its most important use ... [More] cases (blog post forthcoming!). Again, this suddenly puts specialization on the map for Rust Epoch 2018. Sheesh I wish they hadn’t written that! Now the pressure is on. Well, here goes nothing =). Anyway, I’ve been thinking about the upcoming Rust Epoch. We’ve been iterating over the final list of features to be included and I think it seems pretty exciting. But there is one “fancy type system” feature that’s been languishing for sometime: specialization. Accepted to much fanfare as RFC 1210, we’ve been kind of stuck since then trying to figure out how to solve an underlying soundness challenge. As aturon wrote, I think (and emphasis on think!) I may have a solution. I call it the always applicable rule, but you might also call it maximally minimal specialization1. Let’s be clear: this proposal does not support all the specialization use cases originally envisioned. As the phrase maximally minimal suggests, it works by focusing on a core set of impls and accepting those. But that’s better than most of its competitors! =) Better still, it leaves a route for future expansion. The soundness problem I’ll just cover the soundness problem very briefly; Aaron wrote an excellent blog post that covers the details. The crux of the problem is that code generation wants to erase regions, but the type checker doesn’t. This means that we can write specialization impls that depend on details of lifetimes, but we have no way to test at code generation time if those more specialized impls apply. A very simple example would be something like this: impl<T> Trait for T { } impl Trait for &'static str { } At code generation time, all we know is that we have a &str – for some lifetime. We don’t know if it’s a static lifetime or not. The type checker is supposed to have assured us that we don’t have to know – that this lifetime is “big enough” to cover all the uses of the string. My proposal would reject the specializing impl above. I basically aim to solve this problem by guaranteeing that, just as today, code generation doesn’t have to care about specific lifetimes, because it knows that – whatever they are – if there is a potentially specializing impl, it will be applicable. The “always applicable” test The core idea is to change the rule for when overlap is allowed. In RFC 1210 the rule is something like this: Distinct impls A and B are allowed to overlap if one of them specializes the other. We have long intended to extend this via the idea of intersection impls, giving rise to a rule like: Two distinct impls A and B are allowed to overlap if, for all types in their intersection: there exists an applicable impl C and C specializes both A and B.2 My proposal is to extend that intersection rule with the always applicable test. I’m actually going to start with a simple version, and then I’ll discuss an important extension that makes it much more expressive. Two distinct impls A and B are allowed to overlap if, for all types in their intersection: there exists an applicable impl C and C specializes both A and B, and that impl C is always applicable. (We will see, by the way, that the precise definition of the specializes predicate doesn’t matter much for the purposes of my proposal here – any partial order will do.) When is an impl always applicable? Intuitively, an impl is always applicable if it does not impose any additional conditions on its input types beyond that they be well-formed – and in particular it doesn’t impose any equality constraints between parts of its input types. It also has to be fully generic with respect to the lifetimes involved. Actually, I think the best way to explain it is in terms of the implied bounds proposal3 (RFC, blog post). The idea is roughly this: an impl is always applicable if it meets three conditions: it relies only on implied bounds, it is fully generic with respect to lifetimes, it doesn’t repeat generic type parameters. Let’s look at those three conditions. Condition 1: Relies only on implied bounds. Here is an example of an always applicable impl (which could therefore be used to specialize another impl): struct Foo<T: Clone> { } impl<T> SomeTrait for Foo<T> { // code in here can assume that `T: Clone` because of implied bounds } Here the impl works fine, because it adds no additional bounds beyond the T: Clone that is implied by the struct declaration. If the impl adds new bounds that are not part of the struct, however, then it is not always applicable: struct Foo<T: Clone> { } impl<T: Copy> SomeTrait for Foo<T> { // ^^^^^^^ new bound not declared on `Foo`, // hence *not* always applicable } Condition 2: Fully generic with respect to lifetimes. Each lifetime used in the impl header must be a lifetime parameter, and each lifetime parameter can only be used once. So an impl like this is always applicable: impl<'a, 'b> SomeTrait for &'a &'b u32 { // implied bounds let us assume that `'b: 'a`, as well } But the following impls are not always applicable: impl<'a> SomeTrait for &'a &'a u32 { // ^^^^^^^ same lifetime used twice } impl SomeTrait for &'static str { // ^^^^^^^ not a lifetime parmeter } Condition 3: Each type parameter can only be used once. Using a type parameter more than once imposes “hidden” equality constraints between parts of the input types which in turn can lead to equality constraints between lifetimes. Therefore, an always applicable impl must use each type parameter only once, like this: impl<T, U> SomeTrait for (T, U) { } Repeating, as here, means the impl cannot be used to specialize: impl<T> SomeTrait for (T, T) { // ^^^^ // `T` used twice: not always applicable } How can we think about this formally? For each impl, we can create a Chalk goal that is provable if it is always applicable. I’ll define this here “by example”. Let’s consider a variant of the first example we saw: struct Foo<T: Clone> { } impl<T: Clone> SomeTrait for Foo<T> { } As we saw before, this impl is always applicable, because the T: Clone where clause on the impl follows from the implied bounds of Foo. The recipe to transform this into a predicate is that we want to replace each use of a type/region parameter in the input types with a universally quantified type/region (note that the two uses of the same type parameter would be replaced with two distinct types). This yields a “skolemized” set of input types T. When check if the impl could be applied to T. In the case of our example, that means we would be trying to prove something like this: // For each *use* of a type parameter or region in // the input types, we add a 'forall' variable here. // In this example, the only spot is `Foo`, so we // have one: forall { // We can assume that each of the input types (using those // forall variables) are well-formed: if (WellFormed(Foo)) { // Now we have to see if the impl matches. To start, // we create existential variables for each of the // impl's generic parameters: exists { // The types in the impl header must be equal... Foo = Foo, // ...and the where clauses on the impl must be provable. T: Clone, } } } Clearly, this is provable: we infer that T = A, and then we can prove that A: Clone because it follows from WellFormed(Foo). Now if we look at the second example, which added T: Copy to the impl, we can see why we get an error. Here was the example: struct Foo<T: Clone> { } impl<T: Copy> SomeTrait for Foo<T> { // ^^^^^^^ new bound not declared on `Foo`, // hence *not* always applicable } That example results in a query like: forall { if (WellFormed(Foo)) { exists { Foo = Foo, T: Copy, // In this case, we fail to prove T: Copy, because it does not follow from WellFormed(Foo). As one last example, let’s look at the impl that repeats a type parameter: impl<T> SomeTrait for (T, T) { // Not always applicable } The query that will result follows; what is interesting here is that the type (T, T) results in two forall variables, because it has two distinct uses of a type parameter (it just happens to be one parameter used twice): forall { if (WellFormed((A, B))) { exists { (T, T) = (A, B) // What is accepted? What this rule primarily does it allow you to specialize blanket impls with concrete types. For example, we currently have a From impl that says any type T can be converted to itself: impl<T> From<T> for T { .. } It would be nice to be able to define an impl that allows a value of the never type ! to be converted into any type (since such a value cannot exist in practice: impl<T> From<T> for ! { .. } However, this impl overlaps with the reflexive impl. Therefore, we’d like to be able to provide an intersection impl defining what happens when you convert ! to ! specifically: impl From for ! { .. } All of these impls would be legal in this proposal. Extension: Refining always applicable impls to consider the base impl While it accepts some things, the always applicable rule can also be quite restrictive. For example, consider this pair of impls: // Base impl: impl<T> SomeTrait for T where T: 'static { } // Specializing impl: impl SomeTrait for &'static str { } Here, the second impl wants to specialize the first, but it is not always applicable, because it specifies the 'static lifetime. And yet, it feels like this should be ok, since the base impl only applies to 'static things. We can make this notion more formal by expanding the property to say that the specializing impl C must be always applicable with respect to the base impls. In this extended version of the predicate, the impl C is allowed to rely not only on the implied bounds, but on the bounds that appear in the base impl(s). So, the impls above might result in a Chalk predicate like: // One use of a lifetime in the specializing impl (`'static`), // so we introduce one 'forall' lifetime: forall { // Assuming the base impl applies: if (exists { T = &'a str, T: 'static }) { // We have to prove that the // specialized impls type's can unify: &'a str = &'static str } } } As it happens, the compiler today has logic that would let us deduce that, because we know that &'a str: 'static, then we know that 'a = 'static, and hence we could solve this clause successfully. This rule also allows us to accept some cases where type parameters are repeated, though we’d have to upgrade chalk’s capability to let it prove those predicates fully. Consider this pair of impls from RFC 1210: // Base impl: impl<E, T> Extend<E, T> for Vec<E> where T: IntoIterator<Item=E> {..} // Specializing impl: impl<'a, E> Extend<E, &'a [E]> for Vec<E> {..} // ^ ^ ^ E repeated three times! Here the specializing impl repeats the type parameter E three times! However, looking at the base impl, we can see that all of those repeats follow from the conditions on the base impl. The resulting chalk predicate would be: // The fully general form of specializing impl is // > impl Extend for Vec forall { // Assuming the base impl applies: if (exists { E = A, T = &'b [B], Vec = Vec, T: IntoIterator }) { // Can we prove the specializing impl unifications? exists { E = A, &'a [E] = &'b [C], Vec = Vec, } } } This predicate should be provable – but there is a definite catch. At the moment, these kinds of predicates fall outside the “Hereditary Harrop” (HH) predicates that Chalk can handle. HH predicates do not permit existential quantification and equality predicates as hypotheses (i.e., in an if (C) { ... }). I can however imagine some quick-n-dirty extensions that would cover these particular cases, and of course there are more powerful proving techniques out there that we could tinker with (though I might prefer to avoid that). Extension: Reverse implied bounds rules While the previous examples ought to be provable, there are some other cases that won’t work out without some further extension to Rust. Consider this pair of impls: impl<T> Foo for T where T: Clone { } impl<T> Foo for Vec<T> where T: Clone { } Can we consider this second impl to be always applicable relative to the first? Effectively this boils down to asking whether knowing Vec: Clone allows us to deduce that T: Clone – and right now, we can’t know that. The problem is that the impls we have only go one way. That is, given the following impl: impl<T> Clone for Vec<T> where T: Clone { .. } we get a program clause like forall { (Vec: Clone) :- (T: Clone) } but we need the reverse: forall { (T: Clone) :- (Vec: Clone) } This is basically an extension of implied bounds; but we’d have to be careful. If we just create those reverse rules for every impl, then it would mean that removing a bound from an impl is a breaking change, and that’d be a shame. We could address this in a few ways. The most obvious is that we might permit people to annotate impls indicating that they represent minimal conditions (i.e., that removing a bound is a breaking change). Alternatively, I feel like there is some sort of feature “waiting” out there that lets us make richer promises about what sorts of trait impls we might write in the future: this would be helpful also to coherence, since knowing what impls will not be written lets us permit more things in downstream crates. (For example, it’d be useful to know that Vec will never be Copy.) Extension: Designating traits as “specialization predicates” However, even when we consider the base impl, and even if we have some solution to reverse rules, we still can’t cover the use case of having “overlapping blanket impls”, like these two: impl<T> Skip for T where T: Read { .. } impl<T> Skip for T where T: Read + Seek { .. } Here we have a trait Skip that (presumably) lets us skip forward in a file. We can supply one default implementation that works for any reader, but it’s inefficient: it would just read and discard N bytes. It’d be nice if we could provide a more efficient version for those readers that implement Seek. Unfortunately, this second impl is not always applicable with respect to the first impl – it adds a new requirement, T: Seek, that does not follow from the bounds on the first impl nor the implied bounds. You might wonder why this is problematic in the first place. The danger is that some other crate might have an impl for Seek that places lifetime constraints, such as: impl Seek for &'static Foo { } Now at code generation time, we won’t be able to tell if that impl applies, since we’ll have erased the precise region. However, what we could do is allow the Seek trait to be designated as a specialization predicate (perhaps with an attribute like #[specialization_predicate]). Traits marked as specialization predicates would be limited so that every one of their impls must be always applicable (our original predicate). This basically means that, e.g., a “reader” cannot conditionally implement Seek – it has to be always seekable, or never. When determining whether an impl is always applicable, we can ignore where clauses that pertain to #[specialization_predicate] traits. Adding a #[specialization_predicate] attribute to an existing trait would be a breaking change; removing it would be one too. However, it would be possible to take existing traits and add “specialization predicate” subtraits. For example, if the Seek trait already existed, we might do this: impl<T> Skip for T where T: Read { .. } impl<T> Skip for T where T: Read + SeekPredicate { .. } #[specialization_predicate] trait UnconditionalSeek: Seek { fn seek_predicate(&self, n: usize) { self.seek(n); } } Now streams that implement seek unconditionally (probably all of them) can add impl UnconditionalSeek for MyStream { } and get the optimization. Not as automatic as we might like, but could be worse. Default impls need not be always applicable This last example illustrates an interesting point. RFC 1210 described not only specialization but also a more flexible form of defaults that go beyond default methods in trait definitions. The idea was that you can define lots of defaults using a default impl. So the UnconditionalSeek trait at the end of the last section might also have been expressed: #[specialization_predicate] trait UnconditionalSeek: Seek { } default impl<T: Seek> UnconditionalSeek for T { fn seek_predicate(&self, n: usize) { self.seek(n); } } The interesting thing about default impls is that they are not (yet) a full impl. They only represent default methods that real impls can draw upon, but users still have to write a real impl somewhere. This means that they can be exempt from the rules about being always applicable – those rules will be enforced at the real impl point. Note for example that the default impl above is not always available, as it depends on Seek, which is not an implied bound anywhere. Conclusion I’ve presented a refinement of specialization in which we impose one extra condition on the specializing impl: not only must it be a subset of the base impl(s) that it specializes, it must be always applicable, which means basically that if we are given a set of types T where we know: the base impl was proven by the type checker to apply to T the types T were proven by the type checker to be well-formed and the specialized impl unifies with the lifetime-erased versions of T then we know that the specialized impl applies. The beauty of this approach compared with past approaches is that it preserves the existing role of the type checker and the code generator. As today in Rust, the type checker always knows the full region details, but the code generator can just ignore them, and still be assured that all region data will be valid when it is accessed. This implies for example that we don’t need to impose the restrictions that aturon discussed in their blog post: we can allow specialized associated types to be resolved in full by the type checker as long as they are not marked default, because there is no danger that the type checker and trans will come to different conclusions. Thoughts? I’ve opened an internals thread on this post. I’d love to hear whether you see a problem with this approach. I’d also like to hear about use cases that you have for specialization that you think may not fit into this approach. Footnotes We don’t say it so much anymore, but in the olden days of Rust, the phrase “max min” was very “en vogue”; I think we picked it up from some ES6 proposals about the class syntax. ↩ Note: an impl is said to specialize itself. ↩ Let me give a shout out here to scalexm, who recently emerged with an elegant solution for how to model implied bounds in Chalk. ↩ [Less]
Posted 9 days ago by Air Mozilla
 - Matthew Fornaciari from Gremlin talking about a version of Chaos Monkey in Rust - George Morgan from Flipper talking about their embedded Rust...
Posted 9 days ago by Karl Dubost
Yet another Webcompat issue with the characters being cut in the bottom, this will join the other ones, such as cross characters not well centered in a rounded box and many other cases. What about it? The sans-serif issue All of these have the same ... [More] pattern. They rely on the intrinsic font features to get the right design. So… this morning was another of this case. Take this very simple CSS rule: .gsc-control-cse, .gsc-control-cse .gsc-table-result { width: 100%; font-family: Arial, sans-serif; font-size: 13px; } Nothing fancy about it. It includes Arial, a widely used font and it gives a sans-serif fallback. It seems to be a sound and fail-safe choice. Well… meet the land of mobile and your font declaration doesn't seem to be that reliable anymore. Mobile browsers have different default fonts on Android. The sans-serif doesn't mean the same thing in all browsers on the same OS. For example, for sans-serif and western languages Chrome: Roboto Firefox: Clear Sans If you use Chinese or Japanese characters, the default will be different. Fix The Users Woes On Mobile Why is it happening so often? Same story, the web developers didn't have time, budget to test on all browsers. They probably tested on Chrome and Safari (iOS) and they decided to make a pass on Firefox Android. And because fonts have different features, they do not behave the same to line-height, box sizes and so on. Clear Sans and Roboto are different enough that it creates breakage on some sites. If you test only on Chrome Android (you should not), but let says we reached the shores of Friday… and it's time to deploy at 5pm. This is your fix: .gsc-control-cse, .gsc-control-cse .gsc-table-result { width: 100%; font-family: Arial, Roboto, sans-serif; font-size: 13px; } Name the fonts available on mobile OS, you expect the design to be working on. It's still not universally accessible and will not make it reliable in all cases, but it will cover a lot of cases. It will also make your Firefox Android users less grumpy and your Mondays will be brighter. Otsukare! [Less]