I Use This!
Activity Not Available

News

Analyzed 2 months ago. based on code collected 4 months ago.
Posted almost 5 years ago
Using a computer is mostly about executing apps, reading, writing and doing. But it can also be about not doing. Confusing? Bear with me. Imagine for a second that you are in an elementary school. The leadership is optimistic on exposing ... [More] students to technology. They have set up big rooms with rows and rows of computers ready for their students to use. Would you give complete permissions to these teenagers using the computers? Would you allow them to install and uninstall programs as they wish, access any website they feel like, use for as much time they want? An elementary question This is an intriguing situation to think about. At the same time that we want to restrict what the user can do — in this case, the student — we still want them to be able to get stuff done. Thanks to Cassidy Blaede from the elementary OS team, this school-wants-to-limit-students situation was brought to the roundtable during the Parental Controls & Metered Data hackfest that happened during the second half of April. The event itself was extensively covered already by Cassidy, and by Philip’s and Iain’s daily reports, and that is not going to be the focus of this article. I’m personally more interested in the design and technological aspects of what was discussed. Apps or not apps Naturally, when talking about user restriction, apps are probably the very first aspect of it that comes to our minds. Followed by browsers. There is not a lot of controversy to the idea of assuming that administrators are likely to be the “supervisors”, and users without administrator privilages are the “supervised”. At first, the use case we thought we were dealing was mostly about guardians or parents restricting what kind of contents, applications, or websites their kids would be able to access. This is natural on systems that are heavily app-based. Smartphones have interesting implementations that are not only sane, but also technically feasible. But the traditional way of distributing apps via distro packages, and executing them without restrictions, makes that task incredibly hard, impossible dare I say. Unfortunately, the Linux desktop in general is not quite there yet. Flatpak to the rescue! Implementing app restrictions generically may be close to impossible on the traditional desktop, but it is actually very much feasible on newer technologies such as Flatpak. If you have an OSTree-based immutable filesystem where all user-managed apps are installed via Flatpak, such as Endless OS or SilverBlue, app restrictions suddenly becomes even more realistic. In fact, we at Endless have implemented something like that already, even if a bit rudimentary. Endless OS implementation of app restrictions For the purpose of continuing the hackfest discussions, Allan managed to do a quick sketch about it: Disclaimer: this is nowhere near being final. An interesting aspect of the mockups is the “Supervisor Password”; a password that does not give full root permissions, and yet allows whoever is managing the system to assume the role of supervisor. Think of a teacher, without the administrator password, but still able to supervise students’ accounts. But browsers… Restricting websites, however, is remarkably tricky to accomplish. It is essentially impossible to whitelist websites, at least when implenting it outside the browsers themselves. In general, blacklisting is easier(ish) than whitelisting because we don’t have to deal with the case of websites accessing contents outside their domains. I’m pretty sure this is a reasonably studied field though, and I simply lack the pre-existing knowledge. What about me? While discussing those aspects of restricting users, it quickly became clear that applying restrictions on yourself is perfectly valid. It is actually branded as “Digital Wellbeing” and similar on various mobile OSes. Those who work in front of computers and have more control over their schedule will appreciate being able to restrict themselves and improve their health, focus, and productivity. More mockups (and again, quick sketches, not final) It is also interesting to see per-app usage: It is not clear yet how a potential new Focus mode would interact with notifications. Metrics You might notice that those mockups require some sort of a metrics system that would monitor which app the user is using. This could range from a simple event logger to a sophisticated daemon collecting much more. Nothing is settled in this front, no decision was made yet. Of course, privacy concerns were taken into account. There is absolutely no room for sharing this data. All data would be stored locally. My attendance to the Parental Controls & Metered Data hackfest was proudly sponsored by the GNOME Foundation. [Less]
Posted almost 5 years ago
Using a computer is mostly about executing apps, reading, writing and doing. But it can also be about not doing. Confusing? Bear with me. Imagine for a second that you are in an elementary school. The leadership is optimistic on exposing ... [More] students to technology. They have set up big rooms with rows and rows of computers ready for their students to use. Would you give complete permissions to these teenagers using the computers? Would you allow them to install and uninstall programs as they wish, access any website they feel like, use for as much time they want? An elementary question This is an intriguing situation to think about. At the same time that we want to restrict what the user can do — in this case, the student — we still want them to be able to get stuff done. Thanks to Cassidy Blaede from the elementary OS team, this school-wants-to-limit-students situation was brought to the roundtable during the Parental Controls & Metered Data hackfest that happened during the second half of April. The event itself was extensively covered already by Cassidy, and by Philip’s and Iain’s daily reports, and that is not going to be the focus of this article. I’m personally more interested in the design and technological aspects of what was discussed. Apps or not apps Naturally, when talking about user restriction, apps are probably the very first aspect of it that comes to our minds. Followed by browsers. There is not a lot of controversy to the idea of assuming that administrators are likely to be the “supervisors”, and users without administrator privilages are the “supervised”. At first, the use case we thought we were dealing was mostly about guardians or parents restricting what kind of contents, applications, or websites their kids would be able to access. This is natural on systems that are heavily app-based. Smartphones have interesting implementations that are not only sane, but also technically feasible. But the traditional way of distributing apps via distro packages, and executing them without restrictions, makes that task incredibly hard, impossible dare I say. Unfortunately, the Linux desktop in general is not quite there yet. Flatpak to the rescue! Implementing app restrictions generically may be close to impossible on the traditional desktop, but it is actually very much feasible on newer technologies such as Flatpak. If you have an OSTree-based immutable filesystem where all user-managed apps are installed via Flatpak, such as Endless OS or SilverBlue, app restrictions suddenly becomes even more realistic. In fact, we at Endless have implemented something like that already, even if a bit rudimentary. Endless OS implementation of app restrictions For the purpose of continuing the hackfest discussions, Allan managed to do a quick sketch about it: Disclaimer: this is nowhere near being final. An interesting aspect of the mockups is the “Supervisor Password”; a password that does not give full root permissions, and yet allows whoever is managing the system to assume the role of supervisor. Think of a teacher, without the administrator password, but still able to supervise students’ accounts. But browsers… Restricting websites, however, is remarkably tricky to accomplish. It is essentially impossible to whitelist websites, at least when implenting it outside the browsers themselves. In general, blacklisting is easier(ish) than whitelisting because we don’t have to deal with the case of websites accessing contents outside their domains. I’m pretty sure this is a reasonably studied field though, and I simply lack the pre-existing knowledge. What about me? While discussing those aspects of restricting users, it quickly became clear that applying restrictions on yourself is perfectly valid. It is actually branded as “Digital Wellbeing” and similar on various mobile OSes. Those who work in front of computers and have more control over their schedule will appreciate being able to restrict themselves and improve their health, focus, and productivity. More mockups (and again, quick sketches, not final) It is also interesting to see per-app usage: It is not clear yet how a potential new Focus mode would interact with notifications. Metrics You might notice that those mockups require some sort of a metrics system that would monitor which app the user is using. This could range from a simple event logger to a sophisticated daemon collecting much more. Nothing is settled in this front, no decision was made yet. Of course, privacy concerns were taken into account. There is absolutely no room for sharing this data. All data would be stored locally. My attendance to the Parental Controls & Metered Data hackfest was proudly sponsored by the GNOME Foundation. [Less]
Posted almost 5 years ago
I spent a lot of time making Ducktype into a lightweight syntax that I would really enjoy using. I had a list of design goals, and I feel like I hit them pretty well. In this post, I want to outline some of the things I hope you’ll love about the ... [More] syntax. 1. Ducktype has a spec I know not everybody nerds out over reading a spec the way I do. But whether or not you like reading them, specs are important for setting expectations and ensuring interoperable implementations. Probably my biggest gripe with Markdown is that there are just so many different flavors. Couple that with the multiple wiki flavors I regularly deal with, and my days become a game of guess-and-check. Even in languages that haven’t seen flavor proliferation, you can still run into issues with new versions of the language. In a sense, every version of a parser that adds features creates a new flavor. Without a way to specify the version or do any sort of introspection or conditionals, you can run into situations where what you type is silently formatted very differently on different systems. In Ducktype, you can declare the language version and any extensions (more on those later) right at the top of the file. @ducktype/1.0 = Header This is a paragraph. 2. Ducktype keeps semantics I’ve been a strong proponent of semantic markup for a long time, since I started working with LaTeX over 20 years ago. Presentational markup has short-term gains in ease-of-use, but the long-term benefits of semantic markup are clear. Not only can you more easily adapt presentation over time, but you can do a lot with semantic markup other than display it. Semantic markup is real data. Different lightweight languages support semantic markup to different degrees. Markdown is entirely presentational, and if you want to do anything even remotely sophisticated, you have to break out to HTML+CSS. AsciiDoc and reStructuredText both give you some semantic markup at the block level, and give you the tools to create inline semantic markup, but out of the box they still encourage doing semantic-free bold, italic, and monospace. My goal was to completely capture the semantics of Mallard with less markup, not to just create yet another syntax. Ducktype does that. What I found along the way is that it can fairly nicely capture the semantics of other formats like DocBook too, but that’s a story for another day. Ducktype uses a square bracket syntax to introduce semantic block elements, inspired by the group syntax in key files. So if you want a note, you type: [note] This text is an implicit paragraph in the note. If you want a plain old bullet list, you type it just like in any other lightweight syntax: * First item * Second item * Third item But if you want a semantic steps list, you add the block declaration: [steps] * First step * Second step * Third step Ducktype keeps the semantics in inline markup too. Rather than try to introduce special characters for each semantic inline element, Ducktype just lets you use the element name, but with a syntax that’s much less verbose than XML. For example: Click $gui(File). Call the $code(frobnicate) function. Name the file $file(index.duck). Both AsciiDoc and reStructuredText let you do something similar, although sometimes you have to cook up the elements yourself. With Ducktype, you just automatically get everything Mallard can do. You can even include attributes: Press the $key[xref=superkey](Super) key. Call the $code[style=function](frobnicate) function. 3. Ducktype keeps metadata In designing Ducktype, it was absolutely critical that we have a consistent way to provide all metadata. Not only is metadata important for the maintenance of large document sets, but linking metadata is how documents get structure in Mallard. Without metadata, you can’t really write more than a single page. You can do very good page-level metadata in both AsciiDoc and reStructuredText, but for Mallard we need to do metadata at the section and even block levels as well. Markdown, once again, is the poor format here with no support for any metadata at all. Some tools support a YAML header in Markdown, which would be pretty great if it were any sort of standard. Ducktype uses a single, consistent syntax for metadata, always referencing the element name, and coming after the page header, section header, or block declaration that it’s providing info for. = My First Topic @link[type=guide xref=index] @desc This is the first topic page I ever wrote. == My First Section @keywords first section, initial section Do you like my first topic page with a section? We can even do this with block elements: [listing] @credit @name Shaun McCance . A code listing by Shaun [code] here_is_some_code() There’s a lot going on in that little example, from nesting to shorthand block titles. The important part is that we can provide metadata for the code listing. (Side note: I thought a lot about a shorthand for credits, and I have ideas on an extension to provide one. But in the end, I decided that the core syntax should have less magic. Explicit is better than implicit.) 4. Ducktype allows nesting I do quite a bit of semi-manual format conversion fairly frequently. It’s not something I enjoy, and it’s not the best use of my time, but it just keep having to be done, and I’m kind of good at it. If you do that a lot, you’ll often run into cases where something just can’t be represented in another format, and that usually comes down to nesting. Can you put lists inside table cells? Can you put tables inside list items? Can lists nest? Can list items have multiple paragraphs? Both reStructuredText and (shockingly) most flavors of Markdown allow some amount of nesting using indentation. This sometimes breaks down when tables or code blocks are involved, but it’s not terrible. This is the one place where AsciiDoc is my least favorite. In fact, this is my single least favorite thing about AsciiDoc. It uses a plus on its own line to indicate a continuation. It’s hard to visualize, and it has severely limited functionality. Ducktype very explicitly allows nesting with indentation, whether you’re nesting lists, tables, code blocks, notes, or anything else. It allows you to skip some indentation in some common cases, but you can always add indentation to stay in a block. Ducktype specifies the exact nesting behavior. [note style=important] This is very important note about these three things: * The first thing is hard to explain. It takes two paragraphs. * The second thing can really be broken in two: * First subthing * Second subthing * The third thing involves some code: [code] here_is_some_code() You can use any number of spaces you like. I prefer two, in part because it fits in with the shorthand syntax for things like lists. Using indentation and block declarations, there is absolutely no limit on what or how deeply you can nest. 5. Ducktype tables are sane I have never met a lightweight table syntax I liked. Trying to represent real tables with some sort of pseudo-ascii-art works ok for very simple tables, but it falls apart real fast with anything remotely non-trivial. Add to this the extra syntax required for for all the non-trivial stuff, and a substantial portion of your parser is devoted to tables. As a user, I have a hard time keeping all those workarounds in my head. As I said above, the ability to nest things was very important, so most of these tables syntaxes were just a non-starter. How do you add extra blocks in a table cell when the whole table row is expected to fit on one line? In the end, I decided not to have a table syntax. Or more accurately, to treat tables as a sort of list. Without any special treatment, tables, rows, and columns can already be written in Ducktype using the block declarations you’ve already seen: [table] [tr] [td] One [td] Two [tr] [td] Three [td] Four There’s no magic happening here. This is just standard block nesting. But it’s pretty verbose, not as verbose as XML, but still. Can’t we make it a bit easier, like we do for lists? Well yes, we make it easier exactly like we do for lists. [table] [tr] * One * Two [tr] * Three * Four There are a few things going on here. Most importantly, the asterisk is shorthand for a table cell, not a list item. What that shorthand does depends on context. But also, we’ve removed quite a lot of indentation. Earlier I wrote that there are some special rules that allow you to trim indentation in certain common cases. This is one of them. Using a vertical table syntax means that you can do the same kind of nesting you can do with other elements. Here’s a list in a table cell: [table] [tr] * Here's a list: * First * Second * Another table cell Ducktype isn’t the first format to allow a vertical table syntax. MediaWiki allows tables to be written vertically, and it works really well. But Ducktype is the only one to my knowledge not to introduce any new syntactical constructs at all. I have a hard time keeping all those dots and dashes in my head. Using a single, consistent syntax makes it easier to remember, and it reduces the number of special rules you need. 6. Ducktype supports extensions I’ve mentioned extensions a couple of times already in this post, but I haven’t elaborated on them. In addition to being able to use Mallard extensions (which you can do without any extra syntax), I wanted  a way to add and experiment with syntax without shoving everything in the core. And, importantly, I wanted pages to have to declare what extensions they use, so we don’t have a guess-and-check mess of incompatible flavors. I live in a world where files are frequently cherry-picked and pushed through different tool chains. Being explicit is important to me. So in Ducktype, you declare your extensions at the top, just like you do with the version of Ducktype you’re using: @ducktype/1.0 if/experimental = Page with Experimental Extension What kinds of things can an extension do? Well, according to the spec, extensions can do literally anything. Realistically, what extensions can do depends on the extension points in the implementation. The reference implementation has two rough classes of extensions. There are the methods in the ParserExtension class, which let you plug in at the line level to affect how things are parsed. And there’s the NodeFactory, which lets you affect how parsed things are interpreted. All of the ParserExtension methods are implemented and pretty well commented in the _test extension. One interesting use of this is the experimental csv extension. Remember how tables are always written vertically with a list-like syntax. That’s very powerful, but sometimes you really do just want something super simple. The csv extension lets you do simple comma-separated tables, like this: @ducktype/1.0 csv/experimental = CSV Table Example [csv:table] one,two,three four,five,six seven,eight,nine That’s in about 40 lines of extension code. Another example of line parser extensions is a special syntax for conditionals, which I blogged about before. The NodeFactory extension point, on the other hand, lets you control what the various bits of special syntax do. If you want to want to use Ducktype to write DocBook, you would use DocBook element names in explicit block declarations and inline notation, but what about the elements that get implicitly created for paragraphs, headers, list items, etc? That’s what NodeFactory does. Et cetera That’s six things to love about Ducktype. I’ve been really happy with Mallard’s unique approach to structuring docs. I hope Ducktype can make the power of Mallard available without all the extra typing that comes along with XML. If you want to learn more, get in touch.       [Less]
Posted almost 5 years ago
Past Monday we Carlos J. Vives and me gave a talk about Creative Commons and open content in a local education center: A las 13:00 de hoy charleta en el @iesalbaida acerca de licencias libres, @creativecommons , otra actividad más del ... [More] @ccalmfestival 2019 pic.twitter.com/30rDjpvxIX— Ismael Olea (@olea) April 30, 2019 The talk is part of the ccALM Almería Creative Commons Festival. The only goal of this entry is to collect some links to registration services for IP useful for any digital creator in Internet, particularly for open culture works. As far I remember: Free Copyright Registration Online: https://www.freecopyrightregistration.com/ Safe Creative: https://www.safecreative.org/ The Internet Archive Wayback Machine: https://web.archive.org/ Archive Today: http://archive.today/ CreativeCommons wiki is supposed to host a list of registries but for some reason it’s empty: https://wiki.creativecommons.org/wiki/Content_Registries In the past there were this Digital Media Rights service, but seems broken now: http://dmrights.com/ Limited to Spain there is two public managed services: Registro de propiedad intelectual, managed by the culture responsabilities ministry (it usually changes with government changes) And the Depósito legal, managed by the Biblioteca Nacional de España as far as I know. Some of this services are thought to be used as a legal resource in case of litigation. Others are just an historical record for websites. If you want to use any of it study carefully their features and advantages of your interest. [Less]
Posted almost 5 years ago
Today is a every special day for me. In my very first try, I cracked the Google Summer of Code. I am very delighted to have been given an oppurtunity to work for GNOME Foundation. My task is to rebuild the GTK website. For those interested in ... [More] technicalities of the project, the current website is made in PHP which is a great web language, however not so useful for creating static websites. So my job is to build a new website from scratch which uses the concept of Content Management System. I will be using Jekyll for this purpose and the website would be deployed using Gitlab’s Continuous Integration. It’s going to be a great challenging summer but I am really happy to be handed over this oppurtunity. This summer is going to enhance my knowledge about the Web Designing and it’s future. A huge thanks to all those people from GNOME, who selected me for this job. Emmanuele Bassi will be my mentor for this summer and I am very happy as I will learn a lot of new things from this man. For future GSoCers, here is my proposal for the project : Rework the GTK website. If you need any kind of help or want me to handle your next project, you can reach me at my email or DM me on Instagram See you guys in the next post. [Less]
Posted almost 5 years ago
Recently I've been working in the EOS update. The change is really big because EOS has some modifications over the upstream code so there are a lot of commits applied after the last stable upstream commit. EOS 3.5 was based on gnome 3.26 and we are ... [More] updating to use the gnome 3.32 so all the downstream changes done since the last release should be rebased on top of the new code and during this process we refactor the code and commits using new tools and remove what's now in gnome upstream. I've been working in the Hack computer custom functionality that's on top of the EOS desktop and basically I've been rebasing the code in the shell to do the Flip to Hack, the Clubhouse, a side component and notification override to propse hack quests and the wobbly windows effects. I've been working mainly with gnome shell and updating javascript code to the new gjs version, but this was a big change and some rare bugs. Here comes the blindfolded debugging. I've a development system with a lot of changes and a functionality in the gnome shell that depends on different projects, applications and technologies. I don't know the code a lot, because I'm not the one that wrote all of this, I'm working here since February, so I know a bit how things works, but I'm not the one who knows everything, there are a lot of rare cases that I don't know about. I've found and fixes several bugs in different projects, that I don't know about a lot, during this process. How can I do that? If you are a developer that write code since a few years maybe you've experienced something similar, I'm calling this the blindfolded debugging technique: Start to change code without knowing exactly what you're doing, but with a small feeling that maybe that line is the problem. This technique is only for experienced programmers, as a kungfu master that put a blindfold in their eyes to fight against an opponent, the developer that will be brave enough to try this should have a lot of experience or he will fail. You're an experienced developer, but you don't know the software that you're debugging. It doesn't matter. The same way that in a kungfu fight you don't know your opponent, but almost every fighter has two arms and two legs, so more or less you'll know that he'll try to punch you or kick you, you've a clue. As a programmer, every software has an structure, functions, loops... No matters who wrote that or how old that code is, if you're an experienced programmer you'll feel the code and without knowing exactly why or how, you will be able to look at one line and says: Here you're little buggy friend. Maybe I'm only trying to justify my lucky with a story to feel better and say that I'm not a dumb changing random lines to try to find a bug, but I'm sure that other developers has this feeling too, that feeling that guides you to the exact problem but that you're not able to rationalize. I think that this is the experience, the expertise is your body and your inner brain doing the work meanwhile you don't need to think about it or know exactly what you are doing. [Less]
Posted almost 5 years ago
Recently I've been working in the EOS update. The change is really big because EOS has some modifications over the upstream code so there are a lot of commits applied after the last stable upstream commit. EOS 3.5 was based on gnome 3.26 and we are ... [More] updating to use the gnome 3.32 so all the downstream changes done since the last release should be rebased on top of the new code and during this process we refactor the code and commits using new tools and remove what's now in gnome upstream. I've been working in the Hack computer custom functionality that's on top of the EOS desktop and basically I've been rebasing the code in the shell to do the Flip to Hack, the Clubhouse, a side component and notification override to propse hack quests and the wobbly windows effects. I've been working mainly with gnome shell and updating javascript code to the new gjs version, but this was a big change and some rare bugs. Here comes the blindfolded debugging. I've a development system with a lot of changes and a functionality in the gnome shell that depends on different projects, applications and technologies. I don't know the code a lot, because I'm not the one that wrote all of this, I'm working here since February, so I know a bit how things works, but I'm not the one who knows everything, there are a lot of rare cases that I don't know about. I've found and fixes several bugs in different projects, that I don't know about a lot, during this process. How can I do that? If you are a developer that write code since a few years maybe you've experienced something similar, I'm calling this the blindfolded debugging technique: Start to change code without knowing exactly what you're doing, but with a small feeling that maybe that line is the problem. This technique is only for experienced programmers, as a kungfu master that put a blindfold in their eyes to fight against an opponent, the developer that will be brave enough to try this should have a lot of experience or he will fail. You're an experienced developer, but you don't know the software that you're debugging. It doesn't matter. The same way that in a kungfu fight you don't know your opponent, but almost every fighter has two arms and two legs, so more or less you'll know that he'll try to punch you or kick you, you've a clue. As a programmer, every software has an structure, functions, loops... No matters who wrote that or how old that code is, if you're an experienced programmer you'll feel the code and without knowing exactly why or how, you will be able to look at one line and says: Here you're little buggy friend. Maybe I'm only trying to justify my lucky with a story to feel better and say that I'm not a dumb changing random lines to try to find a bug, but I'm sure that other developers has this feeling too, that feeling that guides you to the exact problem but that you're not able to rationalize. I think that this is the experience, the expertise is your body and your inner brain doing the work meanwhile you don't need to think about it or know exactly what you are doing. [Less]
Posted almost 5 years ago
Evaluating Survival Models The most frequently used evaluation metric of survival models is the concordance index (c index, c statistic). It is a measure of rank correlation between predicted risk scores $\hat{f}$ and observed time points $y$ that ... [More] is closely related to Kendall’s τ. It is defined as the ratio of correctly ordered (concordant) pairs to comparable pairs. Two samples $i$ and $j$ are comparable if the sample with lower observed time $y$ experienced an event, i.e., if $y_j > y_i$ and $\delta_i = 1$, where $\delta_i$ is a binary event indicator. A comparable pair $(i, j)$ is concordant if the estimated risk $\hat{f}$ by a survival model is higher for subjects with lower survival time, i.e., $\hat{f}_i >\hat{f}_j \land y_j > y_i$, otherwise the pair is discordant. Harrell's estimator of the c index is implemented in concordance_index_censored. While Harrell's concordance index is easy to interpret and compute, it has some shortcomings: it has been shown that it is too optimistic with increasing amount of censoring [1], it is not a useful measure of performance if a specific time range is of primary interest (e.g. predicting death within 2 years). Since version 0.8, scikit-survival supports an alternative estimator of the concordance index from right-censored survival data, implemented in concordance_index_ipcw, that addresses the first issue. The second point can be addressed by extending the well known receiver operating characteristic curve (ROC curve) to possibly censored survival times. Given a time point $t$, we can estimate how well a predictive model can distinguishing subjects who will experience an event by time $t$ (sensitivity) from those who will not (specificity). The function cumulative_dynamic_auc implements an estimator of the cumulative/dynamic area under the ROC for a given list of time points. The first part of this post will illustrate the first issue with simulated survival data, while the second part will focus on the time-dependent area under the ROC applied to data from a real study. To see the full source code for producing the figures in this post, please see this notebook. Bias of Harrell's Concordance Index Harrell's concordance index is known to be biased upwards if the amount of censoring in the test data is high [1]. Uno et al. proposed an alternative estimator of the concordance index that behaves better in such situations. In this section, we are going to apply concordance_index_censored and concordance_index_ipcw to synthetic survival data and compare their results. Simulation Study We are generating a synthetic biomarker by sampling from a standard normal distribution. For a given hazard ratio, we compute the associated (actual) survival time by drawing from an exponential distribution. The censoring times were generated from an uniform independent distribution $\textrm{Uniform}(0,\gamma)$, where we choose $\gamma$ to produce different amounts of censoring. Since Uno's estimator is based on inverse probability of censoring weighting, we need to estimate the probability of being censored at a given time point. This probability needs to be non-zero for all observed time points. Therefore, we restrict the test data to all samples with observed time lower than the maximum event time $\tau$. Usually, one would use the tau argument of concordance_index_ipcw for this, but we apply the selection before to pass identical inputs to concordance_index_censored and concordance_index_ipcw. The estimates of the concordance index are therefore restricted to the interval $[0, \tau]$. Let us assume a moderate hazard ratio of 2 and generate a small synthetic dataset of 100 samples from which we estimate the concordance index. We repeat this experiment 200 times and plot mean and standard deviation of the difference between the actual (in the absence of censoring) and estimated concordance index. Since the hazard ratio remains constant and only the amount of censoring changes, we would want an estimator for which the difference between the actual and estimated c to remain approximately constant across simulations. We can observe that estimates are on average below the actual value, except for the highest amount of censoring, where Harrell's c begins overestimating the performance (on average). With such a small dataset, the variance of differences is quite big, so let us increase the amount of data to 1000 and repeat the simulation. Now we can observe that Harrell's c begins to overestimate performance starting with approximately 49% censoring while Uno's c is still underestimating the performance, but is on average very close to the actual performance for large amounts of censoring. For the final experiment, we double the size of the dataset to 2000 and repeat the analysis. The trend we observed in the previous simulation is now even more pronounced. Harrell's c is becoming more and more overconfident in the performance of the synthetic marker with increasing amount of censoring, while Uno's c remains stable. In summary, while the difference between concordance_index_ipcw and concordance_index_censored is negligible for small amounts of censoring, when analyzing survival data with moderate to high amounts of censoring, you might want to consider estimating the performance using concordance_index_ipcw instead of concordance_index_censored. Time-dependent Area under the ROC The area under the receiver operating characteristics curve (ROC curve) is a popular performance measure for binary classification task. In the medical domain, it is often used to determine how well estimated risk scores can separate diseased patients (cases) from healthy patients (controls). Given a predicted risk score $\hat{f}$, the ROC curve compares the false positive rate (1 - specificity) against the true positive rate (sensitivity) for each possible value of $\hat{f}$. When extending the ROC curve to continuous outcomes, in particular survival time, a patient’s disease status is typically not fixed and changes over time: at enrollment a subject is usually healthy, but may be diseased at some later time point. Consequently, sensitivity and specificity become time-dependent measures. Here, we consider cumulative cases and dynamic controls at a given time point $t$, which gives rise to the time-dependent cumulative/dynamic ROC at time $t$. Cumulative cases are all individuals that experienced an event prior to or at time $t$ ($t_i \leq t$), whereas dynamic controls are those with $t_i>t$. By computing the area under the cumulative/dynamic ROC at time $t$, we can determine how well a model can distinguish subjects who fail by a given time ($t_i \leq t$) from subjects who fail after this time ($t_i>t$). Hence, it is most relevant if one wants to predict the occurrence of an event in a period up to time $t$ rather than at a specific time point $t$. The cumulative_dynamic_auc function implements an estimator of the cumulative/dynamic area under the ROC at a given list of time points. To illustrate its use, we are going to use data from a study that investigated to which extent the serum immunoglobulin free light chain (FLC) assay can be used predict overall survival. The dataset has 7874 subjects and 9 features; the endpoint is death, which occurred for 2169 subjects (27.5%). First, we are loading the data and split it into train and test set to evaluate how well markers generalize. x, y = load_flchain()   (x_train, x_test,  y_train, y_test) = train_test_split(x, y, test_size=0.2, random_state=0) Serum creatinine measurements are missing for some patients, therefore we are just going to impute these values with the mean using scikit-learn's SimpleImputer. num_columns = ['age', 'creatinine', 'kappa', 'lambda']   imputer = SimpleImputer().fit(x_train.loc[:, num_columns]) x_train = imputer.transform(x_train.loc[:, num_columns]) x_test = imputer.transform(x_test.loc[:, num_columns]) Similar to Uno's estimator of the concordance index described above, we need to be a little bit careful when selecting the test data and time points we want to evaluate the ROC at, due to the estimator's dependence on inverse probability of censoring weighting. First, we are going to check whether the observed time of the test data lies within the observed time range of the training data. y_events = y_train[y_train['death']] train_min, train_max = y_events["futime"].min(), y_events["futime"].max()   y_events = y_test[y_test['death']] test_min, test_max = y_events["futime"].min(), y_events["futime"].max()   assert train_min <= test_min < test_max < train_max, \     "time range or test data is not within time range of training data." When choosing the time points to evaluate the ROC at, it is important to remember to choose the last time point such that the probability of being censored after the last time point is non-zero. In the simulation study above, we set the upper bound to the maximum event time, here we use a more conservative approach by setting the upper bound to the 80% percentile of observed time points, because the censoring rate is quite large at 72.5%. Note that this approach would be appropriate for choosing tau of concordance_index_ipcw too. times = np.percentile(y["futime"], np.linspace(5, 81, 15)) print(times) [ 470.3        1259.         1998.         2464.82428571 2979.  3401.         3787.99857143 4051.         4249.         4410.17285714  4543.         4631.         4695.         4781.         4844.        ] We begin by considering individual real-valued features as risk scores without actually fitting a survival model. Hence, we obtain an estimate of how well age, creatinine, kappa FLC, and lambda FLC are able to distinguish cases from controls at each time point. The plot shows the estimated area under the time-dependent ROC at each time point and the average across all time points as dashed line. We can see that age is overall the most discriminative feature, followed by $\kappa$ and $\lambda$ FLC. That fact that age is the strongest predictor of overall survival in the general population is hardly surprising (we have to die at some point after all). More differences become evident when considering time: the discriminative power of FLC decreases at later time points, while that of age increases. The observation for age again follows common sense. In contrast, FLC seems to be a good predictor of death in the near future, but not so much if it occurs decades later. Next, we will fit an actual survial model to predict the risk of death from the Veterans' Administration Lung Cancer Trial. After fitting a Cox proportional hazards model, we want to assess how well the model can distinguish survivors from deceased in weekly intervals, up to 6 months after enrollment. The plot shows that the model is doing quite well on average with an AUC of ~0.82 (dashed line). However, there is a clear difference in performance between the first and second half of the time range. Performance increases up to about 100 days from enrollment, but quickly drops thereafter. Thus, we can conclude that the model is less effective in predicting death past 100 days. Conclusion I hope this post helped you to understand some of the pitfalls when estimating the performance of markers and models from right-censored survival data. We illustrated that Harrell's estimator of the concordance index is biased when the amount of censoring is high, and that Uno's estimator is more appropriate in this situation. Finally, we demonstrated that the time-dependent area under the ROC is a very useful tool when we want to predict the occurrence of an event in a period up to time $t$ rather than at a specific time point $t$. sebp Sat, 05/04/2019 - 13:12 Comments [Less]
Posted almost 5 years ago
Last month I was in Berlin for the “Design Tools Hackfest 2019”. This whole thing started during last year’s GUADEC in Almeria, when I started playing around with cairo and librsvg. Because of that I got pulled into some discussions around improving ... [More] tooling for GNOME designers, especially around icons. During this hackfest we worked on three main issues: – How to extract multiple icons contained in a single SVG file as separate SVGs (e.g. the stencil file in adwaita icon theme) – How to generate Nightly hicolor app icons automatically – How to export optimized icons directly from Icon Preview The third point comes more or less for free once the first is done, so we focused on the former two. Gnome Stencils Splitting the stencils into different files is currently done with a script which uses Inkscape. This is very very slow because it has to open Inkscape for every single icon. As far I could find there are no libraries which would allow us to manipulate SVG files in the way we need. Therefore I’m using cario and librsvg to generate the files. This approach may sound crazy, but it works quite well. The basic idea is to render an SVG file into a cairo surface with librsvg and then export the surface via cairo as a new SVG which contains only the part we’re interested in. Nightly Icons, generated with svago-export This way we can even use cario masks to automatically render nightly icons, and I started integrating this into Zander’s Icon Preview.   Sadly I’m currently quite busy with university, so I didn’t get around to finish and clean up svago-export so far, but if you want to have a look at the experiments I did feel free to do so. Luckily, the semester will soon be over and I will have more free time \o/ Special thanks to Tobias for hosting the event and thanks to the GNOME Foundation for sponsoring my travel. [Less]
Posted almost 5 years ago
I have recently created a C API for a library dealing with Boot Loader Spec files as well as the GRUB environment file. In the process I have learnt a few things that, coming from a C background, were not obvious to me at all. Box to control unsafe ... [More] ownership Say we have this simple Rust API: pub struct A { counter: u8 } impl A { pub fn new(count: u8) -> A { A { counter: count } } } Let’s start with the new method wrapper: #[no_mangle] pub extern "C" fn a_new(count: u8) -> *mut A { let boxed_a = Box::new(A {counter: count}); Box::into_raw(boxed_a) } A Box is basically a smart pointer, it allows us to control the lifetime of the data outside of the boundaries of Rust’s borrow checker. Box::into_raw returns a pointer to the allocated A instance. Let’s see how to access that data again: #[no_mangle] pub extern "C" fn a_get_counter(a: *mut A) -> u8 { let a = unsafe { Box::from_raw(a) }; let count = a.counter; Box::into_raw(a); count } Box::from_raw is an unsafe method that turns a pointer into an owned Box, this allows us to access the pointer data safely from Rust. Note that Box is automatically dereferenced. Now we need to give the C user a deallocator for instances of A, this is relatively straightforward, we wrap the object around a Box and since we don’t call into_raw again, as soon as the Box is out of scope the inner contents are dropped too: #[no_mangle] pub extern "C" fn a_drop(a: *mut A) { let a = unsafe { Box::from_raw(a) }; } Strings In Rust there are two standard ways to interact with strings, the String type, a dynamic utf-8 string that can be modified and resized, and &str, which basically is a bare pointer to an existing String. It took me a while to realize that internally a String is not null terminated and can contain many null characters. This means that the internal represenation of String is not compatible with C strings. To address this, Rust provides another two types and referenced counterparts: OsString and &OsStr: a native string tied to the runtime platform encoding and sizing CString and &CStr: a null terminated string My main grudge with this model is that it creates friction with the C boundary for a couple of reasons: Internal Rust APIs often expect String or &str, meaning that at the C API boundary you need to allocate a CString if you use String as the internal representation, or the other way around if you use CString as the internal representation You can’t “transparently” pass ownership of a CString to C without a exposing a deallocator specific to CString, more on this on the next section. This means that compared to a C implementation of the API your code will inevitably use more allocations which might or might not be critical depending on the use case, but this is something that struck me as a drawback for Rustification. Allocator mismatch Something else I stumbled upon was that Rust does not use malloc/free, and that mismatch has annoying side effects when you are trying to rustify an API. Say you have this C code: char* get_name() { const char* STATIC_NAME = "John"; char* name = (char*)malloc(sizeof(STATIC_NAME)); memcpy(name, STATIC_NAME, sizeof(STATIC_NAME)); } int main () { char * name = get_name(); printf("%s\n", name); free(name); return 0; } Now if you want to Rustify that C function, the naive way (taking into account the String vs. CString stuff I mentioned before) would be to do this: #[no_mangle] pub extern "C" fn get_name() -> *mut std::os::raw::c_char { const STATIC_NAME: &str = "John"; let name = std::ffi::CString::new(STATIC_NAME) .expect("Multiple null characters in string"); name.into_raw() } But this is not exactly the same as before, note that in the C example we call free() in order to drop the memory. In this case we would have to create a new method that calls CString::from_raw() but that won’t be compatible with the original C API. This is the best I was able to came up with: /* You can use the libc crate as well */ extern { fn malloc(size: usize) -> *mut u8; fn memcpy(dest: *mut u8, src: *const u8, size: usize) -> *mut u8; } #[no_mangle] pub extern "C" fn get_name() -> *mut u8 { const STATIC_NAME: &str = "John"; let name = std::ffi::CString::new(STATIC_NAME) .expect("Multiple null characters in string"); let length = name.as_bytes_with_nul().len(); let cname = unsafe { malloc(length) }; unsafe { memcpy(cname, name.as_bytes_with_nul().as_ptr(), length) }; cname } Note that the const &str is just an example, usually the data comes from a String in your Rust API. The problem here is that we allocated an extra CString to then copy its contents using malloc/memcpy and then drop it immediately. However, later, while working on creating UEFI binaries from Rust, I learned that Rust allows you to override its own allocator and use a custom one or the native system one. This would be another way to achieve the same and save the malloc/memcpy step, but don’t trust me 100% here as I am not sure whether this is entirely safe (if you know, let me know in the comments): use std::alloc::System; #[global_allocator] static GLOBAL: System = System; #[no_mangle] pub extern "C" fn get_name() -> *mut u8 { const STATIC_NAME: &str = "John"; let name = std::ffi::CString::new(STATIC_NAME).expect("Multiple null characters in string"); name.into_raw() as *mut u8 } Traits as fat pointers Let’s say we have the following API with two types and a trait implemented by both: pub struct A {} pub struct B {} impl A { pub fn new () -> A { A{} } } impl B { pub fn new () -> B { B{} } } pub trait T { fn get_name(&self) -> std::ffi::CString; } impl T for A { fn get_name(&self) -> std::ffi::CString { std::ffi::CString::new("I am A").expect("CString error") } } impl T for B { fn get_name(&self) -> std::ffi::CString { std::ffi::CString::new("I am B").expect("CString error") } } Now the problem is, if we want a single wrapper for T::get_name() to avoid having to wrap each trait implementation family of functions, what do we do? I banged my head on this trying to Box a reference to a trait and other things until I read about this in more detail. Basically, the internal representation of a trait is a fat pointer (or rather, a struct of two pointers, one to the data and another to the trait vtable). So we can transmute a reference to a trait as a C struct of two pointers, the end result for type A would be like this (for B you just need another constructor and cast function): #[repr(C)] pub struct CTrait { data: *mut std::os::raw::c_void, vtable: *mut std::os::raw::c_void } #[no_mangle] pub extern "C" fn a_new() -> *mut A { Box::into_raw(Box::new(A::new())) } #[no_mangle] pub extern "C" fn a_drop(a: *mut A) { unsafe{ Box::from_raw(a) }; } #[no_mangle] pub extern "C" fn a_as_t (a: *mut A) -> CTrait { let mut boxed_a = unsafe { Box::from_raw(a) }; let ret: CTrait = { let t: &mut dyn T = &mut *boxed_a; unsafe { std::mem::transmute::<&mut dyn T,CTrait> (t) } }; Box::into_raw(boxed_a); ret } #[no_mangle] pub extern "C" fn t_get_name(t: CTrait) -> *mut u8 { let t = unsafe { std::mem::transmute::<CTrait, &mut dyn T> (t) }; t.get_name().into_raw() as *mut u8 } The C code to consume this API would look like this: typedef struct {} A; typedef struct { void* _d; void* _v; } CTrait; A* a_new(); void a_drop(A* a); CTrait a_as_t(A* a); char* t_get_name(CTrait); int main () { A* a = a_new(); CTrait t = a_as_t(a); char* name = t_get_name(t); printf("%s\n", name); free(name); a_drop(a); return 0; } Error reporting Another hurdle has been dealing with Result<> in general, however this is more of a shortcoming of C’s lack of standard error reporting mechanism. In general I tend to return NULL to C API calls that expect a pointer and let C handle it, but of course data is lost in the way as the C end has no way to know what exactly went wrong as there is no error type to query. I am tempted to mimick GLib’s error handling. I think that if I was trying to replace an existing C library with its own error reporting mapping things would become easier. Conclusions I am in love with Rust and its ability to impersonate C is very powerful, however it is note entirely 0 cost, for me, the mismatch between string formats is the biggest hurdle as it imposes extra allocations, something that could become really expensive when rustifying C code that passes strings back and forth from/to the API caller. The other things I mentioned were things that took me quite some time to realize and by writing it here I hope I help other people that are writing Rust code to expose it as a C API. Any feedback on my examples is welcome. [Less]