I Use This!
Very High Activity

News

Analyzed about 1 hour ago. based on code collected about 8 hours ago.
Posted about 3 years ago
Today is Data Privacy Day, which is a good reminder that data privacy is a thing, and you’re in charge of it. The simple truth: your personal data is very … Read more The post Four ways to protect your data privacy and still be online appeared first on The Firefox Frontier.
Posted about 3 years ago
With a new year comes change, and one change we’re glad to see in 2021 is new leadership at the Federal Communications Commission (FCC). On Thursday, Jan. 21, Jessica Rosenworcel, … Read more The post Jessica Rosenworcel’s appointment is good for the internet appeared first on The Firefox Frontier.
Posted over 3 years ago
As more of daily life takes place across internet connections, privacy and security issues become even more important. A VPN — Virtual Private Network — can help anyone create a … Read more The post Think you don’t need a VPN? Here are five times you just might. appeared first on The Firefox Frontier.
Posted over 3 years ago
This is part 3 of a deep-dive into the implementation details of Taskcluster’s backend data stores. If you missed the first two, see part 1 and part 2 for the background, as we’ll jump right in here! Big Data A few of the tables holding data for ... [More] Taskcluster contain a tens or hundreds of millions of lines. That’s not what the cool kids mean when they say “Big Data”, but it’s big enough that migrations take a long time. Most changes to Postgres tables take a full lock on that table, preventing other operations from occurring while the change takes place. The duration of the operation depends on lots of factors, not just of the data already in the table, but on the kind of other operations going on at the same time. The usual approach is to schedule a system downtime to perform time-consuming database migrations, and that’s just what we did in July. By running it a clone of the production database, we determined that we could perform the migration completely in six hours. It turned out to take a lot longer than that. Partly, this was because we missed some things when we shut the system down, and left some concurrent operations running on the database. But by the time we realized that things were moving too slowly, we were near the end of our migration window and had to roll back. The time-consuming migration was version 20 - migrate queue_tasks, and it had been estimated to take about 4.5 hours. When we rolled back, the DB was at version 19, but the code running the Taskcluster services corresponded to version 12. Happily, we had planned for this situation, and the redefined stored functions described in part 2 bridged the gap with no issues. Patch-Fix Our options were limited: scheduling another extended outage would have been difficult. We didn’t solve all of the mysteries of the poor performance, either, so we weren’t confident in our prediction of the time required. The path we chose was to perform an “online migration”. I wrote a custom migration script to accomplish this. Let’s look at how that worked. The goal of the migration was to rewrite the queue_task_entities table into a tasks table, with a few hundred million rows. The idea with the online migration was to create an empty tasks table (a very quick operation), then rewrite the stored functions to write to tasks, while reading from both tables. Then a background task can move rows from the queue_task_entitites table to the tasks table without blocking concurrent operations. Once the old table is empty, it can be removed and the stored functions rewritten to address only the tasks table. A few things made this easier than it might have been. Taskcluster’s tasks have a deadline after which they become immutable, typically within one week of the task’s creation. That means that the task mutation functions can change the task in-place in whichever table they find it in. The background task only moves tasks with deadlines in the past. This eliminates any concerns about data corruption if a row is migrated while it is being modified. A look at the script linked above shows that there were some complicating factors, too – notably, two more tables to manage – but those factors didn’t change the structure of the migration. With this in place, we ran the replacement migration script, creating the new tables and updating the stored functions. Then a one-off JS script drove migration of post-deadline tasks with a rough ETA calculation. We figured this script would run for about a week, but in fact it was done in just a few days. Finally, we cleaned up the temporary functions, leaving the DB in precisely the state that the original migration script would have generated. Supported Online Migrations After this experience, we knew we would run into future situations where a “regular” migration would be too slow. Apart from that, we want users to be able to deploy Taskcluster without scheduling downtimes: requiring downtimes will encourage users to stay at old versions, missing features and bugfixes and increasing our maintenance burden. We devised a system to support online migrations in any migration. Its structure is pretty simple: after each migration script is complete, the harness that handles migrations calls a _batch stored function repeatedly until it signals that it is complete. This process can be interrupted and restarted as necessary. The “cleanup” portion (dropping unnecessary tables or columns and updating stored functions) must be performed in a subsequent DB version. The harness is careful to call the previous version’s online-migration function before it starts a version’s upgrade, to ensure it is complete. As with the old “quick” migrations, all of this is also supported in reverse to perform a downgrade. The _batch functions are passed a state parameter that they can use as a bookmark. For example, a migration of the tasks might store the last taskId that it migrated in its state. Then each batch can begin with select .. where task_id > last_task_id, allowing Postgres to use the index to quickly find the next task to be migrated. When the _batch function indicates that it processed zero rows, the handler calls an _is_completed function. If this function returns false, then the whole process starts over with an empty state. This is useful for tables where more rows that were skipped during the migration, such as tasks with deadlines in the future. Testing An experienced engineer is, at this point, boggling at the number of ways this could go wrong! There are lots of points at which a migration might fail or be interrupted, and the operators might then begin a downgrade. Perhaps that downgrade is then interrupted, and the migration re-started! A stressful moment like this is the last time anyone wants surprises, but these are precisely the circumstances that are easily forgotten in testing. To address this, and to make such testing easier, we developed a test framework that defines a suite of tests for all manner of circumstances. In each case, it uses callbacks to verify proper functionality at every step of the way. It tests both the “happy path” of a successful migration and the “unhappy paths” involving failed migrations and downgrades. In Practice The impetus to actually implement support for online migrations came from some work that Alex Lopez has been doing to change the representation of worker pools in the queue. This requires rewriting the tasks table to transform the provisioner_id and worker_type columns into a single, slash-separated task_queue_id column. The pull request is still in progress as I write this, but already serves as a great practical example of an online migration (and online dowgrade, and tests). Summary As we’ve seen in this three-part series, Taskcluster’s data backend has undergone a radical transformation this year, from a relatively simple NoSQL service to a full Postgres database with sophisticated support for ongoing changes to the structure of that DB. In some respects, Taskcluster is no different from countless other web services abstracting over a data-storage backend. Indeed, Django provides robust support for database migrations, as do many other application frameworks. One factor that sets Taskcluster apart is that it is a “shipped” product, with semantically-versioned releases which users can deploy on their own schedule. Unlike for a typical web application, we – the software engineers – are not “around” for the deployment process, aside from the Mozilla deployments. So, we must make sure that the migrations are well-tested and will work properly in a variety of circumstances. We did all of this with minimal downtime and no data loss or corruption. This involved thousands of lines of new code written, tested, and reviewed; a new language (SQL) for most of us; and lots of close work with the Cloud Operations team to perform dry runs, evaluate performance, and debug issues. It couldn’t have happened without the hard work and close collaboration of the whole Taskcluster team. Thanks to the team, and thanks to you for reading this short series! [Less]
Posted over 3 years ago by TWiR Contributors
Hello and welcome to another issue of This Week in Rust! Rust is a systems language pursuing the trifecta: safety, concurrency, and speed. This is a weekly summary of its progress and community. Want something mentioned? Tweet us at @ThisWeekInRust ... [More] or send us a pull request. Want to get involved? We love contributions. This Week in Rust is openly developed on GitHub. If you find any errors in this week's issue, please submit a PR. Updates from Rust Community No project updates this week. Official Announcing Rustup 1.23.0 Newsletters This Month in Rust Dimforge #3 Tooling IntelliJ Rust Changelog #136 Rust Analyzer Changelog #53 Knurling-rs Changelog #8 Observations/Thoughts Rust Continuous Delivery Why doesn't Rust's BTreeMap have a with_capacity() method? Why using WebAssembly and Rust together improves Node.js performance lib-ruby-parser Understanding Partial Moves in Rust Error Handling is Hard Scalable Benchmarking with Rust Streams I rewrote 10k lines of JS into Rust over the last month. Here's a write up about it Rust Walkthroughs References in Rust OS in Rust: Building kernel for custom target: Part-4 Writing Rust the Elixer way Risp (in (Rust) (Lisp)) Props and Nested Components with Yes Using Selenium with Rust Rocket Tutorial 04: Data Persistency and Rocket (with MongoDB) The Little Book of Rust Macros [series] Futures Explained in 200 Lines of Rust [video] Demo: 🦀️ Building a runtime reflection system for Rust [video] Sapling livestream 5 - Deleting Code Miscellaneous Why scientists are turning to Rust Pijul - The Mathematically Sound Version Control System Written In Rust Amazon: We're hiring software engineers who know programming language Rust Crate of the Week This week's crate is kira, a library for expressive game audio with many bells and whistles (pardon the pun). Thanks to Alexis Bourget for the suggestion! Submit your suggestions and votes for next week! Call for Participation Always wanted to contribute to open-source projects but didn't know where to start? Every week we highlight some tasks from the Rust community for you to pick and get started! Some of these tasks may also have mentors available, visit the task page for more information. If you are a Rust project owner and are looking for contributors, please submit tasks here. Updates from Rust Core 289 pull requests were merged in the last week upgrade the coverage map to Version 4 allow using generic trait methods in const fn allow Trait inheritance with cycles on associated types do not visit ForeignItemRef for HIR indexing and validation only create OnDiskCache in incremental compilation mode cache pretty-print/retokenize result to avoid compile time blowup stabilize const_int_pow compiler-builtins: fix division on SPARC libtest: print the total time taken to execute a test suite accept '!' in intra-doc links cleanup more of rustdoc bindgen: struct_layout: fix field offset computation for packed(n) structs miri: add simple data-race detector clippy: add suspicious_operation_groupings lint Rust Compiler Performance Triage 2020-11-24: 1 Regression, 2 Improvements, 2 mixed This week saw landing of #79237 which by itself provides no wins but opens the door to support for split debuginfo on macOS. This'll eventually show huge wins as we can likely avoid re-collecting debuginfo while retaining support for lldb and Rust backtraces. #79361 tracks the stabilization of the rustc flag, but the precise rollout to stable users is not yet 100% clear. Triage done by @jyn514 and @simulacrum. 4 regressions, 4 improvements, 2 mixed results. 5 of them in rollups. See the full report for more. Approved RFCs Changes to Rust follow the Rust RFC (request for comments) process. These are the RFCs that were approved for implementation this week: No RFCs were approved this week. Final Comment Period Every week the team announces the 'final comment period' for RFCs and key PRs which are reaching a decision. Express your opinions now. RFCs RFC: Plan to make core and std's panic identical Stabilize Cargo's new feature resolver Tracking Issues & PRs Use true previous lint level when detecting overriden forbids Apply unused_doc_comments lint to inner items remove this weird special case from promotion New RFCs Allow "artifact dependencies" on bin, cdylib, and staticlib crates Infallible promotion Upcoming Events Online December 2, Johannesburg, ZA - Monthly Joburg Rust Chat - Johannesburg Rust Meetup December 2, Indianapolis, IN, US - Indy.rs - with Social Distancing - Indy Rust December 8, Saarbücken, Saarland, DE - Meetup: 6u16 (virtual) - Rust Saar December 8, Stuttgart, DE - TALK: Running Multi-Module Heterogenous WASM Assemblies - Rust Community Stuttgart December 8, Seattle, WA, US - Monthly meetup - Seattle Rust Meetup December 10, Stuttgart, DE - Hack & Learn - Directions for 2021 - Rust Community Stuttgart December 10, San Diego, CA, US - San Diego Rust December 2020 Tele-Meetup - San Diego Rust North America December 9, Atlanta, GA, US - Grab a beer with fellow Rustaceans - Rust Atlanta December 10, Provo, UT, US - Mob Programming: Add --tree -d to lsd Asia Pacific December 7, Auckland, NZ - Rust AKL - Show and Tell + Introduction to RUst II If you are running a Rust event please add it to the calendar to get it mentioned here. Please remember to add a link to the event too. Email the Rust Community Team for access. Rust Jobs Several Engineering Positions - Dfinity - (San Francisco, Palo Alto, Zurich) Tweet us at @ThisWeekInRust to get your job offers listed here! Quote of the Week Let’s be clear: We understand that we are net beneficiaries of the exceptional work that others have done to make Rust thrive. AWS didn’t start Rust or make it the success that it is today, but we’d like to contribute to its future success. – Matt Asay on the AWS Open Source blog Thanks to Alice Ryhl for the suggestion. Please submit quotes and vote for next week! This Week in Rust is edited by: nellshamrell, llogiq, and cdmistman. Discuss on r/rust [Less]
Posted over 3 years ago by chutten
(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can ... [More] find an index of all TWiG posts online.) So you want to collect data in your project? Okay, it’s pretty straightforward. API: You need a way to combine the name of your data with the value that data has. Ideally you want it to be ergonomic to your developers to encourage them to instrument things without asking you for help, so it should include as many compile-time checks as you can and should be friendly to the IDEs and languages in use. Note the plurals. Persistent Storage: Keyed by the name of your data, you need some place to put the value. Ideally this will be common regardless of the instrumentation’s language or thread of execution. And since you really don’t want crashes or sudden application shutdowns or power outages to cause you to lose everything, you need to persist this storage. You can write it to a file on disk (if your platforms have such access), but be sure to write the serialization and deserialization functions with backwards-compatibility in mind because you’ll eventually need to change the format. Networking: Data stored with the product has its uses, but chances are you want this data to be combined with more data from other installations. You don’t need to write the network code yourself, there are libraries for HTTPS after all, but you’ll need to write a protocol on top of it to serialize your data for transmission. Scheduling: Sending data each time a new piece of instrumentation comes in might be acceptable for some products whose nature is only-online. Messaging apps and MMOs send so much low-latency data all the time that you might as well send your data as it comes in. But chances are you aren’t writing something like that, or you respect the bandwidth of your users too much to waste it, so you’ll only want to be sending data occasionally. Maybe daily. Maybe when the user isn’t in the middle of something. Maybe regularly. Maybe when the stored data reaches a certain size. This could get complicated, so spend some time here and don’t be afraid to change it as you find new corners. Errors: Things will go wrong. Instrumentation will, despite your ergonomic API, do something wrong and write the wrong value or call stop() before start(). Your networking code will encounter the weirdness of the full Internet. Your storage will get full. You need some way to communicate the health of your data collection system to yourself (the owner who needs to adjust scheduling and persistence and other stuff to decrease errors) and to others (devs who need to fix their instrumentation, analysts who should be told if there’s a problem with the data, QA so they can write tests for these corner cases). Ingestion: You’ll need something on the Internet listening for your data coming in. It’ll need to scale to the size of your product’s base and be resilient to Internet Attacks. It should speak the protocol you defined in #4, so you should probably have some sort of machine-readable definition of that protocol that product and ingestion can share. And you should spend some time thinking about what to do when an old product with an old version of the protocol wants to send data to your latest ingestion endpoint. Pipeline: Not all data will go to the same place. Some is from a different product. Some adheres to a different schema. Some is wrong but ingestion (because it needs to scale) couldn’t do the verification of it, so now you need to discard it more expensively. Thus you’ll be wanting some sort of routing infrastructure to take ingested data and do some processing on it. Warehousing: Once you receive all these raw payloads you’ll need a place to put them. You’ll want this place to be scalable, high-performance, and highly-available. Datasets: Performing analysis to gain insight from raw payloads is possible (even I have done it), but it is far more pleasant to consolidate like payloads with like, perhaps ordered or partitioned by time and by some dimensions within the payload that’ll make analyses quicker. Maybe you’ll want to split payloads into multiple rows of a tabular dataset, or combine multiple payloads into single rows. Talk to the people doing the analyses and ask them what would make their lives easier. Tooling: Democratizing data analysis is a good way to scale up the number of insights your organization can find at once, and it’s a good way to build data intuition. You might want to consider low-barrier data analysis tooling to encourage exploration. You might also want to consider some high-barrier data tooling for operational analyses and monitoring (good to know that the update is rolling out properly and isn’t bricking users’ devices). And some things for the middle ground of folks that know data and have questions, but don’t know SQL or Python or R. Tests: Don’t forget that every piece of this should be testable and tested in isolation and in integration. If you can manage it, a suite of end-to-end tests does wonders for making you feel good that the whole system will continue to work as you develop it. Documentation: You’ll need two types of documentation: User and Developer. The former is for the “user” of the piece (developers who wish to instrument back in #1, analysts who have questions that need answering in #10). The latter is for anyone going in trying to understand the “Why” and “How” of the pieces’ architecture and design choices. You get all that? Thread safety. File formats. Networking protocols. Scheduling using real wall-clock time. Schema validation. Open ports on the Internet. At scale. User-facing tools and documentation. All tested and verified. Look, I said it’d be straightforward, not that it’d be easy. I’m sure it’ll only take you a few years and a couple tries to get it right. Or, y’know, if you’re a Mozilla project you could just use Glean which already has all of these things… API: The Glean SDK API aims to be ergonomic and idiomatic in each supported language. Persistent Storage: The Glean SDK uses rkv as a persistent store for unsubmitted data, and a documented flat file format for submitted but not yet sent data. Networking: The Glean SDK provides an API for embedding applications to provide their own networking stack (useful when we’re embedded in a browser), and some default implementations if you don’t care to provide one. The payload protocol is built on Structured Ingestion and has a schema that generates and deploys new versions daily. Scheduling: Each Glean SDK payload has its own schedule to respect the character of the data it contains, from as frequently as the user foregrounds the app to, at most, once a day. Errors: The Glean SDK builds user metric and internal health metrics into the SDK itself. Ingestion: The edge servers and schema validation are all documented and tested. We autoscale quite well and have a process for handling incidents. Pipeline: We have a pubsub system on GCP that handles a variety of different types of data. Warehousing: I can’t remember if we still call this the Data Lake or not. Datasets: We have a few. They are monitored. Our workflow software for deriving the datasets is monitored as well. Tooling: Quite a few of them are linked from the Telemetry Index. Tests: Each piece is tested individually. Adjacent pieces sometimes have integration suites. And Raphael recently spun up end-to-end tests that we’re very appreciative of. And if you’re just a dev wondering if your new instrumentation is working? We have the debug ping viewer. Documentation: Each piece has developer documentation. Some pieces, like the SDK, also have user documentation. And the system at large? Even more documentation. Glean takes this incredibly complex problem, breaks it into pieces, solves each piece individually, then puts the solution together in a way that makes it greater than the sum of its parts. All you need is to follow the six steps to integrate the Glean SDK and notify the Ecosystem that your project exists, and then your responsibilities shrink to just instrumentation and analysis. If that isn’t frictionless data collection, I don’t know what is. :chutten (( If you’re not a Mozilla project, and thus don’t by default get to use the Data Platform (numbers 6-10) for your project, come find us on the #glean channel on Matrix and we’ll see what help we can get you. )) (( This post was syndicated from its original location. )) [Less]
Posted over 3 years ago by J.C. Jones
Firefox is the only major browser that still evaluates every website it connects to whether the certificate used has been reported as revoked. Firefox users are notified of all connections involving untrustworthy certificates, regardless the ... [More] popularity of the site. Inconveniently, checking certificate status sometimes slows down the connection to websites. Worse, the check reveals cleartext information about the website you’re visiting to network observers. We’re now testing a technology named CRLite which provides Firefox users with the confidence that the revocations in the Web PKI are enforced by the browser without this privacy compromise. This is a part of our goal to use encryption everywhere. (See also: Encrypted SNI and DNS-over-HTTPS) The first three posts in this series are about the newly-added CRLite technology and provide background that will be useful for following along with this post: Introducing CRLite: All of the Web PKI’s revocations, compressed, CRLite: Speeding Up Secure Browsing, and specifically useful is The End-to-End Design of CRLite.  This blog post discusses the back-end infrastructure that produces the data which Firefox uses for CRLite. To begin with, we’ll trace that data in reverse, starting from what Firefox needs to use for CRLite’s algorithms, back to the inputs derived from monitoring the whole Web PKI via Certificate Transparency. Tracing the Flow of Data Individual copies of Firefox maintain in their profiles a CRLite database which is periodically updated via Firefox’s Remote Settings. Those updates come in the form of CRLite filters and “stashes”. Filters and Stashes The general mechanism for how the filters work is explained in Figure 3 of The End-to-End Design of CRLite. Introduced in this post is the concept of CRLite stashes. These are lists of certificate issuers and the certificate serial numbers that those issuers revoked, which the CRLite infrastructure distributes to Firefox users in lieu of a whole new filter. If a certificate’s identity is contained within any of the issued stashes, then that certificate is invalid. Combining stashes with the CRLite filters produces an algorithm which, in simplified terms, proceeds like this: Figure 1: Simplified CRLite Decision Tree Every time the CRLite infrastructure updates its dataset, it produces both a new filter and a stash containing all of the new revocations (compared with the previous run). Firefox’s CRLite is up-to-date if it has a filter and all issued stashes for that filter. Enrolled, Valid and Revoked To produce the filters and stashes, CRLite needs as input: The list of trusted certificate authority issuers which are enrolled in CRLite, The list of all currently-valid certificates issued by each of those enrolled certificate authorities, e.g. information from Certificate Transparency, The list of all unexpired-but-revoked certificates issued by each of those enrolled certificate authorities, e.g. from Certificate Revocation Lists. These bits of data are the basis of the CRLite decision-making. The enrolled issuers are communicated to Firefox clients as updates within the existing Intermediate Preloading feature, while the certificate sets are compressed into the CRLite filters and stashes. Whether a certificate issuer is enrolled or not is directly related to obtaining the list of their revoked certificates. Collecting Revocations To obtain all the revoked certificates for a given issuer, the CRLite infrastructure reads the Certificate Revocation List (CRL) Distribution Point extension out of all that issuer’s unexpired certificates and filters the list down to those CRLs which are available over HTTP/HTTPS. Then, every URL in that list is downloaded and verified: Does it have a valid, trusted signature? Is it up-to-date? If any could not be downloaded, do we have a cached copy which is still both valid and up-to-date? For issuers which are considered enrolled, all of the entries in the CRLs are collected and saved as a complete list of all revoked certificates for that issuer. Lists of Unexpired Certificates The lists of currently-valid certificates and unexpired-but-revoked certificates have to be calculated, as the data sources that CRLite uses consist of: Certificate Transparency’s list of all certificates in the WebPKI, and All the published certificate revocations from the previous step. By policy now, Certificate Transparency (CT) Logs, in aggregate, are assumed to provide a complete list of all certificates in the public Web PKI. CRLite then filters the complete CT dataset down to certificates which haven’t yet reached their expiration date, but which have been issued by certificate authorities trusted by Firefox. Filtering CT data down to a list of unexpired certificates allows CRLite to derive the needed data sets using set math: The currently-valid certificates are those which are unexpired and not included in any revocation list, The unexpired-but-revoked certificates are those which are unexpired and are included in a revocation list. The CT data simply comes from a continual monitoring of the Certificate Transparency ecosystem. Every known CT log is monitored by Mozilla’s infrastructure, and every certificate added to the ecosystem is processed. The Kubernetes Pods All these functions are orchestrated as four Kubernetes pods with the descriptive names Fetch, Generate, Publish, and Sign-off. Fetch Fetch is a Kubernetes deployment, or always-on task, which constantly monitors Certificate Transparency data from all Certificate Transparency logs. Certificates that aren’t expired are inserted into a Redis database, configured so that certificates are expunged automatically when they reach their expiration time. This way, whenever the CRLite infrastructure requires a list of all unexpired certificates known to Certificate Transparency, it can iterate through all of the certificates in the Redis database. The actual data stored in Redis is described in our FAQ. Figure 2: The Fetch task reads from Certificate Transparency and stores data in a Redis database Generate The Generate pod is a periodic task, which currently runs four times a day. This task reads all known unexpired certificates from the Redis database, downloads and validates all CRLs from the issuing certificate authorities, and synthesizes a filter and a stash from those data sources. The resulting filters and stashes are uploaded into a Google Cloud Storage bucket, along with all the source input data, for both public audit and distribution. Figure 3: The Generate task reads from a Redis database and the Internet, and writes its results to Google Cloud Storage Publish The Publish task is also a periodic task, running often. It looks for new filters and stashes in the Google Cloud Storage bucket, and stages either a new filter or a stash to Firefox’s Remote Settings when the Generate task finishes producing one. Figure 4: The Publish job reads from Google Cloud Storage and writes to Remote Settings Sign-Off Finally, a separate Sign-Off task runs periodically, also often. When there is an updated filter or stash staged at Firefox’s Remote Settings, the sign-off task downloads the staged data and tests it, looking for coherency and to make sure that CRLite does not accidentally include revocations that could break Firefox. If all the tests pass, the Sign-Off task approves the new CRLite data for distribution, which triggers Megaphone to push the update to Firefox users that are online. Figure 5: The Sign-Off task interacts with both Remote Settings and the public Internet Using CRLite We recently announced in the mozilla.dev.platform mailing list that Firefox Nightly users on Desktop are relying on CRLite, after collecting encouraging performance measurements for most of 2020. We’re working on plans to begin tests for Firefox Beta users soon. If you want to try using CRLite, you can use Firefox Nightly, or for the more adventurous reader, interact with the CRLite data directly. Our final blog post in this series, Part 5, will reflect on the collaboration between Mozilla Security Engineering and the several research teams that designed and have analyzed CRLite to produce this impressive system. The post Design of the CRLite Infrastructure appeared first on Mozilla Security Blog. [Less]
Posted over 3 years ago
Firefox is the only major browser that still evaluates every website it connects to whether the certificate used has been reported as revoked. Firefox users are notified of all connections involving untrustworthy certificates, regardless the ... [More] popularity of the site. Inconveniently, checking certificate status sometimes slows down the connection to websites. Worse, the check reveals cleartext information about the website you’re visiting to network observers. We’re now testing a technology named CRLite which provides Firefox users with the confidence that the revocations in the Web PKI are enforced by the browser without this privacy compromise. This is a part of our goal to use encryption everywhere. (See also: Encrypted SNI and DNS-over-HTTPS) The first three posts in this series are about the newly-added CRLite technology and provide background that will be useful for following along with this post: Introducing CRLite: All of the Web PKI’s revocations, compressed, CRLite: Speeding Up Secure Browsing, and specifically useful is The End-to-End Design of CRLite. This blog post discusses the back-end infrastructure that produces the data which Firefox uses for CRLite. To begin with, we’ll trace that data in reverse, starting from what Firefox needs to use for CRLite’s algorithms, back to the inputs derived from monitoring the whole Web PKI via Certificate Transparency. Tracing the Flow of Data Individual copies of Firefox maintain in their profiles a CRLite database which is periodically updated via Firefox’s Remote Settings. Those updates come in the form of CRLite filters and “stashes”. Filters and Stashes The general mechanism for how the filters work is explained in Figure 3 of The End-to-End Design of CRLite. Introduced in this post is the concept of CRLite stashes. These are lists of certificate issuers and the certificate serial numbers that those issuers revoked, which the CRLite infrastructure distributes to Firefox users in lieu of a whole new filter. If a certificate’s identity is contained within any of the issued stashes, then that certificate is invalid. Combining stashes with the CRLite filters produces an algorithm which, in simplified terms, proceeds like this: Figure 1: Simplified CRLite Decision Tree Every time the CRLite infrastructure updates its dataset, it produces both a new filter and a stash containing all of the new revocations (compared with the previous run). Firefox’s CRLite is up-to-date if it has a filter and all issued stashes for that filter. Enrolled, Valid and Revoked To produce the filters and stashes, CRLite needs as input: The list of trusted certificate authority issuers which are enrolled in CRLite, The list of all currently-valid certificates issued by each of those enrolled certificate authorities, e.g. information from Certificate Transparency, The list of all unexpired-but-revoked certificates issued by each of those enrolled certificate authorities, e.g. from Certificate Revocation Lists. These bits of data are the basis of the CRLite decision-making. The enrolled issuers are communicated to Firefox clients as updates within the existing Intermediate Preloading feature, while the certificate sets are compressed into the CRLite filters and stashes. Whether a certificate issuer is enrolled or not is directly related to obtaining the list of their revoked certificates. Collecting Revocations To obtain all the revoked certificates for a given issuer, the CRLite infrastructure reads the Certificate Revocation List (CRL) Distribution Point extension out of all that issuer’s unexpired certificates and filters the list down to those CRLs which are available over HTTP/HTTPS. Then, every URL in that list is downloaded and verified: Does it have a valid, trusted signature? Is it up-to-date? If any could not be downloaded, do we have a cached copy which is still both valid and up-to-date? For issuers which are considered enrolled, all of the entries in the CRLs are collected and saved as a complete list of all revoked certificates for that issuer. Lists of Unexpired Certificates The lists of currently-valid certificates and unexpired-but-revoked certificates have to be calculated, as the data sources that CRLite uses consist of: Certificate Transparency’s list of all certificates in the WebPKI, and All the published certificate revocations from the previous step. By policy now, Certificate Transparency (CT) Logs, in aggregate, are assumed to provide a complete list of all certificates in the public Web PKI. CRLite then filters the complete CT dataset down to certificates which haven’t yet reached their expiration date, but which have been issued by certificate authorities trusted by Firefox. Filtering CT data down to a list of unexpired certificates allows CRLite to derive the needed data sets using set math: The currently-valid certificates are those which are unexpired and not included in any revocation list, The unexpired-but-revoked certificates are those which are unexpired and are included in a revocation list. The CT data simply comes from a continual monitoring of the Certificate Transparency ecosystem. Every known CT log is monitored by Mozilla’s infrastructure, and every certificate added to the ecosystem is processed. The Kubernetes Pods All these functions are orchestrated as four Kubernetes pods with the descriptive names Fetch, Generate, Publish, and Sign-off. Fetch Fetch is a Kubernetes deployment, or always-on task, which constantly monitors Certificate Transparency data from all Certificate Transparency logs. Certificates that aren’t expired are inserted into a Redis database, configured so that certificates are expunged automatically when they reach their expiration time. This way, whenever the CRLite infrastructure requires a list of all unexpired certificates known to Certificate Transparency, it can iterate through all of the certificates in the Redis database. The actual data stored in Redis is described in our FAQ. Figure 2: The Fetch task reads from Certificate Transparency and stores data in a Redis database Generate The Generate pod is a periodic task, which currently runs four times a day. This task reads all known unexpired certificates from the Redis database, downloads and validates all CRLs from the issuing certificate authorities, and synthesizes a filter and a stash from those data sources. The resulting filters and stashes are uploaded into a Google Cloud Storage bucket, along with all the source input data, for both public audit and distribution. Figure 3: The Generate task reads from a Redis database and the Internet, and writes its results to Google Cloud Storage Publish The Publish task is also a periodic task, running often. It looks for new filters and stashes in the Google Cloud Storage bucket, and stages either a new filter or a stash to Firefox’s Remote Settings when the Generate task finishes producing one. Figure 4: The Publish job reads from Google Cloud Storage and writes to Remote Settings Sign-Off Finally, a separate Sign-Off task runs periodically, also often. When there is an updated filter or stash staged at Firefox’s Remote Settings, the sign-off task downloads the staged data and tests it, looking for coherency and to make sure that CRLite does not accidentally include revocations that could break Firefox. If all the tests pass, the Sign-Off task approves the new CRLite data for distribution, which triggers Megaphone to push the update to Firefox users that are online. Figure 5: The Sign-Off task interacts with both Remote Settings and the public Internet Using CRLite We recently announced in the mozilla.dev.platform mailing list that Firefox Nightly users on Desktop are relying on CRLite, after collecting encouraging performance measurements for most of 2020. We’re working on plans to begin tests for Firefox Beta users soon. If you want to try using CRLite, you can use Firefox Nightly, or for the more adventurous reader, interact with the CRLite data directly. Our final blog post in this series, Part 5, will reflect on the collaboration between Mozilla Security Engineering and the several research teams that designed and have analyzed CRLite to produce this impressive system. [Less]
Posted over 3 years ago by Patrick Cloke
Earlier today I released a version 0.4 of celery-batches with support for Celery 5.0. As part of this release support for Python < 3.6 was dropped and support for Celery < 4.4 was dropped. celery-batches is a small library that allows you process multiple calls to a Celery …
Posted over 3 years ago by chuttenc
(“This Week in Glean” is a series of blog posts that the Glean Team at Mozilla is using to try to communicate better about our work. They could be release notes, documentation, hopes, dreams, or whatever: so long as it is inspired by Glean. You can ... [More] find an index of all TWiG posts online.) So you want to collect data in your project? Okay, it’s pretty straightforward. API: You need a way to combine the name of your data with the value that data has. Ideally you want it to be ergonomic to your developers to encourage them to instrument things without asking you for help, so it should include as many compile-time checks as you can and should be friendly to the IDEs and languages in use. Note the plurals. Persistent Storage: Keyed by the name of your data, you need some place to put the value. Ideally this will be common regardless of the instrumentation’s language or thread of execution. And since you really don’t want crashes or sudden application shutdowns or power outages to cause you to lose everything, you need to persist this storage. You can write it to a file on disk (if your platforms have such access), but be sure to write the serialization and deserialization functions with backwards-compatibility in mind because you’ll eventually need to change the format. Networking: Data stored with the product has its uses, but chances are you want this data to be combined with more data from other installations. You don’t need to write the network code yourself, there are libraries for HTTPS after all, but you’ll need to write a protocol on top of it to serialize your data for transmission. Scheduling: Sending data each time a new piece of instrumentation comes in might be acceptable for some products whose nature is only-online. Messaging apps and MMOs send so much low-latency data all the time that you might as well send your data as it comes in. But chances are you aren’t writing something like that, or you respect the bandwidth of your users too much to waste it, so you’ll only want to be sending data occasionally. Maybe daily. Maybe when the user isn’t in the middle of something. Maybe regularly. Maybe when the stored data reaches a certain size. This could get complicated, so spend some time here and don’t be afraid to change it as you find new corners. Errors: Things will go wrong. Instrumentation will, despite your ergonomic API, do something wrong and write the wrong value or call stop() before start(). Your networking code will encounter the weirdness of the full Internet. Your storage will get full. You need some way to communicate the health of your data collection system to yourself (the owner who needs to adjust scheduling and persistence and other stuff to decrease errors) and to others (devs who need to fix their instrumentation, analysts who should be told if there’s a problem with the data, QA so they can write tests for these corner cases). Ingestion: You’ll need something on the Internet listening for your data coming in. It’ll need to scale to the size of your product’s base and be resilient to Internet Attacks. It should speak the protocol you defined in #4, so you should probably have some sort of machine-readable definition of that protocol that product and ingestion can share. And you should spend some time thinking about what to do when an old product with an old version of the protocol wants to send data to your latest ingestion endpoint. Pipeline: Not all data will go to the same place. Some is from a different product. Some adheres to a different schema. Some is wrong but ingestion (because it needs to scale) couldn’t do the verification of it, so now you need to discard it more expensively. Thus you’ll be wanting some sort of routing infrastructure to take ingested data and do some processing on it. Warehousing: Once you receive all these raw payloads you’ll need a place to put them. You’ll want this place to be scalable, high-performance, and highly-available. Datasets: Performing analysis to gain insight from raw payloads is possible (even I have done it), but it is far more pleasant to consolidate like payloads with like, perhaps ordered or partitioned by time and by some dimensions within the payload that’ll make analyses quicker. Maybe you’ll want to split payloads into multiple rows of a tabular dataset, or combine multiple payloads into single rows. Talk to the people doing the analyses and ask them what would make their lives easier. Tooling: Democratizing data analysis is a good way to scale up the number of insights your organization can find at once, and it’s a good way to build data intuition. You might want to consider low-barrier data analysis tooling to encourage exploration. You might also want to consider some high-barrier data tooling for operational analyses and monitoring (good to know that the update is rolling out properly and isn’t bricking users’ devices). And some things for the middle ground of folks that know data and have questions, but don’t know SQL or Python or R. Tests: Don’t forget that every piece of this should be testable and tested in isolation and in integration. If you can manage it, a suite of end-to-end tests does wonders for making you feel good that the whole system will continue to work as you develop it. Documentation: You’ll need two types of documentation: User and Developer. The former is for the “user” of the piece (developers who wish to instrument back in #1, analysts who have questions that need answering in #10). The latter is for anyone going in trying to understand the “Why” and “How” of the pieces’ architecture and design choices. You get all that? Thread safety. File formats. Networking protocols. Scheduling using real wall-clock time. Schema validation. Open ports on the Internet. At scale. User-facing tools and documentation. All tested and verified. Look, I said it’d be straightforward, not that it’d be easy. I’m sure it’ll only take you a few years and a couple tries to get it right. Or, y’know, if you’re a Mozilla project you could just use Glean which already has all of these things… API: The Glean SDK API aims to be ergonomic and idiomatic in each supported language. Persistent Storage: The Glean SDK uses rkv as a persistent store for unsubmitted data, and a documented flat file format for submitted but not yet sent data. Networking: The Glean SDK provides an API for embedding applications to provide their own networking stack (useful when we’re embedded in a browser), and some default implementations if you don’t care to provide one. The payload protocol is built on Structured Ingestion and has a schema that generates and deploys new versions daily. Scheduling: Each Glean SDK payload has its own schedule to respect the character of the data it contains, from as frequently as the user foregrounds the app to, at most, once a day. Errors: The Glean SDK builds user metric and internal health metrics into the SDK itself. Ingestion: The edge servers and schema validation are all documented and tested. We autoscale quite well and have a process for handling incidents. Pipeline: We have a pubsub system on GCP that handles a variety of different types of data. Warehousing: I can’t remember if we still call this the Data Lake or not. Datasets: We have a few. They are monitored. Our workflow software for deriving the datasets is monitored as well. Tooling: Quite a few of them are linked from the Telemetry Index. Tests: Each piece is tested individually. Adjacent pieces sometimes have integration suites. And Raphael recently spun up end-to-end tests that we’re very appreciative of. And if you’re just a dev wondering if your new instrumentation is working? We have the debug ping viewer. Documentation: Each piece has developer documentation. Some pieces, like the SDK, also have user documentation. And the system at large? Even more documentation. Glean takes this incredibly complex problem, breaks it into pieces, solves each piece individually, then puts the solution together in a way that makes it greater than the sum of its parts. All you need is to follow the six steps to integrate the Glean SDK and notify the Ecosystem that your project exists, and then your responsibilities shrink to just instrumentation and analysis. If that isn’t frictionless data collection, I don’t know what is. :chutten (( If you’re not a Mozilla project, and thus don’t by default get to use the Data Platform (numbers 6-10) for your project, come find us on the #glean channel on Matrix and we’ll see what help we can get you. )) [Less]