I Use This!
Very High Activity

News

Analyzed 13 days ago. based on code collected 14 days ago.
Posted about 6 years ago
Overview Since Oracle Solaris 11.4, state of Zones on the system is kept in a shared database in /var/share/zones/, meaning a single database is accessed from all boot environments (BEs). However, up until Oracle Solaris 11.3, each BE kept its own ... [More] local copy in /etc/zones/index, and individual copies were never synced across BEs. This article provides some history, why we moved to the shared zones state database, and what it means for administrators when updating from 11.3 to 11.4. Keeping Zone State in 11.3 In Oracle Solaris 11.3, state of zones is associated separately with every global zone BE in a local text database /etc/zones/index. The zoneadm and zonecfg commands then operate on a specific copy based on which BE is booted. In the world of systems being migrated between hosts, having a local zone state database in every BE constitutes a problem if we, for example, update to a new BE and then migrate a zone before booting into the newly updated BE. When we boot the new BE eventually, the zone will end up in an unavailable state (the system recognizes the shared storage is already used and puts the zone into such state), suggesting that it should be possibly attached. However, as the zone was already migrated, an admin is expecting the zone in the configured state instead. The 11.3 implementation may also lead to a situation where all BEs on a system represent the same Solaris instance (see below for the definition of what is a Solaris instance), and yet every BE can be linked to a non-global Zone BE (ZBE) for a zone of the same name with the ZBE containing an unrelated Solaris instance. Such a situation happens on 11.3 if we reinstall a chosen non-global zone in each BE. Solaris Instance A Solaris instance represents a group of related IPS images. Such a group is created when a system is installed. One installs a system from the media or an install server, via "zoneadm install" or "zoneadm clone", or from a clone archive. Subsequent system updates add new IPS images to the same image group that represents the same Solaris instance. Uninstalling a Zone in 11.3 In 11.3, uninstalling a non-global zone (ie. the native solaris(5) branded zone) means to delete ZBEs linked to the presently booted BE, and updating the state of the zone in /etc/zones/index. Often it is only one ZBE to be deleted. ZBEs linked to other BEs are not affected. Destroying a BE only destroys a ZBE(s) linked to the BE. Presently the only supported way to completely uninstall a non-global zone from a system is to boot each BE and uninstall the zone from there. For Kernel Zones (KZs) installed on ZVOLs, each BE that has the zone in its index file is linked to the ZVOL via an .gzbe: attribute. Uninstalling a Kernel Zone on a ZVOL removes that BE specific attribute, and only if no other BE is linked to the ZVOL, the ZVOL is deleted during the KZ uninstall or BE destroy. In 11.3, the only supported way to completely uninstall a KZ from a system was to boot every BE and uninstall the KZ from there. What can happen when one reinstalls zones during the normal system life cycle is depicted on the following pictures. Each color represents a unique Solaris instance. The picture below shows a usual situation with solaris(5) branded zones. After the system is installed, two zones are installed there into BE-1. Following that, on every update, the zones are being updated as part of the normal pkg update process. The next picture, though, shows what happens if zones are being reinstalled during the normal life cycle of the system. In BE-3, Zone X was reinstalled, while Zone Y was reinstalled in BE-2, BE-3, and BE-4 (but not in BE-5). That leads to a situation where there are two different instances of Zone X on the system, and four different Zone Y instances. Whichever zone instance is used depends on what BE is the system booted into. Note that the system itself and any zone always represent different Solaris instances. Undesirable Impact of the 11.3 Behavior The described behavior could lead to undesired situations in 11.3: With multiple ZBEs present, if a non-global zone is reinstalled, we end up with ZBEs under the zone's rpool/VARSHARE dataset representing Solaris instances that are unrelated, and yet share the same zone name. That leads to possible problems with migration mentioned in the first section. If a Kernel Zone is used in multiple BEs and the KZ is uninstalled and then tried to get re-installed again, the installation fails with an error message that the ZVOL is in use in other BEs. The only supported way to uninstall a zone is to boot into every BE on a system and uninstall the zone from there. With multiple BEs, that is definitely a suboptimal solution. Sharing a Zone State across BEs in 11.4 As already stated in the Overview section, in Oracle Solaris 11.4, the system shares a zone state across all BEs. The shared zone state resolves the issues mentioned in the previous sections. In most situations, nothing changes for the users and administrators as there were no changes in the existing interfaces (some were extended though). The major implementation change was to move the local, line oriented textual database /etc/zones/index to a shared directory /var/share/zones/ and store it in a JSON format. However, as before, the location and format of the database is just an implementation detail and is not part of any supported interface. To be precise, we did EOF zoneadm -R mark as part of the Shared Zone State project. That interface was of very little use already and then only in some rare maintenance situations. Also let us be clear that all 11.3 systems use and will continue to use the local zone state index /etc/zones/index. We have no plans to update 11.3 to use the shared zone state database. Changes between 11.3 and 11.4 with regard to keeping the Zones State With the introduction of the shared zone state, you can run into situations that before were either just not possible or could have been only created via some unsupported behavior. Creating a new zone on top of an existing configuration When deleting a zone, the zone record is removed from the shared index database and the zone configuration is deleted from the present BE. Mounting all other non-active BEs and removing the configuration files from there would be quite time consuming so the files are left behind there. That means if one later boots into one of those previously non-active BEs and tries to create the zone there (which does not exist as we removed it from the shared database before), zonecfg may hit an already existing configuration file. We extended the zonecfg interface in a way that you have a choice to overwrite the configuration or reuse it: root# zonecfg -z zone1 create Zone zone1 does not exist but its configuration file does. To reuse it, use -r; create anyway to overwrite it (y/[n])? n root# zonecfg -z zone1 create -r Zone zone1 does not exist but its configuration file does; do you want to reuse it (y/[n])? y Existing Zone without a configuration If an admin creates a zone, the zone record is put into the shared index database. If the system is later booted into a BE that existed before the zone was created, that BE will not have the zone configuration file (unless left behind before, obviously) but the zone will be known as the zone state database is shared across all 11.4 BEs. In that case, the zone state in such a BE will be reported as incomplete (note that not even the zone brand is known as that is also part of a zone configuration). root# zoneadm list -cv ID NAME STATUS PATH BRAND IP 0 global running / solaris shared - zone1 incomplete - - - When listing the auxiliary state, you will see that the zone has no configuration: root# zoneadm list -sc NAME STATUS AUXILIARY STATE global running zone1 incomplete no-config If you want to remove that zone, the -F option is needed. As removing the zone will make it invisible from all BEs, possibly those with a usable zone configuration, we introduced a new option to make sure an administrator will not accidentally remove such zones. root# zonecfg -z newzone delete Use -F to delete an existing zone without a configuration file. root# zonecfg -z newzone delete -F root# zoneadm -z newzone list zoneadm: zone 'newzone': No such zone exists Updating to 11.4 while a shared zone state database already exists When the system is updated from 11.3 to 11.4 for the first time, the shared zone database is created from the local /etc/zones/index on the first boot into the 11.4 BE. A new svc:/system/zones-upgrade:default service instance takes care of that. If the system is then brought back to 11.3 and updated again to 11.4, the system (ie. the service instance mentioned above) will find an existing shared index when first booting to this new 11.4 BE and if the two indexes differ, there is a conflict that must be taken care of. In order not to add more complexity to the update process, the rule is that on every update from 11.3 to 11.4, any existing shared zone index database /var/share/zones/index.json is overwritten with data converted from the /etc/zone/index file from the 11.3 BE the system was updated from, if the data differs. If there were no changes to Zones in between the 11.4 updates, either in any older 11.4 BEs nor in the 11.3 BE we are again updating to 11.4 from, there are no changes in the shared zone index database, and the existing shared database needs no overwrite. However, if there are changes to the Zones, eg. a zone was created, installed, uninstalled, or detached, the old shared index database is saved on boot, the new index is installed, and the service instance svc:/system/zones-upgrade:default is put to a degraded state. As the output of svcs -xv now in 11.4 newly reports services in a degraded state as well, that serves as a hint to a system administrator to go and check the service log: root# beadm list BE Flags Mountpoint Space Policy Created -- ----- ---------- ----- ------ ------- s11_4_01 - - 2.93G static 2018-02-28 08:59 s11u3_sru24 - - 12.24M static 2017-10-06 13:37 s11u3_sru31 NR / 23.65G static 2018-04-24 02:45 root# zonecfg -z newzone create root# pkg update entire@latest root# reboot -f root# root# beadm list BE Name Flags Mountpoint Space Policy Created ----------- ----- ---------- ------ ------ ---------------- s11_4_01 - - 2.25G static 2018-02-28 08:59 s11u3_sru24 - - 12.24M static 2017-10-06 13:37 s11u3_sru31 - - 2.36G static 2018-04-24 02:45 s11_4_03 NR / 12.93G static 2018-04-24 04:24 root# svcs -xv svc:/system/zones-upgrade:default (Zone config upgrade after first boot) State: degraded since April 24, 2018 at 04:32:58 AM PDT Reason: Degraded by service method: "Unexpected situation during zone index conversion to JSON." See: http://support.oracle.com/msg/SMF-8000-VE See: /var/svc/log/system-zones-upgrade:default.log Impact: Some functionality provided by the service may be unavailable. root# tail /var/svc/log/system-zones-upgrade:default.log [ 2018 Apr 24 04:32:50 Enabled. ] [ 2018 Apr 24 04:32:56 Executing start method ("/lib/svc/method/svc-zones-upgrade"). ] Converting /etc/zones/index to /var/share/zones/index.json. Newly generated /var/share/zones/index.json differs from the previously existing one. Forcing the degraded state. Please compare current /var/share/zones/index.json with the original one saved as /var/share/zones/index.json.backup.2018-04-24--04-32-57, then clear the service. Moving /etc/zones/index to /etc/zones/index.old-format. Creating old format skeleton /etc/zones/index. [ 2018 Apr 24 04:32:58 Method "start" exited with status 103. ] [ 2018 Apr 24 04:32:58 "start" method requested degraded state: "Unexpected situation during zone index conversion to JSON" ] root# svcadm clear svc:/system/zones-upgrade:default root# svcs svc:/system/zones-upgrade:default STATE STIME FMRI online 4:45:58 svc:/system/zones-upgrade:default If you diff(1)'ed the backed up JSON index and the present one, you would see that the zone newzone was added. The new index could be also missing some zones that were created before. Based on the index diff output, he/she can create or remove Zones on the system as necessary, using standard zonecfg(8) command, possibly reusing existing configurations as shown above. Also note that the degraded state here did not mean any degradation of functionality, its sole purpose was to notify the admin about the situation. Do not use BEs as a Backup Technology for Zones The previous implementation in 11.3 allowed for using BEs with linked ZBEs as a backup solution for Zones. That means that if a zone was uninstalled in a current BE, one could usually boot to an older BE or, in case of non-global zones, try to attach and update another existing ZBE from the current BE using the attach -z -u/-U subcommand. With the current implementation, uninstalling a zone means to uninstall the zone from the system, and that is uninstalling all ZBEs (or a ZVOL in case of Kernel Zones) as well. If you uninstall a zone in 11.4, it is gone. If an admin used previous implementation also as a convenient backup solution, we recommend to use archiveadm instead, whose functionality also provides for backing up zones. Future Enhancements An obvious future enhacement would be shared zone configurations across BEs. However, it is not on our short term plan at this point and neither we can guarantee this functionality will ever be implemented. One thing is clear - it would be a more challenging task than the shared zone state. [Less]
Posted about 6 years ago
Been some time now since my last update on what is happening in Fedora Workstation and with current plans to release Fedora Workstation 28 in early May I thought this could be a good time to write something. As usual this is just a small subset of ... [More] what the team has been doing and I always end up feeling a bit bad for not talking about the avalanche of general fixes and improvements the team adds to each release. Thunderbolt Christian Kellner has done a tremendous job keeping everyone informed of his work making sure we have proper Thunderbolt support in Fedora Workstation 28. One important aspect for us of this improved Thunderbolt support is that a lot of docking stations coming out will be requiring it and thus without this work being done you would not be able to use a wide range of docking stations. For a lot of screenshots and more details about how the thunderbolt support is done I recommend reading this article in Christians Blog. 3rd party applications It has taken us quite some time to get there as getting this feature right both included a lot of internal discussion about policies around it and implementation detail. But starting from Fedora Workstation 28 you will be able to find more 3rd party software listed in GNOME Software if you enable it. The way it will work is that you as part of the initial setup will be asked if you want to have 3rd party software show up in GNOME Software. If you are upgrading you will be asked inside GNOME Software if you want to enable 3rd party software. You can also disable 3rd party software after enabling it from the GNOME Software settings as seen below: GNOME Software settings In Fedora Workstation 27 we did have PyCharm available, but we have now added the NVidia driver and Steam to the list for Fedora Workstation 28. We have also been working with Google to try to get Chrome included here and we are almost there as they merged for instance the needed Appstream metadata some time ago, but the last steps requires some tweaking of how Google generates their package repository (basically adding the appstream metadata to their yum repository) and we don’t have a clear timeline for when that will happen, but as soon as it does the Chrome will also appear in GNOME Software if you have 3rd party software enabled. As we speak all 3rd party packages are RPMs, but we expect that going forward we will be adding applications packaged as Flatpaks too. Finally if you want to propose 3rd party applications for inclusion you can find some instructions for how to do it here. Virtualbox guest Another major feature that got some attention that we worked on for this release was Hans de Goedes work to ensure Fedora Workstation could run as a virtualbox guest out of the box. We know there are many people who have their first experience with linux running it under Virtualbox on Windows or MacOSX and we wanted to make their first experience as good as possible. Hans worked with the virtualbox team to clean up their kernel drivers and agree on a stable ABI so that they could be merged into the kernel and maintained there from now on. Firmware updates The Spectre/Meltdown situation did hammer home to a lot of people the need to have firmware updates easily available and easy to update. We created the Linux Vendor Firmware service for Fedora Workstation users with that in mind and it was great to see the service paying off for many Linux users, not only on Fedora, but also on other distributions who started using the service we provided. I would like to call out to Dell who was a critical partner for the Linux Vendor Firmware effort from day 1 and thus their users got the most benefit from it when Spectre and Meltdown hit. Spectre and Meltdown also helped get a lot of other vendors off the fence or to accelerate their efforts to support LVFS and Richard Hughes and Peter Jones have been working closely with a lot of new vendors during this cycle to get support for their hardware and devices into LVFS. In fact Peter even flew down to the offices one of the biggest laptop vendors recently to help them resolve the last issues before their hardware will start showing up in the firmware service. Thanks to the work of Richard Hughes and Peter Jones you will both see a wider range of brands supported in the Linux Vendor Firmware Service in Fedora Workstation 28, but also a wider range of device classes. Server side GL Vendor Neutral Dispatch This is a bit of a technical detail, but Adam Jackson and Lyude Paul has been working hard this cycle on getting what we call Server side GLVND ready for Fedora Workstation 28. Currently we are looking at enabling it either as a zero-day update or short afterwards. so what is Server Side GLVND you say? Well it is basically the last missing piece we need to enable the use of the NVidia binary driver through XWayland. Currently the NVidia driver works with Wayland native OpenGL applications, but if you are trying to run an OpenGL application requiring X we need this to support it. And to be clear once we ship this in Fedora Workstation 28 it will also require a driver update from NVidia to use it, so us shipping it is just step 1 here. We do also expect there to be some need for further tuning once we got all the pieces released to get top notch performance. Of course over time we hope and expect all applications to become Wayland native, but this is a crucial transition technology for many of our users. Of course if you are using Intel or AMD graphics with the Mesa drivers things already work great and this change will not affect you in any way. Flatpak Flatpaks basically already work, but we have kept focus this time around on to fleshing out the story in terms of the so called Portals. Portals are essentially how applications are meant to be able to interact with things outside of the container on your desktop. Jan Grulich has put in a lot of great effort making sure we get portal support for Qt and KDE applications, most recently by adding support for the screen capture portal on top of Pipewire. You can ready more about that on Jan Grulichs blog. He is now focusing on getting the printing portal working with Qt. Wim Taymans has also kept going full steam ahead of PipeWire, which is critical for us to enable applications dealing with cameras and similar on your system to be containerized. More details on that in my previous blog entry talking specifically about Pipewire. It is also worth noting that we are working with Canonical engineers to ensure Portals also works with Snappy as we want to ensure that developers have a single set of APIs to target in order to allow their applications to be sandboxed on Linux. Alexander Larsson has already reviewed quite a bit of code from the Snappy developers to that effect. Performance work Our engineers have spent significant time looking at various performance and memory improvements since the last release. The main credit for the recently talked about ‘memory leak’ goes to Georges Basile Stavracas Neto from Endless, but many from our engineering team helped with diagnosing that and also fixed many other smaller issues along the way. More details about the ‘memory leak’ fix in Georges blog. We are not done here though and Alberto Ruiz is organizing a big performance focused hackfest in Cambridge, England in May. We hope to bring together many of our core engineers to work with other members of the community to look at possible improvements. The Raspberry Pi will be the main target, but of course most improvements we do to make GNOME Shell run better on a Raspberry Pi also means improvements for normal x86 systems too. Laptop Battery life In our efforts to make Linux even better on laptops Hans De Goede spent a lot of time figuring out things we could do to make Fedora Workstation 28 have better battery life. How valuable these changes are will of course depend on your exact hardware, but I expect more or less everyone to have a bit better battery life on Fedora Workstation 28 and for some it could be a lot better battery life. You can read a bit more about these changes in Hans de Goedes blog. [Less]
Posted about 6 years ago
For VC5, I renamed the kernel driver to “v3d” and submitted it to the kernel. Daniel Vetter came back right away with a bunch of useful feedback, and next week I’m resolving that feedback and continuing to work on the GMP support. On the vc4 front ... [More] , I did the investigation of the HDL to determine that the OLED matrix applies before the gamma tables, so we can expose it in the DRM for Android’s color correction. Stefan was also interested in reworking his fencing patches to use syncobjs, so hopefully we can merge those and get DRM HWC support in mainline soon. I also pushed Gustavo’s patch for using the new core DRM infrastructure for async cursor updates. This doesn’t simplify our code much yet, but Boris has a series he’s working on that gets rid of a lot of custom vc4 display code by switching more code over to the new async support. Unfortunately, the vc4 subsystem node removal patch from last week caused the DRM’s platform device to not be on the SOC’s bus. This caused bus address translations to be subtly wrong and broke caching (so eventually the GPU would hang). I’ve shelved the patches for now. I also rebased my user QPU submission code for the Raspberry Pi folks. They keep expressing interest in it, but we’ll see if it goes anywhere this time around. Unfortunately I don’t see any way to expose this for general distributions: vc4 isn’t capable enough for OpenCL or GL compute shaders, and custom user QPU submissions would break the security model (just like GL shaders would have without my shader validator, and I think validating user QPU submissions would be even harder than GL shaders). [Less]
Posted about 6 years ago
As part of preparing my last two talks at LCA on the kernel community, “Burning Down the Castle” and “Maintainers Don’t Scale”, I have looked into how the Kernel’s maintainer structure can be measured. One very interesting approach is looking at the ... [More] pull request flows, for example done in the LWN article “How 4.4’s patches got to the mainline”. Note that in the linux kernel process, pull requests are only used to submit development from entire subsystems, not individual contributions. What I’m trying to work out here isn’t so much the overall patch flow, but focusing on how maintainers work, and how that’s different in different subsystems. Methodology In my presentations I claimed that the kernel community is suffering from too steep hierarchies. And worse, the people in power don’t bother to apply the same rules to themselves as anyone else, especially around purported quality enforcement tools like code reviews. For our purposes a contributor is someone who submits a patch to a mailing list, but needs a maintainer to apply it for them, to get the patch merged. A maintainer on the other hand can directly apply a patch to a subsystem tree, and will then send pull requests up the maintainer hierarchy until the patch lands in Linus’ tree. This is relatively easy to measure accurately in git: If the recorded patch author and committer match, it’s a maintainer self-commit, if they don’t match it’s a contributor commit. There’s a few annoying special cases to handle: Some people use different email addresses or spellings, and sometimes MTAs, patchwork and other tools used in the patch flow chain mangle things further. This could be fixed up with the mail mapping database that LWN for example uses to generate its contributor statistics. Since most maintainers have reasonable setups it doesn’t seem to matter much, hence I decided not to bother. There are subsystems not maintained in git, but in the quilt patch management system. Andrew Morton’s tree is the only one I’m aware of, and I hacked up my scripts to handle this case. After that I realized it doesn’t matter, since Andrew merged exceedingly few of his own patches himself, most have been fixups that landed through other trees. Also note that this is a property of each commit - the same person can be both a maintainer and a contributor, depending upon how each of their patches gets merged. The ratio of maintainer self-commits compared to overall commits then gives us a crude, but fairly useful metric to measure how steep the kernel community overall is organized. Measuring review is much harder. For contributor commits review is not recorded consistently. Many maintainers forgo adding an explicit Reviewed-by tag since they’re adding their own Signed-off-by tag anyway. And since that’s required for all contributor commits, it’s impossible to tell whether a patch has seen formal review before merging. A reasonable assumption though is that maintainers actually look at stuff before applying. For a minimal definition of review, “a second person looked at the patch before merging and deemed the patch a good idea” we can assume that merged contributor patches have a review ratio of 100%. Whether that’s a full formal review or not can unfortunately not be measured with the available data. A different story is maintainer self-commits - if there is no tag indicating review by someone else, then either it didn’t happen, or the maintainer felt it’s not important enough work to justify the minimal effort to record it. Either way, a patch where the git author and committer match, and which sports no review tags in the commit message, strongly suggests it has indeed seen none. An objection would be that these patches get reviewed by the next maintainer up, when the pull request gets merged. But there’s well over a thousand such patches each kernel release, and most of the pull requests containing them go directly to Linus in the 2 week long merge window, when the over 10k feature patches of each kernel release land in the mainline branch. It is unrealistic to assume that Linus carefully reviews hundreds of patches himself in just those 2 weeks, while getting hammered by pull requests all around. Similar considerations apply at a subsystem level. For counting reviews I looked at anything that indicates some kind of patch review, even very informal ones, to stay consistent with the implied oversight the maintainer’s Signed-off-by line provides for merged contributor patches. I therefore included both Reviewed-by and Acked-by tags, including a plethora of misspelled and combined versions of the same. The scripts also keep track of how pull requests percolate up the hierarchy, which allows filtering on a per-subsystem level. Commits in topic branches are accounted to the subsystem that first lands in Linus’ tree. That’s fairly arbitrary, but simplest to implement. Last few years of GPU subsystem history Since I’ve pitched the GPU subsystem against the kernel at large in my recent talks, let’s first look at what things look like in graphics: Fig. 1 GPU total commits, maintainer self-commits and reviewed maintainer self-commits Fig. 2 GPU percentage maintainer self-commits and reviewed maintainer self-commits In absolute numbers it’s clear that graphics has grown tremendously over the past few years. Much faster than the kernel at large. Depending upon the metric you pick, the GPU subsystem has grown from being 3% of the kernel to about 10% and now trading spots for 2nd largest subsystem with arm-soc and staging (depending who’s got a big pull for that release). Maintainer commits keep up with GPU subsystem growth The relative numbers have a different story. First, commit rights and the fairly big roll out of group maintainership we’ve done in the past 2 years aren’t extreme by historical graphics subsystem standards. We’ve always had around 30-40% maintainer self-commits. There’s a bit of a downward trend in the years leading towards v4.4, due to the massive growth of the i915 driver, and our failure to add more maintainers and committers for a few releases. Adding lots more committers and creating bigger maintainer groups from v4.5 on forward, first for the i915 driver, then to cope with the influx of new small drivers, brought us back to the historical trend line. There’s another dip happening in the last few kernels, due to AMD bringing in a big new team of contributors to upstream. v4.15 was even more pronounced, in that release the entirely rewritten DC display driver for AMD GPUs landed. The AMD team is already using a committer model for their staging and internal trees, but not (yet) committing directly to their upstream branch. There’s a few process holdups, mostly around the CI flow, that need to be fixed first. As soon as that’s done I expect this recent dip will again be over. In short, even when facing big growth like the GPU subsystem has, it’s very much doable to keep training new maintainers to keep up with the increased demand. Review of maintainer self-commits established in the GPU subsystem Looking at relative changes in how consistently maintainer self-commits are reviewed, there’s a clear growth from mostly no review to 80+% of all maintainer self-commits having seen some formal oversight. We didn’t just keep up with the growth, but scaled faster and managed to make review a standard practice. Most of the drivers, and all the core code, are now consistently reviewed. Even for tiny drivers with small to single person teams we’ve managed to pull this off, through combining them into larger teams run with a group maintainership model. Last few years of kernel w/o GPU history Fig. 3 kernel w/o GPU maintainer self-commits and reviewed maintainer self-commits Fig. 4 kernel w/o GPU percentage maintainer self-commits and reviewed maintainer self-commits Kernel w/o graphics is an entirely different story. Overall, review is much less a thing that happens, with only about 30% of all maintainer self-commits having any indication of oversight. The low ratio of maintainer self-commits is why I removed the total commit number from the absolute graph - it would have dwarfed the much more interesting data on self-commits and reviewed self-commits. The positive thing is that there’s at least a consistent, if very small upward trend in maintainer self-commit reviews, both in absolute and relative numbers. But it’s very slow, and will likely take decades until there’s no longer a double standard on review between contributors and maintainers. Maintainers are not keeping up with the kernel growth overall Much more worrying is the trend on maintainer self-commits. Both in absolute, and much more in relative numbers, there’s a clear downward trend, going from around 25% to below 15%. This indicates that the kernel community fails to mentor and train new maintainers at a pace sufficient to keep up with growth. Current maintainers are ever more overloaded, leaving ever less time for them to write patches of their own and get them merged. Naively extrapolating the relative trend predicts that around the year 2025 large numbers of kernel maintainers will do nothing else than be the bottleneck, preventing everyone else from getting their work merged and not contributing anything of their own. The kernel community imploding under its own bureaucratic weight being the likely outcome of that. This is a huge contrast to the “everything is getting better, bigger, and the kernel community is very healthy” fanfare touted at keynotes and the yearly kernel report. In my opinion, the kernel community is very much not looking like it is coping with its growth well and an overall healthy community. Even when ignoring all the issues around conduct that I’ve raised. It is also a huge contrast to what we’ve experienced in the GPU subsystem since aggressively rolling out group maintainership starting with the v4.5 release; by spreading the bureaucratic side of applying patches over many more people, maintainers have much more time to create their own patches and get them merged. More crucially, experienced maintainers can focus their limited review bandwidth on the big architectural design questions since they won’t get bogged down in the minutiae of every single simple patch. 4.16 by subsystem Let’s zoom into how this all looks at a subsystem level, looking at just the recently released 4.16 kernel. Most subsystems have unsustainable maintainer ratios Trying to come up with a reasonable list of subsystems that have high maintainer commit ratios is tricky; some rather substantial pull requests are essentially just maintainers submitting their own work, giving them an easy 100% score. But of course that’s just an outlier in the larger scope of the kernel overall having a maintainer self-commit ratio of just 15%. To get a more interesting list of subsystems we need to look at only those with a group of regular contributors and more than just 1 maintainer. A fairly arbitrary cut-off of 200 commits or more in total seems to get us there, yielding the following top ten list: subsystem total commits maintainer self-commits maintainer ratio GPU 1683 614 36% KVM 257 91 35% arm-soc 885 259 29% linux-media 422 111 26% tip (x86, core, …) 792 125 16% linux-pm 201 31 15% staging 650 61 9% linux-block 249 20 8% sound 351 26 7% powerpc 235 16 7% In short there’s very few places where it’s easier to become a maintainer than in the already rather low, roughly 15%, the kernel scores overall. Outside of these few subsystems, the only realistic way is to create a new subsystem, somehow get it merged, and become its maintainer. In most subsystems being a maintainer is an elite status, and the historical trends suggest it will only become more so. If this trend isn’t reversed, then maintainer overload will get a lot worse in the coming years. Of course subsystem maintainers are expected to spend more time reviewing and managing other people’s contribution. When looking at individual maintainers it would be natural to expect a slow decline in their own contributions in patch form, and hence a decline in self-commits. But below them a new set of maintainers should grow and receive mentoring, and those more junior maintainers would focus more on their own work. That sustainable maintainer pipeline seems to not be present in many kernel subsystems, drawing a bleak future for them. Much more interesting is the review statistics, split up by subsystem. Again we need a cut-off for noise and outliers. The big outliers here are all the pull requests and trees that have seen zero review, not even any Acked-by tags. As long as we only look at positive examples we don’t need to worry about those. A rather low cut-off of at least 10 maintainer self-commits takes care of other random noise: subsystem total commits maintainer self-commits maintainer review ratio f2fs 72 12 100% XFS 105 78 100% arm64 166 23 91% GPU 1683 614 83% linux-mtd 99 12 75% KVM 257 91 74% linux-pm 201 31 71% pci 145 37 65% remoteproc 19 14 64% clk 139 14 64% dma-mapping 63 60 60% Yes, XFS and f2fs have their shit together. More interesting is how wide the spread in the filesystem code is; there’s a bunch of substantial fs pulls with a review ratio of flat out zero. Not even a single Acked-by. XFS on the other hand insists on full formal review of everything - I spot checked the history a bit. f2fs is a bit of an outlier with 4.16, barely getting above the cut-off. Usually it has fewer patches and would have been excluded. Everyone not in the top ten taken together has a review ratio of 27%. Review double standards in many big subsystems Looking at the big subsystems with multiple maintainers and huge groups of contributors - I picked 500 patches as the cut-off - there’s some really low review ratios: Staging has 7%, networking 9% and tip scores 10%. Only arm-soc is close to the top ten, with 50%, at the 14th position. Staging having no standard is kinda the point, but the other core subsystems eschewing review is rather worrisome. More than 9 out of 10 maintainer self-commits merged into these core subsystem do not carry any indication that anyone else ever looked at the patch and deemed it a good idea. The only other subsystem with more than 500 commits is the GPU subsystem, at 4th position with a 83% review ratio. Compared to maintainers overall the review situation is looking a lot less bleak. There’s a sizeable group of subsystems who at least try to make this work, by having similar review criteria for maintainer self-commits than normal contributors. This is also supported by the rather slow, but steady overall increase of reviews when looking at historical trend. But there’s clearly other subsystems where review only seems to be a gauntlet inflicted on normal contributors, entirely optional for maintainers themselves. Contributors cannot avoid review, because they can’t commit their own patches. When maintainers outright ignore review for most of their patches this creates a clear double standard between maintainers and mere contributors. One year ago I wrote “Review, not Rocket Science” on how to roll out review in your subsystem. Looking at this data here I can close with an even shorter version: What would Dave Chinner do? Thanks a lot to Daniel Stone, Dave Chinner, Eric Anholt, Geoffrey Huntley, Luce Carter and Sean Paul for reading and commenting on drafts of this article. [Less]
Posted about 6 years ago
As part of preparing my last two talks at LCA on the kernel community, “Burning Down the Castle” and “Maintainers Don’t Scale”, I have looked into how the Kernel’s maintainer structure can be measured. One very interesting approach is looking at the ... [More] pull request flows, for example done in the LWN article “How 4.4’s patches got to the mainline”. Note that in the linux kernel process, pull requests are only used to submit development from entire subsystems, not individual contributions. What I’m trying to work out here isn’t so much the overall patch flow, but focusing on how maintainers work, and how that’s different in different subsystems. Methodology In my presentations I claimed that the kernel community is suffering from too steep hierarchies. And worse, the people in power don’t bother to apply the same rules to themselves as anyone else, especially around purported quality enforcement tools like code reviews. For our purposes a contributor is someone who submits a patch to a mailing list, but needs a maintainer to apply it for them, to get the patch merged. A maintainer on the other hand can directly apply a patch to a subsystem tree, and will then send pull requests up the maintainer hierarchy until the patch lands in Linus’ tree. This is relatively easy to measure accurately in git: If the recorded patch author and committer match, it’s a maintainer self-commit, if they don’t match it’s a contributor commit. There’s a few annoying special cases to handle: Some people use different email addresses or spellings, and sometimes MTAs, patchwork and other tools used in the patch flow chain mangle things further. This could be fixed up with the mail mapping database that LWN for example uses to generate its contributor statistics. Since most maintainers have reasonable setups it doesn’t seem to matter much, hence I decided not to bother. There are subsystems not maintained in git, but in the quilt patch management system. Andrew Morton’s tree is the only one I’m aware of, and I hacked up my scripts to handle this case. After that I realized it doesn’t matter, since Andrew merged exceedingly few of his own patches himself, most have been fixups that landed through other trees. Also note that this is a property of each commit - the same person can be both a maintainer and a contributor, depending upon how each of their patches gets merged. The ratio of maintainer self-commits compared to overall commits then gives us a crude, but fairly useful metric to measure how steep the kernel community overall is organized. Measuring review is much harder. For contributor commits review is not recorded consistently. Many maintainers forgo adding an explicit Reviewed-by tag since they’re adding their own Signed-off-by tag anyway. And since that’s required for all contributor commits, it’s impossible to tell whether a patch has seen formal review before merging. A reasonable assumption though is that maintainers actually look at stuff before applying. For a minimal definition of review, “a second person looked at the patch before merging and deemed the patch a good idea” we can assume that merged contributor patches have a review ratio of 100%. Whether that’s a full formal review or not can unfortunately not be measured with the available data. A different story is maintainer self-commits - if there is no tag indicating review by someone else, then either it didn’t happen, or the maintainer felt it’s not important enough work to justify the minimal effort to record it. Either way, a patch where the git author and committer match, and which sports no review tags in the commit message, strongly suggests it has indeed seen none. An objection would be that these patches get reviewed by the next maintainer up, when the pull request gets merged. But there’s well over a thousand such patches each kernel release, and most of the pull requests containing them go directly to Linus in the 2 week long merge window, when the over 10k feature patches of each kernel release land in the mainline branch. It is unrealistic to assume that Linus carefully reviews hundreds of patches himself in just those 2 weeks, while getting hammered by pull requests all around. Similar considerations apply at a subsystem level. For counting reviews I looked at anything that indicates some kind of patch review, even very informal ones, to stay consistent with the implied oversight the maintainer’s Signed-off-by line provides for merged contributor patches. I therefore included both Reviewed-by and Acked-by tags, including a plethora of misspelled and combined versions of the same. The scripts also keep track of how pull requests percolate up the hierarchy, which allows filtering on a per-subsystem level. Commits in topic branches are accounted to the subsystem that first lands in Linus’ tree. That’s fairly arbitrary, but simplest to implement. Last few years of GPU subsystem history Since I’ve pitched the GPU subsystem against the kernel at large in my recent talks, let’s first look at what things look like in graphics: Fig. 1 GPU total commits, maintainer self-commits and reviewed maintainer self-commits Fig. 2 GPU percentage maintainer self-commits and reviewed maintainer self-commits In absolute numbers it’s clear that graphics has grown tremendously over the past few years. Much faster than the kernel at large. Depending upon the metric you pick, the GPU subsystem has grown from being 3% of the kernel to about 10% and now trading spots for 2nd largest subsystem with arm-soc and staging (depending who’s got a big pull for that release). Maintainer commits keep up with GPU subsystem growth The relative numbers have a different story. First, commit rights and the fairly big roll out of group maintainership we’ve done in the past 2 years aren’t extreme by historical graphics subsystem standards. We’ve always had around 30-40% maintainer self-commits. There’s a bit of a downward trend in the years leading towards v4.4, due to the massive growth of the i915 driver, and our failure to add more maintainers and committers for a few releases. Adding lots more committers and creating bigger maintainer groups from v4.5 on forward, first for the i915 driver, then to cope with the influx of new small drivers, brought us back to the historical trend line. There’s another dip happening in the last few kernels, due to AMD bringing in a big new team of contributors to upstream. v4.15 was even more pronounced, in that release the entirely rewritten DC display driver for AMD GPUs landed. The AMD team is already using a committer model for their staging and internal trees, but not (yet) committing directly to their upstream branch. There’s a few process holdups, mostly around the CI flow, that need to be fixed first. As soon as that’s done I expect this recent dip will again be over. In short, even when facing big growth like the GPU subsystem has, it’s very much doable to keep training new maintainers to keep up with the increased demand. Review of maintainer self-commits established in the GPU subsystem Looking at relative changes in how consistently maintainer self-commits are reviewed, there’s a clear growth from mostly no review to 80+% of all maintainer self-commits having seen some formal oversight. We didn’t just keep up with the growth, but scaled faster and managed to make review a standard practice. Most of the drivers, and all the core code, are now consistently reviewed. Even for tiny drivers with small to single person teams we’ve managed to pull this off, through combining them into larger teams run with a group maintainership model. Last few years of kernel w/o GPU history Fig. 3 kernel w/o GPU maintainer self-commits and reviewed maintainer self-commits Fig. 4 kernel w/o GPU percentage maintainer self-commits and reviewed maintainer self-commits Kernel w/o graphics is an entirely different story. Overall, review is much less a thing that happens, with only about 30% of all maintainer self-commits having any indication of oversight. The low ratio of maintainer self-commits is why I removed the total commit number from the absolute graph - it would have dwarfed the much more interesting data on self-commits and reviewed self-commits. The positive thing is that there’s at least a consistent, if very small upward trend in maintainer self-commit reviews, both in absolute and relative numbers. But it’s very slow, and will likely take decades until there’s no longer a double standard on review between contributors and maintainers. Maintainers are not keeping up with the kernel growth overall Much more worrying is the trend on maintainer self-commits. Both in absolute, and much more in relative numbers, there’s a clear downward trend, going from around 25% to below 15%. This indicates that the kernel community fails to mentor and train new maintainers at a pace sufficient to keep up with growth. Current maintainers are ever more overloaded, leaving ever less time for them to write patches of their own and get them merged. Naively extrapolating the relative trend predicts that around the year 2025 large numbers of kernel maintainers will do nothing else than be the bottleneck, preventing everyone else from getting their work merged and not contributing anything of their own. The kernel community imploding under its own bureaucratic weight being the likely outcome of that. This is a huge contrast to the “everything is getting better, bigger, and the kernel community is very healthy” fanfare touted at keynotes and the yearly kernel report. In my opinion, the kernel community is very much not looking like it is coping with its growth well and an overall healthy community. Even when ignoring all the issues around conduct that I’ve raised. It is also a huge contrast to what we’ve experienced in the GPU subsystem since aggressively rolling out group maintainership starting with the v4.5 release; by spreading the bureaucratic side of applying patches over many more people, maintainers have much more time to create their own patches and get them merged. More crucially, experienced maintainers can focus their limited review bandwidth on the big architectural design questions since they won’t get bogged down in the minutiae of every single simple patch. 4.16 by subsystem Let’s zoom into how this all looks at a subsystem level, looking at just the recently released 4.16 kernel. Most subsystems have unsustainable maintainer ratios Trying to come up with a reasonable list of subsystems that have high maintainer commit ratios is tricky; some rather substantial pull requests are essentially just maintainers submitting their own work, giving them an easy 100% score. But of course that’s just an outlier in the larger scope of the kernel overall having a maintainer self-commit ratio of just 15%. To get a more interesting list of subsystems we need to look at only those with a group of regular contributors and more than just 1 maintainer. A fairly arbitrary cut-off of 200 commits or more in total seems to get us there, yielding the following top ten list: subsystem total commits maintainer self-commits maintainer ratio GPU 1683 614 36% KVM 257 91 35% arm-soc 885 259 29% linux-media 422 111 26% tip (x86, core, …) 792 125 16% linux-pm 201 31 15% staging 650 61 9% linux-block 249 20 8% sound 351 26 7% powerpc 235 16 7% In short there’s very few places where it’s easier to become a maintainer than in the already rather low, roughly 15%, the kernel scores overall. Outside of these few subsystems, the only realistic way is to create a new subsystem, somehow get it merged, and become its maintainer. In most subsystems being a maintainer is an elite status, and the historical trends suggest it will only become more so. If this trend isn’t reversed, then maintainer overload will get a lot worse in the coming years. Of course subsystem maintainers are expected to spend more time reviewing and managing other people’s contribution. When looking at individual maintainers it would be natural to expect a slow decline in their own contributions in patch form, and hence a decline in self-commits. But below them a new set of maintainers should grow and receive mentoring, and those more junior maintainers would focus more on their own work. That sustainable maintainer pipeline seems to not be present in many kernel subsystems, drawing a bleak future for them. Much more interesting is the review statistics, split up by subsystem. Again we need a cut-off for noise and outliers. The big outliers here are all the pull requests and trees that have seen zero review, not even any Acked-by tags. As long as we only look at positive examples we don’t need to worry about those. A rather low cut-off of at least 10 maintainer self-commits takes care of other random noise: subsystem total commits maintainer self-commits maintainer review ratio f2fs 72 12 100% XFS 105 78 100% arm64 166 23 91% GPU 1683 614 83% linux-mtd 99 12 75% KVM 257 91 74% linux-pm 201 31 71% pci 145 37 65% remoteproc 19 14 64% clk 139 14 64% dma-mapping 63 60 60% Yes, XFS and f2fs have their shit together. More interesting is how wide the spread in the filesystem code is; there’s a bunch of substantial fs pulls with a review ratio of flat out zero. Not even a single Acked-by. XFS on the other hand insists on full formal review of everything - I spot checked the history a bit. f2fs is a bit of an outlier with 4.16, barely getting above the cut-off. Usually it has fewer patches and would have been excluded. Everyone not in the top ten taken together has a review ratio of 27%. Review double standards in many big subsystems Looking at the big subsystems with multiple maintainers and huge groups of contributors - I picked 500 patches as the cut-off - there’s some really low review ratios: Staging has 7%, networking 9% and tip scores 10%. Only arm-soc is close to the top ten, with 50%, at the 14th position. Staging having no standard is kinda the point, but the other core subsystems eschewing review is rather worrisome. More than 9 out of 10 maintainer self-commits merged into these core subsystem do not carry any indication that anyone else ever looked at the patch and deemed it a good idea. The only other subsystem with more than 500 commits is the GPU subsystem, at 4th position with a 83% review ratio. Compared to maintainers overall the review situation is looking a lot less bleak. There’s a sizeable group of subsystems who at least try to make this work, by having similar review criteria for maintainer self-commits than normal contributors. This is also supported by the rather slow, but steady overall increase of reviews when looking at historical trend. But there’s clearly other subsystems where review only seems to be a gauntlet inflicted on normal contributors, entirely optional for maintainers themselves. Contributors cannot avoid review, because they can’t commit their own patches. When maintainers outright ignore review for most of their patches this creates a clear double standard between maintainers and mere contributors. One year ago I wrote “Review, not Rocket Science” on how to roll out review in your subsystem. Looking at this data here I can close with an even shorter version: What would Dave Chinner do? Thanks a lot to Daniel Stone, Dave Chinner, Eric Anholt, Geoffrey Huntley, Luce Carter and Sean Paul for reading and commenting on drafts of this article. [Less]
Posted about 6 years ago
Downloads If you're curious about the slides, you can download the PDF or the OTP. Thanks This post has been a part of work undertaken by my employer Collabora. I would like to thank the wonderful organizers of FossNorth, specifically @e8johan for hosting a great event.
Posted about 6 years ago
Downloads If you're curious about the slides, you can download the PDF or the OTP. Thanks This post has been a part of work undertaken by my employer Collabora. I would like to thank the wonderful organizers of FossNorth, specifically @e8johan for hosting a great event.
Posted about 6 years ago
Mmm, a Moving Mesa Midgard Cube In the last Panfrost status update, a transitory “half-way” driver was presented, with the purpose of easing the transition from a standalone library abstracting the hardware to a full-fledged OpenGL ES driver using ... [More] the Mesa and Gallium3D infrastructure. Since then, I’ve completed the transition, creating such a driver, but retaining support for out-of-tree testing. Almost everything that was exposed with the custom half-way interface is now available through Gallium3D. Attributes, varyings, and uniforms all work. A bit of rasterisation state is supported. Multiframe programs work, as do programs with multiple non-indexed, direct draws per frame. The result? The GLES test-cube demo from Freedreno runs using the Mali T760 GPU present in my RK3288 laptop, going through the Mesa/Gallium3D stack. Of course, there’s no need to rely on the vendor’s proprietary compilers for shaders – the demo is using shaders from the free, NIR-based Midgard compiler. Look ma, no blobs! In the past three weeks since the previous update, all aspects of the project have seen fervent progress, culminating in the above demo. The change list for the core Gallium driver is lengthy but largely routine: abstracting features about the hardware which were already understood and integrating it with Gallium, resolving bugs which are discovered in the process, and repeating until the next GLES test passes normally. Enthusiastic readers can read the code of the driver core on GitLab. Although numerous bugs were solved in this process, one in particular is worthy of mention: the “tile flicker bug”, notorious to lurkers of our Freenode IRC channel, #panfrost. Present since the first render, this bug resulted in non-deterministic rendering glitches, where particular tiles would display the background colour in lieu of the render itself. The non-deterministic nature had long suggested it was either the result of improper memory management or a race condition, but the precise cause was unknown. Finally, the cause was narrowed down to a race condition between the vertex/tiler jobs responsible for draws, and the fragment job responsible for screen painting. With this cause in mind, a simple fix squashed the bug, hopefully for good; renders are now deterministic and correct. Huge thanks to Rob Clark for letting me use him as a sounding board to solve this. In terms of decoding the command stream, some miscellaneous GL state has been determined, like some details about tiler memory management, texture descriptors, and shader linkage (attribute and varying metadata). By far, however, the most significant discovery was the operation of blending on Midgard. It’s… well, unique. If I had known how nuanced the encoding was – and how much code it takes to generate from Gallium blend state – I would have postponed decoding like originally planned. In any event, blending is now understood. Under Midgard, there are two paths in the hardware for blending: the fixed-function fast path, and the programmable slow path, using “blend shaders”. This distinction has been discussed sparsely in Mali documentation, but the conditions for the fast path were not known until now. Without further ado, the fixed-function blending hardware works when: The blend equation is either ADD, SUBTRACT, or REVERSE_SUBTRACT (but not MIN or MAX) The “dominant” blend function is either the source/destination colour/alpha, or the special case of a constant ONE or ZERO (but not a constant colour or anything fancier), or the additive complement thereof. The non-dominant blend function is either identical to the dominant blend function, or one of the constant special cases. If these conditions are not met, a blend shader is used instead, incurring a presently unknown performance hit. By dominant and non-dominant modes, I’m essentially referring to the more complex and less complex blend functions respectively, comparing between the functions for the source and the destination. The exact details of the encoding are a little hairy beyond the scope of this post but are included in the corresponding Panfrost headers and the corresponding code in the driver. In any event, this separation between fixed-function and programmable blending is now more or less understood. Additionally, blend shaders themselves are now intelligible with Connor Abbott’s Midgard disassembler; blend shaders are just normal Midgard shaders, with an identical ISA to vertex and fragment shaders, and will eventually be generated with the existing NIR compiler. With luck, we should be able to reuse code from the NIR compiler for the vc4, an embedded GPU lacking fixed-function hardware for any blending whatsoever. Additionally, blend shaders open up some interesting possibilities; we may be able to enable developers to write blend shaders themselves in GLSL through a vendored GL extension. More practically, blend shaders should enable implementation of all blend modes, as this is ES 3.2 class hardware, as well as presumably logic operations. Command-stream work aside, the Midgard compiler also saw some miscellaneous improvements. In particular, the mystery surrounding varyings in vertex shaders has finally been cracked. Recall that gl_Position stores are accomplished by writing the screen-space coordinate to the special register r27, and then including a st_vary instruction with the mysterious input register r1 to the appropriate address. At the time, I had (erroneously) assumed that the r27 store was responsible for the write, and the subsequent instruction was a peculiar errata workaround. New findings shows it is quite the opposite: it is the store instruction that does the store, but it uses the value of r27, not r1 for its input. What does the r1 signify, then? It turns out that two different registers can be used for varying writes, r26 and r27. The register in the store instruction selects between these registers: a value of zero uses r26 whereas a value of one uses r27. Why, then, are there two varying source registers? Midgard is a VLIW architecture, in this case meaning that it can execute two store instructions simultaneously for improved performance. To achieve this parallelism, it needs two source registers, to be able to write two different values to the two varyings. This new understanding clarifies some previous peculiar disassemblies, as the purpose of writes to r26 are now understood. This discovery would have been easier had r26 not also represented a reference to an embedded constant! More importantly, it enables us to implement varying stores in the vertex shader, allowing for smoothed demos, like the shading on test-cube, to work. As a bonus, it cleans up the code relating to gl_Position writes, as we now know they can use the same compiler code path as writes to normal varyings. Besides varyings, the Midgard compiler also saw various improvements, notably including a basic register allocator, crucial for compiling even slightly nontrivial shaders, such as that of the cube. Beyond Midgard, my personal focus, Bifrost has continued to seen sustained progress. Connor Abbott has continued decoding the new shader ISA, uncovering and adding disassembler support for a few miscellaneous new instructions and in particular branching. Branching under Bifrost is somewhat involved – the relevant disassembler commit added over two hundred lines of code – with semantics differing noticeably from Midgard. He has also begun work porting the panwrap infrastructure for capturing, decoding, and replaying command streams from Midgard to Bifrost, to pave the way for a full port of the driver to Bifrost down the line. While Connor continues work on his disassembler, Lyude Paul has been working on a Bifrost assembler compatible with the disassembler’s output, a milestone necessary to demonstrate understanding of the instruction set and a useful prerequisite to writing a Bifrost compiler. Going forward, I plan on cleaning up technical debt accumulated in the driver to improve maintainability, flexibility, and perhaps performance. Additionally, it is perhaps finally time to address the elephant in the command stream room: textures. Prior to this post, there were two major bugs in the driver: the missing tile bug and the texture reading bug. Seeing as the former was finally solved with a bit of persistence, there’s hope for the latter as well. May the pans frost on. [Less]
Posted about 6 years ago
Mmm, a Moving Mesa Midgard Cube In the last Panfrost status update, a transitory “half-way” driver was presented, with the purpose of easing the transition from a standalone library abstracting the hardware to a full-fledged OpenGL ES driver using ... [More] the Mesa and Gallium3D infrastructure. Since then, I’ve completed the transition, creating such a driver, but retaining support for out-of-tree testing. Almost everything that was exposed with the custom half-way interface is now available through Gallium3D. Attributes, varyings, and uniforms all work. A bit of rasterisation state is supported. Multiframe programs work, as do programs with multiple non-indexed, direct draws per frame. The result? The GLES test-cube demo from freedreno runs using the Mali T760 GPU present in my RK3288 laptop, going through the Mesa/Gallium3D stack. Of course, there’s no need to rely on the vendor’s proprietary compilers for shaders – the demo is using shaders from the free, NIR-based Midgard compiler. Look ma, no blobs! In the past three weeks since the previous update, all aspects of the project have seen fervent progress, culminating in the above demo. The change list for the core Gallium driver is lengthy but largely routine: abstracting features about the hardware which were already understood and integrating it with Gallium, resolving bugs which are discovered in the process, and repeating until the next GLES test passes normally. Enthusiastic readers can read the code of the driver core on GitLab. Although numerous bugs were solved in this process, one in particular is worthy of mention: the “tile flicker bug”, notorious to lurkers of our Freenode IRC channel, #panfrost. Present since the first render, this bug resulted in non-deterministic rendering glitches, where particular tiles would display the background colour in lieu of the render itself. The non-deterministic nature had long suggested it was either the result of improper memory management or a race condition, but the precise cause was unknown. Finally, the cause was narrowed down to a race condition between the vertex/tiler jobs responsible for draws, and the fragment job responsible for screen painting. With this cause in mind, a simple fix squashed the bug, hopefully for good; renders are now deterministic and correct. Huge thanks to Rob Clark for letting me use him as a sounding board to solve this. In terms of decoding the command stream, some miscellaneous GL state has been determined, like some details about tiler memory management, texture descriptors, and shader linkage (attribute and varying metadata). By far, however, the most significant discovery was the operation of blending on Midgard. It’s… well, unique. If I had known how nuanced the encoding was – and how much code it takes to generate from Gallium blend state – I would have postponed decoding like originally planned. In any event, blending is now understood. Under Midgard, there are two paths in the hardware for blending: the fixed-function fast path, and the programmable slow path, using “blend shaders”. This distinction has been discussed sparsely in Mali documentation, but the conditions for the fast path were not known until now. Without further ado, the fixed-function blending hardware works when: The blend equation is either ADD, SUBTRACT, or REVERSE_SUBTRACT (but not MIN or MAX) The “dominant” blend function is either the source/destination colour/alpha, or the special case of a constant ONE or ZERO (but not a constant colour or anything fancier), or the additive complement thereof. The non-dominant blend function is either identical to the dominant blend function, or one of the constant special cases. If these conditions are not met, a blend shader is used instead, incurring a presently unknown performance hit. By dominant and non-dominant modes, I’m essentially referring to the more complex and less complex blend functions respectively, comparing between the functions for the source and the destination. The exact details of the encoding are a little hairy beyond the scope of this post but are included in the corresponding Panfrost headers and the corresponding code in the driver. In any event, this separation between fixed-function and programmable blending is now more or less understood. Additionally, blend shaders themselves are now intelligible with Connor Abbott’s Midgard disassembler; blend shaders are just normal Midgard shaders, with an identical ISA to vertex and fragment shaders, and will eventually be generated with the existing NIR compiler. With luck, we should be able to reuse code from the NIR compiler for the vc4, an embedded GPU lacking fixed-function hardware for any blending whatsoever. Additionally, blend shaders open up some interesting possibilities; we may be able to enable developers to write blend shaders themselves in GLSL through a vendored GL extension. More practically, blend shaders should enable implementation of all blend modes, as this is ES 3.2 class hardware, as well as presumably logic operations. Command-stream work aside, the Midgard compiler also saw some miscellaneous improvements. In particular, the mystery surrounding varyings in vertex shaders has finally been cracked. Recall that gl_Position stores are accomplished by writing the screen-space coordinate to the special register r27, and then including a st_vary instruction with the mysterious input register r1 to the appropriate address. At the time, I had (erroneously) assumed that the r27 store was responsible for the write, and the subsequent instruction was a peculiar errata workaround. New findings shows it is quite the opposite: it is the store instruction that does the store, but it uses the value of r27, not r1 for its input. What does the r1 signify, then? It turns out that two different registers can be used for varying writes, r26 and r27. The register in the store instruction selects between these registers: a value of zero uses r26 whereas a value of one uses r27. Why, then, are there two varying source registers? Midgard is a VLIW architecture, in this case meaning that it can execute two store instructions simultaneously for improved performance. To achieve this parallelism, it needs two source registers, to be able to write two different values to the two varyings. This new understanding clarifies some previous peculiar disassemblies, as the purpose of writes to r26 are now understood. This discovery would have been easier had r26 not also represented a reference to an embedded constant! More importantly, it enables us to implement varying stores in the vertex shader, allowing for smoothed demos, like the shading on test-cube, to work. As a bonus, it cleans up the code relating to gl_Position writes, as we now know they can use the same compiler code path as writes to normal varyings. Besides varyings, the Midgard compiler also saw various improvements, notably including a basic register allocator, crucial for compiling even slightly nontrivial shaders, such as that of the cube. Beyond Midgard, my personal focus, Bifrost has continued to see sustained progress. Connor Abbott has continued decoding the new shader ISA, uncovering and adding disassembler support for a few miscellaneous new instructions and in particular branching. Branching under Bifrost is somewhat involved – the relevant disassembler commit added over two hundred lines of code – with semantics differing noticeably from Midgard. He has also begun work porting the panwrap infrastructure for capturing, decoding, and replaying command streams from Midgard to Bifrost, to pave the way for a full port of the driver to Bifrost down the line. While Connor continues work on his disassembler, Lyude Paul has been working on a Bifrost assembler compatible with the disassembler’s output, a milestone necessary to demonstrate understanding of the instruction set and a useful prerequisite to writing a Bifrost compiler. Going forward, I plan on cleaning up technical debt accumulated in the driver to improve maintainability, flexibility, and perhaps performance. Additionally, it is perhaps finally time to address the elephant in the command stream room: textures. Prior to this post, there were two major bugs in the driver: the missing tile bug and the texture reading bug. Seeing as the former was finally solved with a bit of persistence, there’s hope for the latter as well. May the pans frost on. [Less]
Posted about 6 years ago
Mmm, a Moving Mesa Midgard Cube In the last Panfrost status update, a transitory “half-way” driver was presented, with the purpose of easing the transition from a standalone library abstracting the hardware to a full-fledged OpenGL ES driver using ... [More] the Mesa and Gallium3D infrastructure. Since then, I’ve completed the transition, creating such a driver, but retaining support for out-of-tree testing. Almost everything that was exposed with the custom half-way interface is now available through Gallium3D. Attributes, varyings, and uniforms all work. A bit of rasterisation state is supported. Multiframe programs work, as do programs with multiple non-indexed, direct draws per frame. The result? The GLES test-cube demo from freedreno runs using the Mali T760 GPU present in my RK3288 laptop, going through the Mesa/Gallium3D stack. Of course, there’s no need to rely on the vendor’s proprietary compilers for shaders – the demo is using shaders from the free, NIR-based Midgard compiler. Look ma, no blobs! In the past three weeks since the previous update, all aspects of the project have seen fervent progress, culminating in the above demo. The change list for the core Gallium driver is lengthy but largely routine: abstracting features about the hardware which were already understood and integrating it with Gallium, resolving bugs which are discovered in the process, and repeating until the next GLES test passes normally. Enthusiastic readers can read the code of the driver core on GitLab. Although numerous bugs were solved in this process, one in particular is worthy of mention: the “tile flicker bug”, notorious to lurkers of our Freenode IRC channel, #panfrost. Present since the first render, this bug resulted in non-deterministic rendering glitches, where particular tiles would display the background colour in lieu of the render itself. The non-deterministic nature had long suggested it was either the result of improper memory management or a race condition, but the precise cause was unknown. Finally, the cause was narrowed down to a race condition between the vertex/tiler jobs responsible for draws, and the fragment job responsible for screen painting. With this cause in mind, a simple fix squashed the bug, hopefully for good; renders are now deterministic and correct. Huge thanks to Rob Clark for letting me use him as a sounding board to solve this. In terms of decoding the command stream, some miscellaneous GL state has been determined, like some details about tiler memory management, texture descriptors, and shader linkage (attribute and varying metadata). By far, however, the most significant discovery was the operation of blending on Midgard. It’s… well, unique. If I had known how nuanced the encoding was – and how much code it takes to generate from Gallium blend state – I would have postponed decoding like originally planned. In any event, blending is now understood. Under Midgard, there are two paths in the hardware for blending: the fixed-function fast path, and the programmable slow path, using “blend shaders”. This distinction has been discussed sparsely in Mali documentation, but the conditions for the fast path were not known until now. Without further ado, the fixed-function blending hardware works when: The blend equation is either ADD, SUBTRACT, or REVERSE_SUBTRACT (but not MIN or MAX) The “dominant” blend function is either the source/destination colour/alpha, or the special case of a constant ONE or ZERO (but not a constant colour or anything fancier), or the additive complement thereof. The non-dominant blend function is either identical to the dominant blend function, or one of the constant special cases. If these conditions are not met, a blend shader is used instead, incurring a presently unknown performance hit. By dominant and non-dominant modes, I’m essentially referring to the more complex and less complex blend functions respectively, comparing between the functions for the source and the destination. The exact details of the encoding are a little hairy beyond the scope of this post but are included in the corresponding Panfrost headers and the corresponding code in the driver. In any event, this separation between fixed-function and programmable blending is now more or less understood. Additionally, blend shaders themselves are now intelligible with Connor Abbott’s Midgard disassembler; blend shaders are just normal Midgard shaders, with an identical ISA to vertex and fragment shaders, and will eventually be generated with the existing NIR compiler. With luck, we should be able to reuse code from the NIR compiler for the vc4, an embedded GPU lacking fixed-function hardware for any blending whatsoever. Additionally, blend shaders open up some interesting possibilities; we may be able to enable developers to write blend shaders themselves in GLSL through a vendored GL extension. More practically, blend shaders should enable implementation of all blend modes, as this is ES 3.2 class hardware, as well as presumably logic operations. Command-stream work aside, the Midgard compiler also saw some miscellaneous improvements. In particular, the mystery surrounding varyings in vertex shaders has finally been cracked. Recall that gl_Position stores are accomplished by writing the screen-space coordinate to the special register r27, and then including a st_vary instruction with the mysterious input register r1 to the appropriate address. At the time, I had (erroneously) assumed that the r27 store was responsible for the write, and the subsequent instruction was a peculiar errata workaround. New findings shows it is quite the opposite: it is the store instruction that does the store, but it uses the value of r27, not r1 for its input. What does the r1 signify, then? It turns out that two different registers can be used for varying writes, r26 and r27. The register in the store instruction selects between these registers: a value of zero uses r26 whereas a value of one uses r27. Why, then, are there two varying source registers? Midgard is a VLIW architecture, in this case meaning that it can execute two store instructions simultaneously for improved performance. To achieve this parallelism, it needs two source registers, to be able to write two different values to the two varyings. This new understanding clarifies some previous peculiar disassemblies, as the purpose of writes to r26 are now understood. This discovery would have been easier had r26 not also represented a reference to an embedded constant! More importantly, it enables us to implement varying stores in the vertex shader, allowing for smoothed demos, like the shading on test-cube, to work. As a bonus, it cleans up the code relating to gl_Position writes, as we now know they can use the same compiler code path as writes to normal varyings. Besides varyings, the Midgard compiler also saw various improvements, notably including a basic register allocator, crucial for compiling even slightly nontrivial shaders, such as that of the cube. Beyond Midgard, my personal focus, Bifrost has continued to see sustained progress. Connor Abbott has continued decoding the new shader ISA, uncovering and adding disassembler support for a few miscellaneous new instructions and in particular branching. Branching under Bifrost is somewhat involved – the relevant disassembler commit added over two hundred lines of code – with semantics differing noticeably from Midgard. He has also begun work porting the panwrap infrastructure for capturing, decoding, and replaying command streams from Midgard to Bifrost, to pave the way for a full port of the driver to Bifrost down the line. While Connor continues work on his disassembler, Lyude Paul has been working on a Bifrost assembler compatible with the disassembler’s output, a milestone necessary to demonstrate understanding of the instruction set and a useful prerequisite to writing a Bifrost compiler. Going forward, I plan on cleaning up technical debt accumulated in the driver to improve maintainability, flexibility, and perhaps performance. Additionally, it is perhaps finally time to address the elephant in the command stream room: textures. Prior to this post, there were two major bugs in the driver: the missing tile bug and the texture reading bug. Seeing as the former was finally solved with a bit of persistence, there’s hope for the latter as well. May the pans frost on. [Less]