linux kernel monkey log

Ah, so I read the comments that Eric Schrock had about why Sun doesn't want to help with Linux kernel development, and as a Linux kernel developer I thought I would attempt to address his major points, and explain as to why Linux developers feel this way. It's still early in the morning for me, so this might seem a little disorganized, but what the hell:

Ok, his first statement:

The main reason we can't just jump into Linux is because Linux doesn't align with our engineering principles, and no amount of patches will ever change that. In the Solaris kernel group, we have strong beliefs in reliability, observability, serviceability, resource management, and binary compatibility. Linus has shown time and time again that these just aren't part of his core principles, and in the end he is in sole control of Linux's future

To state that the Linux kernel developers don't believe in those good, sound engineering values, is pretty disingenuous. Hey, one of the first thing that people think about when Linux is mentioned is "reliability" (hm, I used to have a link showing that, can't find it anymore...) These is pretty much a baseless argument just wanting to happen, and as he doesn't point out anything specific, I'll ignore it for now.

He continues on with specific issues:

Projects such as crash dumps, kernel debuggers, and tracing frameworks have been repeatedly rejected by Linus, often because they are perceived as vendor added features.

Ah, ok, specifics. Let's get into these:
Crash dumps. The main reason this option has been rejected is the lack of a real, working implementation. But this is being fixed. Look at the kexec based crashdump patch that is now in the latest -mm kernel tree. That is the way to go with regards to crash dumps, and is showing real promise. Eventually that feature will make it into the main kernel tree. But to state that Linux kernel developers just reject this feature is to ignore the fact that it really wasn't ready to be merged into the tree.

Kernel debuggers. Ah, a fun one. I'll leave this one alone only to state that I have never needed to use one, in all the years of my kernel development. But I know other developers who swear by them. So, to each their own. For hardware bringup, they are essential. But for the rest of the kernel community, they would be extra baggage. I know Andrew Morton has been keeping the kernel debugger patches in his tree, for those people who want this feature. And since Linux is about choice, and this choice is offered to anyone who wants to use it, to say it is not available, is again, not a valid argument.

Tracing frameworks. Hm, then what do you call the kprobes code that is in the mainline kernel tree right now? :) This suffered the same issue that the crash dump code suffered, it wasn't in a good enough state to merge, so it took a long time to get there. But it's there now, so no one can argue it again.

Which brings me to the very valid point about how Linux kernel development differs from any other OS development. We (kernel developers) do not have to accept any feature that we deem is not properly implemented, just because some customer or manager tells us we have to have it. In order to get your stuff into the kernel, you must first tell the community why it is necessary, and so many people often forget this. Tell us why we really need to add this new feature to the kernel, and ensure us that you will stick around to maintain it over time. Then, if the implementation seems sane, and it works, will it be added. A lot of these "enterprise" kernel patches didn't have sane implementations, and were not successfully explained to the kernel community as to why they are necessary (and no, the argument, "but we did it in INSERT_YOUR_DYING_OS_NAME_HERE, so Linux needs it" is not a valid excuse.)

Then Eric continues:

Not to mention the complete lack of commitment to binary compatibility (outside of the system call interface). Kernel developers make it nearly impossible to maintain a driver outside the Linux source tree (nVidia being the rare exception), whereas the same apps (and drivers) that you wrote for Solaris 2.5.1 will continue to run on Solaris 10.

First off, this is an argument that no user cares about. Applications written for Linux 1.0 work just fine on the 2.6 kernel. Applications written for older versions of Linux probably work just fine too, I don't know if anyone tried it in a long time. The Linux kernel developers care a lot about userspace binary compatibility. If that breaks, let us know and we will fix it. So to say that old apps written for old versions of Solaris will work on the latest Solaris, is not a valid argument at all.

But the issue of driver compatibility. For all of the people that seem to get upset about this, I really don't see anyone understand why Linux works this way. Here's why the Linux kernel does not have binary driver compatibility, and why it never will:

We want to fix the bugs that we find. If we find a bug in a kernel interface, we fix it, fix up all drivers that use that api call, and everyone is happy.
We learn over time how to write better interfaces. Take a look at the USB driver interface in Windows (as an example). They have rewritten the USB interface in Windows at least 3 times, and changed the driver interface a bit every time. But every time they still have to support that first interface, as there are drivers out there somewhere that use it. So they can never drop old driver apis, no matter how buggy or broken they are. So that code remains living inside that kernel forever. In Linux we have had at least 3 major changes in the USB driver interface (and it looks like we are due for another one...) Each time this happened, we fixed up all of the drivers in the kernel tree, and the api, and got rid of the old one. Now we don't have to support an old, broken api, and the kernel is smaller and cleaner. This saves time and memory for everyone in the long run.
compiler versions and kernel options. If you select something as simple as CONFIG_SMP, that means that core kernel structures will be different sizes, and locks will either be enabled, or compiled away into nothing. So, if you wanted to ship a binary driver, you would have to build your driver for that option enabled, and disabled. Now combine that with the zillion different kernel options that are around that change the way structures are sized and built, and you have a huge number of binary drivers that you need to ship. Combine that with the different versions of gcc which align things differently (and turn on some kernel options themselves, based on different features available in the compiler) and there's no way you can successfully ship a binary kernel driver that will work for all users. It's just an impossible dream of people who do not understand the technology.
Drivers outside the kernel tree and binary drivers take away from Linux, they give nothing back. This was one of the main statements from Andrew Morton's 2004 OLS keynote, and I agree. Out of the box, Linux supports more hardware devices than any other operating system. That is very important, and is something that we could not have done without the drivers being in our tree.
If a kernel api is not being used by anyone in the tree, we delete it. We have no way of knowing if there is some user of this api in a driver living outside on some sf.net site somewhere. I have been yelled at for removing apis like this, when there was no physical way I could have possibly known that someone was using this interface. To prevent this, get your driver into the kernel tree. By doing this, it will force you to write good kernel code (up to the standard of the rest of the tree), and it will let other people fix your bugs easier, and if any kernel api changes, the person who changes it will fix your driver for you. This is the power of opensoruce development, why try to subvert it?

Ok, the final point I want to address:

Large projects like Zones, DTrace, and Predictive Self Healing could never be integrated into Linux simply because they are too large and touch too many parts of the code. Kernel maintainers have rejected patches simply because of the amount of change (SMF, for example, modified over 1,000 files).

This is just wrong. Look at the LSM kernel change that I helped get into the kernel tree? It touched pretty much every part of the kernel. It touched almost every core kernel structure. And yet, the LSM developers worked with the kernel community, addressed their concerns, and got the changes accepted into the tree. And Linux now benefits from having an extremely flexible security model now because of this effort. So to say it is impossible to get such a change into Linux is false.

It's ok that Sun doesn't want to use Linux for their systems, no one is forcing them to, and Solaris is quite nice in places. And when Sun realizes the error of their ways, we'll still be here making Linux into one of the most stable, feature rich, and widely used operating systems in the world.

posted Thu, 23 Sep 2004 in [/diary]