Wow this is huge. Having an OS be able to run both ZFS and KVM enables fantastic things. For example you can store each virtual machine disk template on a dedicated ZFS filesystem, and use "zfs clone" to rapidly deploy VMs (instead of using the KVM-level support for base disk images "qemu-img create -b ..."), as well as "zfs snapshot", "zfs revert", etc. The main advantage being that these clones and snapshots are possible while using the simple and fast "raw" KVM disk image format instead of the notoriously slower "qcow2" format that was, until today, the best format supporting base images and snapshots.
I, for one, have been wanting to use ZFS specifically like that for a while. This does not compare at all to running ZFS on, say, an NFS server or iSCSI SAN serving data to a server running KVM (ZFS data integrity is only verified remotely on the storage server, it is slower, etc).
Who ported KVM to Illumos? I know some old version of QEMU was running on Solaris at some point in the past, but I had no idea they had a full blown KVM port.
I really like the use of DTrace in #if 0'd out code for identifying the highest priorities on porting. Having faced a similar task that seemed impenetrable at the start I would have really liked to have this technique available.
Uhm, isn't that a violation of the GPL given that you're mixing the complete work with non-compatible (by design) CDDL code? Did you ask for KVM to be relicensed for the port or something?
My (somewhat casual) understanding is that the Solaris kernel has always, by necessity, supported modules under a different license much more deeply than Linux. Commercially significant software like the Veritas filesystem and device drivers have always worked this way. So, e.g., the Solaris kernel has a real, no-foolin' ABI.
That doesnt mean that linking in a GPL module doesnt make the rest of the kernel GPL. A real ABI might not mean it is not linked. Allowing commercial modules and linking with GPL ones are not the same.
I dont know though, the definition of linking is pretty opaque to me.
> You can write GPL'ed device drivers for Windows - doesn't make Windows GPL.
The public interface exposed to Windows drivers is _much_ more shallow than the public interface exposed to Linux and Solaris/Illumos modules. Not coincidentially, the Windows driver exports overlap a lot with the interface that Linux allows for usage in proprietary drivers.
Instead, KVM uses a lot of hooks into the kernel innards (into the scheduler, into the MMU, etc.) that are marked as EXPORT_SYMBOL_GPL in the Linux kernel, and the Illumos port does use some of the same hooks.
I am not a lawyer, so I don't have any clear answer. I also work for Red Hat, so even if I had one I would not want to say it (should not say it). But the licensing question is _not_ an idle question.
Also, the licensing question does not detract anything from the awesome work of these guys.
You do not get to choose whether it matters that the hooks (such as context ops) were preexisting. Perhaps it does, perhaps it doesn't. But it's the same trying to define what is a derived work, and neither you nor the KVM copyright holders can do that.
In the meanwhile, just thank whoever said that GPL-incompatibility was an explicit design goal for the CDDL.
The definition of a derived work is actually much more crisp than you are making it out to be -- and it is simple fact (and not opinion) that illumos is not a derived work of our KVM module. Indeed, this issue is so clear-cut and your objections so unfounded that this is beginning to sound like you are deliberately trying to place doubt over the legality of our work. So as long as we're playing Pretend Lawyer, consider your employer and look this one up: tortious interference.
> this is beginning to sound like you are deliberately trying to place doubt over the legality of our work
Oh come on. Again: if anyone has placed doubt on it, it's whoever chose CDDL because it was GPL-incompatible. All I'm saying is the question is not idle. If it was so clear-cut, it would not have popped out in 3 out of 3 places where I read about KVM/Illumos (your blog, LWN, HN).
Could you be more specific? It's there, it works, it's in shipping illumos-based products (from Joyent and others) and it's supported by those who ship it or stand it up. In that regard, it's like any other aspect of the system...
While ZFS adds some advantages, if you were going for snap shots and quick deploy with KVM your best bet was to use LVM as a raw device. A quick lvm snapshot + virt-clone + virsh start will let me have a cloned vm up within seconds.
Thats not saying ZFS doesn't add a lot (checksum backed store, heap of other features). But I would actually expect LVM to be faster, precisely because it is "crappier" (no checksumming, very basic address remapping, etc).
The real question will be support, if they can convince people that this will have enough life to build a user base large enough to support it moving forward. Especially if oracle stops publishing code under open source licenses.
The problem with LVM snapshots is that you have to manually reserve a portion of the logical volume to store them. And if you calculate your needs incorrectly, LVM runs out of space to preserve them, and they become corrupted (they are reported as "INVALID" in lvdisplay IIRC).
That's why I prefer KVM's built-in base image support (qemu-img create -b ...) to quickly provision VMs, instead of LVM. That's how I architected KVM hosts for my employer, running 100+ lightweight VMs each, with 1000+ disk images available on disk.
This is really great. Joyent has been running this internally for years. The combination is just great: ZFS, DTrace and KVM speak for themselves, but another great ingredient is their use of the NetBSD userland (in place of legacy Solaris one).
Linux is great, but a monoculture benefits no one.
I'm having a difficult time wrapping my head around this. If Joyent wanted to stick with a UNIX then why not FreeBSD and help implement KVM there? Wasn't XEN already available for Dom0? I just don't see much mindshare in Illumos/SmartOS for keeping up with current server hardware drivers. I really would have liked to see one of the existing BSDs benefit from the time they've put into this project.
Obviously Linux already has KVM, and there are new implmentations of DTrace and ZFS. Btrfs is not too many more kernel releases away from being considered stable(fsck being a glaring problem) and if you want container-based virtualization you've got LXC(some prefer OpenVZ).
The Linux implementations of ZFS are dog slow, and for the last few weeks I've been evaluating btrfs as a possibility for my home storage server, and I've ruled it out completely.
btrfs has been "Close to a stable release" for years now, but if you follow their mailing list there are still people reporting total loss of large filesystems once or twice a week (This week I saw filesystem corruption from an unexpected power outtage, and last week there was a data loss/corruption bug caused by a pool of drives with different sector sizes), I desperately want btrfs to be mature but it's more than a few releases away from being stable.
You're suggesting Joyent should have finished porting DTrace, ZFS, and Zones to FreeBSD, and then port KVM to FreeBSD, instead of just porting KVM to Illumos?
FreeBSD has already had ZFS and DTrace for years. They have Jails which would have benefited from Solaris' stronger Zones/Containers.
There are already a lot of BSD users. Where are the Illumos users? Are you suggesting that all BSD users go to SmartOS? I used HP-UX and AIX for many years...the one off UNIX systems just end up recreating each other due to fragmentation. They are not the way forward. The smaller mind share/usage of OpenSolaris/Illumos isn't helping. Wouldn't contribution to a BSD have strengthened the community?
Not at all. The beauty of KVM on SmartOS is that Joyent's customers can run BSD, or Linux, or Windows, or whatever they want without even knowing it's running on top of Illumos. That's why the user base isn't as much an issue. But the underlying foundation is critical. I'm actually really glad BSD is adopting DTrace and I'm sure it will be successful, but the BSD documentation says that it's experimental and not yet production ready (http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/dt...). Solaris DTrace has been in production since at least 2005.
Disclaimer: I work for Joyent, but I do not speak for them.
DTrace and ZFS are both great and lack competitive alternatives on Linux. For years, the horrid state of Solaris' userland made it hard to justify order-of-magnitude support increases but this lets you use the best parts without having to suffer through things like a system-destroying package updater[1] simply to get a reliable filesystem.
1. At a previous employer, we bought Sun compute nodes. Nice hardware, arrived with Solaris 10 preinstalled. Each time we got a new shipment I'd get them racked and see if the updater wouldn't render the system unbootable after the first run. Second reboot was always to the Debian installer.
Well, pretty much the same here. We ran linux on xfires until recently because we loved the hardware, but wouldn't touch solaris with a 10ft pole.
I'm skeptical DTrace and ZFS can justify the risk to invest in a niche platform. Personally I'm holding out for the linux alternatives (btrfs, ceph) to mature.
I agree the foundation is critical. So would you say that the extensive community use of KVM on SmartOS has decided it is production quality compared to use on Linux?
They have a lot of investment around Solaris (let's not argue about Solaris, OpenSolaris, Illumos) since a few years ago. Brought a lot of ex-SUN engineers, hired open source activist around OpenSolaris, probably built their infrastructure around it as well.
last I looked, xen was culled from openindiana (then the only functional Illumos based distro) - this was back in the days when we thought xen was dying, back before it was pushed upstream in Linux.
"SmartOS is comprised of the Illumos kernel (with ZFS, DTrace, OS-level virtualization and next-generation KVM) with __BSD package management and a GNU toolchain__."
Oh god, this makes me happy. I gave Solaris a couple of tries, but I've never felt at home there. These two facts mean that I'll try again.
Solaris 11 Express has the GNU toolchain and a modern package management system.
For those that want something like SmartOS but free, and with a modern package management system, I would suggest OpenIndiana. OpenIndiana will be integrating many of the SmartOS changes.
They're really going out of their way to not use Solaris.
SmartOS is comprised of the Illumos kernel (with ZFS, DTrace, OS-level virtualization and next-generation KVM) with BSD package management and a GNU toolchain.
I used Solaris for almost 5 years and when Sun was bought I thought for sure the cool parts of it were doomed. Illumos popped up and I thought that they didn't have a chance, it would wither and die. It's very nice to see that is not the case.
They're really going out of their way to not use Solaris.
Considering that the Illumos kernel is basically the OpenSolaris kernel, I don't understand what you mean. They are using Solaris, but for some reason they're trying to hide it.
This project is certainly not using Solaris which is now an Oracle product (and trademark) and not open source. It is based on Illumos (which originally came from an OpenSolaris kernel as you said) but my understanding is that OpenSolaris as a project is dead. I don't think it's hiding something to name-check Illumos instead of OpenSolaris.
Illumos != Solaris, unless you'd like Oracle to beat you with a stick for trademark infringement.
I suppose they could refer to it in technical documents as "the OpenSolaris kernel", but by that same logic, an OS using the DragonFly BSD kernel should say it's using "the FreeBSD kernel".
Well since they have a lot of former Sun engineers working for them I don't really think they have to SAY Solaris. As others have mentioned there are trademarks involved here too. Sun/Oracle have been hemorrhaging talent, particularly the former Fishworks team, for a while now and a lot of those people are ending up in places like Joyent.
Who will add support for the new hardware to basically abandoned OpenSolaris kernel? Oracle? No. That small company? No. Leveraging FreeBSD community to grow the list of supported hardware would be much, much wiser.
This is simply amazing. Tracing KVM using Dtrace is going to throw up some pretty useful and surprising results.
And add ZFS on top that as well - with all its features (sp. dedup+COW) useful for virtualized hosting.
Anyone know if linux has any variant/clone of dtrace yet? (and no, not systemtap)
Paul Fox seems to have been working on a Linux DTrace port since 2008 (based on the directory listing). He blogs about his progress on DTrace (and other things) at http://crtags.blogspot.com/
Many, many companies do want this, yes. Back in 2005-2006, I worked at MP3tunes and among other things I helped design the storage system. After testing a bunch of storage solutions (many of them high-end systems), we ended up rolling our own using Linux + MogileFS. This was scaled up to a couple hundred terabytes before I left, and even higher afterwards. Sometimes the off-the-shelf solutions simply don't work, especially at scale.
Pretty much, the amount of times we've had issues with new storage vendor arrays from a 3 letter company that starts with an E, and ends with a C, is a bit too much to count.
Firmware bugs that affect anything on a fabric, that affect how it distributes its cache slot locks on writes, issues with the entire array acting funny, which are blamed on either the server hardware or os itself until proven to be an issue with the array (this is far too common to be honest, yay for "support" contracts), etc....
Yes you get "support", and I use the term lightly, with big vendors, but you also have to take a machete through their support organization to get to someone that can help you with a problem. At times rolling your own solution will end up being both cheaper and less problematic. The old adage of "Nobody ever got fired for choosing IBM" may soothe managers minds, but wait until you do end up buying those fancy pants high end arrays and find out how much snake oil turns out not to work on them.
If it can be an order of magnitude cheaper, why not? Enterprise storage has commanded a high premium for a long time and of course enterprises are storing more data than ever, not all of which needs the same performance characteristics. I welcome the shakeup in this space. The old days where you fretted endlessly over which data to preserve on expense SANs are fading behind us.
I think in general the premium goes to pay for support contracts and service as much as for the atoms and bits. Of course, it can always be argued whether those contracts and services are worth the premium, but many enterprise customers (Google being the obvious high-profile exception) seem to think so.
I'd also argue that you don't always need gold-plated storage. Think about user docs that are infrequently accessed , numerous, and must be retained for long periods. Deploying NetApp or EMC for that use case doesn't seem to make sense with options like this or OpenStack object storage.
If you mess up your data retention, though, you can get in big legal trouble. Sometimes you have to ask yourself whether you want to be the one who's liable for that.
Can anyone either explain or link to material that explains the relationships between the different projects and distributions that have sprung up since Oracle discontinued OpenSolaris?
After OpenSolaris OS/Net kernel development was closed, 3 forks were made: Illumos, SchilliX-ON and Stormix (sadly, they all seem quite inactive).
There are 3 distributions based on the first one: OpenIndiana, Nexenta and SmartOS. There is also SchilliX (based on SchilliX-ON) and StromOS (based on Stormix).
This is just based on my existing knowledge, so there might be more forks and/or distributions out there.
Is it, really? According to GitHub's commit history, there were 297 commits made since onnv_147, so ~1 commit/day. Prior to the fork, OpenSolaris was getting ~10 commits/day.
To put this into perspective, there were over 8000 commits made to OpenBSD in the same time-frame. I won't even try to compare this with FreeBSD or Linux...
Don't get me wrong, I love the effort, but at the same time I feel like there wasn't any real progress made since the project's inception.
Uh, no, it is definitely an "actively developed operating system".
As for user visible, actually, yes, changes made to dtrace recently for example.
Sparse zones as another example, and so on.
Remember that Solaris (and its derivatives, unlike Linux) is userland + kernel -- not just the kernel. So changes in userland count in my opinion when considering "actively developed operating system".
> As for user visible, actually, yes, changes made to dtrace recently for example.
> Sparse zones as another example, and so on.
Those changes came from SmartOS/Joyent and were not made by the IllumOS developers.
Don't get me wrong, I really appreciate their work, but I've already lost hope with IllumOS/OpenIndiana... Fortunately Joyent stepped in, so I hope that things will speed-up a bit now :)
Cool. I see this really helping me when testing my application prior to deployment. Would like to see some more movement with No.de. Have been waiting for a bit to get access.
I'm struggling to understand why ZFS makes enterprise storage obsolete. One of the benefits of the SAN is that it is shared amongst a number of servers.
How does ZFS help with that? Assuming you install SmartOS as virtualisation host, you'd still need some kind of shared storage.
[Edit:] The reason I point this out is that if you lose a host, you lose all VMs running on that host. And then you won't easily be able to get them back online on another host if you've lost your storage as well.
But you were able to do this before SmartOS... I'm curious about this too, and how Joyent have architected their cloud arrangement. I feel like perhaps hypervisors have local ZFS storage, as opposed to large SAN-backed virtual machines, but I can't really find much useful information on how Joyent/SmartOS implement/architect storage...
Fantastic news. I've been excited about Illumos for a long time, and now we're finally getting a decent distro with commercial backing. My servers will be very happy when they hear about this.
I 'm happy for all the involvement of Joyent in opensource, but how about also dropping the prices in their accelerators? I 've been using the same servers with the same prices for 3 years with zero upgrades.
I, for one, have been wanting to use ZFS specifically like that for a while. This does not compare at all to running ZFS on, say, an NFS server or iSCSI SAN serving data to a server running KVM (ZFS data integrity is only verified remotely on the storage server, it is slower, etc).
Who ported KVM to Illumos? I know some old version of QEMU was running on Solaris at some point in the past, but I had no idea they had a full blown KVM port.