Eric Raymond explained why he hates proprietary software. And the reason is not with the software itself, but with how it is being written:
In that world, the working programmer’s normal experience includes being forced to use broken tools for political reasons, insane specifications, and impossible deadlines. It means living in Dilbert-land, only without the irony. It means sweating blood through the forehead to do sound work only to have it trashed, mangled, or buried by people who couldn’t write a line of code to save their lives.
If you love programming, trying to do work you can be proud of in this situation is heartbreaking. You know you could do better work if they’d just goddamn give you room to breathe. But there’s never time to do it right, and always another brain-dead idea for a feature nobody will ever actually use coming at you from a marketing department who thinks it will look good on the checklist. Long days, long nights and at the end of it all some guy in a suit owns all that work, owns the children of your mind, owns a piece of you.
Raymond concludes with:
I will have no part of helping it do to the young, malleable, innocent programmers of today and tomorrow what was done to me and my peers.
Because two decades later, my scars still ache.
Unfortunately, based on my own experience, I must admit that he is right on the mark. Writing software in a corporate environment is often painful: artificial deadlines, artificial constraints, artificial priorities, artificial pressure. But it has gotten a lot worse in the past decade, at least in the company I work for.
Why it’s worse today than Raymond remembers
In today’s cost-saving business environment, there is even more than what Raymond identified based on an experience that is over two decades old now. These days, the tools are not just broken for political reasons, more and more they are broken for cost reasons.
My team and I work on several locations, in the US and in Europe. Three of these locations host our servers. On these three locations, in the past month alone, we had no less than six electrical outages (!), five of which were unplanned, and no location was immune. You’d think that having backup power is really basic. But in one case, the backup generator itself was so old and so broken that it prevented power from being restored when the grid came back up! We also lost networking or primary infrastructure tools (e.g. bug tracking system or e-mail) on at least five occasions. These are not accessories, these are all critical tools that directly impact our work. If electricity or network or mail is down, there isn’t much you can do to develop software.
Meanwhile, the release schedule itself tightens, for us it’s now two releases a year. The team shrinks despite the increasing amount of work: we lost key contributors and we don’t even try to replace them. Servers break down because they are too old: my most recent machine was obsoleted in 2004, and it’s a leftover from some other team. Travel is restricted and inconvenient: I once had to wait 12 hours in New-York on my way back from California, which practically doubled the length of an already exhausting trip; the corporate travel booking system chose that flight to save $28, which was not very smart when you know how much two meals in New-York cost…
And to top it off, we have all the small daily frustrations. Coffee machines are not repaired, or worse yet are removed, with explanations that a colleague called “soviet style communication”, like “this coffee pot is not being used” (talk to the guy who’s been filling it three-five times a day). Work PCs are replaced with the lowest-end model, and are already obsolete when you get them: you give employees what customers wouldn’t buy anymore. Signs are posted in the restrooms asking us to be “green” and limit water usage, but the nearby toilets have been leaking for months, not to mention that they are clogged too. Raises are harder and harder to get, bonuses are recomputed with new formulas that make them smaller each year…
This is why I wrote recently to some of my colleagues that “developing software for this company is like being asked to win a marathon while wearing ski boots and carrying three people“… which my colleagues apparently did not think was even strong enough, since the first response was “… on one leg“.
Incompetent bosses were replaced by powerless bosses
Raymond talks about incompetent bosses, and this is still the picture given by Dilbert. I was lucky enough that I did not have that many incompetent bosses. Sure, they all made a mistake here or there, but who doesn’t. On the other hand, my bosses appeared to be less and less empowered, and more and more trapped into a system that dictates what they can do and cannot do.
For example, “standard” applications and processes have been made mandatory in the name of cost savings. It has become more and more difficult to not be punished for maintaining local applications that do what you need, something now called “shadow IT“. And too bad if the standard, centralized applications lack the capacity or features, if they don’t scale, if they are hosted on servers that are too small, if there is practically no redundancy. In the name of cost savings, you accept that there will be several outages per month.
Again, what is really frightening these days is that you can talk to your manager about that problem, and he will talk to his manager, and so on, yet nothing will ever change. You never get any feedback like “we heard you, we will make this and that change”. Instead, what you get is top-down self-congratulatory message explaining that our IT is now so good we could practically run another company with it! In short, whatever you ask your boss, chances are he can’t do a thing about it, and chances are you won’t get it.
How can it be bad to reduce costs? Running a company is, after all, all about competing: competing for customers, competing for best costs, competing for highest revenues, competing for employees. So saving costs seems like a good way to get a competitive advantage.
But the key asset of a company is its employees. Everything else is really just support, tools to do the job. If you consider employees as an asset, you ultimately win, because employees work together to deliver great products. If you start considering them as a liability, as a cost center, as something that you need to eliminate rather than as something you want to optimize, you might get some short-term gains out of it. But I believe that in the long run, you can only lose. The only reason this strategy is so popular today is because most CxOs get more benefit from boosting short-term profits than they could from building a sustainable business model. It is more profitable to “cash out” on the accumulated assets of an existing big company than to solidify these assets for the future.
This is why right now, what I just described is too often the norm in the good old corporations, but not in companies still driven by their original founders. The founders of a company tend to have the same pride for their creation as Raymond has for his software. It is not a matter of scale: even large companies like Google still get it. Some corporations used to get it until their founders were replaced. But I think that we have enough evidence to know that companies can deliver a lot of good products and shareholder value while treating their employees really well.
Open-source vs. Corporate?
There is still one point where I differ slightly from Raymond’s point of view. I am not sure that there is anything mandatory about corporations crushing software developers. There is enough difference between one corporation and the next, enough difference even over the lifetime of a single corporation to believe that treating employees well has little relationship with how you distribute software.
On the contrary, I think that corporations who nurture their employees can provide the best possible environment to develop software, including FOSS. They can provide money, resources, financial safety that makes it easier to concentrate on the work, a sense of purpose or direction (like Mark Shuttleworth is trying to do with Canonical and Ubuntu). Linus Torvalds, the icon of free software developers, has worked for various corporations. An alternative is foundations. In all cases, you can only work on free software if you have enough money to buy a computer.
Where Raymond is right is that open-source software gives developers a whole lot more freedom and control about what will happen with the software. All the work that I put in a product once called “TS/5470 – ECUTEST”, a world-class real-time measurement and control software, was lost when Agilent (HP at the time) decided to shut it down. Nowadays, you can barely find this on Google. It’s too bad, it was really useful. Before even being released, it found bugs in every single piece of car electronics we tested with it, including production ones that had been running car engines for years. Even today, I think there is still nothing like it. But as far as I know, it’s lost.
Where FOSS falls short
Still, FOSS is not the ultimate solution. It is generally very good at replicating infrastructure and commodity software, where cost becomes marginal. It is not so good at innovating. I can’t think of any FOSS innovation similar in scope and impact to the iPhone, Google or Mosaic (which was a proprietary program, even if the source was available).
Even Linux, the poster child of FOSS software, a very innovative platform these days, started as a copycat of a proprietary product (Unix), and started making real progress only with the help of corporations. And as I wrote several years ago, the OS itself is a commodity:
The OS itself will probably fade into the background where it belongs. You don’t care much about the OS of a Palm Pilot or a network appliance or an ATM, and you shouldn’t. The OS would probably have disappeared from the public consciousness five years ago, weren’t it for Microsoft’s insistence on making it its main source of profit
This is exactly where Linux stands today: it is most successful in applications where you don’t see it (e.g. cell phones or appliances.)
What I’d like to see happen is genuine open-source innovation. But I’m afraid this cannot happen, because real innovation requires a lot of money, and corporations remain the best way to fund such innovation, in general with high hopes to make even more money in return.
We all need to eat
This is personal experience too. In the past year, I have been contacted by three companies to develop open-source software. But it was always working on their own agenda. None of them offered to work on my personal open-source project. And that’s the real problem. If, as Raymond points out, the pride you may have about the children of your minds matters that much (and clearly, it does matter to me), do I really win by leaving a product I invented for one I did not invent, even if it’s an open-source one?
Eric Raymond may have a second income that allows him to do what he wants. I personally don’t have this luxury. I would like nothing better than to work full time XL, on creating the most advanced programming language there is, but this is not going to happen. Unless someone, maybe me, suddenly realizes how much money they could make out of it, and decided to fund a company or to add this to an existing company’s projects.
My point is that, in the end, corporations fund innovation based on their own objectives. And in the end, we all need to eat, we all need someone to pay us. It’s not that different in the open-source world, except maybe for a few lucky stars that are about as representative of the open-source community as Bill Gates is representative of the corporate programmer.
What is scalability?
Simply put, scalability is the ability to take advantage of having more CPUs, more memory, more disk, more bandwidth. If I put two CPUs to the task, do I get twice the power? It is not generally true. As Fred Brooks said, no matter how many women you put to the task, it still takes nine months to make a baby. On the other hand, with nine women, you can make nine babies in nine months. In computer terminology, we would say that making babies is a task where throughput (the total amount of work per time unit) scales well with the number of baby-carrying women, whereas latency (the time it takes to complete a request) does not.
Computer scalability is very similar. Different applications will care about bandwidth or latency in different ways. For example, if you connect to Google Maps, latency is the time it takes for Google to show the map, but in that case (unlike for pregnant women), it is presumably improved because Google Maps sends many small chunks of the map in parallel.
I have already written in an earlier post why I believe HP has a good track record with respect to partitioning and scalability.
Scalability of virtual machines
However, IBM has very harsh words against HP Integrity Virtual Machines (aka HPVM), and describes HPVM scalability as a “downside” of the product:
The downside here is scalability. With HP’s virtual machines, there is a 4 CPU limitation and RAM limitation of 64GB. Reboots are also required to add processors or memory. There is no support for features such as uncapped partitions or shared processor pools. Finally, it’s important to note that HP PA RISC servers are not supported; only Integrity servers are supported. Virtual storage adapters also cannot be moved, unless the virtual machines are shut down. You also cannot dedicate processing resources to single partitions.
I already pointed out factual errors in every sentence of this paragraph. But scalability is a more subtle problem, and it takes more explaining just to describe what the problems are, not to mention possible solutions… What matters is not just the performance of a single virtual machine when nothing else is running on the system. You also care about performance under competition, about fairness and balance between workloads, about response time to changes in demand.
The problem is that these are all contradictory goals. You cannot increase the performance of one virtual machine without taking something away from the others. Obviously, the CPU time that you give to one VM cannot be given to another one at the same time. Similarly, increasing the reactivity to fast-changing workloads also increases the risk of instability, as for any feedback loop. Finally, in a server, there is generally no privileged workload, which makes the selection of the “correct” answers harder to make than for workstation virtualization products.
Checkmark features vs. usefulness
Delivering good VM performance is a complex problem. It is not just a matter of lining up virtual CPUs. HPVM implements various safeguards to help ensure that a VM configuration will not just run, but run well. I don’t have as much experience with IBM micro-partitions, but it seems much easier to create configurations that are inefficient by construction. What IBM calls a “downside” of HPVM is, I believe, a feature.
Here is a very simple example. On a system with, say, 4 physical CPUs, HPVM will warn you if you try to configure more than 4 virtual CPUs:
bash-2.05b# hpvmmodify -P vm7 -c 8 HPVM guest vm7 configuration problems: Warning 1: The number of guest VCPUS exceeds server's physical cpus. Warning 2: Insufficient cpu resource for guest. These problems may prevent HPVM guest vm7 from starting. hpvmmodify: The modification process is continuing.
It seems like a sensible thing to do. After all, if you only have 4 physical CPUs, you will not get more power by adding more virtual CPUs. There are, however, good chances that you will get less performance, in any case where one virtual CPU waits on another. Why? Because you increased the chances that the virtual CPU you are waiting on is not actually running at the time you request its help, independently of the synchronization mechanism that you are using. So instead of getting a response in a matter of microseconds (the typical wait time for, say, spinlocks), you will get it in a matter of milliseconds (the typical time slice on most modern systems).
Now, the virtual machine monitor might be able to do something smart about some of the synchronization mechanisms (notably kernel-level ones). But there are just too many ways to synchronize threads in user-space. In other words, by configuring more virtual CPUs than physical CPUs, you simply increased the chances of performing sub-optimally. How is that a good idea?
IBM doesn’t seem to agree with me. First, in their article, they complain about HP vPars implementing a similar restriction: The scalability is also restricted to the nPartition that the vPar is created on. Also, the IBM user-interface lets you create micro-partitions that have too many virtual CPUs with nary a peep. You can create a micro-partition with 16 virtual CPUs on a 2-way host, as illustrated below. Actually, 16 virtual CPUs is almost the maximum on a two way host for another reason: there is a minimum of 0.1 physical CPU per virtual CPU in the IBM technology, and 16 * 0.1 is 1.6, which only leaves a meager 0.4 CPU for the virtual I/O server.
The problem is that no matter how I look at it, I can’t imagine how it would be a good thing to run 16 virtual CPUs on a 2-CPU system. To me, this approach sounds a lot like the Fantasia school of scalability. If you remember, in that movie, Mickey Mouse plays a sorcerer apprentice who casts a spell so that his broom will do his chores in his stead. But things rapidly get wildly out of control. When Mickey tries to ax the brooms to stop the whole thing, each fragment rapidly grows back into a full grown broom, and things go from bad to worse. CPUs, unfortunately, are not magic brooms: cutting a CPU in half will not magically make two full-size CPUs.
Performing well in degraded configurations
Now, I don’t want you to believe that I went all defensive because IBM found a clever way to do something that HPVM can’t do. Actually, even if HPVM warns you by default, you can still force it to start a guest in such a “stupid” configuration, using the -F switch of hpvmstart. And it’s not like HPVM systematically performs really badly in this case either.
For example, below are the build times for a Linux 2.6.25 kernel in a variety of configurations.
4-way guest running on a 4-way host, 5 jobs[linux-18.104.22.168]# gmake -j5 real 5m25.544s user 18m46.979s sys 1m41.009s
8-way guest running on a 4-way host, 9 jobs[linux-22.214.171.124]# time gmake -j9 real 5m38.680s user 36m23.662s sys 3m52.764s
8-way guest running on a 4-way host, 5 jobs[linux-126.96.36.199]# time gmake -j5 real 5m35.500s user 22m25.501s sys 2m6.003s
As you can verify, the build time is almost exactly the same, whether the guest has 4 our 8 virtual CPUs. As expected, the additional virtual CPUs do not bring any benefit. In that case, the degradation exists, but it is minor. It is however relatively easy to build cases where the degradation would be much larger. Another observation is that running only enough jobs to keep 4 virtual CPUs busy actually improves performance: less time is spent for the virtual CPUs to wait on one another.
So, why do we even test such configurations or allow them to run, then? Well, there is always the possibility that a CPU goes bad, in which case the host operating system is most likely to disable it. When that happens, we may end up running with an insufficient number of CPUs. Even so, this is no reason to kill the guest. We still want to perform as well as we can, until the failed CPU is replaced with a good one.
In short, I think that HPVM is doing the right thing by telling you if you are about to do something that will not be efficient. However, in case you found yourself in that situation due to some unplanned event, such as a hardware failure, it still does the hard work to keep you up and running with the best possible performance.
Remaining balanced and fair
There is another important point to consider regarding the performance of virtual machines. You don’t want virtual machines to just perform well, you also care a lot about maintaining balance between the various workloads, both inside the virtual machine itself, and between virtual machines. This is actually very relevant to scalability, because multi-threaded or multi-processor software often scales worse when some CPUs run markedly slower than others.
Consider for example that you have 4 CPUs, and divide a task into four approximately equal-sized chunks. The task will only complete when all 4 sub-tasks are done. If one CPU is significantly slower, all other CPUs will have to wait for it. In some cases, such as ray-tracing, it may be easy enough for another CPU to pick up some of the extra work. For other more complicated algorithms, however, the cost of partitioning may be significant, and it may not pay off to re-partition the task in flight. And even when re-partitioning on the fly is possible, software is unlikely to have implemented it if it did not bring any benefit on non-virtual hardware.
Loading virtual machines little by little…
In order to get a better feeling for all these points, readers are invited to do the following very simple experiment with their favorite virtual machine technology. To maximize chances that you can run the experiment yourself, I will not base my experiment on some 128-way machine with 1TB of memory running 200 16-way virtual machines or anything über-expensive like that. Instead, I will consider the simplest of all cases involving SMP guests: two virtual machines VM1 and VM2, each with two virtual CPUs, running concurrently on a 2-CPU machine. What could possibly go wrong with that? Nowadays, this is something you can try on most laptops…
The experiment is equally simple. We will simply incrementally load the virtual machines with more and more jobs, and see what happens. When I ran the experiment, I used a simple CPU spinner program written in C that counts how many loops per second it can perform. The baseline, which I will refer to as “100%”, is the number of iterations that the program makes on a virtual machine, with another virtual machine sitting idle. This is illustrated below, with the process Process 1 running in VM1, colored in orange.
|CPU 1||CPU 2|
Now, let’s say that you start another identical process in VM2. The ideal situation is that one virtual CPU for each virtual machine gets loaded at 100%, so that each process gets a 100% score. In other words, each physical CPU is dedicated to running a virtual CPU, but the virtual CPUs belong to different virtual machines. The sum of the scores is 200%, which is the maximum you can get on the machine, and the average is 100%. This is both optimal and fair. As far as I can tell, both HPVM and IBM micro-partitions implement this behavior. This is illustrated below, with VM1 in orange and VM2 in green.
|CPU 1||CPU 2|
|Process 1||Process 2|
However, this behavior is not the only choice. Versions of VMware up to version 3 used about a mechanism called co-scheduling, where all virtual CPUs must run together. As the document linked above shows, VMware was boasting about that technique, but the result was that as soon as one virtual CPU was busy, the other physical CPU had to be reserved as well. As a result, in our little experiment, each process would get 50% of its baseline, not 100%. This approach is fair, but hardly optimal since you waste half of the available CPU power. VMware apparently chose that approach to avoid dealing with the more complicated cases where one virtual CPU would wait for another virtual CPU that was not running at the time.
|CPU 1||CPU 2|
Now, let’s fire a second process in VM1. This is where things get interesting. In that situation, VM1 has both its virtual CPUs busy, but VM2 has only one virtual CPU busy, the other being idle. There are many choices here. One is to schedule the two CPUs of VM1, then the two CPUs of VM2 (even if one is idle). This method is fair between the two virtual machines, but it reserves a physical CPU for an idle virtual CPU half of the time. As a result, all processes will get a score of 50%. This is fair, but suboptimal, since you get a total score of 150% when you could get 200%.
|CPU 1||CPU 2|
|Process 1||Process 3|
In order to optimize things, you have to take advantage of that ‘idle’ spot, but that creates imbalance. For example, you may want to allocate CPU resources as follows:
|CPU 1||CPU 2|
|Process 1||Process 2|
|Process 1||Process 3|
This scenario is optimal, since the total CPU bandwidth available is 200%, but it is not fair: process 1 now gets twice as much CPU bandwidth as processes 2 and 3. In the worst case, the guest operating system may end up being confused by what is going on. So one solution is to balance things out over longer periods of time:
|CPU 1||CPU 2|
|Process 1||Process 2|
|Process 1||Process 3|
|Process 3||Process 2|
This solution is again optimal and fair: process 1, 2 and 3 each get 66% of a CPU, for a total of 200%. But other important performance considerations come into play. One is that we cannot keep all processes on a single CPU. Keeping processes bound to a given CPU improves cache and TLB usage. But in this example, one of the processes (at least) will have to jump from one CPU to the other, even if the guest operating system thinks that it’s bound to a single CPU.
Another big downside as far as scalability is concerned is with respect to inter-process communication. If processes 1 and 3 want to talk to one another in VM1, they can do so without waiting only half of the time, since during the other half, the other CPU is actually running a process that belongs to another virtual machine. A consequence is that the latency of this inter-process communication increase very significantly. As far as I can tell, this particular problem is the primary issue with the scalability of virtual machines. VMware tried to address it with co-scheduling, but we have seen why it is not a perfect solution. Statistically speaking, adding virtual CPUs increases the chances that the CPU you need will not be running, in particular when other virtual machines are competing for CPU time.
Actual scalability vs. simple metrics
This class of problems is the primary reason why HPVM limits the size of virtual machines. It is not that it doesn’t work. There are even workloads that scale really well, Linux kernel builds or ray-tracing being good examples. But even workloads that scale OK with a single virtual machine will no longer scale as well under competition. Again, virtual machine scalability is nowhere as simple as “add more virtual CPUs and it works”.
This is not just theory. We have tested the theory. Consider the graph below, which shows the results of the same benchmark run into a variety of configurations. The top blue line, which is almost straight, is perfect scalability, which you practically get on native hardware. The red line is HPVM under light competition, i.e. with other virtual machines running but mostly idle. In that case, HPVM scales well up to 16-way. The blue line below is HPVM under heavy competition. If memory serves me right, the purple line is fully-virtualized Xen… under no competition.
In other words, if HPVM supports 8 virtual CPUs today, it is because we believe that we can maintain useful scalability on demanding workloads and even under competition. We know, for having tested and measured it, that going beyond 8-way will usually not result in increased performance, only in increasing CPU churn.
One picture is worth 210 words
As we have shown, making the right decisions for virtual machines is not simple. Interestingly, even the very simple experiment I just described highlights important differences between various virtual machine technologies. After launching 10 processes on each guest, here is the performance of the various processes when running under HPVM. In that case, the guest is a Linux RedHat 4 server competing against an HP-UX partition running the same kind of workload. You can see the Linux scheduler granting time to all processes almost exactly fairly, although there is some noise. I suspect that this noise is the result of the feedback loop that Linux puts in place to ensure fairness between processes.
By contrast, here is how AIX 6.1 performs when running the same workload. As far as I can tell, IBM implements what looks like a much simpler algorithm, probably without any feedback loop. It’s possible that there is an option to enable fair share scheduling on AIX (I am much less familiar with that system, obviously). The clear benefit is that it is very stable over time. The downside is that it seems quite a bit unfair compared to what we see in Linux. Some processes are getting a lot more CPU time than others, and this over an extended period of time (the graph spans about 5 minutes).
The result shown in the graphs is actually a combination of the effect of the operating system and virtual machine schedulers. In the case of IBM servers, I must say that I’m not entirely sure about how the partition and process schedulers interact. I’m not even sure that they are distinct: partitions seem to behave much like processes in AIX. In the case of HPVM, you can see the effect of the host HP-UX scheduler on the total amount allocated to the virtual machine.
Naturally, this is a very simple test, not even a realistic benchmark. It is not intended to demonstrate the superiority of one approach over the other. Instead, it demonstrates that virtual machine scalability and performance is not as simple as counting how many CPUs your virtual machine software can allocate. There are large numbers of complicated trade-offs, and what works for one class of workloads might not work so well for others.
I would not be surprised if someone shows benchmarks where IBM scales better than HPVM. Actually, it should be expected: after all, micro-partitions are almost built into the processor to start with; the operating system has been rewritten to delegate a number of tasks to the firmware; AIX really runs paravirtualized. HPVM, on the other hand, is full virtualization, which implies higher virtualization costs. It doesn’t get any help from Linux or Windows, and only very limited help from HP-UX. So if anything, I expect IBM micro-partitions to scale better than HPVM.
Yet I must say that my experience has not confirmed that expectation. In the few tests I made, differences, if any, were in HPVM’s favor. Therefore, my recommendation is to benchmark your application and see which solution performs best for you. Don’t let others tell you what the result should be, not even me…
IBM just posted a comparison of virtual machines that I find annoyingly flawed. Fortunately, discussion on OSnews showed that technical people don’t buy this kind of “data”. Still, I thought that there was some value in pointing out some of the problems with IBM’s paper.
nPartitions are tougher than logical partitions
What HP calls ‘nPartitions’ are electrically isolated partitions. IBM correctly indicates that nPartitions offers true electrical isolation. The key benefit, using IBM’s own words, is that nPartitions allow you to service one partition while others are online. You can for example extract a cell, replace or upgrade CPUs or memory, and re-insert the cell without downtime.
The problem is when IBM states that this is similar to IBM logical partitioning. In reality, electrically-isolated partitions are tougher. Like on IBM systems, CPUs can be replaced in case of failure. But you can replace entire cells without shutting down the system, contrary to IBM’s claim that systems require a reboot when moving cells from one partition to another.
Finally, IBM remarks that Another downside is that entry-level servers do not support this technology, only HP9000 and Integrity High End and Midrange servers. Highly redundant, electrically isolated partitions have a cost, so they don’t make so much sense on entry level systems. But it’s not a downside of HP that they only have partitioning technologies similar to IBM’s on entry level systems, it’s a downside of IBM not to have anything similar to nPartitions on their high-end servers.
Integrity servers support more operating systems
IBM writes: It’s important to note that while nPartitions support HP-UX, Windows®, VMS, and Linux, they only do so on the Itanium processor, not on the HP9000 PA Risc architecture. It is true that HP (Microsoft, really) doesn’t support Windows on PA-RISC, a long obsolete architecture, but what is more relevant is that IBM doesn’t support Windows on POWER even today. This is the same kind of spin as for electrically-isolated partitions: IBM tries to divert attention from a missing feature in their product line by pointing out that there was a time when HP didn’t have the feature either.
IBM writes that Partition scalability also depends on the operating system running in the nPartition,. Again, this is true… on any platform. But the obvious innuendo here is that the scalability of Integrity servers is inconsistent across operating systems. A quick visit to the TPC-H and and TPC-C result pages will demonstrate that this is false. HP posts world-class results with HP-UX and Windows, and other Itanium vendors demonstrate the scalability of Linux on Itanium.
By contrast, the Power systems from IBM or Bull all run AIX. And if you want to talk about scalability, IBM as of today has no 30TB TPC-H results, and it often takes several IBM machines to compete with a single HP system.
Regarding scalability of other operating systems, HP engineers have put a lot of effort early into Linux scalability. I would go as far as saying that 64-way scalability of Linux is probably something that HP engineers tested seriously before anybody else, and they got some recognition for this work.
HP offers both isolation and flexibility
IBM also tries to minimize the flexibility of hard partitions. If this is electrically isolated, it can’t be flexible, right? That’s exactly what IBM wants you to believe: They also do not support moving resources to and from other partitions without a reboot.
However, HP has found a clever way to provide both flexibility and electrical isolation. With HP’s Instant Capacity (iCap), you can move “right to use” CPUs around. On such a high-end machine, it is a good idea to have a few spare CPUs, either in case of CPU failure or in case of a sudden peak in demand. HP recognized that, so you can buy machines with CPUs installed, but disabled. You pay less for these CPUs, you do not pay software licenses, and so on.
The neat trick that iCap enables is to transfer a CPU from one nPartition to another without breaking electrical isolation: you simply shut it down in one partition, which gives you the right to use it in another partition without paying anything for it. All this can be done without any downtime, and again, it preserved electrical isolation between the partitions.
Shared vs. dedicated
Regarding virtual partitions, IBM is quick to point out the “drawbacks”: What you can’t do with vPars is share resources, because there is no virtualized layer in which to manage the interface between the hardware and the operating systems. Of course, you cannot do that with dedicated resources on IBM systems either, that’s the point of having a partitioning system with dedicated resources! If you want shared resources, HP offers virtual machines, see below. So when IBM claims that This is one reason why performance overhead is limited, a feature that HP will market without discussing its clear limitations, the same also holds for dedicated resources on IBM systems.
What is true is that IBM has attempted to deliver in a single product what HP offers in two different products, virtual partitions (vPars) and virtual machines (HPVM). The apparent benefit is that you can configure systems that have a mix of dedicated and shared resources. For example, you can have in the same partition some disks directly attached to a partition like they would be in HP’s vPars, and other disks served through the VIO in a way similar to HPVM. In reality, though, the difference between the way IBM partitions and HPVM virtual machines work is not that big, and if anything, is going to diminish over time.
A more puzzling statement is that The scalability is also restricted to the nPartition that the vPar is created on. Is IBM really presenting as a limitation the fact that you can’t put more than 16 CPUs in a partition on a 16-CPU system? So I tested that idea. And to my surprise, IBM does indeed allow you to create for example a system with 4 virtual CPUs on a host with 2 physical CPUs. Interestingly, I know how to create such configurations with HPVM, but I also know that they invariably run like crap when put under heavy stress, so HPVM will refuse to run them if you don’t twist its arm with some special “force” option. I believe that it’s the correct thing to do. My quick experiments with IBM systems in such configurations confirms my opinion.
IBM also comments that There is also limited workload support; resources cannot be added or removed. Resources can be added or removed to a vPar, or moved between vPars, so I’m not sure what IBM refers to. Similarly, when IBM writes: Finally, vPars don’t allow you to share resources between partitions, nor can you dynamically allocate processing resources between partitions, I invite readers to check the transcript of a 3-years old demo to see for how long (at least) that statement has been false…
HPVM now supports 8 virtual CPUs, not 4, and it has always been able to take advantage of the much large numbers of physical CPUs HP-UX supports (128 today). Please note that 8 virtual CPUs is what is supported, not what works (the red line in the graph shows 16-way scalability of an HPVM virtual machine under moderate competition using a Linux scalability benchmark; the top blue line is the scalability on native hardware; the bottom curves are HPVM under maximal competition and Xen under no competition.)
IBM also states that Reboots are also required to add processors or memory. , but this is actually not very different from their own micropartitions. In a Linux partition with a kernel supporting CPU hotplug, for example, you can disable CPU3 under HPVM by typing echo 0 > /sys/devices/system/cpu/cpu3/online, just like you would on a real system (on HP-UX, you would use hpvmmgmt -c 3 to keep only 3 CPUs). HPVM also features dynamic memory in a virtual machine, meaning that you can add or remove memory while the guest is running. You need to reboot a virtual machine only to change the maximum number of CPUs or the maximum amount of memory, but then that’s also the case on IBM systems: to change the maximum number of CPUs, you need to modify a “profile”, but to apply that profile, you need to reboot. However, IBM micro-partitions have a real advantage (probably not for long) which is that you can boot with less than the maximum, whereas HPVM requires that the maximum resources be available at boot time.
IBM wants you to believe that There is no support for features such as uncapped partitions or shared processor pools. In reality, HPVM has always worked in uncapped mode by default. Until recently, you needed HP’s Global Workload Manager to enable capping, because we thought that it was a little too cumbersome to configure manually. Our customers told us otherwise, so you can now directly create virtual machines with capped CPU allocation. As for shared processor pools, I think that this is an IBM configuration concept that is entirely unnecessary on HP systems, as HPVM computes CPU placement dynamically for you. Let me know if you think otherwise…
The statement IBM made that Virtual storage adapters also cannot be moved, unless the virtual machines are shut down. is also untrue. All you need is to use hpvmmodify to add or remove virtual I/O devices. There are limits. For example, HPVM only provisions a limited number of spare PCI slots and busses, based on how many devices you started with. So if you start with one disk controller, you might have an issue going to the maximum supported configuration of 158 I/O devices without rebooting the partition. But if you know ahead of time that you will need that kind of growth, we even provide “null” backing stores that can be used for that purpose.
But the core argument IBM has against HP Virtual Machines is: The downside here is scalability. Scalability is a complex enough topic that it deserves a separate post, with some actual data…
Preserving the investment matters
Let me conclude by pointing out an important fact: HP has a much better track record at preserving the investment of their customers. For example, on cell-based systems, HP allowed mixed PA-RISC and Itanium cells to facilitate the transition.
This is particularly true for virtual machines. HPVM is software, that HP supports on any Integrity server running the required HP-UX (including systems by other vendors). By contrast, IBM micropartitions are firmware, with a lot of hardware dependency. Why does it matter? Because firmware only applies to recent hardware, whereas software can be designed to run on older hardware. As an illustration, my development and test environments consists in severely outdated zx2000 and rx5670 (the old 900MHz version). These machines were introduced in 2002 and discontinued in 2003 or 2004. In other words, I develop HPVM on hardware that was discontinued before HPVM was even introduced…
By contrast, you cannot run IBM micro-partitions on any POWER-4 system, and you need new POWER-6 systems in order to take advantage of new features such as Live Partition Mobility (the ability to transfer virtual machines from one host to another). HP customers who purchased hardware five years ago can use it to evaluate the equivalent HPVM feature today, without having to purchase brand new hardware.
Update: About history
The IBM paper boasts about IBM’s 40-plus year history of virtualization, trying to suggest that they have been in the virtualization market longer than anybody else. In reality, the PowerVM solution is really recent (less than 5 years old). Earlier partitioning solutions (LPARs) were comparable to HP’s vPars, and both technologies were introduced in 2001.
But a friend and colleague at HP pointed out that a lot of these more modern partitioning technologies actually originated at Digital Equipment, under the name OpenVMS Galaxy. And the Galaxy implementation actually offered features that have not entirely been matched yet, like APIs to share memory between partitions.