Archive for June, 2008

Josh Simons, CEO Sun Microsystems

June 24, 2008

I received a phone call yesterday morning from a firm claiming to be preparing a plaque recognizing Sun’s selection as one of the 20 best large companies to work for in Massachusetts by the Boston Business Journal. They wanted to send a mock-up of the plaque to me for my approval. A weird request to make of a random engineer in a 30K+ person company. Figuring this was some sort of headhunter scam to extract additional information from me, I asked how they had found my name and phone number. Whereupon she asked, “Aren’t you the CEO?” I responded by giving her Sun’s main phone number in Burlington and suggested she call there for help.

I had a good laugh about this with Eric, the engineer in the office next to mine. As it happens, Eric’s office is across from our mailstop. While talking with him, I was idly sorting through my mail when I came across a piece with this mailing label:

The sender of this letter has nothing whatsoever to do with the firm that had just called me about the plaque. Oh boy. I have no idea how this happened and I can’t imagine what kinds of mailings, phone calls, and invitations I’ll now receive as a result. Jonathan, see you in Davos? 🙂

ClusterTools 8: Early Access 2 Now Available

June 23, 2008

The latest early access version of Sun HPC ClusterTools — Sun’s MPI library — has just been made available for download here. As an active member of the Open MPI community, we continue to build our MPI offering on the Open MPI code base, making pre-compiled libraries freely available and offering a paid support option for interested customers. Wondering why we would base our MPI implementation on Open MPI? Read this.

What is particularly cool about CT 8 is that in addition to supporting Solaris, we’ve added Sun support for Linux (RHEL 4 & 5 and SLES 9 & 10), including use of both the Sun Studio compilers and tools and GNU C. We’ve also included a DTrace provider for enhanced MPI observability under Solaris as well as additional performance analysis capabilities and a number of other enhancements that are all detailed on the <a href=" Early Access webpage.

Open MPI on the Biggest Supercomputer in the World

June 23, 2008

Los Alamos National Laboratory and IBM recently announced they had broken the PetaFLOP barrier with a LINPACK run on the Roadrunner supercomputer. The Open MPI community, including Sun Microsystems, was proud to have played a role in this HPC milestone. As described by Brad Benton, member of the Roadrunner team, the 1.026 PetaFLOP/s LINPACK run was achieved using an early, unmodified snapshot of Open MPI v1.3 as the messaging layer that tied together Roadrunner’s 3000+ AMD-powered nodes. For more details on specific MPI tunables used, read this subsequent message from Brad and this follow-up message from Jeff Squyres, Open MPI contributor from Cisco.

About two years ago, we decided to change Sun’s MPI strategy from one of continuing to develop our own proprietary implementation of MPI to instead joining a community-based effort to create a scalable, high-performance, and portable implementation of MPI. We joined the Open MPI community because we felt (and still feel) strongly that combining forces with other vendors and other organizations is the most effective path to creating the middleware infrastructure needed to support the needs of the HPC community into the future.

Sun was the 2nd commercial member to join the Open MPI effort, which at the time consisted of a small handful of research and academic organizations. Two years later, the community looks like this:


This mix of academic/research members and commercial members brings together into one community a focus on quality, stability and customer requirements on the one hand, with a passion for research and innovation on the other. Of course, it does also create some challenges as the community works to achieve an appropriate balance between these sometimes opposing forces, but the results to date have been impressive, as witnessed by the use of Open MPI to set a new LINPACK world record on the biggest supercomputer in the world.

Lufthansa Schadenfreude

June 22, 2008

[WARNING: This blog entry is primarily about vomit.]

I flew home from Dresden via Frankfurt on Friday, boarding LH 422 for the eight-hour flight to Boston. Sitting just forward of me was a coed group of boisterous college-aged kids who mostly quieted down once they had stowed their gear and found their seats. With the exception of Oscar, who was sitting two seats in front of me across the aisle.

Oscar’s behavior went beyond boisterous, well into the realm of obnoxious. He was loud, he was rude, he was up out of his seat repeatedly, unable to sit still. I’m a seasoned air traveler and very used to ignoring the various mis-behaviors of my fellow passengers, but there was something about Oscar I found particularly grating. He persisted in this behavior up until the first meal service, at which point some god out of some pantheon smote him but good and he threw up all over himself in spectacular fashion. When he stood up after being directed aft by an attendant, I saw that he was literally covered in vomit–all over his shirt and down his pant legs and in considerable quantity. And there was apparently enough left over to have covered the seat as well, since the attendant later placed a pillow on it to make the seat usable again.

Peace reigned for most of an hour while Oscar presumably cleaned himself up. He then reappeared–shirtless. Perhaps a little quieter, but still with plenty of swagger. Which I must say I viewed with some amusement since his cool demeanor did not jive with the fact that from the rear one could see that the entire crotch of his pants was completely packed with now-drying vomit which he had apparently missed in the clean-up effort. Ah, I thought to myself. This was schadenfreude.

Eventually one his of traveling companions gave Oscar a t-shirt and he fell asleep sitting on his vomit-laden pillow for most of the remainder of the flight. Later, mention of a $120 bar bill lead me to conclude that Oscar had had far too much to drink prior to boarding the flight.

Inside NanoMagnum, the Sun Datacenter Switch 3×24

June 19, 2008

Here is a quick look under the covers of the new Sun Datacenter Switch 3×24, the new InfiniBand switch just announced by Sun at ISC 2008 in Dresden. First some photos and then an explanation of how this switch is used as a Sun Constellation System component to build clusters with up to 288 nodes.

First, the photos:

Nano Magnum’s three heat sinks sit atop Mellanox 24-port InfiniBand 4x switch chips. The purple object is an air plenum that guides air past the sinks from the rear of the unit.
Looking down on the Nano, you can see the three heat sinks that cover the switch chips and the InfiniBand connectors along the bottom of the photo. The unit has two rows of twelve connectors with the bottom row somewhat visible under the top row in this photo.
The Nano Magnum is in the foreground. The unit sitting on top of Nano’s rear deck for display purposes is an InfiniBand NEM. See text for more information.

You might assume NanoMagnum is either a simple 24-port InfiniBand switch or, if you know that each connector actually carries three separate InfiniBand 4X connections, a simple 72-port switch. In fact, it is neither. NanoMagnum is a core switch and none of the three InfiniBand switch chips is connected to the others. Since it isn’t intuitive how a box containing three unconnected switch chips can be used to create single, fully-connected clusters, let’s look in detail at how this is done. I’ve created two diagrams that I hope will make the wiring configurations clear.

Before getting into cluster details, I should explain that a NEM, or Network Express Module, is an assembly that plugs into the back of each of the four shelves in a Sun Blade 6048 chassis. In the case of an InfiniBand NEM, it contains the InfiniBand HCA logic needed for each blade as well as two InfiniBand leaf switch elements that are used to tie the shelves into cluster configurations. You can see a photo of a NEM above.

The first diagram (below) illustrates how any blade in a shelf can reach any blade in any other shelf connected to a NanoMagnum switch. There are a few important points to note. First, all three switch chips in the NanoMagnum are connected to every switch port, which means that regardless of which switch chip your signal enters, it can be routed to any other port in the switch. Second, you will notice that only one switch chip in the NEM is being used. The second is used only for creating redundant configurations and the cool thing about that is that from an incremental cost perspective, one need only buy additional cables and additional switches–the leaf switching elements are already included in the configuration.

If the above convinced you that any blade can reach any other blade connected to the same switch, the rest is easy. The diagram below shows the components and connections needed to build a 288-node Sun Constellation System using four NanoMagnums.

Clusters of smaller size can be built in a similar way, as can clusters that are over-subscribed (i.e. not non-blocking.)

Sun Announces Hercules at ISC 2008 in Dresden

June 18, 2008

Last night in Dresden at the International Supercomputing Conference (ISC 2008), Sun unveiled Hercules, our newest Sun Constellation System blade module. Officially named the Sun Blade X6450 Server Module, Hercules is a four-socket, quad-core blade with Xeon 7000 series processors (Tigerton) that fits into the Sun Blade 6048 Chassis, the computational heart of Sun’s Constellation System architecture for HPC. According to Lisa Robinson Schoeller, Blade Product Line Manager, the most notable features of Hercules are its 50% increase in DIMM slots per socket (six instead of the usual four), the achievable compute density at the chassis level (71% increase over IBM and 50% increase over HP), and the fact that Hercules is diskless (though it does also support a 16 GB on-board CF card that could be used for local booting.) A single Constellation chassis full of these puppies delivers over 7 TeraFLOPs of peak floating-point performance.

Lisa Schoeller and Bjorn Andersson, Director for HPC, showing off Hercules, Sun’s latest Intel-based Constellation blade system

Sun Announces NanoMagnum at ISC 2008 in Dresden

June 18, 2008

Last night in Dresden at ISC 2008 Sun announced NanoMagnum, the latest addition to the Sun Constellation System architecture. Nano, more properly called the Sun Datacenter Switch 3×24, is the world’s densest DDR InfiniBand core switch, with three 24-port IB switches encased within a 1 rack-unit form factor. Nano complements its big brother Magnum, the 3456-port InfiniBand switch used at TACC and elsewhere, and allows the Constellation architecture to scale down by supporting smaller clusters of C48 (Sun Blade 6048) chassis. Nano also uses the same ultra-dense cabling developed for the rest of the Constellation family components, with three 4X DDR connections carried in one 12X cable for reduced cabling complexity.

Here are some photos I took of the unveiling on the show floor at ISC.

</table

IDC HPC Briefing at ISC 2008 in Dresden

June 18, 2008

My notes from IDC‘s HPC briefing at the ISC 2008 conference here in Dresden. The IDC folks are fast slide flippers so interesting details are missing in some cases–best I could do.

New HPC tracking methods review. HPC market update (2007, forecasts, cluster update) New IDC research findings — HPC management software

HPC means “all technical servers” — HPC, HPTC, technical servers, highly computational servers are all synonyms. Anything bigger than a desktop that is computationally or data intensive.

HPC growth stronger than expected. 19% CAGR over last four year. $11.5B in 2007. x86 and Linux dominate. Blades making inroads into all segments. Clusters continue to gain share. Major challenges are power, cooling, realestate, storage and data management, system management. Software continues to be a major hurdle.

Supercomputer now equals anything over $500K. Divisional $250K-$499K. Departmental is $100K-$249K. Workgroup 0-$99K. Biggest growth is in departmental space. 45% CAGR Departmental.

Total server market is about $52B. Of that business servers are $42B and HPC is about $10B.

Look at all server processors, technical versus commercial. Almost one third in HPC at this point. Virtualization has reduced processor counts in commercial. And HPC customers buy lots of processors because they (x86 called out) do a poor job of running required workloads–need more. Many-core processors amplify this.

Clusters still hard to use and manage. System management and growing cluster complexity. Power, cooling, floor space. Third party cost issues — up to 60-65% of budget in some cases.

Software becoming #1 bottleneck. Better management software needed–new buyers require high ease of use. Parallel software issues related to multicore. Application rewrites required.

HPC vendor market share: HP 32.9%; IBM 32.9%; Dell 17.8%; Sun 4.6%. x86 now at 72% share in 2007. Linux 74% share in 2007. Watching Windows closely — has a stable 5% share, roughly.

HPC processors are now shipping at a rate of over 3.3M/year. HPC clusters represent over 2.6M/year.

Forecast results based on these assumptions: Worldwide economic downturn will negatively impact overall IT spending, but HPC will be somewhat insulated due to its R&D nature. Commodization will continue to rule. Global petascale initiative will push technical innovation. Major growth areas will be energy, defense, security, manufacturing, and entertainment. Many-core will ignite a new growth curve as customers buy more processors, BUT this has not been factored into forecast.

5-year forecast says that major growth will be in the low to mid parts of the market with double digit growth in those segments. 9.2% overall market growth for market. Storage five-year CAGR predicted to be 11.4%.

Blades optimize on environmental factors. One and two RU rack servers will also continue to be important for their flexibility.

Overall guidance on areas of focus for vendors: work on getting more memory bandwidth to sockets, specialised processors, speed of interconnect, better data management, power and cooling. Pay attention to the mid and low end of market where most growth will occur.

Introduced a new term: “hpc management software” in recognition of fact that this area is becoming a large problem as clusters get larger and more complex and customers take an a la carte approach to systems, putting heavy burden on system administrators and user support personnel.

There then followed a presentation of the joint work done by IDC and the Council on Competitiveness. The Reflect and Reveal studies, which are well worth reading to understand the perspective of new entrants into HPC, a high growth area for the market. The reports are available for download here. This is the same material that was presented at the IDC HPC User Forum meeting in Norfolk, Virginia this past April.

New areas of research for IDC. Extreme computing — any kind of technology (hw/sw/business models) that may define the datacenter of the future. Adding performance and price performance to quarterly views. HPC storage market sizing and forecasts, including data management research. HPC end-user quarterly view–what is actually going on at sites, including info on what benchmarks customers are using to make selection decisions. Multi-core issues, specifically application impacts. New low-end workgroup computers–expect people moving from a desktop to get their apps running in parallel and then jump in at workgroup level or perhaps higher–not clear at this point. Tracking petascale and exascale initiatives. Country-level HPC tracking — 17 countries. Market share by application/industry by vendor. Datacenter assessment and benchmarking

HPC User Forum meetings will include OIl/gas meetings and Finance meetings to do drilldowns on technical requirements in these areas.

HPC Storage growth charts. Total $3.7B in 2006. Growth projection showed good growth and absolute market size below compute and above service.

User Forum meetings: Oct 13-14 Stuttgart; Oct 16 London; Sept 8-10 Tucson, AZ.

Burton Smith: The Killer Micros II: The Software Strikes Back

June 17, 2008

My notes from Burton Smith’s talk in the Cluster session at ISC 2008 in Dresden.

The Killer Micros II: The Software Strikes Back
Dr. Burton Smith , Microsoft Corporation, USA

Spent 30 years in High Performance Computing, but not his day job at Microsoft though he does some night work. His day job is parallel computing on clients — general purpose parallel computing — multicore.

Cluster Software is Primitive. Programming is at too low a level (C++, OpenMP, MPI.) Tools are too few and too thinly supported — cluster market is not big enough. Applications are too cluster-specific because the infrastructure is different on various clients–islands of cluster specificity.

Big Changes are Coming. First, the many-core inflection point–parallel computing comes to the mainstream. Second, cloud and corporate computing–better data searches and access and service-based application software. SOAP, AJAX, XML. Yes, partly. But it is really about breaking applications into communicating pieces and distributing them.

At SC ’89, Eugene Brooks predicted mainstream hardware would dominate the HPC area, “None will survive the attack of the killer micros!” Now it is software’s turn. Meaning that HPC will soon be dominated by software from outside of HPC.

Client Many-Core Parallelism. Client computing will soon be parallel, even on mobile devices and phones. Why? More absolute performance, and more performance per watt. Parallel languages and tools are underway–they are need to use the new hardware well. And this does not mean adding parallel for-loops to C++.

Cluster versus mainstream software. SPMD parallelism with OpenMP versus mixed task/data parallelism; fixed processor counts versus variable processor counts; MPI versus Internet and MPI; C++ versus C#, F#, Excel, SQL, C++; file system versus databases, cloud, file system;

Emerging mainstream software has richer capabilities.

Software as a Service. Software services will run everywhere— in clients, on servers, in the cloud. Which brings distributed computing to the forefront. Service-based apps are distributed with data intensity where data are and computational intensity where processors are.

Conclusions. Cluster software is still primitive because the market is so small. Two major revolutions now underway in computing that will change the landscape. Clusters will be affected by these changes.

HPC Consortium: Sun Recognizes First Constellation System Customers

June 17, 2008

At the Sun HPC Consortium gala dinner last night here in Dresden, Marc Hamilton, Sun’s Vice President of Systems Practice Americas, recognized Sun’s first four Sun Constellation System customers and presented each with a plaque. The text of the plaques is shown below along with some photos of the awards.

</td
Bjorn Andersson, Director of HPC, and Marc Hamilton, VP System Practices Americas, prepare to unveil NanoMagnum at ISC 2008</td
</td
Ta da. The Sun Datacenter Switch 3×24, sitting in front of a Sun Blade 6048 chassis and on top of a scale model of TACC Ranger, the world’s largest Sun Constellation system</td
</td
A closeup of the new switch</td
</td
Bjorn uncorks a magnum of champagne to celebrate</td
</td
Lisa Robinson Schoeller (Blade Product Line Manager) and Bjorn prepare to spread good cheer</td
</td
The official toast</td
</td
NanoCork</td