Archive for June, 2009

HPC in Hamburg: Sun Customers Speak at the HPC Consortium

June 24, 2009

It’s crazy time again. I’m in Hamburg for two HPC events: Sun’s HPC Consortium customer event, and ISC ’09, the International Supercomputing Conference. The Consortium ran all day Sunday and Monday and then ISC started on Tuesday. It is now Wednesday and this is the first break I’ve had to post a summary talks of given at the Consortium. Due to the sheer number of presentations, including a wide range of Sun and partner talks, I only summarize those given by our customers. The full agenda is here.

Our first customer talk on Sunday was given by Dr. James Leylek, Executive Director of CU-CCMS, the Clemson University Center Computational Center for Mobility Systems, which focuses on problems in the automotive, aviation/aerospace, and energy industries.

The mission of CU-CCMS is not unique — there are numerous university-based centers that work closely with industry by bringing resources and expertise to bear in a variety of problem domains. What sets CU-CCMS apart is its focus on addressing the mismatch between typical university time-scales and those of their industrial partners. Businesses need results quickly; universities move more slowly.

CU-CCMS has addressed this need in a few ways. They’ve staffed the center with full-time MS and PhD level engineers who have no teaching responsibilities. And they have provided a significant amount of computing gear to enable those engineers to work effectively with their industrial partners and generate results in a timely way.

Heterogeneity is another key part of the CU-CCMS strategy. By offering a range of computing platforms from clusters to very large shared-memory machines (from Sun) they are able to map problems to appropriate resources to deliver the fast turnaround times required by their industrial partners.

Dr. Leylek also briefly discussed the challenge of introducing HPC to industry as detailed in the Council on Competitiveness study, Reveal. As he noted, many companies are “sitting on the sidelines” of HPC and not engaging even though they could increase their competitiveness by using HPC techniques. He believes CU-CCMS offers a model for how such engagements can be run successfully: assemble a team of expert, dedicated technical resources with appropriate domain knowledge, algorithmic expertise, etc, and combine that with ample high performance computing infrastructure, and an understanding that turnaround time is critical for successful industrial engagements. And then generate valuable results. Lather, rinse, repeat.

Thomas Nau from the University of Ulm gave the next talk, which was a quick tour through several OpenSolaris technologies. He talked about COMSTAR, gave a quick demo of the new OpenSolaris Time Slider, and spent most of time talking about ZFS, specifically about the benefits of solid state disks for increasing ZFS performance. Thomas identified the ZIL — the ZFS Intent Log — as the component most often affecting performance. Experiments he has done that involved moving the ZIL from a standard hard disk to a ramdisk have shown significant ZFS performance improvements. In addition, only a small amount of solid state storage is needed to achieve good performance, e.g. perhaps 1-4 GB even for multi-TeraByte drives. Thomas noted that while one could theoretically increase ZFS performance by disabling the ZIL, DO NOT DO THIS. He then ended with the following statement, with which I can only agree: “Hardware RAID is dead, dead, dead. Just use ZFS.” 🙂

Our first customer talk on Monday was given by Prof. Dr. Thomas Lippert head of the Jülich Supercomputing Center(JCS), site of Sun’s largest European deployment to date of our Sun Constellation System architecture. He first gave a brief history of the Jülich Research Center, which is one of the largest civilian research centers in Europe with over 4000 researchers in nine departments, one of which is the new Institute for Advanced Simulation of which JCS is a part. The site has a very long history of computer acquisitions, starting in 1957. This year JCS purchased three systems: a Sun system (JuRoPa), a Bull system (HPC-FF), and an IBM system (Jugene.) These systems have, respectively, 200 TFLOPs, 100 TFLOPs, and 1 PFLOPs of peak performance. Since the Sun and Bull systems are interconnected at the highest level of their switch hierarchies, the two machines can be run as a single system. This combined system delivered 274.8 TFLOPs on LINPACK which earned it the #10 entry on the latest edition of the TOP500 list. Collectively, JCS serves about 250 projects across Europe, including 20-30 highly scalable projects that are chosen by international referees for their potential for producing breakthrough science.

Dr. Lippert also spoke briefly about PRACE, the Partnership for Advanced Computing in Europe, which is radically changing the supercomputing landscape across Europe. Due to earlier studies, computing is now considered to be a crucial pillar of research infrastructure and, as such, it is now receiving considerable attention from funding agencies.

In closing, Dr. Lippert presented specific details of the JuRoPa system (2208 nodes, 17664 cores, 207 TFLOPs, 48 GB/node, and Sun’s new M9 QDR switch.) He also described some of specific issues that will be explored with these systems, including control of jitter through the use of gang scheduling, daemon reduction, a SLERT kernel, etc. And some additional secret sauce from Sun perhaps. 🙂

Prof. Satoshi Matsuoka from the Tokyo Institute of Technology spoke next. While he did mention Tsubame, Tokyo Tech’s Sun-based supercomputer, he primarily spoke about the return of vector machines to HPC. They have, he believes, been reincarnated as GPGPU-based machines. Dinosaurs are once again walking the earth. 🙂 In particular, the GPGPU’s high compute density, high memory bandwidth, and low memory latency
echo some of the fundamental capabilities of vector machines that make them interesting for both tightly coupled codes like N-body as well as sparse codes like CFD. In his view, the GPGPU essentially becomes the main processor while the CPU becomes an ancillary processor.

Computers, however, are not useful unless they can be used to solve problems. To support the fact that GPGPU-based clusters can be effective HPC platforms, Prof. Matsuoka presented results from several new algorithms that have been developed at Tokyo Tech to take advantage of GPGPU-based systems. He showed impressive results for 3D FFTs used for protein folding and results for CFD with speedups up to 70X over CPU-based algorithms.

Our next customer speaker was Henry Tufo of the University of Colorado at Boulder (UCB) and the National Center for Atmospheric Research (NCAR.) He gave an update on UCB’s upcoming Constellation-based HPC system and also spoke about some of the challenges related to climate modeling. It seems clear at this point that accurate climate modeling is going to be critical for understanding our future and our planet’s future. It was a bit daunting to hear that climate modelers would like to increase many dimensions of their simulations, including spatial resolution by 10^3 or 10^5, the completeness of their models by a factor of 100x, the length of their simulator runs by 100x, and increase the number of modeled parameters by 100x. All told, their desires would increase computational needs by 10^10 or 10^12 over current requirements. It was sobering to hear that current technology trajectories predict that a 10^7 improvement will take about 30 years. Not good.

Their new Sun-based system will consist of 12 Constellation racks, Nehalem blades, QDR InfiniBand, about 500 TB of storage with about 10% of the clusters nodes accelerated with GPUs. The system will be located next to an existing physics building in three containers — one for the IT components, one for electrical, and one for cooling.

Stephane Thiell from the Commissariat à l’Énergie Atomique (CEA) gave an overview CEA, talked a bit about CEA’s TERA-100 project and then detailed CEA’s planned use of Lustre for TERA-100. The CEA computing complex currently has two computing centers, one classified (TERA) and one open (CCRT.) TERA-100 will be a follow-on to TERA-10, which is a 60 TFLOPs, Linux-based system built by Bull in 2005. It includes an impressive 1 PetaByte Lustre filesystem and uses HPSS to archive to Sun StorageTek tape libraries with a 15 PetaByte capacity.

TERA-100 aims to increase CEA’s classified computing capacity by about 20x with a final size of one PFLOPs or perhaps a little larger. CEA plans to continue with their COTS-based, general-purpose approach rather than move of the main sequence to something more exotic. It will be x86-based with more than 500 GFLOPs per node using 4-socket nodes. There will be 2-4 GB per core and two Lustre file systems will be supported, one with a 300 GB/s transfer requirements and the other with a 200 GB/s requirement. The system will consume less than 5 MW. A 40 TFLOPs demonstrator system will be built first and it will include scaled-down versions of the Lustre file systems as well. In the final system the Lustre servers will be built with four-socket nodes and a four-node HA architecture will be used to guarantee against failure and to avoid long failover times.

CEA is involved in some interesting Lustre-based development, including joint work with Sun on a binding between Lustre and external HSM systems with the goal of supporting Lustre levels of performance with transparent access to hierarchical storage management. CEA is also working on Shine, a management tool for Lustre.

Dieter an Mey gave some general information about computing at RWTH Aachen University and then gave an update on their latest acquisition, a Sun-based supercomputer. He ended with a discussion about the pleasures and perils of workload placement on current generation systems. Along the way he shared some feedback on Sun products — one of those habits that makes customers like Dieter such valuable partners for Sun.

Aachen provides both Linux and Windows-based HPC resources for their users. On Linux they record about 40,000 batch jobs per month and perhaps 150 interactive sessions per day. The Windows cluster is used primarily for interactive jobs. It was interesting to hear that Windows is gaining ground with respect to Linux at Aachen: a previous study at Aachen had shown that Windows lagged Linux in performance by about 24%, but a recent re-run of the study now shows the gap to be on the order of about 7%.

Aachen’s new system will support both Linux and Windows equally with a flexible dividing line between them. The facility is designed to be general purpose with a mix of thin and fat nodes and with the required high-speed interconnect for those who use MPI. A new building is being erected to house this machine which will come fully online over the course of 2009-2010. When complete, the system will have a peak floating point rate in excess of 200 TFLOPs and it will include a 1 PetaByte Lustre file system. Speaking of Lustre, Dieter rated its configuration as “complex”, something Sun is working on. The system will also include two of Sun’s latest InfiniBand switches, the new 648-port QDR M9 switch.

Dieter’s final topic was the correct placement of workload on non-uniform system architectures. In particular, he described the difference between compact and scatter placement on multicore NUMA systems. Compact placement uses threads on the same core first, then cores in the same socket — a strategy that is used to minimize latency and to maximize cache sharing. Scatter placement uses threads on different sockets first, and then threads on different cores — a strategy that maximizes memory bandwidth. Which strategy is best depends on the details of an application’s underlying algorithms. (Dieter noted that currently Sun Grid Engine is not aware of these issues — it treats nodes as flat arrays of threads or cores.) Placement decisions are further complicated when attempting to schedule more than one application onto a fat node. For example, different strategies would be used depending on whether single job turnaround is more important than overall throughput of jobs.

Our last customer talk at the Consortium was given by the tag team of Arnie Miles (left) from Georgetown University and
Tim Bornholtz
(right) of the Bornholtz Group. Their topic was the Thebes Consortium for which they presented current status, did a short demo, and announced that the source code would be available by the end of June on

The Thebes Consortium aims to help the widespread adoption of distributed computing technologies by creating an enabling infrastructure that focuses on scalability, security, and simplicity.

Arnie described (and Tim demo’ed) the instantiation of the Thebes first use case which assumes 1) that users have usernames and passwords in their home domain, 2) that one or more local resources have a trust relationship with a local STS (secure token service), 3) that these resources are known to users, and 4) that all resources are able to consume SAML.

The use case itself consists of the following actions: 1) users create job submission files using the client application or a command line, 2) users use institution usernames and passwords to acquire a signed SAML token, 3) users perform no other logins and do not have to go to a resource command line interface, 4) users manually choose their resources, 5) job scheduling is handled by resources. Note that in this instance “resource” refers to a DRM-managed cluster which will accept the incoming request and then schedule the job appropriately on its managed cluster. In the prototype as it currently exists, a service is a compute service though there is also some level of support for a file system service as well.


FORTRAN: Calling All Dinosaurs!

June 18, 2009


Please ASSIGN some time to RECORD your opinions about current and future FORTRAN needs in our non-COMPLEX online survey. It is in your INTRINSIC self-interest to PAUSE and DO so.

It is IMPLICIT and LOGICAL that you also CALL on your colleagues (those CHARACTERs) to READ this, get REAL, and make an ENTRY as well.

You can OPEN the survey IF you GOTO here.

(Something we share in COMMON: I am a FORTRAN TYPE as well and am eligible to join the Dinosaur UNION.)

unConference: The Future of Software & the Internet

June 8, 2009

The Massachusetts Technology Leadership Council held an unconference on Sun’s Burlington campus last Friday, titled The Future of Software & the Internet. I attended because I was both interested in the topic and also curious about the logistics and effectiveness of unconferences.

I was surprised when the moderator asked everyone in the room to introduce themselves by stating their name and either their company or their location. C’mon! There were well over 200 people in the room and we were not sitting in neat rows. And yet it worked somehow. I of course didn’t remember any names, but I got a good sense of the companies represented–the usual suspects (Sun, IBM, HP, Microsoft, Google, CISCO, etc) as well as many (MANY) small companies, venture capitalists, and several attendees with undisclosed affiliations. In addition, there was probably some benefit in having everyone actually make a vocalization at the outset — something about participating rather than just observing. In any case, it didn’t take long and it was a good ice breaker. And it perhaps helped everyone feel the next step was achievable as well: creating an agenda for the rest of the day, based on everyone’s input. And doing so in a finite time. 🙂

An unconference is an unconference at least in part because the agenda is not defined beforehand by a conference committee–it is created on the fly by participants at the start of the event with the help of a skilled moderator. At the start of the day, our agenda had four hour-long discussion sessions blocked out, but no content at all. Content was identified this way:

  1. Anyone who was interested in hosting a discussion wrote their name and discussion title on a sheet of paper. Proposers would be responsible for running their session, but not for having any answers necessarily.
  2. Proposers then lined up and each gave a short (SHORT) description of their discussion idea. We had two mikes and therefore two lines that alternated, which helped this part run a little faster.
  3. After announcing their discussion idea, each proposer placed their paper onto a matrix posted on the wall. Our matrix had four rows — one for each of the day’s four one-hour sessions. The matrix had 15 columns, one for each conference room or area designated for discussion, each of which was labeled with a letter from A to O. Proposers could place their discussion in any cell, though there was some encouragement from the moderator to ensure that the last session of the day had a good number of discussions scheduled. Each column was also labeled with the approximate size of each discussion area as input to the heuristic placement procedure. Cloud discussions went into big rooms while the “making parallel programming easier” discussion area had just a couch and a few chairs.

I’m guessing there is some rule of thumb that helps to organizers decide how many concurrent sessions will be needed based on the number of attendees. However it was done, the number of proposed topics mapped nicely to the 4 x 15 = 60 available discussion slots.

Once the agenda was complete, the moderator helped everyone get to their first discussion by reading aloud the titles and locations of the first set of concurrent topics. The entire agenda matrix was then moved to a wall in a central location so attendees could easily visit it between sessions to pick their next discussion topic.

All of the above — from opening, through introductions and agenda forming — took less than an hour. The resulting agenda cast a wide net over the theme of the unconference. There were discussions on business models, specific technical issues, models of innovation, development and testing processes, open source, cloud computing, etc. I participated in the following four discussions:

  • Simplified Parallel Computing
  • How to Start Your Idea [with Almost No Money]
  • The Future of Software Testing
  • From Data to Answers

I learned something in each discussion, though in the parallel computing case it was merely that talk of SIMD, MIMD, OpenMP, parallel spreadsheets, M language processing, and streaming parallelism is a sure way to keep your discussion group small. They were dropping like flies. 🙂

Kidding aside, I was interested to talk to testing practitioners about the 2nd class role played by QA in the engineering hierarchy and how Agile methods might perhaps mitigate that problem by making quality an explicitly shared goal of all team members.

I approached the “data to answers” session wondering if HPC techniques for turning large amounts of data into insights would be applicable in a broader business context and learned that many businesses have a sad lack of experience with even the simplest of analytical methods, including a lack of understanding of even relatively simple data displays.

The “ideas” discussion presented a model for thinking about “intention” as being different from “invention” in the innovation process and how confusing the two can lead to problems in start-up situations. Intention is a statement about who you want to help or what you want to improve, while invention is how one chooses to satisfy the intention. Bill Warner, who lead the session, used the Wildfire voice system as an example to show how confusing these two concepts can lead to problems.

The Innovation unConference, MassTLC’s next such event, will be held on the Sun Burlington campus on October 1, 2009. I plan to attend.

Building Packages for OpenSolaris: Easier than Ever

June 2, 2009

In a previous entry I documented in detail how I contributed an open-source package (Ploticus) to OpenSolaris using SourceJuicer, starting with how to write a spec file and ending with the inclusion of the package in the contrib repository. In truth, at the time I published the information I had not actually taken the last step to promote the package from the pending repository to the contrib repository due to a Ploticus bug I discovered during testing. Ploticus ran okay, but it was not configured as I had wanted. It took me some time to create appropriate patch files, rebuild the package, re-test it, etc.

In retrospect, I’m glad I was delayed because in the meantime OpenSolaris 2009.06 and SourceJuicer 1.2.0 were both released, which gave me a chance to see if any improvements had been made in the contribution process. I am happy to report that improvements were definitely made. Read on for details.

Most important, SourceJuicer documentation has been much improved. See, for example, How to Use OpenSolaris SourceJuicer for a good overview of the submission process. In addition, the short (9 min) video below, which walks through the mechanics of submitting files using SourceJuicer, is also an excellent resource:

SourceJuicer itself has also been improved significantly with this latest release. For example, it is now possible to delete a submitted file if it is no longer needed—I was able to use SourceJuicer 1.2.0 to remove an incorrect copyright file I had created when I first submitted Ploticus. While I appreciated that improvement, I found the following much more intriguing:

The screendump above shows the results of recent SourceJuicer builds, including Ploticus. I was happy to see Ploticus built successfully with the patches I had created on my first try. I was also curious about the implied promise of the new Install column. Since I next wanted to install and test this latest package on my 2009.06 system, I clicked on the Install link. And saw this:

Hey, cool. Firefox knows it should invoke the Package Manager to handle my request. How? With OpenSolaris 2009.06 we’ve enhanced the Package Manager to support a web installer mode and created a new mime type (application/ to pass package installation requests from a web page to Package Manager. This works from any web browser so long as the web server is configured to handle .p5i files correctly. See John Rice’s blog entry on 2009.06 Package Manager enhancements for more details.

I clicked OK and then saw:

Package Manager promises to not only install the requested package, but to automatically add the required repository to my configuration as well. Surely it can’t be this simple. I clicked on Proceed:

Apparently, it can be that simple. 🙂

I’ve now tested my patched version of Ploticus on 2009.06 and requested the package be promoted to contrib by sending a note to I’m hopeful Ploticus will soon be available to the entire OpenSolaris community.