Archive for July, 2007

There’s Something About Mary

July 31, 2007

I had one of those “There’s Something About Mary” moments at the coffee bar in Sun’s Menlo Park cafeteria this morning. I ordered a large chai, which was delivered with a wonderful head of foam on top. As I popped a top onto the cup, a dollop of foam went shooting straight up out of the drinking hole at high velocity. And never came down.

At least I couldn’t find it…

HazMat: A Driving Game for Adults

July 31, 2007

I remember my parents keeping us kids occupied on long vacation drives with car-related games. Our favorite was spotting licenses plates from as many different US states and Canadian provinces as we could find.

If you commute to work, why not pass the time and learn a little about the many chemicals that underpin our society by collecting hazardous materials codes from nearby trucks? As an added plus, you will join me in being horrified at how some automobile drivers seem totally oblivious to the ramifications of playing chicken with these big rigs.

To play HazMat, you’ll need to keep track of the codes you see on truck placards. The placards look like this:

[hazmat placard example]

In this example, the “3” represents the hazard class of the material being transported. There are nine such categories. They are:

  1. Explosives
  2. Gases – Compressed, Dissolved or Refrigerated
  3. Flammable Liquid
  4. Flammable Solids – Combustible, Water Reactive
  5. Oxidizing Substances – Organic Peroxides
  6. Poisonous (Toxic) and Infectious Substances
  7. Radioactive Material
  8. Corrosives
  9. Miscellaneous Dangerous Goods

You will also need a copy of the US government’s hazardous materials table to decipher the four-digit codes. For example, the “1203” on the sample placard means gasoline or gasohol. I found a PDF copy of the table here.

Here are a few of the more unusual placard numbers I’ve seen commuting on Rt 128 in the Boston area.

  • 1824 — sodium hydroxide solution
  • 1966 — refrigerated liquid hydrogen
  • 3264 — acidic, inorganic, corrosive liquid
  • 3257 — elevated temperature at or above 100 degreesC and below its flash point (including molten metals, molten salts, etc.)

So, give HazMat a try. Have fun, collect them all…and be careful out there.

Have a Compiler, Tool, or HPC question? Operators are standing by!

July 24, 2007

Actually, not operators. Even better, members of the Sun Studio engineering team are standing by to answer your questions on a live chat session every day this week from 9am-11am Pacific time, starting today.

Questions about our C, Fortran, or C++ compilers? Questions about performance analysis? High Performance Computing? Solaris Express Developer Edition? Whatever. Everything is fair game so have at it and give them your toughest questions.

Go directly to the live chat session here. Read Roman’s blog intro here.

Diamond Hunting in New York

July 19, 2007

I went hunting for Herkimer Diamonds this weekend on my way to and from visiting a friend in western New York. Hermiker Diamonds aren’t real diamonds; they are doubly-terminated quartz crystals sought after by mineral collectors.

There is something very elemental and fun about sitting in a rock pit smashing open rocks with a hammer to reveal well-formed, free-floating crystals nestled in little vugs within the rock matrix. Here are two photos showing crystals I found in this manner.

[herkimer diamond #1]

[herkimer diamond #2]

I visited both the Herkimer Diamond Mines and the Ace of Diamonds Mine, which are located next door to each other on Rt 28 in Herkimer, NY. Herkimer Diamond Mines is the more upscale of the two businesses, featuring a nice mineral museum and an extensive mineral shop. Ace of Diamonds was more rustic and I think caters to a more serious crowd of “miners”. It has a basic general store and offers a wider selection of tools for rent.

Regardless of which establishment you choose, you will eventually find yourself either pounding on a pile of rocks or, if you have more time and are more serious, trying to hammer, chisel and pry your way to discovering a mineral pocket in the ledge visible at both sites.

Bring your own tools if you like, and definitely bring safety glasses.

Linux on SPARC: Dave Miller Dissects Logical Domains (LDOMS)

July 16, 2007

Dave Miller (yes, that Dave Miller) is working to bring up Linux as a guest operating system on Sun’s new SPARC virtualization technology, Logical Domains (LDOMS). He has posted a short technical walkthrough of LDOMS on his blog.

Brian Eno’s 77 Million Paintings

July 2, 2007

I attended the North American premiere of Brian Eno‘s 77 Million Paintings last night. This was the third night of the premiere, which was held as a private event for Long Now members. It was held simultaneously in San Francisco and Second Life. I attended the SL event, my first virtual party.

Entrance to the venue.

Eno dropped in for a short while to say hello.

And strolled in to visit the installment itself in the next room. did a fantastic job creating the SL venue. The installment was wonderfully done with subdued lighting, rich wooden flooring, and an excellent Eno mix playing in the background. Bravo!

HPC Consortium: Summary Blog Entry with Pointers

July 2, 2007

I’ve completed my series of blog entries about Sun’s HPC Consortium meeting in Dresden last week. All customer talks and a selection of Sun talks were included. I wasn’t able to cover all of the Sun talks or any of the partner talks at the event due to time constraints. You might be amazed at how long it took to create the entries referenced below.

Here are pointers to the blog entries about customer talks:

And here are pointers to blog entries covering a selection of the Sun talks:

And here are a few entries with details of last week’s announcement of the Sun Constellation System:

The next HPC Consortium meeting will be held in Reno in November just prior to SC07.

HPC Consortium: Big Science Means Big Compute and Big Data at CERN

July 2, 2007

Our final two customer talks at the Sun HPC Consortium meeting in Dresden last week both focused on aspects of CERN‘s Large Hadron Collider (LHC) Project. LHC is Big Science answering Big Questions.

Helge Meinhard of CERN IT spoke first, giving an overview of CERN and LHC followed by a discussion of the IT infrastructure and requirements underlying the science of LHC. Martin Gasthuber from DESY then spoke further about storage and compute-related issues for LHC.

[cern accelerator]
The CERN accelerator ring is 27km in circumference and at a depth of 50-150 meters

CERN was founded in 1954 as the European laboratory for particle physics. CERN has 20 participating member countries, 3000 staff members, and about 6500 visiting scientists (from 500 institutions and 80 countries.) The visiting scientists constitute the user base at CERN.

The Large Hadron Collider is a proton-against-proton accelerator capable of 14 TeV collision energies. This is by far the world’s most powerful accelerator with 2nd place held by the 2 TeV accelerator at Fermilab. The LHC tunnel has four experiments positioned around its circumference, each represented by a mass of human-dwarfing gear positioned in the accelerator’s beam. The accelerator can fire 300 bunches of 100 billion protons each with the same number fired in the opposite direction, which will cause up to 40 million collisions at each of the four interaction sites. The entire accelerator is lined with superconducting magnets at two degrees Kelvin to keep the heavy protons moving in the correct track. The four experiments are called ATLAS, CMS, ALICE, and LHCb. The first beams are expected in 2008.

The computational requirements at CERN, while large, are also embarrassingly parallel (or pleasingly parallel–no need to be embarrassed), meaning the data can be processed independently and in parallel, obviating the need for complex problem decompositions or for high-speed, low-latency interconnects. In addition, the problems are very integer intensive with little or no floating point requirements. As compute and storage requirements continue to grow, power and cooling have become huge issues as well as CERN now predicts their 2.5 MW datacenter will run out of power in 2009-2010. [I suggested to Helge at lunch before his talk that Sun’s Niagara (N1, N2) processor may well be ideally suited for this massive throughput problem, a thought that was echoed by a customer during Helge’s presentation to the Consortium.]

The 40 million collisions per second at each of the four experiment sites will generate a huge data volume. This will be filtered and reduced within the collector itself down to a few hundred megabytes per second or about 15 Petabytes per year for four experiments. Each event corresponds to a few megabytes of filtered data and it is these events that can be processed in parallel at CERN and its partner sites.

CERN has calculated it will need about 142 Mega SPECint2000‘s worth of processing, 57 Petabytes of disk storage, and 43 Petabytes of tape to process and store the data from these experiments. They equate this to on the order of 30K CPUS and 100000 disks. Because this is too much for CERN to handle alone, a multi-tiered consortium of organizations has been established to distribute the processing and analysis of LHC data around the world.

Data will flow from CERN to a set of Tier1 sites that will perform initial processing and long-term data curation while also distributing the data to a large set of Tier 2 sites for final processing and analysis. The intent is essentially to make this entire distributed infrastructure look like one huge compute facility in spite of the fact that these Tier1 and Tier2 centers are autonomous, cooperating organizations rather than parts of a single, large entity. Thus, while there are many commonalities across the sites, there is no mandated standard hardware or software. It is true, however, that they have settled on commodity processors and all use Scientific Linux, a lightly-modified and recompiled version of Red Hat Linux.

With respect to particular requirements in the infrastructure, ethernet is adequate for the workload. Servers are stripped-down HPC boxes with commodity processors and 2 Gbytes of memory per core. Processors are chosen for their ability to score well on SPECint2000.

[DESY logo]

Martin Gasthuber from DESY then spoke about computing and storage at the LHC Tier1 and Tier2 sites. He also gave some brief background on DESY, which is the largest national HEP (high energy physics) lab in Europe. Its accerator, HERA, is due to be replaced by two new ones which will be used to concentrate on proton physics. HERA was scheduled to be turned off last week in preparation for construction on the new units. DESY is a Tier2 site for two LHC experiments: ATLAS and CMS.

Martin explained that Tier1 sites concentrate on reconstruction of data sent back from Tier2 sites which is a CPU-bound activity with sequential data access patterns, while Tier2 sites perform simulation work which requires CPU cycles, sequential writes with generally low bandwidth needs. User analysis is also performed at Tier2 sites, which involves chaotic data access and IO-bound jobs.

There are large differences in capabilities between large and small sites in the LHC processing hierarchy. Some have much experience running experiments on-site and with the technology generally and others do not. In addition, the number of people resources available per site can vary widely.

As these experiments will run for several years, sites will be upgrading their computational infrastructure over time. They will not, however, upgrade their entire site with each procurement. Instead, sites will continuously grow their resources to meet the demands of experiments and will recycle the oldest components of their infrastructure. Thus, it is expected that a site may flip 10-30% of its compute resources at a time rather than upgrade all resources simultaneously.

For the LHC experiments, there has been some degree of basic standardization of the compute infrastructure, as indicated by Helge Meinhard in his talk. In particular, x86, Gigabit Ethernet, TCP/IP, and Linux have all become part of the standard approach to processing LHC data.

Currently, most computing is done with 1U dual-core systems, though quad-core systems are starting to move in. Blades are still very much in the minority according to Martin and he isn’t sure why.

While the compute infrastructure has settled into what is viewed as a reasonably optimal place, it is harder to do the same for disk systems. Disk storage is more complex, more prone to surprises. And there are more consequences of disk failures. Commodity components are important, but operational costs become more important over time.

ZFS and Solaris 10 has been looked at as a way of providing a stable lower-level of storage infrastructure on which high
er-level storage objects are layered (e.g. dCache.) Simultaneously, more LCH sites are beginning to use Sun’s Thumper (Sun Fire x4500) ultra-dense disk storage systems.

With respect to file systems, a few centers use GPFS, but it is rare. To date, Lustre is not deployed by a HEP site, though there is interest at some Tier2 sites to support user data analysis.

Having conducted testing and analysis of ZFS, it is felt that the combination of ZFS and Solaris solves the critical data integrity issues that have been seen with other approaches. They feel the problem has been solved completely with the use of this technology. There is currently about one Petabyte of Thumper storage deployed across Tier1 and Tier2 sites. That number is expected to rise to approximately four Petabytes by the end of this summer.

HPC Consortium: Sun HPC at Clemson University (Why Big SMPs Matter)

July 2, 2007

[dr. james leylek, clemson]

James Leylek, Director of the Computational Center for Mobility Systems at Clemson University (CU-CCMS), spoke at Sun’s HPC Consortium meeting in Dresden this past week. He presented a brief overview of the Center and its mission and gave a status update on the Center’s computational infrastructure, including an explanation of why CU-CCMS believes strongly in both large SMPs and small-node clusters for HPC.

Since his last update in November, the Center’s computational infrastructure has now been put in place. It includes a Sun Fire E25K with 72 UltraSPARC IV+ processors and 680 Gbytes of memory; two Sun Fire E6900 systems, each with 24 UltraSPARC IV+ processors and 384 Gbytes of memory; 1600 cores worth of Sun Fire V20Z systems connected with Voltaire Infiniband; and a variety of workstations. All of the Big Iron is running Solaris 10, while the V20z cluster runs SUSE Linux. The infrastructure has a peak performance rating of about 11 TFLOPs

As Dr. Leylek explained, the mission of the Center is to provide a balanced computational approach to satisfy a diverse set of requirements from the ten major technical groups (e.g., fluid dynamics, acoustics, mechanics, vehicle design, human modelling, etc.) served by the Center. Also, because CU-CCMS is not a research organization and must deliver results on time and within budget, they have a focus on supplying stable, reliable infrastructure for their customers.

The wide range of systems at CU-CCMS reflects an understanding that one size does not fit all for HPC applications: not everything parallelizes onto clusters. As an example, adaptive multi-grid computations are considered to be memory monsters that benefit from the immense capabilities found within the single Solaris image of an E25K or E6900. At the Center, they view billion-element finite element simulations as a starting point for full vehicle simulations. They are dealing with big problems.

As the Center prepared to bring its computational capabilities on line, what scared the CU-CCMS staff the most was the actual act of setting up and deploying this infrastructure. With Clemson, Sun, CISCO, and Voltaire all responsible for key aspects of the infrastructure, they were worried that coordinating all of these efforts successfully was going to be an absolute nightmare.

In response to this, Sun assigned a program manager to run the entire integration process. As Dr. Leylek said, the Sun program manager put together the most detailed integration plan he had ever seen in his life. In addition, as work progressed, all status was reported on a site accessible to all participating parties, which aided in maintaining coordination and promoting problem solving throughout the process.

In the end, Dr. Leylek said that what they had feared would be a nightmare turned out to be as seamless and painlessly smooth as it could have been.

Totally flipped out…

July 2, 2007

.ǝɔɐ1d ɹǝ11np ɐ ǝq p1noʍ p1ɹoʍ ǝɥʇ uǝɥʇ ‘sıɥʇ ǝʞı1 sbuıɥʇ op ʇ,upıp ʎǝɥʇ ɟı ‘ǝsɹnoɔ ɟo .sʎɐp ǝsǝɥʇ uo ǝɯıʇ ɹıǝɥʇ puǝds ǝ1doǝd ʇɐɥʍ ǝɯ oʇ buızɐɯɐ s,ʇı