Archive for March, 2006

Mac Book Pro, Part Deux

March 23, 2006

As I mentioned a few weeks ago, I had to return my brand-new Mac Book Pro for replacement due to a DOA superdrive. I had put a DVD into the drive only to find that the system did not see the drive and there was no way to remove the disc. I chose to have the unit replaced rather than repaired. This is a status update.

Apple had predicted I would receive the new unit on March 31st. In fact, I had it in hand this Tuesday, March 21st. I enjoyed using FedEx.com to watch it move from Shanghai, to Anchorage, to Indianapolis, and then to Massachusetts. The new unit is fine. The superdrive was the first thing I tried and I was blown away by the quality of the screen while watching the opening scenes of The Fifth Element.

The odd little plinking noise is still there in the background so I guess this is just a fact of life for Mac Book users. Or perhaps only users with the 7200 RPM drive option. As I reported earlier, the thinking is that the disk drive’s read head parking itself occasionally is causing this sound. I think it is slightly quieter on this unit than on the last one–and the plink on the last one was already pretty quiet.

No notification yet from Netflix that they’ve received the DVD that was stuck in the drive. When I returned the laptop to Apple, I also included the Netflix return mailer and a note asking the Apple engineers to drop the DVD in the mail after watching the movie. I’m still hopeful that someone will open the unit and find the mailer and the note, though I occasionally have visions of the last scene in Raiders of the Lost Ark…a large warehouse stacked to the ceiling with unopened FedEx boxes.

Advertisements

Dtrace, Part Deux

March 23, 2006

A few days ago, I took Dtrace for a test drive and reported initial results here. This entry finishes the story–or at least takes it as far as I’m willing to go for the sake of a slow-loading spreadsheet. In any case, the spreadsheet was never the issue: I’m using this get a feel for Dtrace and its capabilities and have a few more features to illustrate.

As Dan Price point out in a Navel comment, the long times spent in lwp_park() are likely a symptom of the problem rather than the problem itself. A thread is likely being parked over and over again, waiting for something to complete. I took a quick look at the return codes from all invocations of lwp_park using this little dtrace script:

syscall::lwp_park:return
/pid==6152/
{
@foo[errno] = count();
}

And, sure enough, found that the vast majority of the calls to lwp_park were returning after timing out. It did seem the thread was waiting for some as yet unknown event to complete.

In addition to his observation, Dan also suggested using Dtrace’s profile provider to shed more light on what StarOffice is doing as it loads the problematic spreadsheet. Off to the Dtrace manual again. Nice. Dtrace provides the ability to periodically interrupt a running process and sample what it’s doing. A bit like what gprof offers.

The profiling command I used was essentially exactly what Dan had suggested:

dtrace -n profile:::profile-997'/execname=="soffice.bin"/{@a[ustack()]=count();}' \
-n END'{trunc(@a,10); printa(@a);}'

The use of execname was more convenient than specifying a PID and I learned about the END action from this example as well. The command periodically collects the state of the user stack and then aggregates that information and prints the ten most popular stack traces, which should give a statistical indication of where StarOffice is spending its time while loading the spreadsheet.

Rather than include the raw output here, I’ve looked at the stacks and see that most of the time is being spent in routines with names like XMLStylesContext, XMLNumFormatContext, NumberFormatter, etc. From this I deduce that StarOffice is for some reason taking a very long time to parse the XML representation of this particular spreadsheet. Somewhat anti-climactic, I guess, but it always was about the journey, not the destination.

Action At A Distance

March 21, 2006

Be forewarned that this is filed under Absurdities.

As I’ve verified through experiment, you can at least double the range of your car’s remote unlock feature by touching the dongle to your jaw as you operate it. The increased range has nothing to do with the increased height of the device–holding it over your head will not have the same effect. I’m told it relates more to using your head as an antenna.

This might be useful for finding your car in a crowded parking lot. Or impressing your geek friends. It probably won’t score any points with your spouse or date.

Around the Block with Dtrace

March 20, 2006

Late last week someone pointed out that one of our departmental spreadsheets look a very long time to load into StarOffice. Several minutes versus about 15 seconds for Excel.

I decided to use this as an opportunity to take Dtrace for a spin. What I show here is based on about 15 minutes of looking through the Dtrace documentation on docs.sun.com and a quick conversation with one of our performance guys here in Burlington. This is a work in progress–I don’t yet know why this particular spreadsheet is so slow. If further exploration yields additional interesting Dtrace examples, I’ll do another post.

So, here’s the setup. I have Solaris on my Tecra latop with a Staroffice process up and running (pid 6152.) To start, I wanted to see what system calls were being made while reading in the spreadsheet, so I issued the following command (cribbed from the Dtrace manual) as root:

# dtrace -n syscall:::entry'/pid==6152/ { @syscalls[probefunc] = count(); }'

This command traces the entry to all system calls and keeps track of the number of times each system call is invoked within the process 6152. With Dtrace running, I loaded the problem spreadsheet into the Staroffice process and terminated the Dtrace run with CTRL/C once the spreadsheet was loaded. Here’s the tail of the output:

mmap                                                            107
close                                                           112
open                                                            118
gtime                                                           493
pollsys                                                         638
xstat                                                           662
lseek                                                           740
write                                                          1029
ioctl                                                          1516
read                                                           1931
lwp_park                                                       3369

I was interested in all those lwp_park calls. How much of the load time was actually spent in lwp_park? For this next Dtrace run, I put the Dtrace script in a .d file rather than on the command line for convenience. The syntax is dtrace -s scriptname. Here’s the script:

# cat park.d
syscall::lwp_park:entry
/pid==6152/
{
self->ts = timestamp;
}
syscall::lwp_park:return
/pid==6152/
{
@time["time in lwp_park"] =  quantize((timestamp - self->ts)/1000);
self->ts = 0;
}
#

And here’s the output, showing a histogram of the number of milliseconds spent in each invocation of lwp_park():

time in lwp_park
value  ------------- Distribution ------------- count
-1 |                                         0
0 |@                                        56
1 |                                         2
2 |                                         0
4 |                                         0
8 |                                         8
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   3148
32 |@                                        74
64 |                                         8
128 |                                         11
256 |                                         0

I started adding up those times manually, which was dumb. I had Dtrace do it instead with this small addition to the script:

syscall::lwp_park:entry
/pid==6152/
{
self->ts = timestamp;
}
syscall::lwp_park:return
/pid==6152/
{
@time["time in lwp_park"] =  quantize((timestamp - self->ts)/1000000);
@total["total time in lwp_park"] = sum( (timestamp - self->ts)/1000000);
self->ts = 0;
}
time in lwp_park
value  ------------- Distribution ------------- count
-1 |                                         0
0 |@                                        64
1 |                                         0
2 |                                         0
4 |                                         0
8 |                                         9
16 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   3212
32 |@                                        63
64 |                                         5
128 |                                         6
256 |                                         0
total time in lwp_park                                  98073

Over 98 seconds spent in lwp_park during the load of this spreadsheet! Next, I wondered where the lwp_park calls were coming from. This version of the script looks at the stack traces for all lwp_park invocations and prints each of the unique stack traces with a count of how many times this stack trace occurred:

syscall::lwp_park:entry
/pid==6152/
{
@stacks[ustack(20)] = count();
self->ts = timestamp;
}
syscall::lwp_park:return
/pid==6152/
{
@time["time in lwp_park"] =  quantize((timestamp - self->ts)/1000000);
@total["total time in lwp_park"] = sum( (timestamp - self->ts)/1000000);
self->ts = 0;
}

The ustack primitive causes aggregation of the user stack showing up to 20 elements in the case used here. The output contained several separate stack traces with the most common one (occurred 3302 times out of 3369 invocations) shown below:

libc.so.1`__lwp_park+0x19
libc.so.1`cond_wait_queue+0x3e
libc.so.1`cond_wait_common+0x1e9
libc.so.1`_cond_timedwait+0x4a
libc.so.1`cond_timedwait+0x27
libc.so.1`pthread_cond_timedwait+0x21
libuno_sal.so.3`osl_waitCondition+0xa1
libfwk680si.so`0xc94f0395
libfwk680si.so`0xc9358d0a
libuno_sal.so.3`0xcf8d7f8c
libc.so.1`_thr_setup+0x51
libc.so.1`_lwp_start
3302

This as far as I’ve gone so far. Next I will look at the arguments to lwp_park to see if I find anything interesting. I will also look at what some of the libraries mentioned above are doing to get a better sense of what’s going on.

Shock of the Familiar

March 15, 2006

As I passed through the checkout line in our Burlington cafeteria on Monday, I turned and came face-to-face with Geoff, my mentor-by-example when I was a new Sun DE and my partner-in-crime on one notable internal Sun project. I always enjoy interacting with Geoff and my brain automatically started to do whatever brains do when confronted by a pleasant environmental stimulus. A smile, anticipation of an interesting, engaging encounter, etc. But the neurological equivalent of cold water splashed over me when I realized that Geoff wasn’t really HERE HERE. He was just “here”–to pick up his belongings after being RIFed. 😦

A RIF is a physical, orgchart-oriented exercise–it eliminates jobs, slots in the orgchart. If you are in the wrong slot at the wrong time, off you go. That’s probably okay–or at least it has the intended effect–when the physical and logical job descriptions align, as they usually do. But for many Distinguished Engineers (DEs) at Sun, their position in the corporate orgchart often does not convey the scope of their involvement or contributions.

In Geoff’s case, he was off working on what I thought were some pretty important issues related to Sun’s global engineering community. In the end, perhaps logic will prevail…

Sun Joins Open MPI!

March 8, 2006

Sun has decided to join forces with the Open MPI community by becoming a core, contributing community member. As such, we will contribute to the Open MPI code base and use it to deliver future supported versions of MPI for Solaris. This is big news!

Sun’s history with MPI really began in late 1996 when a group of us came to Sun from Thinking Machines where we’d been working on a multi-platform parallel programming toolkit called GlobalWorks. GlobalWorks was based in large part on technology we had developed for TMC’s Connection Machine massively parallel supercomputers through the 1980s into the mid-1990s.

Over the intervening years at Sun, we put a huge effort into our MPI library. Early on we abandoned our original MPICH base and did a clean sheet design. Our implementation was one of the first to support the full MPI2 standard, including a nicely designed, fine-grain approach to thread safety and a robust client-server capability In addition, because our HPC efforts at the time were centered around clusters of very fat SMP nodes (up to 100 CPUs per node), we put significant effort into our shared memory optimizations for fast data transfer and collectives.

By joining the Open MPI community we will bring our expertise, experience, and resources to that effort while also benefiting from the work of other members of the community. Los Alamos National Laboratory (LANL), Indiana University (IU), the University of Tennessee (UTK), the University of Houston, and the University of Stuttgart (HLRS) all bring excellent MPI experience and expertise to the community. In addition, with Cisco Systems also joining Open MPI there will now be two vendors bringing an intense, product-oriented focus to the effort which should be of great benefit to the community and its user base. Including our customers.

This shift towards open source is directly aligned with our belief that because the creation of ultra-scale software tools components for HPC is so expensive, such development should be done in the commons rather than in separate, duplicative efforts. It also aligns with our belief that innovation does happen elsewhere and we should leverage it in cases where that makes sense, as it does here.

Mac Book Pro, Back You Go–For Now

March 7, 2006

A few days ago I popped a DVD into my new Mac Book Pro to take that awesome 15″ display through its paces. Alas, it was not to be.

To make a not very long, but also not very interesting story, shorter: The system doesn’t seem to have any knowledge of the superdrive at all. DVD Player can’t see it. The System Profile says it isn’t there. The eject key doesn’t work. Rebooting with the trackpad button depressed doesn’t work. And so on. Which means no playing and no burning–and no extracting that Netflix disc either since there is apparently no mechanical release mechanism for this drive. I’ll be enclosing my Netflix return envelope with the machine when I ship it back to the DOA engineering team and expect they’ll drop the disc in the mail for me.

The folks at Apple have been absolutely wonderful to deal with–high marks to each of them. They gave me the option of sending the laptop in for repair or getting a replacement. I opted for the replacement since the machine is so new, and because I haven’t actually transferred significant state from my iMac yet anyway. And also because there is a quiet, but annoying plinking sound coming from the area of the disk drive that I’m told by people on the Apple Forum is probably made by the drive parking its read head occasionally. It’s not a loud noise, but once you’ve noticed it, you keep noticing it. Perhaps the next unit won’t do this.

Of course, I’m still a huge fan of the Mac Book and can’t wait to get my replacement unit. Problems like this happen, as we know. It’s a minor inconvenience that, honestly, was made quite pleasant by the professional and helpful Apple employees I interacted with.

Kick-ass products. Kick-ass employees. Could be time to buy more AAPL…

Mac Book Pro Power Brick

March 2, 2006

I was asked about the size of the Mac Book Pro’s power brick. It measures a hefty 3″ X 4″ X 1″. Here’s a shot of it next to my 2nd gen iPod’s brickette.

Mac Book Pro Packaging

March 2, 2006

I’ve always admired Apple’s packaging aesthetic. It certainly helps to elevate their offerings from mere products to genuine objects of desire–an enviable skill.

Here are some shots I took of the 15″ Mac Book Pro packaging. It doesn’t rise to the level, say, of the origami-like cube used for the 2nd generation iPod (I must admit to having just recently throwing that packaging away after holding onto it for, what, 3-4 years?) but it’s still quite appealing.