[   Home   ] [ Services ] [ Gallery  ] [ About Us ] [ Contact  ] [  Links   ] [ Site Info ]
HORK Enterprises
3221 Quick Road
Holly, MI 48442, USA
 +1 (248) 328-0231
 consult@hork.com

High Performance Computing

[ Cray YMP ] For several decades, high performance computing meant spending millions on specialized hardware. Not to mention, taking out a wall from the building to get the hardware in and rewiring the electrical systems to provide it with the power and cooling needed. So, who could have predicted that the benign PC of the 80's would once become the center piece of some of the world's most powerful computers? Well, if you were knowledgable about disruptive technologies, you might have...
Linux PCs
[ Linux Workstation ] When the Linux Operating System was mated with the ever more powerful Intel based processors and nVidia graphics, a serious engineering workstation was born. With prices far below those of the specialized UNIX™ workstations, the writing was on the wall.
In high performance computing, hardware typically only has a life span of three years. Then performance demands are such that it must be replaced with something more powerful. The specialized UNIX™ engineering workstations have never found a use outside of engineering, so they had to be written off. The Linux workstations can live on a few more years, after their high performance computing duties are over, in less demanding positions, elsewhere in the company, where they may even run Microsoft Windows™. This allows CAE engineers and accountants to become friends 😉.
Cluster Computing
Common, off the shelf PC workstations can be combined into a high performance analysis cluster. Each node may have multiple processors and each processor may have multiple cores. [ Computer Cluster ] The nodes are interconnected with a high speed interface, such as the Infiniband™ interconnect. CPU speed, memory size and bandwidth must be carefully balanced to get the maximum performance. Analysis software, modified to make use of Distributed Memory Parallel (DMP) processing, is used to let each node take on part of the problem. The advantage of DMP processing over a single node with as many cores, is that the number of memory buses also multiplies. That means that the CPUs can actually be kept busy, rather than wait for data.
I built my first cluster with 4 dual 2.2 GHz Xeon HP systems and a gigabit switch. The second one comprised 8 IBM systems with dual 2.4 GHz AMD Opteron processors and an Infiniband™ switch. The latest one has 16 nodes and they all have two multi-core CPUs.
Job scheduling applications allow the 8-node cluster to be sub-divided into 4, 8, 16, 32 or even 64 cpu cores. This gives flexibility to either run a single big job through as fast as possible or run multiple jobs in parallel. Efficiency diminishes as more nodes are added, so subdividing the cluster may be the most effective way to run multiple jobs in a shared resources environment. The performance of the fastest compute clusters is measured with a benchmark and posted on the Top500 web site for bragging rights.
CPU Configuration*)Speed-up Factor
1 x 1 x 1 =  1  1.0
1 x 2 x 1 =  2  1.5
1 x 2 x 2 =  4  2.5
4 x 1 x 1 =  4  3.7
4 x 2 x 1 =  8  6.0
8 x 2 x 1 = 16  9.0
8 x 2 x 2 = 3212.2
*) 4x2x1 denotes 4 nodes, with each 2 cpus, with each 1 core.
An analysis cluster is typically kept in an air-conditioned server room with other dedicated computers, such as the application, web, and database servers, to dissipate the heat generated. The system administrator can access the computers there, or with the right equipment, also from half way across the world. My set up, with networked KVM switch, is such that an analysis cluster in Australia can be administred from Michigan and monitored through a web browser. The status web page gives a quick overview of the systems with colors indicating the status of the servers: Black for systems that are off line; Blue for systems that are available but idle; Green for systems that are in use, but have reserve capacity (typically running at about 50%); Orange for systems that are running at full capacity, using all CPUs and cores; Red for systems that are overloaded which, if it becomes structural, indicates a capacity problem. One click on the monitor icon gives me detailed information about the system.
[node #1] Cluster/DMP server [node #5] Cluster/DMP Server [web srv] Web server
2x Xeon E5 2.6GHz 2x Xeon E5 3.5GHz 2x Opteron 3.2GHz
[node #2] Cluster/DMP server [node #6] Cluster/DMP Server [dbs srv] Database server
2x Xeon E5 2.6GHz 2x Xeon E5 3.5GHz 2x Opteron 3.2GHz
[node #3] Cluster/DMP server [node #7] Cluster/DMP Server [app srv] Application server
2x Xeon E5 2.6GHz 2x Xeon E5 3.5GHz 2x Opteron 2.8GHz
[node #3] Cluster/DMP server [node #7] Cluster/DMP Server [tps srv] Backup server
2x Xeon E5 2.6GHz 2x Xeon E5 3.5GHz 2x Opteron 2.6GHz
Likewise the user workstations can be anywhere within the local area network or, like mine, in the wide area network. Their performance is of less importance, although good graphics performance helps. My systems may have quad, hexa, or octa core Intel or AMD CPUs running at a variety of clock speeds from 2.6 GHz to 3.5 GHz and all sport nVIDIA graphics.

Just because it seems that every workstation these days is based on x86_64 architecture and the Linux Operating Systems, doesn't mean that I haven't enjoyed the previous generation of UNIX™ based computer systems with RISC* architecture CPUs or the exotics from Cray, Alliant, Gould, and others before that. I did. From TNO in Delft, Netherlands we connected to the Cray-YMP of the SARA Center of the University of Amsterdam. One late night in 1990 I was connected from my home, through my Acorn R260 and 1200 baud modem, to TNO and via the "Internet" to the YMP. I typed in "who" to see who was all working on the computer that night. There was no-one.   I had the Cray all to myself... That made my neck hairs stand up! 😎
*) RISC stands for Reduced Instruction Set Computer

All throughout the '90 I loved the Silicon Graphics workstations with their MIPS CPUs and performance graphics. I even had some myself.

[ Purple Beauty ]

Silicon Graphics Indigo R4000/Elan
Having enjoyed Silicon Graphics computers professionally from the first Iris 4D20 in 1989, it was only a matter of time for one to occupy space in my own office. The supply and demand lines finally crossed each other in January 1997. My primary use of the Indigo was in software development. However, it was also used for Internet access and DNS. This "Purple Beauty" looked into the world through the desirable 20” Sony Trinitron, making the most of the ELAN graphics with 24 bitplanes and hardware Z-buffer. It ran Irix 6.2 and boasted 64 MB RAM and 5 GB disc space.
Silicon Graphics Indy R5000
[ Half Blue ] An then there were two...(11/98). The Indy provided faster processing than the Indigo, with its MIPS R5000 processor running at 180 MHz. The primary use of the Indy was engineering analysis where the extra processing power was most useful. It had the same colorful 24-bit desktop, displayed on a 20" Sony Trinitron. It also ran Irix 6.2 on 64 MB of RAM and had a 16 GB hard disk. It hosted a 3.5" diskette drive and a DAT drive. It transparently shared all its resources with its older sister.

[ Dell PC ]

Dell Dimension XPS T500
The Accountant wanted to run "Peachtree" and the Program Manager needed access to industry standard presentation and program management software for compatibility with the rest of the business world. So on the brink of the new millennium an Intel/Microsoft Windows 98™ box was acquired.
For several years it ran the first line office duties. It was even upgraded to Windows XP™ Then it served as my test bed for various flavors of Linux and internet access. With its 500MHz Intel Pentium III processor it was just about powerful enough to do that. It had 640 MB of RAM, a 13.5 GB hard disk, a DVD, CDRW, a camera and microphone. It has long since been replaced by a sleek little laptop, that can do all that so much better, and stream live tv on top of that.
Acorn Archimedes R260
[ Acorn R260 ] Up to the arrival of the Indigo (1/97) the Acorn R260 used to run RISCiX, a (quite capable) BSD UNIX. As such it was a very pleasant platform for software engineering. Lack of development made it eventually obsolete though. Subsequently it ran its native RISC OS (3.10) operating system. It ran well-designed applications for all the usual office duties until 1999. It served a few more years beating me at chess, then friends and family made me turn the switch on it, because “nobody needs that many computers”.
The R260 faced the world through a 15” Sony Trinitron monitor. It had 8 MB RAM and 600 MB disc space. 10 years is a long timespan for a computer, which shows how far ahead the system was at release. Today the ARM chip that was the heart of the Acorn R260, lives on (as the "Snapdragon") in many PDAs, cell phones and calculators.

[ BBC model B ]

Acorn BBC model B
Ah, the venerable BBC-B computer. The one that taught me the love for computers, software, and everything associated with it. With analog to digital converters, serial, parallel, and 8-bit I/O port it had all the interfacing capabilities for exciting hardware projects. With its 8-bit 2 MHz MOS 6502 processor, linear memory, assembler, BASIC, Pascal, Forth, BCPL, sound and graphics it had everything you needed to understand and enjoy computing and learn how to program. It came with good games too. The most famous one being Elite!, but there were many others that allowed stress relieve during software bug hunts. The original system came with just 32kB RAM. I had dual 320 kB floppy drives and (eventually) a 10 MB hard drive. It came with an expansion bus that allowed a co-processor (with memory) to be attached to the base unit. These co-processors (I had the Intel 80186 as well as the NS 32016) were much more powerful and extended the life of the BBC-B all the way into the nineties. There are still enthusiasts out there that keep them alive after some forty years. Rightfully so! I pity the kid who has to get the love of computing from a modern day PC or tablet. (fortunately, there is Raspberry Pi).
Casio FX-700P
The Casio FX-700P was my first computer, being that it was programmable in BASIC. I have had it since 1983. It once boasted a sixth order (+ square root) polynomial RMS approximation program that could calculate a NACA wing profile.
I remember it fondly for three reasons. [ Casio FX-700P ]
  1. It was still holding together despite numerous dings and despite losing all original screws.
  2. I have never found a worthy successor.
  3. It had the sympathetic message "READY P0" written on the display.

If all else fails: The Slide Ruler!
[ slide ruler ]
No computer will ever enslave me!
[ Roark's ]

The table below lists some other computers I’ve met or owned and their relative CPU performance. They are all indexed against the DEC MicroVAX II using Digital Research Labs' benchmarking routines. For multi-processor systems the single CPU performance is listed, unless otherwise noted.
System Architecture and CPUOperating SystemMVUP1Year2
DEC MicroVAX II with KA630-AA (78032/78132) @ 5 MHzVMS1.01986
Acorn R260 ARM3/FPA10 @ 26 MHzRISCiX 1.21c4.61989
SGI 4D20 Mips R2000A/R2010A @ 12 MHz (IP6)IRIX 4.0.58.91989
SGI 4D25 Mips R2000A/R2010A @ 20 MHz (IP6)IRIX 4.0.515.81989
Cray Y-MPUnicos 7.0.5194.11989
SGI 4D440 Mips R3000/R3010 @ 40 MHz (4 cpu's) (IP7)IRIX 4.0.537.01991
SGI 4D35 Mips R3000/R3000 @ 36 MHz (IP12)IRIX 4.0.531.71991
SGI Indigo Mips R3000/R3000 @ 33 MHz (IP12)IRIX 4.0.528.21992
IBM RS6000 / 34H POWER Arch. @ 42 MHzAIX 3.2.565.11993
Compaq / Intel 80386 DX 33 MHz, IIT 80C387Linux 0.99.112.61993
Cray C98/4256Unicos 7.C.3243.41993
Sun 4c SPARK cpu + TI fpu @ 40 MHzSunOs 4.1.123.01993
SGI Indigo XZ Mips R4000/R4010 @ 100 MHz (IP20)IRIX 6.262.01993
SGI Indigo Extreme Mips R4400/R4010 @ 150 MHz (IP22)IRIX 5.1.1.397.21994
DEC 3000_500 Alpha @ 100 MHzOSF/1 1.2.1097.21994
HP PA 9000/715 @ 50 MHz (PA-RISK 1.1)HPUX A.09.069.81994
SGI Indy Mips R4600/R4610 @ 100 MHz (IP22)IRIX 5.259.31994
SGI Indigo2 XZ Mips R8000/R8010 @ 75 MHz (IP26)IRIX64 6.0.1125.21994
Cray C90Unicos 8.0.3341.91994
Cray J90 (4 CPUs)Unicos 8.0.3263.91995
SGI Indy Mips R5000/R5000 @ 180 MHz (IP22)IRIX 6.2204.11996
SGI Indigo3 Mips R10000 @ 195MHz (IP28)IRIX 6.2624.51996
HP C200 PA 9000/782 @ 200 MHz (PA-RISK 1.1)HPUX B.10.20269.81998
HP C240 PA 9000/800 @ 240 MHz (PA-RISK 1.1) (4 CPUs)HPUX B.11.00317.71999
Sun Ultra-Enterprise 500/6500 (4 CPUs)SunOS 5.6238.41999
Dell XPS T500 Intel Pentium III @ 500 MHzLinux 2.2.14412.92000
Compaq AP500 Intel Pentium III @ 550 MHzLinux 2.2.14455.22000
HP C550 PA 9000/785 @ 550 MHz (PA-RISK 2.0)HPUX B.11.0013042001
Compaq EVO W6000 Pentium III Xeon @ 2.2 GHz (2 CPUs)Linux 2.4.713982002
Compaq EVO W6000 Pentium III Xeon @ 2.4 GHz (2 CPUs)Linux 2.4.1814562002
HP xw6000 Pentium III Xeon @ 2.8 GHz (2 CPUs)Linux 2.4.1818702003
Dell Inspiron 5150 Pentium 4HT @ 3.06 GHzLinux 2.4.21-423702004
HP xw6000 Intel Xeon @ 3.2 GHz (2 CPUs)Linux 2.4.21-925782004
IBM Intellistation AMD Opteron 250 @ 2.4 GHz (2 CPUs)Linux 2.4.21-939962005
4 x 1 x 1 cluster AMD Opteron 250 @ 2.4 GHz (Infiniband)3Linux 2.6.9-45148002006
4 x 2 x 1 cluster AMD Opteron 250 @ 2.4 GHz (Infiniband)3Linux 2.6.9-45240002007
8 x 2 x 1 cluster AMD Opteron 250 @ 2.4 GHz (Infiniband)3Linux 2.6.9-45360002007
8 x 2 x 2 cluster Intel Xeon 5160 @ 3.0 GHz (Infiniband)3Linux 2.6.18-53490002008
HP xw9400 AMD Opteron 2380 @ 2.5 GHz (1 x 2 x 4)3Linux 2.6.38-63186412011
HP z800 Intel Xeon X5570 @ 2.93 GHz (1 x 2 x 4)3Linux 2.6.38-63292532014
HP z800 Intel Xeon X5650 @ 2.67 GHz (1 x 2 x 6)3Linux 2.6.38-63350242017
HP z800 Intel Xeon X5687 @ 3.60 GHz (1 x 2 x 4)3Linux 2.6.38-63438802019
HP z840 Intel Xeon E5-2687W v3 @ 3.10 GHz (1 x 2 x 10)3Linux 5.6.13-100546402022
HP z840 Intel Xeon E5-2690 v4 @ 2.60 GHz (1 x 2 x 14)3Linux 6.8.9-100739062024
  1. MVUP is MicroVAX Units of Processing
  2. The year listed signifies the year the system was benchmarked rather than the year it was first released.
  3. The latest computers and clusters are too fast for the DRL benchmark to be used any longer. We have switched to using explicit finite element solver based benchmarks and scaled the numbers back to MVUPs.