Many people think that cluster computing originated with Thomas Sterling and Donald Becker's work on the Beowulf Project in 1994. This project is certainly one of the most important events in the history of cluster computing. Its use of the Linux operating system on inexpensive PC's has revolutionized the high performance computing community and created all whole new class of systems (known as Beowulf clusters). However, it was not the first time that a cluster, or what had often previously been referred to as a "multicomputer", had been built. Many others, including Mississippi State University, had been working on the subject for several years before that event. This is a brief description of the history of cluster computing research at Mississippi State University.
Mississippi State University has been involved in what is now called cluster computing at least since 1987. In that year, DARPA funded an MSU project called MADEM (Mapped Array Differential Equation Machine). MADEM was a distributed memory MIMD system based on the Sun 4/110 workstation.
By 1992, research had moved to an 8 node system based on SPARCstation 2
workstations interconnected with communications cards developed by MSU
researchers. Included in this system were custom built performance monitoring
capabilities as well as a midplane with wormhole router chips. This system was
known as the MSPARC/8, which indicated that it was
the second generation of the MADEM system, was now based on the new Sun SPARC
architecture, and that it had 8 processors. As with the MADEM system, the
MSPARC/8 had motherboards that were removed from their original chassis and
mounted in a custom chassis with direct interconnects to the midplane.
In June of 1993, the first components were purchased for what would be known
as the SuperMSPARC. This was the third generation of this project. The
SuperMSPARC is comprised of 8 Sun SPARCstation 10 workstations. Each node has
4 90MHz HyperSPARC processor modules, and 288 MB of RAM. Sun had originally
intended to release a quad processor SuperSPARC-based SPARCstation 10, but
eventually released them as HyperSPARCs instead due to heat issues.
Unfortunately, the project was already named SuperMSPARC by that time. The
nodes have been interconnected via the built-in 10Mb/s ethernet, 155Mb/s (OC3) ATM,
and Myrinet. The system also has a custom-built midplane and SBUS cards used
for monitoring interprocess communications. Unlike its predecessors, the
SuperMSPARC systems were left in their original chassis and connected via
cables from their SBUS ports to the custom midplane. This project has been so
successful, that as of June 2002, nine years after its construction began, it
is still in service as a tool to teach parallel computing techniques.
In December 1999, the fourth generation of this project began. The
UltraMSPARC is a 16 node system. Each node has 4 400 MHz UltraSPARC II
processors and 2 GB of RAM. The nodes are connected via Myrinet as well as
100Mb/s ethernet. The research continues with this system by using custom
built Global Positioning System (GPS) cards in the nodes to synchronize their
system clocks very accurately with similar systems in a remote location. MSU
is now experimenting with clustering techniques where the physical location of
the nodes no longer matters. Unlike previous generations, the UltraMSPARC was
designed from the outset to be primarily a production level system. The
clustering research on this system is secondary to its main function as a
center-wide computational resource.
It was due to the experience gained through more than a decade of cluster
computing that the MSU Engineering Research Center embarked on the large-scale
production system that became known as EMPIRE (ERC's Massively Parallel
Initiative for Research and Engineering). EMPIRE is currently a 1038
processor (519 node) cluster based on Intel Pentium III processors running the
Linux operating system. Each node contains dual Pentium III processors
running at either 1GHz or 1.266GHz, and 1GB of RAM. It is the first cluster
built at the ERC based on the Intel/Linux architecture instead of the
Sun/SunOS/Solaris architecture. EMPIRE is built primarily with IBM eSeries
x330 rackmountable systems connected via 100Mb/s ethernet with interswitch
communications via Gigabit ethernet.