Mastering concurrency3 October 2002
Correctly implementing concurrency in computer software is vital for accurate computational modelling, and for the safety and reliability of embedded control systems. By Jim Moores
Concurrency in computer software is required when systems need to perform several, possibly interacting tasks at the same time. Traditionally, explicitly and deliberately introducing parallelism into software systems has been seen as rather foolhardy - a great possible source of extra errors. Most multithreaded software is written without any real formalism or methodology. This has given rise to a whole new class of errors, the most problematic of which are race-hazards, deadlock and livelock.
Race-hazards are caused when different parts (or "threads") of a concurrent program simultaneously modify data that they share. This can lead to a situation when two threads both read the same piece of data and both change its value and store it back in memory. If both threads happen to overlap then the result is that some data is overwritten and lost. This kind of problem can lead to unreliable results and strange errors in simulations.
Deadlock and livelock on the other hand, cause the complete failure of a system. Deadlock occurs when two or more separate threads are indefinitely waiting for each other to perform an action. Deadlock often causes a program to fail every time, but can be probabilistic. Unfortunately, naïve solutions to deadlock often just decrease the probability of deadlock occurring. Probabilistic deadlock is extremely difficult to identify as the time between failures can sometimes be years! In embedded systems, particularly the safety critical systems such as those used in the nuclear industry, the consequences of this could be catastrophic - potentially freezing a system at a critical moment. Months of extensive testing can miss an error such as this.
Computational modelling and simulations are essential to the design and development of new reactor designs, fuels and operational procedures. The most accurate models require many hours of processing on even the most powerful workstation to provide just a few seconds of real-time data. Tackling larger computational models has often meant resorting to the use of expensive supercomputer time and obtaining limited performance scalability with so-called parallelizing Fortran compilers. Such compilers rely on analysing serial array operations and transforming them into parallel operations. Because the programs are written from a serial viewpoint, they often introduce unnecessary ordering into calculations. By facing up to the necessity of parallel processing at the programming level, better efficiencies can be achieved.
Making life easier
Quickstone Technologies (QT) believes that, properly used, threads can make life easier for the programmer by allowing the application design to more closely reflect reality. By using the appropriate techniques, errors can be avoided and, ultimately, programs become more reliable. xCSP is a new way of writing software that avoids many of the problems that conventional techniques face, yet is intuitive and easy to learn. With xCSP, scalability to multiprocessors and clusters comes for free.
xCSP is a practical refinement of the powerful CSP (communicating sequential processes) mathematical notation that has been used for modelling concurrency for many years. Because xCSP is so closely based on mathematics, it is possible to guarantee the elimination of race-hazards and deadlock/livelock using a combination of methodology and mathematical proof. This is of particular importance in safety critical and high reliability systems, but also important to other fields, such as the accuracy of simulations that utilise parallel hardware.
The fundamental method of communication between threads in xCSP is via channels. Channels are carefully synchronised connections between threads down which data can be passed. There is an "Alternative" operation that allows a single thread to listen to several channels and/or timers and returns the first channel or timer to be ready.
To complement channels, xCSP also provides some higher level synchronisation primitives, like dynamic concurrent read/exclusive write (CREW) locks (which only allow a single writer to access a shared variable or resource, but allow concurrent readers between writes) and barrier synchronisations (a lock which blocks threads synchronising on it until a certain number have joined, at which point the lock is released).
QT's flagship product, JCSP Network Edition, is a library and toolset for developing multi-threaded and distributed xCSP programs in Java. Using JCSP means that simulations, embedded systems and many other kinds of software can be more reliable and be written more quickly. Simulations written in JCSP can be run on cheap clusters of commodity workstations rather than expensive supercomputers. Embedded systems can run using multiple processors and/or multithreading, communicating over internal networks in the same way as workstation clusters. xCSP doesn't differentiate between threads on different machines and threads on the same machine, so complex distributed applications can be developed on a single machine and then easily distributed over a network once development is complete.
As hardware demands more and more parallelism from software, QT claims that xCSP will become the standard model for programming concurrent systems over the next decade.