Lecture Parallel and Distributed Embedded Systems


VAK 03-ME-712.06
Category: Lecture+Lesson, 4 SWS
Master Course
ECTS: 6, Sommer Semester
University of Bremen
Lecturer: PD Dr. Stefan Bosse

This lecture is intended to illustrate the increasing importance of parallel data processing in computer science and hardware chip design. Parallel processing concepts are well known in algorithms and software engineering for several decades. An algorithm is partitioned into subprocesses that are run in parallel and concurrently on several processors. However, applications of parallel and distributed algorithms have so far often been in the area of computer-intensive numerics. The development tends to create more powerful microprocessors, with the result of increasing complexity and much more important with an increase in electrical power consumption. It can be shown that the decomposition of a complex system into a system of cooperating less complex systems with the same overall computing performance has significant advantages:

The structures and algorithms known from classical multiprocessing techniques can be transfered under certain constraints in the design of digital logic systems so that a hardware design can be made with software engineering models, e.g. Multi-process models with interprocess communication primitives such as semaphores or queues. The trend in the hardware design therefore goes towards the sea-of-processor concepts with up to 1000 (simple) processor cores on a single chip. In this hardware design, system partitioning and communication play a central role. A combined hardware software co-design is essential here.

However, the design of parallel systems involves a few pitfalls and difficulties, and there is also no optimal generic computer architecture for parallel systems, such as is present in sequential data processing. Scaling, synchronization, deadlocks, the handling of competition from multi-process-based parallel and distributed data processing place high demands on the understanding of parallel systems and their design, acquired both theoretically, by examples, and practically in lessons. The understanding of the problem of parallel systems and their synchronization is introduced with state space diagrams and a simulator (CPV). In the following, the CSP-based parallel programming language OCCAM with a compiler and virtual machine is used (recently replaced by a CSP JavaScript implementation using jxcore+ and threads.js). The implementation of simple parallel data processing systems on an FPGA is practically shown using the ConPro high-level synthesis framework and the XilinX ISE FPGA design suite.

The course content is structured as follows: Classical multiprocess model with communicating sequential processes including process algebra, discussion of synchronization, extension of the classical CSP model with competition and global shared resources, mapping this extended CCSP model on Register-Transer architecture level, and finally discussing the properties of parallels and distributed systems in general. The examination consists of an oral final examination.


  1. Motivation and introduction
    • Use and limits of single-processor systems
    • Architecture of a single-processor system
    • Programmed execution
  2. Multi-process model
    • With generic processors
    • Scaling to application-specific digital logic systems
  3. Multiprocessor architectures
    • With generic processors
    • Scaling to application-specific digital logic systems
  4. Inter-process Communication {Mutex, Semaphore, Event, Queue, Barrier, Monitor}
    • software
    • hardware
  5. Parallel algorithms
    • software
    • hardware
  6. Parallel architectures
    • System-on-chip architecture
    • Using FPGAs & ASICs
  7. Logic synthesis and high-level synthesis for behavioral modeling of the system
  8. Pipelined architectures
    • In functional systems
    • In reactive systems
  9. Petri Nets


[Lecture Script]
PARSYS 2017, Revision 20.6.2017, PDF