Single-core performance increases have stalled. To increase available cycles, microprocessor designers have shifted to chip-multiprocessor (CMP) designs. Unfortunately, the additional processors provided by CMPs may remain idle because most applications lack data-parallelism and task-parallelism is unlikely to saturate future CMP designs. The systems community needs to rethink how systems are structured to fully utilize CMPs.
We propose that operating systems be adapted to harness CMP resources by leveraging recent results in Concurrent Threaded Pipeline (pipeline-parallel) organizations. This paper discusses potential performance improvements of CTPs and the necessary OS support.