The Concurrency with Python Series:
- Concurrency with Python: Why?
- Concurrency with Python: Threads and Locks
- Concurrency with Python: Functional Programming
- Concurrency with Python: Separating Identity From State
- Concurrency with Python: Actor Models
- Concurrency with Python: CSP and Coroutines
- Concurrency with Python: Hardware-Based Parallelism
- Concurrency with Python: Data-Intensive Architectures
- Concurrency with Python: Conclusion
The concept of communicating sequential processes, or CSP, is similar to the notion of actor models, but brings added utility to contemporary concurrency challenges. Both of these concurrency models leverage message passing, but whereas actors pass messages between containers of state, the CSP model passes messages between channels, a form of synchronization and communication between coroutines or coroutine-like processes.
Russ Cox has a great writeup about the history of CSP.
This mode of concurrency addresses a number of difficulties with implementing concurrency in production:
Efficiently abstracting away dependencies on underlying hardware: Blocking operations on physical hardware, like CPU or I/O, are very expensive. Designing an entire virtual machine to efficiently abstract away the hardware reality (e.g. Erlang processes on the BEAM VM) is even more expensive. Finding a happy medium where the code sticks close to the hardware, but also abstracts away blocking actions away from hardware primitives with low implementation costs, is important in order to maintain performance and simplicity. CSP provides this by multiplexing and scheduling coroutines on top of a CPU thread pool, establishing a many to many relationship between language threads/processes and hardware threads/processes. Since the scheduler is just a state machine, it can be compiled into a binary and loaded anywhere without the need for complex configuration.
For example, A
golangcoroutine, or "goroutine", can be as cheap as a few kilobytes of memory to spin up. The
golangFAQ mentions an average overhead of three CPU instructions per goroutine. This improves the cost/benefit analysis for organizations considering CSP over threading and locking or another concurrency model where and when performance matters.
Increasing user adoption: The focus on channels, rather than application code, not only provides an easier way for new programmers to wrap their head around the language and the concurrency model in a two-step process, but also reduces the pain of transitioning legacy code written in the language to use the favored concurrency model during the "make it work, make it right, make it fast" phases. The actor model forces language users to containerize and break up their state to fit the concurrency model from the very beginning, which is a jarring shift from, say, object-oriented programming. The ability of a scheduler to simply pause a coroutine if it blocks, and resume executing the coroutine after it unblocks and a machine thread is made available, gives back control of application state in source code to the user. This allows for unbounded state mutation within a particular coroutine (as long as state is scoped within the coroutine), which is a more familiar concept to many programmers than packaging state in messages, and better models stateful, sequential problems.
You may not want to use CSP if:
Coroutines need to share lots of data across channels: Ravelin shared some information on
golangchannels that indicated sharing data between goroutines across channels was expensive in comparison to keeping data within a set goroutine. CSP channels are also inherently synchronous. Throughput is likely much better if CPU-bound tasks were kept within the goroutine, and goroutines communicated to resources through I/O.
Sylvain Wallez also wrote a great blog post on
golang's failings that described, among other things, the difficulty of sharing data structures and memory across channels when the data structures
golangprovides are not concurrency-first (e.g. mutable, non-atomic), which may result in non-trivial debugging overhead.
The most popular example of CSP in a programming language at the moment is
golang, whose concurrency model is
designed around a derivative of Hoare's CSP process calculus and implements
channels as first-class citizens.
You can see the CSP scheduler implemented in
Python CSP Libraries
Python does not natively support CSP or channels as first-class citizens. However, a number of small academic projects have provided a base layer from which a CSP framework could arise:
mpi4py: Python bindings for the
C/C++-based Message Passing Interface, or MPI. A number of other frameworks supporting Python bindings for MPI exist as well, including native bindings from Boost. Research has been done in constructing a CSP abstraction layer on top of MPI, but it doesn't look like it is production-ready, at least from reading one reference.
For example, in order to parallelize multiple
python-cspoverrides floor division for
CSPProcessobjects with the
def __floordiv__(self, proclist): """ Run this process in parallel with a list of others. """ par = Par(self, *list(proclist)) par.start()
This gives rise to the example listed in the documentation about parallel processes:
>>> @process ... def foo(n): ... # Function ... >>> foo(100) // (foo(200), foo(300)) # Output <Par(Par-5, initial)> >>>
Not recommended in production due to a lack of continued development, a lack of feature stability, incomplete documentation, and no description of common ways
python-cspmay fail (no description of the error model).
pycsp: A CSP framework with some extensions based on π-calculus. From a cursory inspection, this framework appears to be well-documented and built out, with plenty of examples to inspect and run. One major downside, shared with other concurrency libraries built on top of Python, is the lack of user adoption shown in the number of outstanding issues and average time for issue resolution.
Python CSP Primitives
Unlike other concurrency models, which suffer from a lack of Pythonic foundations, implementing a CSP process calculus could be done pythonically through the use of Python's native and various coroutines and coroutine libraries.
Python's development of coroutines began with realizing how generator
expressions, combined with the
yield from and
.send() keywords, results in
the same inversion of control that allows for separate tasks to be concurrently
scheduled. Abu Ashraf Masnun wrote a great blog post on this evolutionary
Truly native coroutines, separate from generator coroutines, with specifically
async/await syntax, asynchronous context management, and standard
library support in
sys (among other features), came with
implementation of PEP 492 in Python
Many libraries are racing to take advantage of this native support, a demonstration of how acceptance of a model of programming in the Python standard library drastically and effectively increases user adoption.
Stackless Python is a fork of CPython that supports additional concurrency primitives. The ones relevant to the discusson on CSP include tasklets and channels. Both of these map well to CSP's ideas of coroutines and channels.
greenlet is an evolution of
Stackless Python's idea of tasklets, except that scheduling and other control
flow primitives are handled within user code. Tasklets are a form of
microthread, whereas greenlets are
a form of coroutine. Guido discusses the difference between the two in a
Microthreads are "almost real" threads, with round-robin scheduling. What makes them attractive to some is the fact that they don't use operating system resources: there's no OS-level stack or kernel process table entry associated with them. This also accounts for their biggest weakness: when a microthread is suspended for I/O, all microthreads are suspended. In limited contexts, such as a network server, this can be solved by an approach similar to that in Gordon's SelectDispatcher. (It would be a titanic effort to do this for every potentially blocking I/O operation, and it probably wouldn't work well with C extensions.)
gevent combines the
greenlet library with the
libev event loop library, and
provides Python bindings. Some interesting Python libraries, such as
locust (an HTTP load testing
gunicorn (a web server
framework), are built on top of
Experimental / In-Progress CSP Initiatives
A handful of experimental, Python-specific concurrency initiatives are in the works. If they make it to a stable release point, they may one day form the foundations for CSP in Python (among other types of concurrency models).
Eric Snow opened PEP 525 describing an initiative to utilize subinterpreters within a main Python interpreter, where the global scope of all of Python, including the C extensions API, would be moved one level down in the Python process space. This may help Python natively support an orchestration layer within the language, instead of using a third-party orchestration tool to coordinate multiple distinct Python services. Eric explicitly mentions CSP as an inspiration for how subinterpreters may communicate with one another:
Concurrency is a challenging area of software development. Decades of research and practice have led to a wide variety of concurrency models, each with different goals. Most center on correctness and usability.
One class of concurrency models focuses on isolated threads of execution that interoperate through some message passing scheme. A notable example is Communicating Sequential Processes (CSP) (upon which Go's concurrency is roughly based). The isolation inherent to subinterpreters makes them well-suited to this approach.
pypy is an evolution of Stackless Python, that
combines pluggable garbage
compatibility with most existing CPython code, along with greenlets, in order to
make native Python more performant.
pypy implements greenlets on top of a
is further implemented on top of a
"stacklet". As far as
I can tell, this effort is to make context switching composable and
A Concurrent Sieve of Erathostenes with
The Sieve of Erathostenes is a popular computer science problem that calculates the prime numbers in a bounded sequence by labeling all multiples of an existing prime.
golang implements a neat algorithm to compute sequential primes using
channels; this example is runnable within the
golang playground. The general gist of the algorithm is to have one channel
create a stream of sequentially increasing values, and daisy chain other
channels to the stream to drop values that are multiples. The outputs of the
daisy chain are the prime numbers in the sequence. However, as Stian Eikeland
notes in his blog post on the same topic using Clojure's
core.async, and as this
Hacker News discussion posits,
this concurrency demonstration is mostly academic, as it is not very performant
due to every prime being touched by every channel before the maximum of its
prime factorization is reached.
pycsp has a version of this sieve implementation hosted on its website, but
testing the provided algorithm on a development version of
pycsp checked out
HEAD/fb9e32fd8aa88a33acce40d31aafcfe6693f0fff fails to run. After
manually patching socket handling such that type
bytes was explicitly set:
s = socket.socket(
ip = socket.inet_ntoa(fcntl.ioctl(
0x8915, # SIOCGIFADDR
# NOTE: Line below was originally
The example, as posted here, successfully logs primes between 2 and 2000 to stdout.
Python has non-trivial limitations when it comes to natively implementing CSP.
golang Google Groups
rather eye-opening in terms of exposing how differently
golang and Python
prioritize inter-process communication in syntax and typing. At the same time,
the flexible nature of CSP lends itself to easier implementation on a language
that does not natively support it, as compared to other concurrency models.
While CSP in Python is still an academic discussion, a stable release of Python
sub-interpreters or a production-ready CSP on
async Python library may make
discussions about CSP-like concurrency in production Python environments