Concurrency with Python: Separating Identity From State

The Concurrency with Python Series:

δὶς ἐς τὸν αὐτὸν ποταμὸν οὐκ ἂν ἐμβαίης.

You could not step twice into the same river, for other waters are ever flowing onto you.

Heraclitus, in Plato's "Cratylus"

Overview

In "Seven Concurrency Models in Seven Weeks", Paul Butcher forks off the previous chapter's focus on functional programming to discuss the benefits of Clojure, a Lisp implemented on the JVM, in detail. In doing so, he bases an entire concurrency model on a key design aspect of Clojure: the ability for Clojure's types to separate identity from state.

Clojure has an informative and enlightening section within its design philosophy about how it approaches state. In particular, Rich Hickey, the author of Clojure, states (emphasis added):

Identities are mental tools we use to superimpose continuity on a world which is constantly, functionally, creating new values of itself.

Please refer to the community Clojure documentation on concurrency and parallelism for information on how to take advantage of concurrency constructs in Clojure.

Properties of Concurrency-aware State

One way programming languages implement proper concurrency control is by adhering to a set of integrity guarantees. As Edsger Dijkstra once said, testing only shows the presence, not the absence of bugs. Guarantees are much more useful than testing for reliably reducing error spaces. The ones that matter to separating identity from state include, but are not limited to:

Atomicity / Linearizability

Atomicity is the ability to restrict the possibility of different state mutations to the set that are both valid and serializable. This is useful when objects may be referenced and manipulated by multiple concurrently running threads, as for any given atomic object, every thread sees every other thread execute exactly one operation, and obtains the context of when it is safe to interrupt a thread (i.e. not when an atomic operation is currently taking place).

Atomic operations are oftentimes natively supported in modern hardware, such as the compare-and-swap operation implemented as CMPXCHG in x86 assembly. This means given good interop between hardware and software, you may be able to propagate hardware-level atomic guarantees to the software level, given an intelligent enough compiler.

Atomicity helps separate identity and state because it guarantees state will always be valid. Either any stateful updates to the identity will succeed and all threads will eventually see it, or the update will fail, any intermediate state will be rolled back by the language itself, and the original identity will remain unaltered. This transaction-like semantic enables you to maintain multiple references to state within the identity at different points in time, without worrying that state will become invalid and your reads/writes become torn and your data corrupted.

Without atomicity, concurrency models are much harder to work with, because without specification guarantees, the source belies and highly depends upon the underlying implementation of the language. For example, in Java running on 32-bit JVMs, long value assignment is not an atomic operation, as this Stack Overflow answer explains. It also mentions how 64-bit JVMs do not suffer this issue as larger register sizes allow for atomic assignments, which indicates debugging such an issue tightly couples workflows to specific deployment environments, which partially defeats the purpose of the virtual machine. In Python, A. Jesse Jiryu Davis explains how the Python swap operation explodes into multiple bytecode instructions, resulting in race condition failures when testing a new build of pymongo. From these examples, we can see how while each bytecode instruction is atomic, the many to many relationships between source and bytecode necessitates explicit atomic guarantees at the source level.

In Python, you may use the dis module to see how source breaks down into bytecode. For example, there is no notable difference between int and long types, as the LOAD_GLOBAL opcode loads both types in a single bytecode operation:

username@hostid:~$ ipython
Python 3.7.1 (default, Dec 14 2018, 13:28:58)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.2.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: def myfunc1():
   ...:     x = int(1)
   ...:

In [2]: import dis

In [3]: dis.dis(myfunc1)
  2           0 LOAD_GLOBAL              0 (int)
              2 LOAD_CONST               1 (1)
              4 CALL_FUNCTION            1
              6 STORE_FAST               0 (x)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

In [4]: def myfunc2():
   ...:     x = long(1)
   ...:

In [5]: dis.dis(myfunc2)
  2           0 LOAD_GLOBAL              0 (long)
              2 LOAD_CONST               1 (1)
              4 CALL_FUNCTION            1
              6 STORE_FAST               0 (x)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE

This is because Python int types are object definitions, as detailed in this Stack Overflow answer, and not strictly primitives. In general though, do not rely on the atomicity of Python bytecode sequences. As Russell Smith detailed in a great blog post, bytecode atomicity from source may be highly dependent on implementation details of the specific Python distribution used, rather than at the specification level.

(As an aside, the Python virtual machine may render incorrect results for whether the process is running as 32-bit or 64-bit, necessitating special checking of sys.maxsize instead of platform.architecture() as this Stack Overflow answer mentions.)

This issue is also why immutable types are not retrofitted to newer versions of Python. As Guido mentioned when rejecting PEP-416 to add an immutable dict type definition to the set of Python native types, there is no guarantee that the bytecode can be made to be atomic, and hence object immutability with regards to thread safety is not guaranteed by the interpreter.

In addition, Python makes atomicity difficult by defining operator overloading behavior, which brings atomic control of object definitions to the application level, where atomicity may be harder to correctly implement and verify.

As an alternative to Python's extensive suite of atomic accessors to mutable state, like locks and semaphores, Google's Python style guide recommends the usage of the queue class to pass state across threads and guarantee thread safety. Python's documentation explicitly assures the developer that the queue module automatically takes care of thread synchronization. Hence, you could treat method accesses to Queue objects to be atomic, as other threads cannot interrupt those operations due to underlying synchronization primitives. Eli Bendersky wrote a great blog post on how to apply this pattern of queues and worker threads, and define atomic behavior in worker threads at the application level.

Persistence / Durability

Persistence, at least with respect to data structures, refers to the ability to preserve prior states upon mutation. As this Stack Overflow answer explains, persistence is an interface, while immutability is an implementation; all persistent data structures appear immutable to the application developer, but not all immutable data structures trace state mutations over time.

To see what this means visually, we can open up a Leiningen REPL session and see how Clojure implements persistent data structures.

username@hostid:~$ lein repl
nREPL server started on port 56980 on host 127.0.0.1 - nrepl://127.0.0.1:56980
REPL-y 0.3.7, nREPL 0.2.12
Clojure 1.8.0
Java HotSpot(TM) 64-Bit Server VM 1.8.0_131-b11
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
    Exit: Control+D or (exit) or (quit)
 Results: Stored in vars *1, *2, *3, an exception in *e

user=>

(All code and visualization examples in this specific section come from Butcher's book, and all credit for these examples go to him. I don't think I can explain it better.)

If you create a list called listv1, with elements [1, 2, 3]:

user=> (def listv1 (list 1 2 3))
#'user/listv1
user=> listv1
(1 2 3)

This is how it's referred to in memory:

digraph foo {
        rankdir=LR;
        node [shape=record];
        listv1 [label="{ <data> listv1 }", fontname="courier", style=rounded]
        a [label="{ <data> 1 | <ref>  }", width=1]
        b [label="{ <data> 2 | <ref>  }", width=1];
        c [label="{ <data> 3 }", width=0.5];
        a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
        b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
        listv1:ref:listv1 -> a:data [arrowhead=vee]
}

If you modified this list using the cons function:

user=> (def listv2 (cons 4 listv1))
#'user/listv2
user=> listv2
(4 1 2 3)

The underlying memory representation of the data structure would become:

digraph foo {
        rankdir=LR;
        node [shape=record];
        listv1 [label="{ <data> listv1 }", fontname="courier", style=rounded]
        listv2 [label="{ <data> listv2 }", fontname="courier", style="rounded,bold"]
        a [label="{ <data> 1 | <ref>  }", width=1]
        b [label="{ <data> 2 | <ref>  }", width=1];
        c [label="{ <data> 3 }", width=0.5];
        d [label="{ <data> 4 | <ref> }", width=1, style=bold]
        a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
        b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
        d:ref:d -> a:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, headport=nsw]
        listv1:ref:listv1 -> a:data [arrowhead=vee, headport=nnw]
        listv2:ref:listv2 -> d:data [arrowhead=vee]
}

If you modified it further by using yet another variable assignment and the rest function:

user=> (def listv3 (cons 5 (rest listv1)))
#'user/listv3
user=> listv3
(5 2 3)

The underlying memory representation updates to:

digraph foo {
        rankdir=LR;
        node [shape=record];
        listv1 [label="{ <data> listv1 }", fontname="courier", style=rounded]
        listv2 [label="{ <data> listv2 }", fontname="courier", style=rounded]
        listv3 [label="{ <data> listv3 }", fontname="courier", style="rounded,bold"]
        a [label="{ <data> 1 | <ref>  }", width=1]
        b [label="{ <data> 2 | <ref>  }", width=1];
        c [label="{ <data> 3 }", width=0.5];
        d [label="{ <data> 4 | <ref> }", width=1]
        e [label="{ <data> 5 | <ref> }", width=1, style="bold"]
        a:ref:c -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, headport=nnw];
        b:ref:c -> c:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false];
        d:ref:d -> a:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, headport=nsw]
        e:ref:e -> b:data [arrowhead=vee, arrowtail=dot, dir=both, tailclip=false, headport=nnw]
        listv1:ref:listv1 -> a:data [arrowhead=vee, headport=nnw]
        listv2:ref:listv2 -> d:data [arrowhead=vee]
        listv3:ref:listv3 -> e:data [arrowhead=vee]
}

Note how Clojure data structures are not creating a deep copy in memory for every list, but rather appends nodes to a tree and preserves a functional interface by maintaining multiple references. Clojure only creates a full copy of a list if the tail of the list is not common. This property relaxation allows Clojure to save some time and memory.

You can also see the beauty of separating identity and state. The variables referencing the individual lists are the states of the data structure identity, and the state can be mutated without changing the identity. If you have an outdated reference to the identity (e.g. reassignment of variable name listv1), you don't encounter a race condition, where if you time it wrong, the result is completely invalid. You just get imperfect state (e.g. an outdated reference to listv1 while listv1 is undergoing reassignment), which can be refreshed by retrying the request. This makes concurrency with data structures much safer to work with and reason about, and can only take place if your programming framework does not complect identity and state.

There are academic implementations of persistent data structures in Python, such as pyrsistent or pyrthon, but after inspecting the source code for each, it does not appear the libraries take care to ensure thread safety through synchronization or atomicity (e.g. using compound operations like += or indexing directly into mutable dict types without locking), and use immutable constructs like list comprehensions. This implies the persistent interface of the library may break when multiple threads are used to access shared persistent state, and cannot complect persistence and immutability, meaning you cannot take advantage of these properties with regards to concurrency, as you may in Clojure. The difficulty again resides in using language primitives that do not support persistence as top-level guarantees.

Transactional References and Software Transactional Memory

The Clojure documentation on transactional references lists another way to safely and concurrently modify mutable state. Transactional references leverage software transactional memory, where writes are staged and verified before committing. This approach removes the need for developers to implement fine-grained locking, resulting in simpler and more intuitive concurrency control at the application layer. Clojure implements STM using multi-version concurrency control and snapshot isolation, which keep multiple copies of the same identity at different points in time, in order for requests at a particular point in time to see the database at the same point in time, in order to prevent write skew errors.

While promising, STM hasn't taken off for a number of reasons. Cașcaval et al. discuss the limitations of STM in the ACM after implementing transactional memory for IBM AIX; they mention the lack of determinism in debugging due to lack of idempotency during transaction abortion/retry cycles, the need for transactional memory paradigms to access non-transactional data and break strong atomicity, and performance penalties at low degrees of parallelism due to indexing and validating multiple snapshots, all of which prevent serious adoption of STM.

The most serious effort I've seen in Python relating to STM is pypy-stm, a branch of pypy that removes the global interpreter lock entirely. While the spec is not completely implemented (as noted in the documentation), one of the most interesting sections is transaction.atomic, which when used within a context manager, allows you to define long-running transactions (sections of Python code that will execute without releasing the GIL).

This distribution of pypy is far from production-ready. In particular, there are no consistent stable/nightly builds, only one distribution/architecture combination (64-bit Linux) is supported, the documentation is incomplete and outdated, returns on investment are fairly low as TM is still an open research topic, and building the source yourself requires, among other things, a special build of gcc. Given these difficulties, I can say transactional memory in Python is not currently practical (which is absolutely not a reflection of the core dev's efforts).

Conclusion

Unlike other concurrency paradigms, there is really no way to retrofit property adherence onto a language specification. It has to be baked into the language design from the outset. In this sense, Clojure, a spec-first and concurrency-first language, has several advantages that Python does not and likely will never have. Clojure also has the benefit of the JVM ecosystem, which some may underestimate in size and significance; one only needs to look at the OpenJDK census, recognize that many if not most of these hundreds of maintainers are paid six figures to work on OpenJDK during their day jobs, and compare it with pypy-stm's <5 primary maintainers, to realize the difference in scale of efforts and intellectual network and compounding abilities.

Perhaps the best way in order to use Python in a properties-conscious manner may be to write your own plugins and checkers for pylint or another static analysis tool. For example, if you write a pylint checker to avoid the use of compound operations like += and validated them in your CI/CD pipeline, you could restrict yourself to a subset of Python's functionality that respects the higher-level principles important to your application's fundamental design assumptions. You could always explicitly opt-in to the feature by commenting #pylint: disable=specific-rule and having the opt-in documented during code search. Of course, this assumes a deep understanding of Python, a stable deployment environment with low dimensionality deltas (e.g. not many new dependencies and C extensions), and a willingness to bear the additional operational expenses.

(Correction on 05/20/2019): The original version of this blog post misspelled Rich Hickey's name as "Rich Hickley"; this has been updated.