子流程执行的顺序及其对操作原子性的影响

Question

我正在学习 python 多处理模块，我找到了 this 示例（这是一个稍微修改过的版本）：

#!/bin/env python
import multiprocessing as mp
import random
import string
import time

# Define an output queue
output = mp.Queue()

# define a example function
def rand_string(length, output):
    time.sleep(1)
    """ Generates a random string of numbers, lower- and uppercase chars. """
    rand_str = ''.join(random.choice(
                    string.ascii_lowercase
                    + string.ascii_uppercase
                    + string.digits)
               for i in range(length))
    result = (len(rand_str), rand_str)
    print result
    time.sleep(1)
    output.put(result)


def queue_size(queue):
    size = int(queue.qsize())
    print size


# Setup a list of processes that we want to run
processes = [mp.Process(target=rand_string, args=(x, output)) for x in range(1,10)]


# Run processes
for p in processes:
    p.start()


# Exit the completed processes
for p in processes:
    p.join()


# Get process results from the output queue
results = [output.get() for p in processes]
print(results)

输出如下：

(3, 'amF')
(1, 'c')
(6, '714CUg')
(4, '10Qg')
(5, 'Yns6h')
(7, 'wsSXj3Z')
(9, 'KRcDTtVZA')
(2, 'Qy')
(8, '50LpMzG9')
[(3, 'amF'), (1, 'c'), (6, '714CUg'), (4, '10Qg'), (5, 'Yns6h'), (9, 'KRcDTtVZA'), (2, 'Qy'), (7, 'wsSXj3Z'), (8, '50LpMzG9')]

我知道进程不是按照创建的顺序调用的（使用 processes = [mp.Process(target=rand_string, args=(x, output)) for x in range(1,10)]），这在参考文章中提到过。我不明白（或者我不确定是否理解正确）是为什么 result 的顺序与 print 将 result 输出到 STDOUT 的顺序不一致？我对此的理解是这三个操作不是原子的（我的意思是它们可以通过进程切换来分隔）：

    print result
    time.sleep(1)
    output.put(result)

基本上这里发生的事情是，在进程 print 将 results 到 STDOUT 的那一刻，它被切换到另一个写入 results 的进程。类似的东西：

Time 
------------------------------------------------------------------------------------------------------------------>
Process1: print results |               |                                    | time.sleep(1) | output.put(result) |
Process2:               | print results | time.sleep(1) | output.put(result) |               |                    |

在这种情况下，STDOUT 上的输出将是：

(1, 'c')
(2, 's5')

但是results的实际内容是：

[ (2, 's5') (1, 'c')]

出于同样的原因，进程在创建时并没有按顺序启动。

我说得对吗？

Answer 1

是的，您是对的——进程不会以锁步方式执行。现代 OSes 使用复杂的算法来决定何时从一个进程切换到另一个进程，但这些算法不向任何进程提供任何形式的保证，即它相对于另一个具有相同优先级的进程（通常只对不同优先级进程的有限保证）。

通常，进程在等待 OS 时或当前时间片（基于硬件节拍中断）到期时被阻塞。这些周期性地发生，但是前台任务在一个 tick 期间接收的时间量有很大的不同，这取决于后台发生的事情，以及进程何时被切换（可能是因为另一个进程被关闭，因为它阻塞了 I/O).

如果您重新运行您的测试并加载不同的系统，您很可能会得到不同的结果。（而且每个过程要做的工作越多，您也就越有可能看到不同的结果。）

Answer 2

你是对的。操作系统kernel can and will perform context switches however and whenever it pleases to do so. The Python interpreter (or Just-In-Time compiler or whatever) is an userspace程序，完全受内核控制

这个"kernel/user slavery"因此被传递给"from father to child"，或者换句话说，Python程序是在解释器的帮助下，而解释器又在内核的帮助下。

因此，用户空间程序（例如 Python 应用程序）确保同步的唯一方法是使用锁定原语，例如 mutexes or other synchronization primitives.

现在，在现实世界中，什么通常会导致写入文件时发生上下文切换（例如 stdout，默认情况下由 print 完成），很多的昂贵操作应该完成，例如 system calls, complex memory remappings and black-magic-ies, and loopback mechanisms (such as when stdout refers to a pseudo-terminal，这是当今最常见的情况）。

子流程执行的顺序及其对操作原子性的影响

order of subprocesses execution and it's impact on operations atomicity

python

multithreading

multiprocessing

python-multiprocessing