Python for循环:正确实现多处理

最后发布: 2019-12-03 10:22:43


问题

以下for循环是迭代仿真过程的一部分,并且是计算时间的主要瓶颈:

import numpy as np

n_int = 10

class Simulation(object):

    def loop(self):

        for itr in range(n_int):

            cols_red_list = []
            rows_list = list(range(2500))
            diff = np.random.uniform(-1, 1, (2500, 300))

            for row in rows_list:
                col =  next(idx for idx, val in enumerate(diff[row,:]) if val < 0)
                cols_red_list.append(col)
            print(len(cols_red_list))

sim1 = Simulation()
sim1.loop() 

因此,我尝试使用多处理程序包对其进行并行化,以期减少计算时间:

import numpy as np
from multiprocessing import  Pool, cpu_count
from functools import partial

n_int = 10

def crossings(row, diff):
    return next(idx for idx, val in enumerate(diff[row,:]) if val < 0)

class Simulation(object):

    def loop(self):

        for itr in range(n_int):

            rows_list = list(range(2500))
            diff = np.random.uniform(-1, 1, (2500, 300))

            if __name__ == '__main__':
                num_of_workers = cpu_count()
                print('number of CPUs : ', num_of_workers)
                pool = Pool(num_of_workers)
                cols_red_list = pool.map(partial(crossings,diff = diff), rows_list)
                pool.close()
                print(len(cols_red_list))
             #some code.....

sim1 = Simulation()
sim1.loop()

不幸的是,与顺序代码相比,并行化要慢得多。 因此,我的问题是:在该特定示例中,我是否正确使用了多处理程序包? 是否有其他方法可以并行化上述for循环?

python multithreading loops multiprocessing