使用 pandas 数据框向量化 for 循环

Question

我正在尝试为我的物理学做一个项目 class，我们应该在其中模拟带电粒子的运动。我们应该随机生成它们的位置和电荷，但我们必须在一个区域有带正电的粒子，而在其他任何地方都有带负电的粒子。现在，作为概念证明，我尝试只做 10 个粒子，但最终项目将至少有 1000 个。

我的想法是创建一个数据框，其中第一列包含随机生成的费用和运行一个循环以查看我获得的值并将其放置在与接下来的三列相同的数据框中它们生成的位置.

我尝试做一个简单的 for 循环遍历行并边输入数据，但我运行进入 IndexingError：索引器太多。我还希望这个运行尽可能高效，这样如果我扩大粒子数量，它就不会减慢太多。

我还想向量化计算每个粒子运动的操作，因为它基于每个其他粒子的位置，通过正常循环将花费大量计算时间。

任何矢量化优化或卸载到 GPU 都会非常有帮助，谢谢。

# In[1]:


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from mpl_toolkits import mplot3d


# In[2]:


num_points=10
df_position = pd.DataFrame(pd,np.empty((num_points,4)),columns=['Charge','X','Y','Z'])


# In[3]:


charge = np.array([np.random.choice(2,num_points)])

df_position.iloc[:,0]=np.where(df_position["Charge"]==0,-1,1)


# In[4]:


def positive():
    return np.random.uniform(low=0, high=5)
def negative():
    return np.random.uniform(low=5, high=10)


# In[5]:


for row in df_position.itertuples(index=True,name='Charge'):
    if(getattr(row,"Charge")==-1):
        df_position.iloc[row,1]=positive()
        df_position.iloc[row,2]=positive()
        df_position.iloc[row,3]=positive()
    else:
        df_position.iloc[row,1]=negative()
        #this is where I would get the IndexingError and would like to optimize this portion
        df_position.iloc[row,2]=negative()
        df_position.iloc[row,3]=negative()



df_position.iloc[:,0]=np.where(df_position["Charge"]==0,-1,1)


# In[6]:


ax=plt.axes(projection='3d')
ax.set_xlim(0, 10); ax.set_ylim(0, 10); ax.set_zlim(0,10);
xdata=df_position.iloc[:,1]
ydata=df_position.iloc[:,2]
zdata=df_position.iloc[:,3]
chargedata=df_position.iloc[:11,0]
colors = np.where(df_position["Charge"]==1,'r','b')
ax.scatter3D(xdata,ydata,zdata,c=colors,alpha=1)

编辑：我想要结果的数据框是这样的

Charge  X   Y   Z
-1
 1
-1
-1
 1

每个电荷的初始坐标列在各自的列中。这将是一个 3D 数据框，因为我需要在每个时间步之后跟踪它们的所有新位置，以便我可以制作运动动画。每一层都将是完全相同的格式。

Answer 1

一些用于创建数据框的代码：

import numpy as np
import pandas as pd

num_points = 1_000

# uniform distribution of int, not sure it is the best one for your problem
# positive_point = np.random.randint(0, num_points)
positive_point = int(num_points / 100 * np.random.randn() + num_points / 2)
negavite_point = num_points - positive_point

positive_df = pd.DataFrame(
    np.random.uniform(0.0, 5.0, size=[positive_point, 3]), index=[1] * positive_point, columns=['X', 'Y', 'Z']
)
negative_df = pd.DataFrame(
    np.random.uniform(5.0, 10.0, size=[negavite_point, 3]), index=[-1] *negavite_point, columns=['X', 'Y', 'Z']
)

df = pd.concat([positive_df, negative_df])

对于 1,000 或 1,000,000 来说已经相当快了。

编辑：在我的第一个回答中，我完全错过了问题的很大一部分。这个新的应该更合身。

第二次编辑：我对正点的数量使用了比整数均匀分布更好的分布。

使用 pandas 数据框向量化 for 循环

Vectorizing a for loop with a pandas dataframe

numpy

vectorization

dataframe

python-3.x

pandas