Python: 如何将数组分成块

Python: How to separate array into chunks

我对python编程还很陌生

我有一个数组,我正试图分解成块。 我的数组中似乎有多个数组(我认为)。

输出看起来像这样:

[array([None, '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
       '0', '0', '0', '0', '0', '0', '0', '0', None, None, None],
      dtype=object)
 array([None, None, '0', '0', '0', '1', '0', '0', '0', '0', None, None,
       None, None, None, None, None, None, None, None, None, None, None,
       None], dtype=object)
 array([None, None, '0', '0', '0', '0', '0', '0', None, None, None, None,
       None, None, None, None, None, None, None, None, None, None, None,
       None], dtype=object)

这是打印输出的片段。有没有办法在一个 24 列的数组中显示此输出?

我根据我创建的包含 24 列的数据框创建了我的数组。我想使用 for 循环填充这些列。循环有效,但它只填充数组。

这是我的数据框的一些示例输出。我有 24 "status" 列和一个名为 "Account Opened Date"

的列

这是状态列之一的输出:

0       1
1       0
2       P
3       0
4    None
Name: status6, dtype: object 

想法是获取所有 24 个状态列的输出并将它们放在名为 "stat" 的新列中,该列的范围也为 24。所以status 24 的输出将填充到 stat 1 中,而 status 23 将填充到 统计 2

我看到了这个关于如何将数组分成块的示例,但我无法获得想要的输出。 https://www.geeksforgeeks.org/break-list-chunks-size-n-python/

from datetime import date
import pandas as pd

df = pd.read_sql(sql,cnxn)

#add stat1-24 into the data frame
df = df.join(pd.DataFrame({
        'stat1':'','stat2':'','stat3':'','stat4':'',
        'stat5':'','stat6':'','stat7':'','stat8':'',
        'stat9':'','stat10':'','stat11':'','stat12':'',
        'stat13':'','stat14':'','stat15':'','stat16':'',
        'stat17':'','stat18':'','stat19':'','stat20':'',
        'stat21':'','stat22':'','stat23':'','stat24':'',},index=df.index))

#call status1-24 from the data frame and store the columns in an array
status = df.as_matrix(columns=df.columns[6:30])

#call stat1-24 from the data frame and store the columns in an array
stat = df.as_matrix(columns=df.columns[31:55])

l = len(df)

#calculate difference in months between startDate and AccountOpenedDate
def monthly_diff(d2,startDate):
    return(d2.year - startDate.year) * 12 + d2.month - startDate.month

startDate = date(year=2017, month = 7, day = 1)

df['Difference_IN_Months'] = df['AccountOpenedDate']


for x in range(l):
    d2_1=df['AccountOpenedDate'][x]
    d2=d2_1.date()
    df['Difference_IN_Months'][x]= monthly_diff(d2,startDate)
    for i in range(0,23):
        if 3 <= 24 - monthly_diff(d2,startDate) - i + 1 <=24:    
            stat[x,i] = status[24 - monthly_diff(d2,startDate) - i + 1] 
        else: stat[x,i]=''


print(stat[1,:])

我希望我的代码不会太混乱。一切正常,除了我的数组 "stat" 应该用相关数据填充我的数据框列 (stat1-stat24) 的部分。

这是我从你的代码和问题中所能理解的最好的。

import pandas as pd
import numpy as np



start=0
l=[np.array([None, '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
       '0', '0', '0', '0', '0', '0', '0', '0', None, None, None],
      dtype=object),
 np.array([None, None, '0', '0', '0', '1', '0', '0', '0', '0', None, None,
       None, None, None, None, None, None, None, None, None, None, None,
       None], dtype=object),
 np.array([None, None, '0', '0', '0', '0', '0', '0', None, None, None, None,
       None, None, None, None, None, None, None, None, None, None, None,
       None], dtype=object)]

d={'stat1':'','stat2':'','stat3':'','stat4':'','stat5':'','stat6':'','stat7':'','stat8':'','stat9':'','stat10':'','stat11':'','stat12':'','stat13':'','stat14':'','stat15':'','stat16':'','stat17':'','stat18':'','stat19':'','stat20':'','stat21':'','stat22':'','stat23':'','stat24':''}     
df = pd.DataFrame(d,index=[0])

print(df)
for i in l:
    df.loc[len(df)] = i
print(df)

输出:

  stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9  ... stat16 stat17 stat18 stat19 stat20 stat21 stat22 stat23 stat24
0                                                        ...

[1 rows x 24 columns]


  stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9  ... stat16 stat17 stat18 stat19 stat20 stat21 stat22 stat23 stat24
0                                                        ...
1  None     0     0     0     0     0     0     0     0  ...      0      0      0      0      0      0   None   None   None
2  None  None     0     0     0     1     0     0     0  ...   None   None   None   None   None   None   None   None   None
3  None  None     0     0     0     0     0     0  None  ...   None   None   None   None   None   None   None   None   None

[4 rows x 24 columns]

正如我在您的示例数据中所理解的那样,您的数组中有 "array rows",您希望将这些 "nested rows" 转换为列。如果是这种情况,您可以执行以下操作(我假设原始数组存储在 array_to_split 中):

# Create a 24xnumber_of_nested_arrays size array
array_split_to_columns = np.zeros((len(array_to_split[0]), len(array_to_split)))

# Then fill it with the data of the nested array
for column in range(0, len(array_to_split)):
    array_split_to_columns[:,column] = array_to_split[column]

在这种情况下,array_split_to_columns 变量如下所示:

[[nan nan nan]
 [ 0. nan nan]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  1.  0.]
 [ 0.  0.  0.]
 [ 0.  0.  0.]
 [ 0.  0. nan]
 [ 0.  0. nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [ 0. nan nan]
 [nan nan nan]
 [nan nan nan]
 [nan nan nan]]

希望它能帮助您填充 Pandas 数据框。如果您有任何问题,请随时提问:)