Python: 如何将数组分成块
Python: How to separate array into chunks
我对python编程还很陌生
我有一个数组,我正试图分解成块。
我的数组中似乎有多个数组(我认为)。
输出看起来像这样:
[array([None, '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
'0', '0', '0', '0', '0', '0', '0', '0', None, None, None],
dtype=object)
array([None, None, '0', '0', '0', '1', '0', '0', '0', '0', None, None,
None, None, None, None, None, None, None, None, None, None, None,
None], dtype=object)
array([None, None, '0', '0', '0', '0', '0', '0', None, None, None, None,
None, None, None, None, None, None, None, None, None, None, None,
None], dtype=object)
这是打印输出的片段。有没有办法在一个 24 列的数组中显示此输出?
我根据我创建的包含 24 列的数据框创建了我的数组。我想使用 for 循环填充这些列。循环有效,但它只填充数组。
这是我的数据框的一些示例输出。我有 24 "status" 列和一个名为 "Account Opened Date"
的列
这是状态列之一的输出:
0 1
1 0
2 P
3 0
4 None
Name: status6, dtype: object
想法是获取所有 24 个状态列的输出并将它们放在名为 "stat" 的新列中,该列的范围也为 24。所以status 24 的输出将填充到 stat 1 中,而 status 23 将填充到 统计 2 等
我看到了这个关于如何将数组分成块的示例,但我无法获得想要的输出。 https://www.geeksforgeeks.org/break-list-chunks-size-n-python/
from datetime import date
import pandas as pd
df = pd.read_sql(sql,cnxn)
#add stat1-24 into the data frame
df = df.join(pd.DataFrame({
'stat1':'','stat2':'','stat3':'','stat4':'',
'stat5':'','stat6':'','stat7':'','stat8':'',
'stat9':'','stat10':'','stat11':'','stat12':'',
'stat13':'','stat14':'','stat15':'','stat16':'',
'stat17':'','stat18':'','stat19':'','stat20':'',
'stat21':'','stat22':'','stat23':'','stat24':'',},index=df.index))
#call status1-24 from the data frame and store the columns in an array
status = df.as_matrix(columns=df.columns[6:30])
#call stat1-24 from the data frame and store the columns in an array
stat = df.as_matrix(columns=df.columns[31:55])
l = len(df)
#calculate difference in months between startDate and AccountOpenedDate
def monthly_diff(d2,startDate):
return(d2.year - startDate.year) * 12 + d2.month - startDate.month
startDate = date(year=2017, month = 7, day = 1)
df['Difference_IN_Months'] = df['AccountOpenedDate']
for x in range(l):
d2_1=df['AccountOpenedDate'][x]
d2=d2_1.date()
df['Difference_IN_Months'][x]= monthly_diff(d2,startDate)
for i in range(0,23):
if 3 <= 24 - monthly_diff(d2,startDate) - i + 1 <=24:
stat[x,i] = status[24 - monthly_diff(d2,startDate) - i + 1]
else: stat[x,i]=''
print(stat[1,:])
我希望我的代码不会太混乱。一切正常,除了我的数组 "stat" 应该用相关数据填充我的数据框列 (stat1-stat24) 的部分。
这是我从你的代码和问题中所能理解的最好的。
import pandas as pd
import numpy as np
start=0
l=[np.array([None, '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
'0', '0', '0', '0', '0', '0', '0', '0', None, None, None],
dtype=object),
np.array([None, None, '0', '0', '0', '1', '0', '0', '0', '0', None, None,
None, None, None, None, None, None, None, None, None, None, None,
None], dtype=object),
np.array([None, None, '0', '0', '0', '0', '0', '0', None, None, None, None,
None, None, None, None, None, None, None, None, None, None, None,
None], dtype=object)]
d={'stat1':'','stat2':'','stat3':'','stat4':'','stat5':'','stat6':'','stat7':'','stat8':'','stat9':'','stat10':'','stat11':'','stat12':'','stat13':'','stat14':'','stat15':'','stat16':'','stat17':'','stat18':'','stat19':'','stat20':'','stat21':'','stat22':'','stat23':'','stat24':''}
df = pd.DataFrame(d,index=[0])
print(df)
for i in l:
df.loc[len(df)] = i
print(df)
输出:
stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9 ... stat16 stat17 stat18 stat19 stat20 stat21 stat22 stat23 stat24
0 ...
[1 rows x 24 columns]
stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9 ... stat16 stat17 stat18 stat19 stat20 stat21 stat22 stat23 stat24
0 ...
1 None 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 None None None
2 None None 0 0 0 1 0 0 0 ... None None None None None None None None None
3 None None 0 0 0 0 0 0 None ... None None None None None None None None None
[4 rows x 24 columns]
正如我在您的示例数据中所理解的那样,您的数组中有 "array rows",您希望将这些 "nested rows" 转换为列。如果是这种情况,您可以执行以下操作(我假设原始数组存储在 array_to_split
中):
# Create a 24xnumber_of_nested_arrays size array
array_split_to_columns = np.zeros((len(array_to_split[0]), len(array_to_split)))
# Then fill it with the data of the nested array
for column in range(0, len(array_to_split)):
array_split_to_columns[:,column] = array_to_split[column]
在这种情况下,array_split_to_columns
变量如下所示:
[[nan nan nan]
[ 0. nan nan]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. nan]
[ 0. 0. nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[nan nan nan]
[nan nan nan]
[nan nan nan]]
希望它能帮助您填充 Pandas 数据框。如果您有任何问题,请随时提问:)
我对python编程还很陌生
我有一个数组,我正试图分解成块。 我的数组中似乎有多个数组(我认为)。
输出看起来像这样:
[array([None, '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
'0', '0', '0', '0', '0', '0', '0', '0', None, None, None],
dtype=object)
array([None, None, '0', '0', '0', '1', '0', '0', '0', '0', None, None,
None, None, None, None, None, None, None, None, None, None, None,
None], dtype=object)
array([None, None, '0', '0', '0', '0', '0', '0', None, None, None, None,
None, None, None, None, None, None, None, None, None, None, None,
None], dtype=object)
这是打印输出的片段。有没有办法在一个 24 列的数组中显示此输出?
我根据我创建的包含 24 列的数据框创建了我的数组。我想使用 for 循环填充这些列。循环有效,但它只填充数组。
这是我的数据框的一些示例输出。我有 24 "status" 列和一个名为 "Account Opened Date"
的列这是状态列之一的输出:
0 1
1 0
2 P
3 0
4 None
Name: status6, dtype: object
想法是获取所有 24 个状态列的输出并将它们放在名为 "stat" 的新列中,该列的范围也为 24。所以status 24 的输出将填充到 stat 1 中,而 status 23 将填充到 统计 2 等
我看到了这个关于如何将数组分成块的示例,但我无法获得想要的输出。 https://www.geeksforgeeks.org/break-list-chunks-size-n-python/
from datetime import date
import pandas as pd
df = pd.read_sql(sql,cnxn)
#add stat1-24 into the data frame
df = df.join(pd.DataFrame({
'stat1':'','stat2':'','stat3':'','stat4':'',
'stat5':'','stat6':'','stat7':'','stat8':'',
'stat9':'','stat10':'','stat11':'','stat12':'',
'stat13':'','stat14':'','stat15':'','stat16':'',
'stat17':'','stat18':'','stat19':'','stat20':'',
'stat21':'','stat22':'','stat23':'','stat24':'',},index=df.index))
#call status1-24 from the data frame and store the columns in an array
status = df.as_matrix(columns=df.columns[6:30])
#call stat1-24 from the data frame and store the columns in an array
stat = df.as_matrix(columns=df.columns[31:55])
l = len(df)
#calculate difference in months between startDate and AccountOpenedDate
def monthly_diff(d2,startDate):
return(d2.year - startDate.year) * 12 + d2.month - startDate.month
startDate = date(year=2017, month = 7, day = 1)
df['Difference_IN_Months'] = df['AccountOpenedDate']
for x in range(l):
d2_1=df['AccountOpenedDate'][x]
d2=d2_1.date()
df['Difference_IN_Months'][x]= monthly_diff(d2,startDate)
for i in range(0,23):
if 3 <= 24 - monthly_diff(d2,startDate) - i + 1 <=24:
stat[x,i] = status[24 - monthly_diff(d2,startDate) - i + 1]
else: stat[x,i]=''
print(stat[1,:])
我希望我的代码不会太混乱。一切正常,除了我的数组 "stat" 应该用相关数据填充我的数据框列 (stat1-stat24) 的部分。
这是我从你的代码和问题中所能理解的最好的。
import pandas as pd
import numpy as np
start=0
l=[np.array([None, '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0', '0',
'0', '0', '0', '0', '0', '0', '0', '0', None, None, None],
dtype=object),
np.array([None, None, '0', '0', '0', '1', '0', '0', '0', '0', None, None,
None, None, None, None, None, None, None, None, None, None, None,
None], dtype=object),
np.array([None, None, '0', '0', '0', '0', '0', '0', None, None, None, None,
None, None, None, None, None, None, None, None, None, None, None,
None], dtype=object)]
d={'stat1':'','stat2':'','stat3':'','stat4':'','stat5':'','stat6':'','stat7':'','stat8':'','stat9':'','stat10':'','stat11':'','stat12':'','stat13':'','stat14':'','stat15':'','stat16':'','stat17':'','stat18':'','stat19':'','stat20':'','stat21':'','stat22':'','stat23':'','stat24':''}
df = pd.DataFrame(d,index=[0])
print(df)
for i in l:
df.loc[len(df)] = i
print(df)
输出:
stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9 ... stat16 stat17 stat18 stat19 stat20 stat21 stat22 stat23 stat24
0 ...
[1 rows x 24 columns]
stat1 stat2 stat3 stat4 stat5 stat6 stat7 stat8 stat9 ... stat16 stat17 stat18 stat19 stat20 stat21 stat22 stat23 stat24
0 ...
1 None 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 None None None
2 None None 0 0 0 1 0 0 0 ... None None None None None None None None None
3 None None 0 0 0 0 0 0 None ... None None None None None None None None None
[4 rows x 24 columns]
正如我在您的示例数据中所理解的那样,您的数组中有 "array rows",您希望将这些 "nested rows" 转换为列。如果是这种情况,您可以执行以下操作(我假设原始数组存储在 array_to_split
中):
# Create a 24xnumber_of_nested_arrays size array
array_split_to_columns = np.zeros((len(array_to_split[0]), len(array_to_split)))
# Then fill it with the data of the nested array
for column in range(0, len(array_to_split)):
array_split_to_columns[:,column] = array_to_split[column]
在这种情况下,array_split_to_columns
变量如下所示:
[[nan nan nan]
[ 0. nan nan]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. nan]
[ 0. 0. nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[ 0. nan nan]
[nan nan nan]
[nan nan nan]
[nan nan nan]]
希望它能帮助您填充 Pandas 数据框。如果您有任何问题,请随时提问:)