Python 列表理解 - numpy数组
Python List Comprehension - numpy array
当我使用大于 9 的数字时,从列表理解创建的 NumPy 数组的形状不正确请帮助我更正它并解释为什么会这样。请在下面找到代码。
import pandas as pd
import numpy as np
sep_payment = pd.DataFrame({"Creditor":['Axis','RBL_CC','KOTAK_PL','KOTAK_CC','Cashe','SBI','HDFC_Jumbo','HDFC_CC','SCB','Tata Capital','Flex_Salary'],"Priority":[1,2,3,4,5,6,7,8,9,10,11],"Payment_Status":['Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending'],"Credit_Status":['Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending'],"Payment_Date":['-','-','-','-','-','-','-','-','-','-','-'],"Time Taken in Days":[2,5,5,2,5,2,5,5,5,5,2]})
# List comprehension Looped with range 9 NO ERRORS | Output (9, 6)
subb= sep_payment.iloc[1].to_string(index=False).split()
subb
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(9)]
subb2
data= np.array(subb2)
print(data.shape)
# List comprehension Looped with range 10 ERROR in THE SHAPE printed | Output (10,)
subb= sep_payment.iloc[1].to_string(index=False).split()
subb
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(10)]
subb2
data= np.array(subb2)
print(data.shape)
Dataframe
list comprehension
您遇到的问题是由于 space 银行 Tata Capital
行的数据中出现的
第 1 部分:
您的第一个代码是将此字符串(对于行)分成 6 个部分,每个部分因为在 6 列中的任何标记之间没有出现 space。这会产生一个 (9,6) 形状的 numpy 数组,如预期的那样有 9 行和 6 列。
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(9)]
subb2
[['Axis', '1', 'Pending', 'Pending', '-', '2'],
['RBL_CC', '2', 'Pending', 'Pending', '-', '5'],
['KOTAK_PL', '3', 'Pending', 'Pending', '-', '5'],
['KOTAK_CC', '4', 'Pending', 'Pending', '-', '2'],
['Cashe', '5', 'Pending', 'Pending', '-', '5'],
['SBI', '6', 'Pending', 'Pending', '-', '2'],
['HDFC_Jumbo', '7', 'Pending', 'Pending', '-', '5'],
['HDFC_CC', '8', 'Pending', 'Pending', '-', '5'],
['SCB', '9', 'Pending', 'Pending', '-', '5']]
第 2 部分:
然而,在第二部分中,由于 Tata Capital
中的 space,您将所有其他行分成 6 个部分,但其中一排分为 7 个部分。当您尝试将其转换为 numpy 数组时,它会按预期创建一个包含 10 行但 1 列的数组,因为此数组中的每个对象都是一个列表对象并计为 1 项。
这是因为 numpy 中的 ndarray
需要每个轴具有相同的元素。
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(10)]
subb2
[['Axis', '1', 'Pending', 'Pending', '-', '2'],
['RBL_CC', '2', 'Pending', 'Pending', '-', '5'],
['KOTAK_PL', '3', 'Pending', 'Pending', '-', '5'],
['KOTAK_CC', '4', 'Pending', 'Pending', '-', '2'],
['Cashe', '5', 'Pending', 'Pending', '-', '5'],
['SBI', '6', 'Pending', 'Pending', '-', '2'],
['HDFC_Jumbo', '7', 'Pending', 'Pending', '-', '5'],
['HDFC_CC', '8', 'Pending', 'Pending', '-', '5'],
['SCB', '9', 'Pending', 'Pending', '-', '5'],
['Tata', 'Capital', '10', 'Pending', 'Pending', '-', '5']] #<-- CHECK THIS ROWS
解决方案:
只需直接使用 df.to_numpy()
而不是你正在做的事情来获取 numpy 数组..
data = sep_payment.to_numpy()
data
# array([['Axis', 1, 'Pending', 'Pending', '-', 2],
# ['RBL_CC', 2, 'Pending', 'Pending', '-', 5],
# ['KOTAK_PL', 3, 'Pending', 'Pending', '-', 5],
# ['KOTAK_CC', 4, 'Pending', 'Pending', '-', 2],
# ['Cashe', 5, 'Pending', 'Pending', '-', 5],
# ['SBI', 6, 'Pending', 'Pending', '-', 2],
# ['HDFC_Jumbo', 7, 'Pending', 'Pending', '-', 5],
# ['HDFC_CC', 8, 'Pending', 'Pending', '-', 5],
# ['SCB', 9, 'Pending', 'Pending', '-', 5],
# ['Tata Capital', 10, 'Pending', 'Pending', '-', 5],
# ['Flex_Salary', 11, 'Pending', 'Pending', '-', 2]], dtype=object)
data.shape
#(11, 6)
当我使用大于 9 的数字时,从列表理解创建的 NumPy 数组的形状不正确请帮助我更正它并解释为什么会这样。请在下面找到代码。
import pandas as pd
import numpy as np
sep_payment = pd.DataFrame({"Creditor":['Axis','RBL_CC','KOTAK_PL','KOTAK_CC','Cashe','SBI','HDFC_Jumbo','HDFC_CC','SCB','Tata Capital','Flex_Salary'],"Priority":[1,2,3,4,5,6,7,8,9,10,11],"Payment_Status":['Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending'],"Credit_Status":['Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending','Pending'],"Payment_Date":['-','-','-','-','-','-','-','-','-','-','-'],"Time Taken in Days":[2,5,5,2,5,2,5,5,5,5,2]})
# List comprehension Looped with range 9 NO ERRORS | Output (9, 6)
subb= sep_payment.iloc[1].to_string(index=False).split()
subb
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(9)]
subb2
data= np.array(subb2)
print(data.shape)
# List comprehension Looped with range 10 ERROR in THE SHAPE printed | Output (10,)
subb= sep_payment.iloc[1].to_string(index=False).split()
subb
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(10)]
subb2
data= np.array(subb2)
print(data.shape)
Dataframe
list comprehension
您遇到的问题是由于 space 银行 Tata Capital
第 1 部分:
您的第一个代码是将此字符串(对于行)分成 6 个部分,每个部分因为在 6 列中的任何标记之间没有出现 space。这会产生一个 (9,6) 形状的 numpy 数组,如预期的那样有 9 行和 6 列。
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(9)]
subb2
[['Axis', '1', 'Pending', 'Pending', '-', '2'],
['RBL_CC', '2', 'Pending', 'Pending', '-', '5'],
['KOTAK_PL', '3', 'Pending', 'Pending', '-', '5'],
['KOTAK_CC', '4', 'Pending', 'Pending', '-', '2'],
['Cashe', '5', 'Pending', 'Pending', '-', '5'],
['SBI', '6', 'Pending', 'Pending', '-', '2'],
['HDFC_Jumbo', '7', 'Pending', 'Pending', '-', '5'],
['HDFC_CC', '8', 'Pending', 'Pending', '-', '5'],
['SCB', '9', 'Pending', 'Pending', '-', '5']]
第 2 部分:
然而,在第二部分中,由于 Tata Capital
中的 space,您将所有其他行分成 6 个部分,但其中一排分为 7 个部分。当您尝试将其转换为 numpy 数组时,它会按预期创建一个包含 10 行但 1 列的数组,因为此数组中的每个对象都是一个列表对象并计为 1 项。
这是因为 numpy 中的 ndarray
需要每个轴具有相同的元素。
subb2 = [sep_payment.iloc[i].to_string(index=False).split() for i in range(10)]
subb2
[['Axis', '1', 'Pending', 'Pending', '-', '2'],
['RBL_CC', '2', 'Pending', 'Pending', '-', '5'],
['KOTAK_PL', '3', 'Pending', 'Pending', '-', '5'],
['KOTAK_CC', '4', 'Pending', 'Pending', '-', '2'],
['Cashe', '5', 'Pending', 'Pending', '-', '5'],
['SBI', '6', 'Pending', 'Pending', '-', '2'],
['HDFC_Jumbo', '7', 'Pending', 'Pending', '-', '5'],
['HDFC_CC', '8', 'Pending', 'Pending', '-', '5'],
['SCB', '9', 'Pending', 'Pending', '-', '5'],
['Tata', 'Capital', '10', 'Pending', 'Pending', '-', '5']] #<-- CHECK THIS ROWS
解决方案:
只需直接使用 df.to_numpy()
而不是你正在做的事情来获取 numpy 数组..
data = sep_payment.to_numpy()
data
# array([['Axis', 1, 'Pending', 'Pending', '-', 2],
# ['RBL_CC', 2, 'Pending', 'Pending', '-', 5],
# ['KOTAK_PL', 3, 'Pending', 'Pending', '-', 5],
# ['KOTAK_CC', 4, 'Pending', 'Pending', '-', 2],
# ['Cashe', 5, 'Pending', 'Pending', '-', 5],
# ['SBI', 6, 'Pending', 'Pending', '-', 2],
# ['HDFC_Jumbo', 7, 'Pending', 'Pending', '-', 5],
# ['HDFC_CC', 8, 'Pending', 'Pending', '-', 5],
# ['SCB', 9, 'Pending', 'Pending', '-', 5],
# ['Tata Capital', 10, 'Pending', 'Pending', '-', 5],
# ['Flex_Salary', 11, 'Pending', 'Pending', '-', 2]], dtype=object)
data.shape
#(11, 6)