从参差不齐的嵌套序列创建 ndarray 已被弃用......创建 ndarray 时必须指定 'dtype=object'

Creating an ndarray from ragged nested sequences is deprecated ... you must specify 'dtype=object' when creating the ndarray

我正在尝试阅读 Excel 电子表格并根据某些列的数据创建图表。 我正在使用 Python 3.6.2,因为我的脚本中使用了一些库,在较新的版本中效果不佳。

这是我的代码:

import os
import plotly
import plotly.graph_objects as go
import pandas as pd
import plotly.express as px

def create_graph(directoryPath):
    for root, subFolders, files in os.walk(directoryPath):
        for f in files:
            if "screenlog" in f.lower():
                print("Creating graphs from the Screen logs")
                absolutePathOfFile = os.path.join(root, f)
                pd.options.mode.chained_assignment = None  # default="warn"
                #PROCESSING THE EXCEL FILE
                df_init = pd.read_excel(absolutePathOfFile,engine="openpyxl")
                df_init["Screen&Action"]= df_init["SCREEN"].str.cat(df_init["Action_Name"], sep=" - ")
                ##### PLOTING Scatter chart######
                print("Plotting the scatter graphs")
                #using plotly to create a scatter graph
                fig_scatter= px.scatter(df_init, x="INSTANT", y="DURATION",color="Screen&Action",height=500,title="Traditional Screen Analysis")
                fig_scatter.update_layout(showlegend=False,title={       
                        "y":0.9,
                        "x":0.5,
                        "xanchor": "center",
                        "yanchor": "top"})
        fig_scatter.show()

错误信息:

> Creating graphs from the Screen logs
> ...\AppData\Local\Programs\Python\Python36\lib\site-packages\numpy\core\fromnumeric.py:87:
> VisibleDeprecationWarning:
> 
> Creating an ndarray from ragged nested sequences (which is a
> list-or-tuple of lists-or-tuples-or ndarrays with different lengths or
> shapes) is deprecated. If you meant to do this, you must specify
> 'dtype=object' when creating the ndarray
> 
> Plotting the scatter graphs Traceback (most recent call last):   File
> "...\Documents\log_app\log_parser.py", line 3812, in <module>
>     scripts.screen.create_graph(directoryPath)   File "...\Documents\log_app\scripts\screen.py", line 20, in create_graph
>     fig_scatter= px.scatter(df_init, x="INSTANT", y="DURATION",color="Screen&Action",height=500,title="Traditional
> Screen Analysis")   File
> "...\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_chart_types.py",
> line 66, in scatter
>     return make_figure(args=locals(), constructor=go.Scatter)   File "...\AppData\Local\Programs\Python\Python36\lib\site-packages\plotly\express\_core.py",
> line 1988, in make_figure
>     group = grouped.get_group(group_name if len(group_name) > 1 else group_name[0])   File
> "...\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\groupby\groupby.py",
> line 680, in get_group
>     raise KeyError(name) KeyError: (nan, '', '', '', '')

我花了一些时间寻找解决方法,发现有几个网站提到添加“dtype=object”作为额外参数应该可以解决问题。

但是,如何在这一行中添加“dtype=object”?:

fig_scatter= px.scatter(df_init, x="INSTANT", y="DURATION",color="Screen&Action",height=500,title="Traditional Screen Analysis")

我总是遇到语法错误。

数据框:

打印(df_init)

       Tenant_Id  ...                                    Screen&Action
0              1  ...      Screen_Logs - TelemetryClickEvent.SendEvent
1              1  ...                      Screen_Logs - FilterOrReset
2              1  ...  Cyclic_Job_Logs - TelemetryClickEvent.SendEvent
3              1  ...                        Screen_Logs - Preparation
4             20  ...                 EventoDetalhe_New - Load_Atletas
...          ...  ...                                              ...
31893         20  ...                              Login - Preparation
31894          1  ...                              Login - Preparation
31895         20  ...                              Login - Preparation
31896          1  ...                              Login - Preparation
31897         20  ...                              Login - Preparation

[31898 rows x 20 columns]

print(df_init.head(10).to_dict("记录"))

[{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:15.345000'), 
'DURATION': 14, 'SCREEN': 'Screen_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 
'User_Id': 1263, 'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 
'Executed_By': 'E3Q3J-PR4U18', 'Session_Bytes': 8207, 'Viewstate_Bytes': 4204, 
'Session_Requests': 1, 'Access_Mode': 'Ajax', 'Request_Key': 'ac581f0c-ae8a-4f1d-8eea-39c0f89171b7', 
'Action_Name': 'TelemetryClickEvent.SendEvent', 'Espace_Name': 'ServiceCenter', 
'Application_Name': 'Service Center', 'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 
'Screen&Action': 'Screen_Logs - TelemetryClickEvent.SendEvent'}, 
{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:15.202000'), 
'DURATION': 31, 'SCREEN': 'Screen_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 
'User_Id': 1263, 'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 8207, 'Viewstate_Bytes': 4184, 'Session_Requests': 1, 'Access_Mode': 'Ajax', 
'Request_Key': 'd4f00718-3fbe-416d-8540-aeed02f371b3', 'Action_Name': 'FilterOrReset', 
'Espace_Name': 'ServiceCenter', 'Application_Name': 'Service Center', 
'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 'Screen&Action': 
'Screen_Logs - FilterOrReset'}, {'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:11.733000'), 
'DURATION': 15, 'SCREEN': 'Cyclic_Job_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 
'User_Id': 1263, 'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 
'Executed_By': 'E3Q3J-PR4U18', 'Session_Bytes': 8093, 'Viewstate_Bytes': 4160, 
'Session_Requests': 1, 'Access_Mode': 'Ajax', 'Request_Key': 'a35aa5c7-81e9-487a-8df3-474b12df2a1a', 
'Action_Name': 'TelemetryClickEvent.SendEvent', 'Espace_Name': 'ServiceCenter', 
'Application_Name': 'Service Center', 'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 
'Screen&Action': 'Cyclic_Job_Logs - TelemetryClickEvent.SendEvent'}, 
{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:11.733000'), 'DURATION': 797, 
'SCREEN': 'Screen_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 'User_Id': 1263, 
'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 8093, 'Viewstate_Bytes': 0, 'Session_Requests': 1, 
'Access_Mode': 'Screen', 'Request_Key': '4ac69574-86e2-4b85-86eb-e8cda703193c', 
'Action_Name': 'Preparation', 'Espace_Name': 'ServiceCenter', 'Application_Name': 'Service Center', 
'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 'Screen&Action': 'Screen_Logs - Preparation'}, 
{'Tenant_Id': 20, 'INSTANT': Timestamp('2021-11-25 21:44:05.108000'), 'DURATION': 1359, 
'SCREEN': 'EventoDetalhe_New', 'Session_Id': 'jXC8xqSk1Eu0q4C+VQNZww==', 'User_Id': 151, 
'Espace_Id': 80, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 17444, 'Viewstate_Bytes': 45784, 'Session_Requests': 4, 'Access_Mode': 'Ajax', 
'Request_Key': '9f62097f-fe6a-468a-a4d4-5b2c39e295ce', 'Action_Name': 'Load_Atletas', 
'Espace_Name': 'TalentWeb', 'Application_Name': 'TalentWeb', 
'Application_Key': '448b995f-e07e-4a21-a442-12204cfffea6', 
'Screen&Action': 'EventoDetalhe_New - Load_Atletas'}, 
{'Tenant_Id': 20, 'INSTANT': Timestamp('2021-11-25 21:44:04.573000'), 'DURATION': 33, 
'SCREEN': 'EventoDetalhe_New', 'Session_Id': 'jXC8xqSk1Eu0q4C+VQNZww==', 'User_Id': 151, 
'Espace_Id': 80, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U1S', 
'Session_Bytes': 12431, 'Viewstate_Bytes': 45824, 'Session_Requests': 1, 'Access_Mode': 'Ajax', 
'Request_Key': '61d1d2a1-2c56-4e1d-a752-8d1719aa77a8', 'Action_Name': 'Tabs.OnChange', 
'Espace_Name': 'TalentWeb', 'Application_Name': 'TalentWeb', 
'Application_Key': '448b995f-e07e-4a21-a442-12204cfffea6', 'Screen&Action': 'EventoDetalhe_New - Tabs.OnChange'},
 {'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:44:02.658000'), 'DURATION': 31, 
'SCREEN': 'Login', 'Session_Id': 'am9Hoz4suEifYkaRsh8usQ==', 'User_Id': 0, 'Espace_Id': 1, 
'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U2H', 
'Session_Bytes': 0, 'Viewstate_Bytes': 0, 'Session_Requests': 1, 'Access_Mode': 'Screen', 
'Request_Key': '1b4f4776-f622-46da-ac0f-88a897c5b8a7', 'Action_Name': 'Preparation', 
'Espace_Name': 'ServiceCenter', 'Application_Name': 'Service Center', 
'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 'Screen&Action': 'Login - Preparation'}, 
{'Tenant_Id': 20, 'INSTANT': Timestamp('2021-11-25 21:43:59.729000'), 'DURATION': 314, 
'SCREEN': 'EventoDetalhe_New', 'Session_Id': 'jXC8xqSk1Eu0q4C+VQNZww==', 'User_Id': 151, 
'Espace_Id': 80, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U1S', 
'Session_Bytes': 14891, 'Viewstate_Bytes': 0, 'Session_Requests': 3, 
'Access_Mode': 'Screen', 'Request_Key': '866f27bb-f86a-4add-b084-f433e1cadc73', 
'Action_Name': 'Preparation', 'Espace_Name': 'TalentWeb', 'Application_Name': 'TalentWeb', 
'Application_Key': '448b995f-e07e-4a21-a442-12204cfffea6', 'Screen&Action': 'EventoDetalhe_New - Preparation'}, 
{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:43:53.594000'), 'DURATION': 1904, 
'SCREEN': 'Cyclic_Job_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 'User_Id': 1263, 
'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 8093, 'Viewstate_Bytes': 4160, 'Session_Requests': 1, 
'Access_Mode': 'Screen', 'Request_Key': '26f2d266-b2fc-4a2f-8551-046ba9604477', 
'Action_Name': 'ExportToExcel', 'Espace_Name': 'ServiceCenter', 'Application_Name': 'Service Center', 
'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 
'Screen&Action': 'Cyclic_Job_Logs - ExportToExcel'}, 
{'Tenant_Id': 1, 'INSTANT': Timestamp('2021-11-25 21:43:53.594000'), 'DURATION': 2060, 
'SCREEN': 'Cyclic_Job_Logs', 'Session_Id': 'N30TTfz+v0G6OuDKYznMeA==', 'User_Id': 1263, 
'Espace_Id': 1, 'MSISDN': nan, 'Screen_Type': 'WEB', 'Executed_By': 'E3Q3J-PR4U18', 
'Session_Bytes': 8093, 'Viewstate_Bytes': 4160, 'Session_Requests': 1, 
'Access_Mode': 'Ajax', 'Request_Key': '66d521dd-905d-4f66-ac99-a982bcccb99b', 
'Action_Name': 'TelemetryClickEvent.SendEvent', 'Espace_Name': 'ServiceCenter', 
'Application_Name': 'Service Center', 'Application_Key': '463836d2-9aea-42ff-9f58-a1e78e163c11', 
'Screen&Action': 'Cyclic_Job_Logs - TelemetryClickEvent.SendEvent'}]

print(df_init[df_init['Screen&Action'].isnull()])

Tenant_Id                 INSTANT  DURATION  ... Application_Name                       Application_Key  Screen&Action
25708         20 2021-11-24 13:45:47.022        93  ...   Statvue Emails  f256c73c-722e-4adb-b2c1-c80cc20c2745            NaN
25763         20 2021-11-24 13:39:20.798         6  ...   Statvue Emails  f256c73c-722e-4adb-b2c1-c80cc20c2745            NaN
25782         20 2021-11-24 13:38:06.664       171  ...   Statvue Emails  f256c73c-722e-4adb-b2c1-c80cc20c2745            NaN
25805         20 2021-11-24 13:36:15.512      2295  ...   Statvue Emails  f256c73c-722e-4adb-b2c1-c80cc20c2745            NaN

我正在尝试阅读Excel file

感谢任何帮助。

  • 使用您提供的示例 10 行。通过将 SCREEN 设置为 nan 两行
  • 来复制错误
  • 这是一个数据质量问题(我使用的是 python 3.9 和 plotly 5.5.0)
  • 当您将两列连接在一起时,它们可能会出现此问题
  • 已通过在目标串联列
  • 中将 nan 替换为 fillna() 来解决问题
# df_init = pd.read_excel(absolutePathOfFile, engine="openpyxl")

# simulate a couple of missing values
df_init.loc[df_init.sample(2).index,"SCREEN"] = np.nan
df_init.loc[df_init.sample(2).index,"Action_Name"] = ""

df_init["Screen&Action"] = df_init["SCREEN"].str.cat(df_init["Action_Name"], sep=" - ")
##### PLOTING Scatter chart######
print("Plotting the scatter graphs")
# using plotly to create a scatter graph
try:
    fig_scatter = px.scatter(
        df_init,
        x="INSTANT",
        y="DURATION",
        color="Screen&Action",
        height=500,
        title="Traditional Screen Analysis",
    )
except KeyError:
    print("failed, defaulting")
    df_init["Screen&Action"] = df_init["Screen&Action"].fillna("unknown")
    fig_scatter = px.scatter(
        df_init,
        x="INSTANT",
        y="DURATION",
        color="Screen&Action",
        height=500,
        title="Traditional Screen Analysis",
    )

    
fig_scatter.update_layout(
    showlegend=False, title={"y": 0.9, "x": 0.5, "xanchor": "center", "yanchor": "top"}
)
fig_scatter.show()