PySpark 中具有非重复值的小部件

Question

我正在尝试根据以下情况从具有非重复值的列表构建小部件。有人可以帮我找到正确的方法吗？

fileInfoList = list(filter(lambda f: f.name.endswith("") , dbutils.fs.ls(srcPath)))
for fileNames in fileInfoList:
  print(fileNames.name)

这会打印：员工员工历程承包商承包商历史

我要的只是没有历史的值。试过了但是 returns 错误：

dbutils.widgets.dropdown("FileName", "Employee", [str(fileNames.name) for fileNames in fileInfoList])

Answer 1

为什么不在输入下拉功能之前简单地过滤列表？

>> fileList = ['Employee', 'Contractor', 'EmployeeHistory', 'ContractorHistory']
>> print(fileList)
   ['Employee', 'Contractor', 'EmployeeHistory', 'ContractorHistory']

>> filteredFileList = [item for item in fileList if 'History' not in item]
>> print(filteredFileList)
   ['Employee', 'Contractor']

PySpark 中具有非重复值的小部件

Widgets with non-duplicate values in PySpark

pyspark

azure-databricks