如果行不存在于唯一 list/csv 中，则添加行的功能？

Question

我下载了一个 .csv 文件，其中包含以下列：属、种、地区和分布。（下图）。

每个属 + 种都有不同的区域变化，none 完全相同，并且在分布中每一行都显示 'Present' 因为物种都存在。

我在名为 unique_regions 的数据框中创建了一个独特区域列表（以及一个包含数据集中所有区域的单列的 .csv 文件）。此文件还具有每个独特区域的相应纬度和经度。

我的目标是使用此 unique_regions 变量（或 .csv 文件）系统地遍历每个属 + 种，并将未包含（或换句话说，不存在）的国家添加到Region 列，然后将 'Absent' 添加到 Distribution 列。

例如：这是一个仅存在于世界 20 个地区的物种（在我列表中的 324 个独特地区中）：

我需要有 304 个新行（仅针对这个物种），具有相同的科、属和物种条目。未包含的区域应与 unique_regions 列表或 .csv 文件中相应的纬度和经度一起添加，并且在这些区域旁边应显示 'Absent'.

Answer 1

假设您先将文件变成 python 列表，您可以这样做：


List = [{"Family":"family name here","Genus":"genus name here","Species":"species name here"}] 
#This list contains all of the information from your file. With this
#script, each entry is a dictionary so that you can access a column by going:
#
#item = List[Index_Number]
#data_you_need = item["type of data you need"]
#
#However, you could also just use a list and remember which list index corresponds
#to which kind of data

item = List[0]

for i in range(304):
  List.append({"Family":item["Species"], "Genus":item["Genus"],
                      "Species":item["Species"]}
#This bit here uses the keys that you define to access each type of data,
#grabs the data entered in the previous entries, and copies it into a new
#entry.

Answer 2

想通了。

# load csv with species
df = pd.read_csv('./Erysiphaceae_combined/Thekopsora_areolata.csv')

# load csv with unique regions
unique_regions = pd.read_csv('./unique_regions.csv')

# add all unique_regions to Region column of df, replace NA values with desired values
combined = pd.concat([unique_regions,df],sort=False)
combined['Family'] = combined['Family'].replace(np.nan,'Erysiphaceae')
combined['Genus'] = combined['Genus'].replace(np.nan,'Thekopsora') #HERE
combined['Species'] = combined['Species'].replace(np.nan,'areolata') #HERE
combined['Distribution'] = combined['Distribution'].replace(np.nan,'Absent')
combined = combined.drop_duplicates(subset = 'Region',keep ='last')

# write new csv with desired outputs
combined.to_csv("./Kriging/Thekopsora_areolata.csv")

如果行不存在于唯一 list/csv 中，则添加行的功能？

Function for adding rows if they aren't already present from a unique list/csv?

python

parsing

rows

multiple-columns