使用 csv.writer 和一系列字符串时的行和列输出问题
Issues with row and column output when using csv.writer and a series of strings
我有一组 pdf,我正试图从中提取数据进行分析。作为此过程的一部分,我想修改此数据并将其导出到 .csv 文件中。到目前为止,我已经能够使用 pdfplumber 从我的 pdf 中成功提取我的数据。
这部分数据是一组字符串,如下所示:
Deer W Pre 4-3F
Deer W Post 2-1F
DG Post 7F
S Pre 2-12F
Staff Post 3-1F
Staff Pre 2-10F
Staff Post 2-11F
Tut Post 2-1F
我正在尝试使用 csv.writer 将这一系列字符串写入 .csv 文件,所有字符串都在同一列中结束,但每个字符串都在自己的行中。我在这里做了很多挖掘,但未能找到解决我的问题的方法。我使用的代码是:
with open("output.csv", mode="a+") as fp:
wr = csv.writer(fp, dialect="excel")
for item in site_tree_info: #site_tree_info is the variable that stores the strings
wr.writerow([str(item)])
这给了我一个相当奇怪的输出:
大家对如何接收我的预期输出有什么建议吗:
我真的不明白为什么 [str(string)] 在这里对我不起作用,因为它对很多其他有类似问题的人都有效。
这是我用来创建上面列出的字符串的代码:
# Get list of output pdf files in our directory
meta_sample = re.compile(r'^[A-Z].*') #this is to pull text from page 1
for root, dirs, files in os.walk('/Users/myname/tree'):
for filename in files:
p = os.path.join(root, filename)
#print(p)
with pdfplumber.open(p) as pdf:
#pull text from the first page of pdfs which includes information about the samples and the conditions they were analyzed under
sample_info = pdf.pages[0]
sample_info_text = sample_info.extract_text()
sample_info_text_split = sample_info_text.split('\n')
for lines in sample_info_text_split:
if meta_sample.match(lines):
column_name, *column_info = lines.split(':')
column_info = ' '.join(column_info)
#print(column_info) #we have accurately captured both left and right sides of the table from page 1
#This prints Sample ID and sample site/tree info, which is the 2nd item [2] in the sample_info_text_split string
#We then strip the string of the ":" and split the string into two at that point. I then grab the 2nd item in this split string [1] which prints the site and tree info
site_tree_info = sample_info_text_split[2].strip().split(":", 1)[1]
print(site_tree_info) #this prints as above
简单的解释是你的 site_tree_info
变量是一个 str
所以当你循环它时,它会为每个字符创建新行所以我建议你而不是字符串使用 list
for site_tree_info
像这样(我假设数据是这样的)
site_tree_info = ['Deer W Pre 4-3F','Deer W Post 2-1F']
我有一组 pdf,我正试图从中提取数据进行分析。作为此过程的一部分,我想修改此数据并将其导出到 .csv 文件中。到目前为止,我已经能够使用 pdfplumber 从我的 pdf 中成功提取我的数据。
这部分数据是一组字符串,如下所示:
Deer W Pre 4-3F
Deer W Post 2-1F
DG Post 7F
S Pre 2-12F
Staff Post 3-1F
Staff Pre 2-10F
Staff Post 2-11F
Tut Post 2-1F
我正在尝试使用 csv.writer 将这一系列字符串写入 .csv 文件,所有字符串都在同一列中结束,但每个字符串都在自己的行中。我在这里做了很多挖掘,但未能找到解决我的问题的方法。我使用的代码是:
with open("output.csv", mode="a+") as fp:
wr = csv.writer(fp, dialect="excel")
for item in site_tree_info: #site_tree_info is the variable that stores the strings
wr.writerow([str(item)])
这给了我一个相当奇怪的输出:
大家对如何接收我的预期输出有什么建议吗:
我真的不明白为什么 [str(string)] 在这里对我不起作用,因为它对很多其他有类似问题的人都有效。
这是我用来创建上面列出的字符串的代码:
# Get list of output pdf files in our directory
meta_sample = re.compile(r'^[A-Z].*') #this is to pull text from page 1
for root, dirs, files in os.walk('/Users/myname/tree'):
for filename in files:
p = os.path.join(root, filename)
#print(p)
with pdfplumber.open(p) as pdf:
#pull text from the first page of pdfs which includes information about the samples and the conditions they were analyzed under
sample_info = pdf.pages[0]
sample_info_text = sample_info.extract_text()
sample_info_text_split = sample_info_text.split('\n')
for lines in sample_info_text_split:
if meta_sample.match(lines):
column_name, *column_info = lines.split(':')
column_info = ' '.join(column_info)
#print(column_info) #we have accurately captured both left and right sides of the table from page 1
#This prints Sample ID and sample site/tree info, which is the 2nd item [2] in the sample_info_text_split string
#We then strip the string of the ":" and split the string into two at that point. I then grab the 2nd item in this split string [1] which prints the site and tree info
site_tree_info = sample_info_text_split[2].strip().split(":", 1)[1]
print(site_tree_info) #this prints as above
简单的解释是你的 site_tree_info
变量是一个 str
所以当你循环它时,它会为每个字符创建新行所以我建议你而不是字符串使用 list
for site_tree_info
像这样(我假设数据是这样的)
site_tree_info = ['Deer W Pre 4-3F','Deer W Post 2-1F']