修改CSV文件[不能使用pandas或numpys]
Modify CSV file [Can't use pandas or numpys]
我需要向通过网络导入数据创建的 CSV 文件添加一列。新列必须是两行的串联,例如 06_2018.
New_Format_Data = ''
Output_File = open('Desktop/HW3/' + state_names[counter] + '.txt','w')
for counter in range(0 , len(urls)):#Will go tru all the states.
print (urls[counter])
html = urllib.request.urlopen(urls[counter]).read().decode('utf-8')#opening url
rows = html.splitlines(1)#Split the data in rows. The number 1 is very important
if counter ==0:
New_Format_Data = "Test" + rows[0] #Header
for row in range(1, len(rows)): #First row...
New_Format_Data += 'Test' + '\t' + rows[row]#Adding that state column.
Output_File.write(New_Format_Data)#Once finished with the for loops then the it will download and close.
Output_File.close()
我不知道你在行中有什么 - 以及你想连接哪些列,所以我以第 1 列和第 2 列为例。
您必须将行(字符串)拆分为值列表,此列表中的下一个 replace/add 值,接下来将所有值连接回单个字符串,然后您可以写入新文件。
它需要从字符串末尾删除 \n
,因为它需要在同一行中添加新值 - 所以 splitlines()
中的 1
将无用。
类似这样。
我直接从列表中获取字符串而不是使用索引 range(len(..))
for row in rows[1:]: # get directly string instead of index
# convert to list
row = row.split(',')
# create new value using column 1 and 2
new_value = row[1] + '_' + row[2]
# append to list
row.append(new_value)
# convert back to string
row = ','.join(row)
# add new row and `\n` at the end
New_Format_Data += 'Test' + '\t' + row + '\n'
完整代码如下所示
# PEP8: at least two spaces before `#` and one space after `#
new_format_data = '' # PEP8: `lower_case_names` for variables
output_file = open('Desktop/HW3/' + state_names[counter] + '.txt','w')
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
rows = html.splitlines() # split the data in rows. DON'T NEED `1` because I don't need `\n'
if counter == 0:
new_format_data = "Test" + rows[0] + ',new_columns' + '\n' # header with new column
for row in rows[1:]: # get directly string instead of index
# convert to list
row = row.split(',')
# create new value using column 1 and 2
new_value = row[1] + '_' + row[2]
# append to list
row.append(new_value)
# convert back to string
row = ','.join(row)
new_format_data += 'Test' + '\t' + row + '\n' # adding that state column.
# --- after loop ---
output_file.write(new_format_data) # once finished with the for loops then the it will download and close.
output_file.close()
但是如果某些列的值是 ,
,这可能会有问题,因为 split
会将其视为分隔符。所以最好使用标准模块 csv
来解决所有问题。
类似
import csv
output_file = open('Desktop/HW3/' + state_names[counter] + '.txt','w')
# create csv writer
output_csv = csv.writer(output_file)
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
# read all rows from csv
rows = list(csv.reader(html.splitlines()))
if counter == 0:
headers = rows[0]
headers[0] = "Test" + headers[0]
headers.append('new_colum')
# write headers
output_csv.writerow(headers)
for row in rows[1:]: # get directly string instead of index
# create new value using column 1 and 2
new_value = row[1] + '_' + row[2]
# append to row
row.append(new_value)
# write row
output_csv.writerow(row)
# --- after loop ---
output_file.close()
它最终是这样工作的:
new_format_data = '' # PEP8: `lower_case_names` for variables
output_file = open('Desktop/HW3_2/' + state_names[counter] + '.txt','w')
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
rows = html.splitlines() # split the data in rows. DON'T NEED `1` because I don't need `\n'
if counter == 0:
# new_format_data = "Month_Year" + '\t' + rows[0] + '\n' # header with new column
new_format_data = rows[0] + "Month_Year"+'\n' # header with new column
for row in rows[1:]: # get directly string instead of index
# convert to list
row = row.split('\t')
# create new value using column 1 and 2
new_value = row[2] + '_' + row[1]
# append to list
row.append(new_value)
# convert back to string
row = '\t'.join(row)
new_format_data += row + '\n' # adding that state column.
output_file.write(new_format_data) # once finished with the for loops then the it will download and close.
output_file.close()
我要修改,居然是txt
格式,不是CSV
格式。
现在,我正在尝试删除一列并过滤信息。因此,其中一列是“年”。原始数据从 1976
到 2022
开始。我只需要 2015
到 2020
的信息。
尝试了一些东西,但我破坏了其余的代码:(
我需要向通过网络导入数据创建的 CSV 文件添加一列。新列必须是两行的串联,例如 06_2018.
New_Format_Data = ''
Output_File = open('Desktop/HW3/' + state_names[counter] + '.txt','w')
for counter in range(0 , len(urls)):#Will go tru all the states.
print (urls[counter])
html = urllib.request.urlopen(urls[counter]).read().decode('utf-8')#opening url
rows = html.splitlines(1)#Split the data in rows. The number 1 is very important
if counter ==0:
New_Format_Data = "Test" + rows[0] #Header
for row in range(1, len(rows)): #First row...
New_Format_Data += 'Test' + '\t' + rows[row]#Adding that state column.
Output_File.write(New_Format_Data)#Once finished with the for loops then the it will download and close.
Output_File.close()
我不知道你在行中有什么 - 以及你想连接哪些列,所以我以第 1 列和第 2 列为例。
您必须将行(字符串)拆分为值列表,此列表中的下一个 replace/add 值,接下来将所有值连接回单个字符串,然后您可以写入新文件。
它需要从字符串末尾删除 \n
,因为它需要在同一行中添加新值 - 所以 splitlines()
中的 1
将无用。
类似这样。
我直接从列表中获取字符串而不是使用索引 range(len(..))
for row in rows[1:]: # get directly string instead of index
# convert to list
row = row.split(',')
# create new value using column 1 and 2
new_value = row[1] + '_' + row[2]
# append to list
row.append(new_value)
# convert back to string
row = ','.join(row)
# add new row and `\n` at the end
New_Format_Data += 'Test' + '\t' + row + '\n'
完整代码如下所示
# PEP8: at least two spaces before `#` and one space after `#
new_format_data = '' # PEP8: `lower_case_names` for variables
output_file = open('Desktop/HW3/' + state_names[counter] + '.txt','w')
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
rows = html.splitlines() # split the data in rows. DON'T NEED `1` because I don't need `\n'
if counter == 0:
new_format_data = "Test" + rows[0] + ',new_columns' + '\n' # header with new column
for row in rows[1:]: # get directly string instead of index
# convert to list
row = row.split(',')
# create new value using column 1 and 2
new_value = row[1] + '_' + row[2]
# append to list
row.append(new_value)
# convert back to string
row = ','.join(row)
new_format_data += 'Test' + '\t' + row + '\n' # adding that state column.
# --- after loop ---
output_file.write(new_format_data) # once finished with the for loops then the it will download and close.
output_file.close()
但是如果某些列的值是 ,
,这可能会有问题,因为 split
会将其视为分隔符。所以最好使用标准模块 csv
来解决所有问题。
类似
import csv
output_file = open('Desktop/HW3/' + state_names[counter] + '.txt','w')
# create csv writer
output_csv = csv.writer(output_file)
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
# read all rows from csv
rows = list(csv.reader(html.splitlines()))
if counter == 0:
headers = rows[0]
headers[0] = "Test" + headers[0]
headers.append('new_colum')
# write headers
output_csv.writerow(headers)
for row in rows[1:]: # get directly string instead of index
# create new value using column 1 and 2
new_value = row[1] + '_' + row[2]
# append to row
row.append(new_value)
# write row
output_csv.writerow(row)
# --- after loop ---
output_file.close()
它最终是这样工作的:
new_format_data = '' # PEP8: `lower_case_names` for variables
output_file = open('Desktop/HW3_2/' + state_names[counter] + '.txt','w')
for counter, url in enumerate(urls):
print('url:', url)
html = urllib.request.urlopen(url).read().decode('utf-8') # opening url
rows = html.splitlines() # split the data in rows. DON'T NEED `1` because I don't need `\n'
if counter == 0:
# new_format_data = "Month_Year" + '\t' + rows[0] + '\n' # header with new column
new_format_data = rows[0] + "Month_Year"+'\n' # header with new column
for row in rows[1:]: # get directly string instead of index
# convert to list
row = row.split('\t')
# create new value using column 1 and 2
new_value = row[2] + '_' + row[1]
# append to list
row.append(new_value)
# convert back to string
row = '\t'.join(row)
new_format_data += row + '\n' # adding that state column.
output_file.write(new_format_data) # once finished with the for loops then the it will download and close.
output_file.close()
我要修改,居然是txt
格式,不是CSV
格式。
现在,我正在尝试删除一列并过滤信息。因此,其中一列是“年”。原始数据从 1976
到 2022
开始。我只需要 2015
到 2020
的信息。
尝试了一些东西,但我破坏了其余的代码:(