Python--读取dat文件行,重写到Excel中的列。 csv/numpy/openpyxl
Python--Read dat file rows, rewrite to columns in Excel. csv/numpy/openpyxl
我 运行 在使用 csv/numpy/openpyxl 时遇到了一些问题,问题是
我有一个 .dat 文件,在
a,a,a,a
b,b,b,b
c,c,c,c
我想获取 dat 文件的每一行,将其放入每个 excel 的一列中,意思是
excel 文件:
a b c
a b c
a b c
这是我到目前为止得到的结果:
import csv
import openpyxl
import numpy as np
wb = openpyxl.Workbook()
ws = wb.active
with open('Shari10.dat') as f:
dat_reader = csv.reader(f, delimiter = ",")
for header in csv.reader(f):
break
for dat_line in f:
line = dat_line.split(",")
data = np.vstack(line[1:8])
for row in data:
ws.append(row)
print(row)
#wb.save("coffee.xlsx")
这里是错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-a07e6ac6842f> in <module>
20 print(data)
21 for row in data:
---> 22 ws.append(row)
23 #wb.save("coffee.xlsx")
~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in append(self, iterable)
665
666 else:
--> 667 self._invalid_row(iterable)
668
669 self._current_row = row_idx
~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in _invalid_row(self, iterable)
792 def _invalid_row(self, iterable):
793 raise TypeError('Value must be a list, tuple, range or generator, or a dict. Supplied value is {0}'.format(
--> 794 type(iterable))
795 )
796
TypeError: Value must be a list, tuple, range or generator, or a dict. Supplied value is <class 'str'>
作为参考,我正在尝试这样做:
data = [
['A', 100, 1.0],
['B', 200, 2.0],
['C', 300, 3.0],
['D', 400, 4.0],
]
for row in data:
ws.append(row)
同时,我刚开始学习python,所以请原谅我乱七八糟的代码结构,至于语法,我尽量写得准确而不是缩短代码。
您似乎遇到了 numpy 数组不是列表的问题。您可以使用 numpy 的 tolist()
方法通过更改此
来解决此问题
for row in data:
ws.append(row)
print(row)
至此
for row in data:
ws.append(row.tolist())
print(row.tolist())
只需更改这些行即可使代码 运行 成功,但它不会提供您想要的输出。 运行 输入文件的代码
a,a,a,a
b,b,b,b
c,c,c,c
生成如下所示的电子表格,因为您要将每个行数组转置为列数组,然后将各列堆叠在一起(ws.append
将行添加到工作表底部)
b
b
b
b\n
c
c
c
c\n
如果您想要转置整个 csv(包括 header),一个简单的方法是使用 numpy 的 transpose
方法。此方法将为您交换整个数组,然后您可以遍历每一行以将每一行写入工作表。这将简化您在 csv 文件中的读取方式,如下所示。请记住 transpose
仅适用于方形数组,因此我添加了一些代码来对任何锯齿状数组进行平方。
import openpyxl
import numpy as np
# Create
wb = openpyxl.Workbook()
ws = wb.active
with open('input.dat') as f:
# Read in all the data
data = list(csv.reader(f))
## If your CSV isn't square, you need to square it first
# Get longest row in array
longest = len(max(data, key=len))
# Pad every row to longest row length
for row in data:
row.extend( (longest - len(row))*[''])
## Once data is square, continue as normal
# Transpose the array
data = np.transpose(data)
# Write all rows to worksheet
for row in data:
ws.append(row.tolist())
# Save worksheet
wb.save('test.xlsx')
假设我们有一个文件 example.dat,其中包含以下内容:
a1,a2,a3,a4
b1,b2,b3,b4
c1,c2,c3,c4
最好使用 pandas. First load the data as a dataframe, then take the transpose and save the resulting dataframe in an excel 文件,如下所示:
import pandas as pd
df_in = pd.read_csv("example.dat", header = None) # header = False since the data has no header.
data_out = df_in.transpose()
data_out.to_excel("example.xlsx", index = False, header = False) # index and header False since you don't want row or column indices written to the excel file.
输出:
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
优点:简单干净。 缺点: 此实现需要 openpyxl
安装为:pip install openpyxl
我 运行 在使用 csv/numpy/openpyxl 时遇到了一些问题,问题是 我有一个 .dat 文件,在
a,a,a,a
b,b,b,b
c,c,c,c
我想获取 dat 文件的每一行,将其放入每个 excel 的一列中,意思是
excel 文件:
a b c
a b c
a b c
这是我到目前为止得到的结果:
import csv
import openpyxl
import numpy as np
wb = openpyxl.Workbook()
ws = wb.active
with open('Shari10.dat') as f:
dat_reader = csv.reader(f, delimiter = ",")
for header in csv.reader(f):
break
for dat_line in f:
line = dat_line.split(",")
data = np.vstack(line[1:8])
for row in data:
ws.append(row)
print(row)
#wb.save("coffee.xlsx")
这里是错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-17-a07e6ac6842f> in <module>
20 print(data)
21 for row in data:
---> 22 ws.append(row)
23 #wb.save("coffee.xlsx")
~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in append(self, iterable)
665
666 else:
--> 667 self._invalid_row(iterable)
668
669 self._current_row = row_idx
~\AppData\Local\Continuum\anaconda3\lib\site-packages\openpyxl\worksheet\worksheet.py in _invalid_row(self, iterable)
792 def _invalid_row(self, iterable):
793 raise TypeError('Value must be a list, tuple, range or generator, or a dict. Supplied value is {0}'.format(
--> 794 type(iterable))
795 )
796
TypeError: Value must be a list, tuple, range or generator, or a dict. Supplied value is <class 'str'>
作为参考,我正在尝试这样做:
data = [
['A', 100, 1.0],
['B', 200, 2.0],
['C', 300, 3.0],
['D', 400, 4.0],
]
for row in data:
ws.append(row)
同时,我刚开始学习python,所以请原谅我乱七八糟的代码结构,至于语法,我尽量写得准确而不是缩短代码。
您似乎遇到了 numpy 数组不是列表的问题。您可以使用 numpy 的 tolist()
方法通过更改此
for row in data:
ws.append(row)
print(row)
至此
for row in data:
ws.append(row.tolist())
print(row.tolist())
只需更改这些行即可使代码 运行 成功,但它不会提供您想要的输出。 运行 输入文件的代码
a,a,a,a
b,b,b,b
c,c,c,c
生成如下所示的电子表格,因为您要将每个行数组转置为列数组,然后将各列堆叠在一起(ws.append
将行添加到工作表底部)
b
b
b
b\n
c
c
c
c\n
如果您想要转置整个 csv(包括 header),一个简单的方法是使用 numpy 的 transpose
方法。此方法将为您交换整个数组,然后您可以遍历每一行以将每一行写入工作表。这将简化您在 csv 文件中的读取方式,如下所示。请记住 transpose
仅适用于方形数组,因此我添加了一些代码来对任何锯齿状数组进行平方。
import openpyxl
import numpy as np
# Create
wb = openpyxl.Workbook()
ws = wb.active
with open('input.dat') as f:
# Read in all the data
data = list(csv.reader(f))
## If your CSV isn't square, you need to square it first
# Get longest row in array
longest = len(max(data, key=len))
# Pad every row to longest row length
for row in data:
row.extend( (longest - len(row))*[''])
## Once data is square, continue as normal
# Transpose the array
data = np.transpose(data)
# Write all rows to worksheet
for row in data:
ws.append(row.tolist())
# Save worksheet
wb.save('test.xlsx')
假设我们有一个文件 example.dat,其中包含以下内容:
a1,a2,a3,a4
b1,b2,b3,b4
c1,c2,c3,c4
最好使用 pandas. First load the data as a dataframe, then take the transpose and save the resulting dataframe in an excel 文件,如下所示:
import pandas as pd
df_in = pd.read_csv("example.dat", header = None) # header = False since the data has no header.
data_out = df_in.transpose()
data_out.to_excel("example.xlsx", index = False, header = False) # index and header False since you don't want row or column indices written to the excel file.
输出:
a1 b1 c1
a2 b2 c2
a3 b3 c3
a4 b4 c4
优点:简单干净。 缺点: 此实现需要 openpyxl
安装为:pip install openpyxl