python3:如何打印groupby.last()?
python3: how to print groupby.last()?
$ cat n2.txt
apn,date
3704-156,11/04/2019
3704-156,11/22/2019
5515-004,10/23/2019
3732-231,10/07/2019
3732-231,11/15/2019
$ python3
Python 3.7.5 (default, Oct 25 2019, 10:52:18)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.read_csv("n2.txt")
>>> df
apn date
0 3704-156 11/04/2019
1 3704-156 11/22/2019
2 5515-004 10/23/2019
3 3732-231 10/07/2019
4 3732-231 11/15/2019
>>> g = df.groupby('apn')
>>> g.last()
date
apn
3704-156 11/22/2019
3732-231 11/15/2019
5515-004 10/23/2019
>>> f = g.last()
>>> for r in f.itertuples(index=True, name='Pandas'):
... print(getattr(r,'apn'), getattr(r,'date'))
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
AttributeError: 'Pandas' object has no attribute 'apn'
>>> for r in f.itertuples(index=True, name='Pandas'):
... print(getattr(r,"apn"), getattr(r,"date"))
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
AttributeError: 'Pandas' object has no attribute 'apn'
将其打印到文件的正确方法是什么?
例如
apn, date
3704-156,11/22/2019
3732-231,11/15/2019
5515-004,10/23/2019
您的代码应该更改:
df = pd.read_csv("n2.txt")
g = df.groupby('apn')
f = g.last()
使用Series.to_csv
因为f
的输出是pandas Series
:
f.to_csv(file)
或使用 DataFrame.to_csv
将 index
转换为 2 列 DataFrame
:
f.reset_index().to_csv(file, index=False)
或使用 DataFrame.drop_duplicates
的解决方案:
df = pd.read_csv("n2.txt")
df = df.drop_duplicates('apn', keep='last')
df.to_csv(file, index=False)
在您的解决方案中,将 Index
用于 select index
of Series
:
for r in f.itertuples(index=True, name='Pandas'):
print(getattr(r,'Index'), getattr(r,'date'))
3704-156 11/22/2019
3732-231 11/15/2019
5515-004 10/23/2019
df = pd.read_csv("n2.txt")
g = df.groupby('apn').last()
print(g.to_csv())
应该如你所愿。
如果您在控制台中键入 g.to_csv()
,它会 returns 一个以 'apn,data,\r\n...'
开头的字符串。而 print
函数在遇到 '\r\n'
时会开始一个新行,最终给出你想要的输出。
$ cat n2.txt
apn,date
3704-156,11/04/2019
3704-156,11/22/2019
5515-004,10/23/2019
3732-231,10/07/2019
3732-231,11/15/2019
$ python3
Python 3.7.5 (default, Oct 25 2019, 10:52:18)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> df = pd.read_csv("n2.txt")
>>> df
apn date
0 3704-156 11/04/2019
1 3704-156 11/22/2019
2 5515-004 10/23/2019
3 3732-231 10/07/2019
4 3732-231 11/15/2019
>>> g = df.groupby('apn')
>>> g.last()
date
apn
3704-156 11/22/2019
3732-231 11/15/2019
5515-004 10/23/2019
>>> f = g.last()
>>> for r in f.itertuples(index=True, name='Pandas'):
... print(getattr(r,'apn'), getattr(r,'date'))
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
AttributeError: 'Pandas' object has no attribute 'apn'
>>> for r in f.itertuples(index=True, name='Pandas'):
... print(getattr(r,"apn"), getattr(r,"date"))
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
AttributeError: 'Pandas' object has no attribute 'apn'
将其打印到文件的正确方法是什么?
例如
apn, date
3704-156,11/22/2019
3732-231,11/15/2019
5515-004,10/23/2019
您的代码应该更改:
df = pd.read_csv("n2.txt")
g = df.groupby('apn')
f = g.last()
使用Series.to_csv
因为f
的输出是pandas Series
:
f.to_csv(file)
或使用 DataFrame.to_csv
将 index
转换为 2 列 DataFrame
:
f.reset_index().to_csv(file, index=False)
或使用 DataFrame.drop_duplicates
的解决方案:
df = pd.read_csv("n2.txt")
df = df.drop_duplicates('apn', keep='last')
df.to_csv(file, index=False)
在您的解决方案中,将 Index
用于 select index
of Series
:
for r in f.itertuples(index=True, name='Pandas'):
print(getattr(r,'Index'), getattr(r,'date'))
3704-156 11/22/2019
3732-231 11/15/2019
5515-004 10/23/2019
df = pd.read_csv("n2.txt")
g = df.groupby('apn').last()
print(g.to_csv())
应该如你所愿。
如果您在控制台中键入 g.to_csv()
,它会 returns 一个以 'apn,data,\r\n...'
开头的字符串。而 print
函数在遇到 '\r\n'
时会开始一个新行,最终给出你想要的输出。