在 python 中读取在线 .tbl 数据文件

Question

正如标题所说，我正在尝试读取一个.tbl 格式的在线数据文件。这是数据的 link：https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl

我尝试了以下代码

cosmos= pd.read_table('https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/cosmos_morph_cassata_1.1.tbl')

运行这并没有给我任何错误，但是当我写 print (cosmos.column) 时，它没有给我一个个人列的列表而是 python 把所有东西放在一起并给出我的输出看起来像：

Index(['|            ID|            RA|           DEC|  MAG_AUTO_ACS|       R_PETRO|        R_HALF|    CONC_PETRO|     ASYMMETRY|          GINI|           M20|   Axial Ratio|     AUTOCLASS|   CLASSWEIGHT|'], dtype='object').

我的主要目标是单独打印 table 的列，然后打印 cosmos['RA']。任何人都知道如何做到这一点？

Answer 1

您的文件有四 header 行，在 header (|) 和数据（空格）中有不同的分隔符。您可以使用 read_table 的 skiprows 参数读取数据。

import requests
import pandas as pd

filename = 'cosmos_morph_cassata_1.1.tbl'
url = 'https://irsa.ipac.caltech.edu/data/COSMOS/tables/morphology/' + filename
n_header = 4

## Download large file to disc, so we can reuse it...
table_file = requests.get(url)
open(filename, 'wb').write(table_file.content)


## Skip the first 4 header rows and use whitespace as delimiter
cosmos = pd.read_table(filename, skiprows=n_header, header=None, delim_whitespace=True)

## create header from first line of file
with open(filename) as f:
    header_line = f.readline()
    ## trim whitespaces and split by '|'
    header_columns = header_line.replace(' ', '').split('|')[1:-1]

cosmos.columns = header_columns

在 python 中读取在线 .tbl 数据文件

Reading an online .tbl data file in python

python

datatable