使用 sqlalchemy 查询和 pandas 数据帧时如何最小化代码

Question

我正在尝试通过现代化来升级我的查询代码。

我的旧代码（如下）。第一个查询连接两个 table 并选择每个歌曲标题的评级以及标题的艺术家，第二个查询获取每个标题的流派（使用关联 table）：

items = []
query = db.session.query(Rating, Song).filter(Rating.id==Song.id).all()
for x in query:
    dic = {
        "rating": x[0],
        "title": x[1].title,
        "artist": x[1].artist,
        "genre": Genre.query.filter(Genre.songs.any(title=x[1].title)).all(),
        }
    items.append(dic)

我的代码更简洁。我现在使用 pandas 数据框而不是字典。这给了我错误 ArgumentError: SQL expression element or literal value expected, got somethingsomething

query = db.session.query(Rating, Song).filter(Rating.id==Song.id).all()
df = pd.DataFrame(query, columns=["rating", "title"])
for item in df.title:
    df['genre'] = (Genre.query.filter(Genre.songs.any(title=item)).all())

如何让它工作？
是否有更有效的编码方式？

产生了完整的错误

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
~\AppData\Local\Programs\Python\Python37\lib\site-packages\sqlalchemy\engine\base.py in execute(self, statement, *multiparams, **params)
   1194         try:
-> 1195             meth = statement._execute_on_connection
   1196         except AttributeError as err:

AttributeError: 'BaseQuery' object has no attribute '_execute_on_connection'

The above exception was the direct cause of the following exception:

ObjectNotExecutableError                  Traceback (most recent call last)
<ipython-input-4-99ffacdf2d91> in <module>
      1 query = db.session.query(Rating, Song).filter(Rating.id==Song.id)
----> 2 df = pd.read_sql_query(query, db.engine)
      3 df

~\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\sql.py in read_sql_query(sql, con, index_col, coerce_float, params, parse_dates, chunksize)
    381         coerce_float=coerce_float,
    382         parse_dates=parse_dates,
--> 383         chunksize=chunksize,
    384     )
    385 

~\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\sql.py in read_query(self, sql, index_col, coerce_float, parse_dates, params, chunksize)
   1292         args = _convert_params(sql, params)
   1293 
-> 1294         result = self.execute(*args)
   1295         columns = result.keys()
   1296 

~\AppData\Local\Programs\Python\Python37\lib\site-packages\pandas\io\sql.py in execute(self, *args, **kwargs)
   1160     def execute(self, *args, **kwargs):
   1161         """Simple passthrough to SQLAlchemy connectable"""
-> 1162         return self.connectable.execution_options().execute(*args, **kwargs)
   1163 
   1164     def read_table(

<string> in execute(self, statement, *multiparams, **params)

~\AppData\Local\Programs\Python\Python37\lib\site-packages\sqlalchemy\util\deprecations.py in warned(fn, *args, **kwargs)
    388         if not skip_warning:
    389             _warn_with_version(message, version, wtype, stacklevel=3)
--> 390         return fn(*args, **kwargs)
    391 
    392     doc = func.__doc__ is not None and func.__doc__ or ""

~\AppData\Local\Programs\Python\Python37\lib\site-packages\sqlalchemy\engine\base.py in execute(self, statement, *multiparams, **params)
   3036         """
   3037         connection = self.connect(close_with_result=True)
-> 3038         return connection.execute(statement, *multiparams, **params)
   3039 
   3040     @util.deprecated_20(

~\AppData\Local\Programs\Python\Python37\lib\site-packages\sqlalchemy\engine\base.py in execute(self, statement, *multiparams, **params)
   1196         except AttributeError as err:
   1197             util.raise_(
-> 1198                 exc.ObjectNotExecutableError(statement), replace_context=err
   1199             )
   1200         else:

~\AppData\Local\Programs\Python\Python37\lib\site-packages\sqlalchemy\util\compat.py in raise_(***failed resolving arguments***)
    209 
    210         try:
--> 211             raise exception
    212         finally:
    213             # credit to

ObjectNotExecutableError: Not an executable object: <flask_sqlalchemy.BaseQuery object at 0x000001CBC5F14A48>

Answer 1

首先，只创建一个查询 return 您需要的所有数据，其中流派分组是使用 GROUP_CONCAT 函数完成的：

query = (
    db.session
    .query(
        Rating.rating,
        Song.title,
        Song.artist,
        db.func.GROUP_CONCAT(Genre.category, ", ").label("genres")
    )
    .select_from(Song)
    .where(Rating.id == Song.id)
    .join(Genre, Song.genres)
    .group_by(
        Rating.rating,
        Song.title,
        Song.artist,
    )
)

然后使用pandas方法将其放入dataframe中：

df = pd.read_sql_query(query.statement, db.engine)

print(df) 应该产生这样的东西：

   rating    title artist            genres
0     2.0  title 2  art-2               pop
1     3.0  title 3  art-3         rock, pop
2     4.0  title 4  art-3             other
3     5.0  title 5  art-4  rock, pop, other

使用 sqlalchemy 查询和 pandas 数据帧时如何最小化代码

How to minimize code when using sqlalchemy query and pandas dataframes

python

sqlalchemy

pandas