Pandas

Question

我有一个印度牛奶生产的数据集。我正在尝试获取过去 3 年牛奶总产量增加到 Pandas 的 5 个州（如果存在）的列表。

State        Total10-11 Total11-12  Total13-14  Total14-15  Total15-16
Andhra Pradesh    11204      12088       13007         9656      10817
Arunachal Pradesh    28         22          43           46         50
Assam               790        797         814          829        844
Bihar              6517       6643        7197         7775       8289
Chhattisgarh       1030       1118        1208         1232       1278
Goa                  60         60          68           66         54
Gujarat            9322       4089       11112        11690      12262
Haryana            6268       6661        7442         7902       8381
Himachal Pradesh   1102       1120        1151         1173       1283

预期输出：

State
Assam
Bihar
Chhattisgarh
Haryana
Himachal Pradesh

我想找出每年牛奶产量呈增长趋势的州。与往年相比，随后几年的牛奶产量不应下降。预期产出状态的产量按递增顺序排列，而且它们的产量甚至一次都没有下降。我有点被这个问题困住了，我尝试了几种方法，但它们离正确答案还很远。解决办法是什么？提前致谢。

Answer 1

如果您只是在寻找趋势，那么我认为可视化就是答案。

你可以这样做。

import matplotlib.pyplot as plt
import pandas as pd

df = df.set_index('state')
df.T.plot(figsize=(10,15))

或单独查看它们：

df.T.plot(figsize=(15,20), subplots=True,layout=(3,3))

Answer 2

如果您正在寻找差异总是在增加，您可以使用 diff > 0 和 cumsum 即

df = df.set_index("State/UT Name")

temp = (df.T.diff() > 0).cumsum()
# Values will increment if the difference between past and present is positive 
State/UT Name  Andhra Pradesh  Arunachal Pradesh  Assam  Bihar  Chhattisgarh  \
Total10-11                  0                  0      0      0             0   
Total11-12                  1                  0      1      1             1   
Total13-14                  2                  1      2      2             2   
Total14-15                  2                  2      3      3             3   
Total15-16                  3                  3      4      4             4   

State/UT Name  Goa  Gujarat  Haryana  Himachal Pradesh  
Total10-11       0        0        0                 0  
Total11-12       0        0        1                 1  
Total13-14       1        1        2                 2  
Total14-15       1        2        3                 3  
Total15-16       1        3        4                 4  

# The one with max sum is the one that kept increasing over time 
temp.sum().nlargest(10)

State/UT Name
Assam                10
Bihar                10
Chhattisgarh         10
Haryana              10
Himachal Pradesh     10
Andhra Pradesh        8
Arunachal Pradesh     6
Gujarat               6
Goa                   3

如果你想要州名，那么

states = temp.sum().nlargest(5).index.tolist()

['Assam', 'Bihar', 'Chhattisgarh', 'Haryana', 'Himachal_Pradesh']

Pandas - 在一行不同的列中找出增加的趋势

Pandas - Find increasing trend in a row of different columns

python

data-analysis