按降序计算值列表的频率及其相关的较高值百分比
Calculate frequency of a list of values in descending order and its associated percentage of values higher
我正在尝试编写一个 Python 代码,用于按降序计算给定值列表 (y) 每个 y 值的频率以及具有较大 y 值的样本 (yi) 的相关百分比考虑到频率。
非常感谢!
这是我使用 NumPy 编写的 Python 代码,但我在计算百分比和计算频率时遇到了一些错误,我希望它与新的 y 值数组保持一致而不重复(arr)
# Permeability values (mD)
y = [27.10, 23.02, 18.26, 17.46, 16.88, 15.75, 15.21, 12.65, 12.65, 12.65, 12.65, 14.93, 13.88, 13.53, 13.31, 13.27, 12.65, 12.41, 11.97, 11.93, 11.84, 11.82, 27.10, 27.10, 27.10, 11.12, 11.10, 10.65, 10.54, 10.29, 9.98, 9.19, 9.03, 8.56, 8.28, 8.21, 9.98, 9.98, 11.97, 11.97, 11.97, 4.68, 4.37, 3.82, 3.44, 3.38, 3.33, 3.27, 3.22, 2.52, 2.38, 1.91, 1.89, 1.87, 1.81, 1.00, 13.27, 13.27, 9.98, 13.27, 9.98, 13.27, 9.98, 13.27]
# Permeability values in descending order (y, mD)
y_sorted = sorted(y, reverse=True)
# Calculate frequency for the permeability values in descending order
y_new_sorted = np.array(y_sorted)
arr,count = np.unique(y_new_sorted,return_counts=True)
arr_sorted = sorted(arr, reverse=True)
print('Frequency= ', count)
print('Permeability values in descending order without repititions= ', arr_sorted)
# Percentage of samples with larger permeability (x, %)
vec_percent = np.vectorize(percent)
np.unique(vec_percent(y_new_sorted))
print('Percentage of samples with larger permeability= ', vec_percent)
**OUTPUTS**
Frequency= [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 4 1 5 6 1 1 1 1
1 1 1 1 1 1 4]
Permeability values in descending order without repititions= [27.1, 23.02, 18.26, 17.46, 16.88, 15.75, 15.21, 14.93, 13.88, 13.53, 13.31, 13.27, 12.65, 12.41, 11.97, 11.93, 11.84, 11.82, 11.12, 11.1, 10.65, 10.54, 10.29, 9.98, 9.19, 9.03, 8.56, 8.28, 8.21, 4.68, 4.37, 3.82, 3.44, 3.38, 3.33, 3.27, 3.22, 2.52, 2.38, 1.91, 1.89, 1.87, 1.81, 1.0]
Traceback (most recent call last):
File line 22, in <module>
vec_percent = np.vectorize(percent)
NameError: name 'percent' is not defined
Process finished with exit code 1
基本
list.count(item)
函数 returns 可以在 list
中找到 item
的次数。
list.index(item)
函数 returns 列表中第一个 item
的位置,恰好是它之前的元素数(因为 python 从 0 开始索引列表)由于它以递减的方式排序,因此这恰好是较高值的数量。
y = [390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290]
def freq(item, lst):
return lst.count(item)
def higher_perc(item, lst):
return lst.index(item) / len(lst)
print(freq(370, y)) # 2
print(higher_perc(370, y)) # 0.2631578947368421
如果我们想将它应用于多个值,我们可以创建一个函数,returns 一个应用该操作的函数,然后使用 map
:
y = [390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290]
items = sorted(set(y), reverse=True)
def create_freq_function(lst):
def freq(item):
return lst.count(item)
return freq
def create_higher_perc_function(lst):
def higher_perc(item):
return lst.index(item) / len(lst)
return higher_perc
print(items)
# [390, 370, 350, 330, 310, 290]
print(list(map(create_freq_function(y), items))
# [5, 2, 1, 6, 4, 1]
print(list(map(create_higher_perc_function(y), items))
# [0.0, 0.2631578947368421, 0.3684210526315789, 0.42105263157894735, 0.7368421052631579, 0.9473684210526315]
麻木
如果数据集太大,numpy
包会有所帮助。 numpy.unique
既可以获取唯一项目的列表,也可以获取它们出现的次数,而 numpy.cumsum
可以累积单个元素的百分比。
import numpy as np
y = np.array([390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290])
items, freqs = np.unique(y, return_counts=True)
items, freqs = items[::-1], freqs[::-1]
perc_freqs = freqs/len(y)
higher_percs = np.cumsum(perc_freqs) - perc_freqs
print(items)
# [390 370 350 330 310 290]
print(freqs)
# [5 2 1 6 4 1]
print(higher_percs)
# [0. 0.26315789 0.36842105 0.42105263 0.73684211 0.94736842]
您可以使用此函数进行频率计算:
def frequencies(values, display_flag=True):
freq = {}
for val in values:
if str(val) in freq:
freq[str(val)] += 1
else:
freq[str(val)] = 1
# Displaying frequencies
if display_flag:
for i in (sorted (freq.keys())) :
print("Frequency of " + i + " is : " + str(freq[i]))
return freq
您可以使用此函数计算百分比:
def percentages(values):
freq = frequencies(values, False)
total = len(values)
current = 0
for i in (sorted (freq.keys())) :
temp = freq[i]/total
print("Percentage of " + i + " is : " + str(current + temp))
current += temp
请注意 percentages
函数与 frequencies
函数一起使用
有两种方式,使用传统的list
或者使用高效的numpy
:
使用列表
>>> y = [390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290]
#declare a lambda function to calculate percentage and frequency
>>> freq = lambda x: y.count(x)
>>> percent = lambda z: y.index(z)/len(y)
#after this using map() and mapping over only unique values rather than all
>>> print(list(map(freq,set(y))))
[1, 5, 6, 2, 4, 1]
>>> print(list(map(percent,set(y))))
[0.9473684210526315, 0.0, 0.42105263157894735, 0.2631578947368421, 0.7368421052631579, 0.3684210526315789]
>>> set(y)
{290, 390, 330, 370, 310, 350}
#frequency and percent corresponds here to respective values
使用 Numpy
我建议使用它,因为它快速高效,但只有当您有相对较大的数据集要处理时,您才会看到更好的结果。
>>> import numpy as np
>>> y_new = np.array(y)
>>> arr,count = np.unique(y_new,return_counts=True) #very simple approach to get output
>>> count
array([1, 4, 6, 1, 2, 5])
>>> arr
array([290, 310, 330, 350, 370, 390])
#defining vectorized percentage function refering to what defined previously
>>> vec_percent = np.vectorize(percent)
>>> np.unique(vec_percent(y_new))
array([0. , 0.26315789, 0.36842105, 0.42105263, 0.73684211,
0.94736842])
#you get your percentages
现在由您决定使用什么。
我正在尝试编写一个 Python 代码,用于按降序计算给定值列表 (y) 每个 y 值的频率以及具有较大 y 值的样本 (yi) 的相关百分比考虑到频率。
非常感谢! 这是我使用 NumPy 编写的 Python 代码,但我在计算百分比和计算频率时遇到了一些错误,我希望它与新的 y 值数组保持一致而不重复(arr)
# Permeability values (mD)
y = [27.10, 23.02, 18.26, 17.46, 16.88, 15.75, 15.21, 12.65, 12.65, 12.65, 12.65, 14.93, 13.88, 13.53, 13.31, 13.27, 12.65, 12.41, 11.97, 11.93, 11.84, 11.82, 27.10, 27.10, 27.10, 11.12, 11.10, 10.65, 10.54, 10.29, 9.98, 9.19, 9.03, 8.56, 8.28, 8.21, 9.98, 9.98, 11.97, 11.97, 11.97, 4.68, 4.37, 3.82, 3.44, 3.38, 3.33, 3.27, 3.22, 2.52, 2.38, 1.91, 1.89, 1.87, 1.81, 1.00, 13.27, 13.27, 9.98, 13.27, 9.98, 13.27, 9.98, 13.27]
# Permeability values in descending order (y, mD)
y_sorted = sorted(y, reverse=True)
# Calculate frequency for the permeability values in descending order
y_new_sorted = np.array(y_sorted)
arr,count = np.unique(y_new_sorted,return_counts=True)
arr_sorted = sorted(arr, reverse=True)
print('Frequency= ', count)
print('Permeability values in descending order without repititions= ', arr_sorted)
# Percentage of samples with larger permeability (x, %)
vec_percent = np.vectorize(percent)
np.unique(vec_percent(y_new_sorted))
print('Percentage of samples with larger permeability= ', vec_percent)
**OUTPUTS**
Frequency= [1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6 1 1 1 1 1 1 1 1 4 1 5 6 1 1 1 1
1 1 1 1 1 1 4]
Permeability values in descending order without repititions= [27.1, 23.02, 18.26, 17.46, 16.88, 15.75, 15.21, 14.93, 13.88, 13.53, 13.31, 13.27, 12.65, 12.41, 11.97, 11.93, 11.84, 11.82, 11.12, 11.1, 10.65, 10.54, 10.29, 9.98, 9.19, 9.03, 8.56, 8.28, 8.21, 4.68, 4.37, 3.82, 3.44, 3.38, 3.33, 3.27, 3.22, 2.52, 2.38, 1.91, 1.89, 1.87, 1.81, 1.0]
Traceback (most recent call last):
File line 22, in <module>
vec_percent = np.vectorize(percent)
NameError: name 'percent' is not defined
Process finished with exit code 1
基本
list.count(item)
函数 returns 可以在 list
中找到 item
的次数。
list.index(item)
函数 returns 列表中第一个 item
的位置,恰好是它之前的元素数(因为 python 从 0 开始索引列表)由于它以递减的方式排序,因此这恰好是较高值的数量。
y = [390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290]
def freq(item, lst):
return lst.count(item)
def higher_perc(item, lst):
return lst.index(item) / len(lst)
print(freq(370, y)) # 2
print(higher_perc(370, y)) # 0.2631578947368421
如果我们想将它应用于多个值,我们可以创建一个函数,returns 一个应用该操作的函数,然后使用 map
:
y = [390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290]
items = sorted(set(y), reverse=True)
def create_freq_function(lst):
def freq(item):
return lst.count(item)
return freq
def create_higher_perc_function(lst):
def higher_perc(item):
return lst.index(item) / len(lst)
return higher_perc
print(items)
# [390, 370, 350, 330, 310, 290]
print(list(map(create_freq_function(y), items))
# [5, 2, 1, 6, 4, 1]
print(list(map(create_higher_perc_function(y), items))
# [0.0, 0.2631578947368421, 0.3684210526315789, 0.42105263157894735, 0.7368421052631579, 0.9473684210526315]
麻木
如果数据集太大,numpy
包会有所帮助。 numpy.unique
既可以获取唯一项目的列表,也可以获取它们出现的次数,而 numpy.cumsum
可以累积单个元素的百分比。
import numpy as np
y = np.array([390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290])
items, freqs = np.unique(y, return_counts=True)
items, freqs = items[::-1], freqs[::-1]
perc_freqs = freqs/len(y)
higher_percs = np.cumsum(perc_freqs) - perc_freqs
print(items)
# [390 370 350 330 310 290]
print(freqs)
# [5 2 1 6 4 1]
print(higher_percs)
# [0. 0.26315789 0.36842105 0.42105263 0.73684211 0.94736842]
您可以使用此函数进行频率计算:
def frequencies(values, display_flag=True):
freq = {}
for val in values:
if str(val) in freq:
freq[str(val)] += 1
else:
freq[str(val)] = 1
# Displaying frequencies
if display_flag:
for i in (sorted (freq.keys())) :
print("Frequency of " + i + " is : " + str(freq[i]))
return freq
您可以使用此函数计算百分比:
def percentages(values):
freq = frequencies(values, False)
total = len(values)
current = 0
for i in (sorted (freq.keys())) :
temp = freq[i]/total
print("Percentage of " + i + " is : " + str(current + temp))
current += temp
请注意 percentages
函数与 frequencies
函数一起使用
有两种方式,使用传统的list
或者使用高效的numpy
:
使用列表
>>> y = [390, 390, 390, 390, 390, 370, 370, 350, 330, 330, 330, 330, 330, 330, 310, 310, 310, 310, 290]
#declare a lambda function to calculate percentage and frequency
>>> freq = lambda x: y.count(x)
>>> percent = lambda z: y.index(z)/len(y)
#after this using map() and mapping over only unique values rather than all
>>> print(list(map(freq,set(y))))
[1, 5, 6, 2, 4, 1]
>>> print(list(map(percent,set(y))))
[0.9473684210526315, 0.0, 0.42105263157894735, 0.2631578947368421, 0.7368421052631579, 0.3684210526315789]
>>> set(y)
{290, 390, 330, 370, 310, 350}
#frequency and percent corresponds here to respective values
使用 Numpy
我建议使用它,因为它快速高效,但只有当您有相对较大的数据集要处理时,您才会看到更好的结果。
>>> import numpy as np
>>> y_new = np.array(y)
>>> arr,count = np.unique(y_new,return_counts=True) #very simple approach to get output
>>> count
array([1, 4, 6, 1, 2, 5])
>>> arr
array([290, 310, 330, 350, 370, 390])
#defining vectorized percentage function refering to what defined previously
>>> vec_percent = np.vectorize(percent)
>>> np.unique(vec_percent(y_new))
array([0. , 0.26315789, 0.36842105, 0.42105263, 0.73684211,
0.94736842])
#you get your percentages
现在由您决定使用什么。