Python - 使用集合降低复杂性
Python - reduce complexity using sets
我正在使用 spotify
API
中的 url_analysis
工具(包装器 spotipy
,使用 sp.
)来处理曲目,使用以下代码:
def loudness_drops(track_ids):
names = set()
tids = set()
tracks_with_drop_name = set()
tracks_with_drop_id = set()
for id_ in track_ids:
track_id = sp.track(id_)['uri']
tids.add(track_id)
track_name = sp.track(id_)['name']
names.add(track_name)
#get audio features
features = sp.audio_features(tids)
#and then audio analysis id
urls = {x['analysis_url'] for x in features if x}
print len(urls)
#fetch analysis data
for url in urls:
# print len(urls)
analysis = sp._get(url)
#extract loudness sections from analysis
x = [_['start'] for _ in analysis['segments']]
print len(x)
l = [_['loudness_max'] for _ in analysis['segments']]
print len(l)
#get max and min values
min_l = min(l)
max_l = max(l)
#normalize stream
norm_l = [(_ - min_l)/(max_l - min_l) for _ in l]
#define silence as a value below 0.1
silence = [l[i] for i in range(len(l)) if norm_l[i] < .1]
#more than one silence means one of them happens in the middle of the track
if len(silence) > 1:
tracks_with_drop_name.add(track_name)
tracks_with_drop_id.add(track_id)
return tracks_with_drop_id
代码有效,但是如果我 search
的歌曲数量设置为 limit=20
,则处理所有 audio segments
x
所需的时间] 和 l
使过程过于昂贵,例如:
time.time()
打印 452.175742149
问题:
我怎样才能大大降低这里的复杂性?
我尝试使用 sets
而不是 lists
,但是使用 set
objects
禁止 indexing
.
编辑:10 urls
:
[u'https://api.spotify.com/v1/audio-analysis/5H40slc7OnTLMbXV6E780Z', u'https://api.spotify.com/v1/audio-analysis/72G49GsqYeWV6QVAqp4vl0', u'https://api.spotify.com/v1/audio-analysis/6jvFK4v3oLMPfm6g030H0g', u'https://api.spotify.com/v1/audio-analysis/351LyEn9dxRxgkl28GwQtl', u'https://api.spotify.com/v1/audio-analysis/4cRnjBH13wSYMOfOF17Ddn', u'https://api.spotify.com/v1/audio-analysis/2To3PTOTGJUtRsK3nQemP4', u'https://api.spotify.com/v1/audio-analysis/4xPRxqV9qCVeKLQ31NxhYz', u'https://api.spotify.com/v1/audio-analysis/1G1MtHxrVngvGWSQ7Fj4Oj', u'https://api.spotify.com/v1/audio-analysis/3du9aoP5vPGW1h70mIoicK', u'https://api.spotify.com/v1/audio-analysis/6VIIBKYJAKMBNQreG33lBF']
这是我看到的,对 spotify 了解不多:
for id_ in track_ids:
# this runs N times, where N = len(track_ids)
...
tids.add(track_id) # tids contains all track_ids processed until now
# in the end: len(tids) == N
...
features = sp.audio_features(tids)
# features contains features of all tracks processed until now
# in the end, I guess: len(features) == N * num_features_per_track
urls = {x['analysis_url'] for x in features if x}
# very probably: len(urls) == len(features)
for url in urls:
# for the first track, this processes features of the first track only
# for the seconds track, this processes features of 1st and 2nd
# etc.
# in the end, this loop repeats N * N * num_features_per_track times
你不应该 url 两次。你这样做了,因为你将所有曲目保存在 tids
中,然后对于每个曲目,你在 tids
中处理所有内容,这将其复杂性转化为 O(n2).
一般来说,在尝试降低复杂性时总是在循环中寻找循环。
我相信在这种情况下这应该有效,如果 audio_features
需要一组 ID:
# replace this: features = sp.audio_features(tids)
# with:
features = sp.audio_features({track_id})
我正在使用 spotify
API
中的 url_analysis
工具(包装器 spotipy
,使用 sp.
)来处理曲目,使用以下代码:
def loudness_drops(track_ids):
names = set()
tids = set()
tracks_with_drop_name = set()
tracks_with_drop_id = set()
for id_ in track_ids:
track_id = sp.track(id_)['uri']
tids.add(track_id)
track_name = sp.track(id_)['name']
names.add(track_name)
#get audio features
features = sp.audio_features(tids)
#and then audio analysis id
urls = {x['analysis_url'] for x in features if x}
print len(urls)
#fetch analysis data
for url in urls:
# print len(urls)
analysis = sp._get(url)
#extract loudness sections from analysis
x = [_['start'] for _ in analysis['segments']]
print len(x)
l = [_['loudness_max'] for _ in analysis['segments']]
print len(l)
#get max and min values
min_l = min(l)
max_l = max(l)
#normalize stream
norm_l = [(_ - min_l)/(max_l - min_l) for _ in l]
#define silence as a value below 0.1
silence = [l[i] for i in range(len(l)) if norm_l[i] < .1]
#more than one silence means one of them happens in the middle of the track
if len(silence) > 1:
tracks_with_drop_name.add(track_name)
tracks_with_drop_id.add(track_id)
return tracks_with_drop_id
代码有效,但是如果我 search
的歌曲数量设置为 limit=20
,则处理所有 audio segments
x
所需的时间] 和 l
使过程过于昂贵,例如:
time.time()
打印 452.175742149
问题:
我怎样才能大大降低这里的复杂性?
我尝试使用 sets
而不是 lists
,但是使用 set
objects
禁止 indexing
.
编辑:10 urls
:
[u'https://api.spotify.com/v1/audio-analysis/5H40slc7OnTLMbXV6E780Z', u'https://api.spotify.com/v1/audio-analysis/72G49GsqYeWV6QVAqp4vl0', u'https://api.spotify.com/v1/audio-analysis/6jvFK4v3oLMPfm6g030H0g', u'https://api.spotify.com/v1/audio-analysis/351LyEn9dxRxgkl28GwQtl', u'https://api.spotify.com/v1/audio-analysis/4cRnjBH13wSYMOfOF17Ddn', u'https://api.spotify.com/v1/audio-analysis/2To3PTOTGJUtRsK3nQemP4', u'https://api.spotify.com/v1/audio-analysis/4xPRxqV9qCVeKLQ31NxhYz', u'https://api.spotify.com/v1/audio-analysis/1G1MtHxrVngvGWSQ7Fj4Oj', u'https://api.spotify.com/v1/audio-analysis/3du9aoP5vPGW1h70mIoicK', u'https://api.spotify.com/v1/audio-analysis/6VIIBKYJAKMBNQreG33lBF']
这是我看到的,对 spotify 了解不多:
for id_ in track_ids:
# this runs N times, where N = len(track_ids)
...
tids.add(track_id) # tids contains all track_ids processed until now
# in the end: len(tids) == N
...
features = sp.audio_features(tids)
# features contains features of all tracks processed until now
# in the end, I guess: len(features) == N * num_features_per_track
urls = {x['analysis_url'] for x in features if x}
# very probably: len(urls) == len(features)
for url in urls:
# for the first track, this processes features of the first track only
# for the seconds track, this processes features of 1st and 2nd
# etc.
# in the end, this loop repeats N * N * num_features_per_track times
你不应该 url 两次。你这样做了,因为你将所有曲目保存在 tids
中,然后对于每个曲目,你在 tids
中处理所有内容,这将其复杂性转化为 O(n2).
一般来说,在尝试降低复杂性时总是在循环中寻找循环。
我相信在这种情况下这应该有效,如果 audio_features
需要一组 ID:
# replace this: features = sp.audio_features(tids)
# with:
features = sp.audio_features({track_id})