Python - 使用集合降低复杂性

Python - reduce complexity using sets

我正在使用 spotify API 中的 url_analysis 工具(包装器 spotipy,使用 sp.)来处理曲目,使用以下代码:

def loudness_drops(track_ids):

names = set()
tids = set()
tracks_with_drop_name = set()
tracks_with_drop_id = set()

for id_ in track_ids:
    track_id = sp.track(id_)['uri']
    tids.add(track_id)
    track_name = sp.track(id_)['name']
    names.add(track_name)
    #get audio features
    features = sp.audio_features(tids)
    #and then audio analysis id
    urls = {x['analysis_url'] for x in features if x}
    print len(urls)
    #fetch analysis data
    for url in urls:
        # print len(urls)
        analysis = sp._get(url)
        #extract loudness sections from analysis
        x = [_['start'] for _ in analysis['segments']]
        print len(x)
        l = [_['loudness_max'] for _ in analysis['segments']]
        print len(l)
        #get max and min values
        min_l = min(l)
        max_l = max(l)
        #normalize stream
        norm_l = [(_ - min_l)/(max_l - min_l) for _ in l]
        #define silence as a value below 0.1
        silence = [l[i] for i in range(len(l)) if norm_l[i] < .1]
    #more than one silence means one of them happens in the middle of the track
    if len(silence) > 1:
        tracks_with_drop_name.add(track_name)
        tracks_with_drop_id.add(track_id)
return tracks_with_drop_id

代码有效,但是如果我 search 的歌曲数量设置为 limit=20,则处理所有 audio segments x 所需的时间] 和 l 使过程过于昂贵,例如:

time.time() 打印 452.175742149

问题:

我怎样才能大大降低这里的复杂性?

我尝试使用 sets 而不是 lists,但是使用 set objects 禁止 indexing.

编辑:10 urls

[u'https://api.spotify.com/v1/audio-analysis/5H40slc7OnTLMbXV6E780Z', u'https://api.spotify.com/v1/audio-analysis/72G49GsqYeWV6QVAqp4vl0', u'https://api.spotify.com/v1/audio-analysis/6jvFK4v3oLMPfm6g030H0g', u'https://api.spotify.com/v1/audio-analysis/351LyEn9dxRxgkl28GwQtl', u'https://api.spotify.com/v1/audio-analysis/4cRnjBH13wSYMOfOF17Ddn', u'https://api.spotify.com/v1/audio-analysis/2To3PTOTGJUtRsK3nQemP4', u'https://api.spotify.com/v1/audio-analysis/4xPRxqV9qCVeKLQ31NxhYz', u'https://api.spotify.com/v1/audio-analysis/1G1MtHxrVngvGWSQ7Fj4Oj', u'https://api.spotify.com/v1/audio-analysis/3du9aoP5vPGW1h70mIoicK', u'https://api.spotify.com/v1/audio-analysis/6VIIBKYJAKMBNQreG33lBF']

这是我看到的,对 spotify 了解不多:

for id_ in track_ids:
    # this runs N times, where N = len(track_ids)
    ...
    tids.add(track_id)  # tids contains all track_ids processed until now
    # in the end: len(tids) == N
    ...
    features = sp.audio_features(tids)
    # features contains features of all tracks processed until now
    # in the end, I guess: len(features) == N * num_features_per_track

    urls = {x['analysis_url'] for x in features if x}
    # very probably: len(urls) == len(features)

    for url in urls:
        # for the first track, this processes features of the first track only
        # for the seconds track, this processes features of 1st and 2nd
        # etc.
        # in the end, this loop repeats N * N * num_features_per_track times

你不应该 url 两次。你这样做了,因为你将所有曲目保存在 tids 中,然后对于每个曲目,你在 tids 中处理所有内容,这将其复杂性转化为 O(n2).

一般来说,在尝试降低复杂性时总是在循环中寻找循环。

我相信在这种情况下这应该有效,如果 audio_features 需要一组 ID:

# replace this: features = sp.audio_features(tids)
# with:
features = sp.audio_features({track_id})