带字典的交叉数据框
Cross dataframe with dictionary
我在变量中有以下字典:
sk_channel_types = {"facebooknotification": 2,
"facebookmessenger": 9,
"onsitenotification": 3,
"pushnotification": 6,
"pushnotificationmessage": 6,
"lightbox": 4,
"onsitemessage": 7,
"mailmessage": 1}
sk_story_types = {"welcome": 7,
"rescue": 13,
"frequency": 4,
"abandoncart": 6,
"pricedrop": 16,
"manual": 5,
"searchbykeyword": 30,
"sazonality": 31,
"bestdayforpurchase": 28,
"pricechange": 32,
"availability": 33,
"toptrending": 1,
"toptrendingbycluster": 2,
"toptrendingwithpricelimit": 3,
"frequencyview": 4,
"manualnotification": 5,
"trending": 9,
"toptrendingbykeyword": 9}
这是我当前的 spark 数据框:
ID
StoryType
Type
StoryId
abcdefghijklmnopqrst
AbandonCart
MailMessage
56465465456456456465
lçdkçlskdçlsdkçlskdç
ManualNotification
MailMessage
60983099380938390833
uahuahuahauhauahuaha
ManualNotification
MailMessage
49438093890484984949
sklçskçlskdkcnopeieo
ManualNotification
MailMessage
93084098409840984098
2d5fe941380938098948
ManualNotification
MailMessage
49809380398094894844
9883jkjd3eu0dj0j3930
ManualNotification
MailMessage
636f50c9380938093893
我需要根据变量将 StoryType 和 Type 列替换为它们各自的编号,如下所示:
ID
StoryType
Type
StoryId
abcdefghijklmnopqrst
6
1
56465465456456456465
lçdkçlskdçlsdkçlskdç
5
1
60983099380938390833
uahuahuahauhauahuaha
5
1
49438093890484984949
sklçskçlskdkcnopeieo
5
1
93084098409840984098
2d5fe941380938098948
5
1
49809380398094894844
9883jkjd3eu0dj0j3930
5
1
636f50c9380938093893
我该怎么做?我可以使用低电量的手机壳吗?我是 Pyspark 的新手。
由于字典很小,有效的方法是将它们广播数据集并将它们连接到数据集。
我在变量中有以下字典:
sk_channel_types = {"facebooknotification": 2,
"facebookmessenger": 9,
"onsitenotification": 3,
"pushnotification": 6,
"pushnotificationmessage": 6,
"lightbox": 4,
"onsitemessage": 7,
"mailmessage": 1}
sk_story_types = {"welcome": 7,
"rescue": 13,
"frequency": 4,
"abandoncart": 6,
"pricedrop": 16,
"manual": 5,
"searchbykeyword": 30,
"sazonality": 31,
"bestdayforpurchase": 28,
"pricechange": 32,
"availability": 33,
"toptrending": 1,
"toptrendingbycluster": 2,
"toptrendingwithpricelimit": 3,
"frequencyview": 4,
"manualnotification": 5,
"trending": 9,
"toptrendingbykeyword": 9}
这是我当前的 spark 数据框:
ID | StoryType | Type | StoryId |
---|---|---|---|
abcdefghijklmnopqrst | AbandonCart | MailMessage | 56465465456456456465 |
lçdkçlskdçlsdkçlskdç | ManualNotification | MailMessage | 60983099380938390833 |
uahuahuahauhauahuaha | ManualNotification | MailMessage | 49438093890484984949 |
sklçskçlskdkcnopeieo | ManualNotification | MailMessage | 93084098409840984098 |
2d5fe941380938098948 | ManualNotification | MailMessage | 49809380398094894844 |
9883jkjd3eu0dj0j3930 | ManualNotification | MailMessage | 636f50c9380938093893 |
我需要根据变量将 StoryType 和 Type 列替换为它们各自的编号,如下所示:
ID | StoryType | Type | StoryId |
---|---|---|---|
abcdefghijklmnopqrst | 6 | 1 | 56465465456456456465 |
lçdkçlskdçlsdkçlskdç | 5 | 1 | 60983099380938390833 |
uahuahuahauhauahuaha | 5 | 1 | 49438093890484984949 |
sklçskçlskdkcnopeieo | 5 | 1 | 93084098409840984098 |
2d5fe941380938098948 | 5 | 1 | 49809380398094894844 |
9883jkjd3eu0dj0j3930 | 5 | 1 | 636f50c9380938093893 |
我该怎么做?我可以使用低电量的手机壳吗?我是 Pyspark 的新手。
由于字典很小,有效的方法是将它们广播数据集并将它们连接到数据集。