带字典的交叉数据框

Question

我在变量中有以下字典：

sk_channel_types = {"facebooknotification": 2,
                    "facebookmessenger": 9,
                    "onsitenotification": 3,
                    "pushnotification": 6,
                    "pushnotificationmessage": 6,
                    "lightbox": 4,
                    "onsitemessage": 7,
                    "mailmessage": 1}

sk_story_types = {"welcome": 7,
                  "rescue": 13,
                  "frequency": 4,
                  "abandoncart": 6,
                  "pricedrop": 16,
                  "manual": 5,
                  "searchbykeyword": 30,
                  "sazonality": 31,
                  "bestdayforpurchase": 28,
                  "pricechange": 32,
                  "availability": 33,
                  "toptrending": 1,
                  "toptrendingbycluster": 2,
                  "toptrendingwithpricelimit": 3,
                  "frequencyview": 4,
                  "manualnotification": 5,
                  "trending": 9,
                  "toptrendingbykeyword": 9}

这是我当前的 spark 数据框：

ID	StoryType	Type	StoryId
abcdefghijklmnopqrst	AbandonCart	MailMessage	56465465456456456465
lçdkçlskdçlsdkçlskdç	ManualNotification	MailMessage	60983099380938390833
uahuahuahauhauahuaha	ManualNotification	MailMessage	49438093890484984949
sklçskçlskdkcnopeieo	ManualNotification	MailMessage	93084098409840984098
2d5fe941380938098948	ManualNotification	MailMessage	49809380398094894844
9883jkjd3eu0dj0j3930	ManualNotification	MailMessage	636f50c9380938093893

我需要根据变量将 StoryType 和 Type 列替换为它们各自的编号，如下所示：

ID	StoryType	Type	StoryId
abcdefghijklmnopqrst	6	1	56465465456456456465
lçdkçlskdçlsdkçlskdç	5	1	60983099380938390833
uahuahuahauhauahuaha	5	1	49438093890484984949
sklçskçlskdkcnopeieo	5	1	93084098409840984098
2d5fe941380938098948	5	1	49809380398094894844
9883jkjd3eu0dj0j3930	5	1	636f50c9380938093893

我该怎么做？我可以使用低电量的手机壳吗？我是 Pyspark 的新手。

Answer 1

由于字典很小，有效的方法是将它们广播数据集并将它们连接到数据集。

带字典的交叉数据框

Cross dataframe with dictionary

python

apache-spark

pyspark