Marketing algorithm controversy ends, simplifying code and transfer learning become the biggest winners

Marketing algorithm controversy ends, simplifying code and transfer learning become the biggest winners

Edit | Debra
AI Frontline Guide: On June 5th, Beijing time, the final of the IJCAI 2018 Alimama International Advertising Algorithm Contest officially ended in Hangzhou. After speeches, defenses, and a collegial discussion by 5 judges, the 8 finalist teams finally won the championship by the DOG team. The Blue Whale Incense Burning Team and the Lying Team were ranked second and third. The team and the Qiangdong Team won the Innovation Award. These five teams jointly won the qualification to participate in the IJCAI 2018 main conference in Stockholm in July.

For more dry goods content, please follow the WeChat public account "AI Frontline", (ID: ai-front)

The goal of this competition is to "dig out more technologies and talents, and empower the entire marketing ecosystem". The schedule includes three stages: preliminary, semi-finals and finals. As the competition progresses, the data is gradually opened up, and the difficulty gradually increases.

The data used in the contest questions of this competition comes from real business scenarios. As a big data marketing platform under Alibaba, Alimama owns the core business data of Alibaba Group. These data have been used to adopt deep learning, online learning, and reinforcement. Learning and other artificial intelligence technologies to efficiently and accurately predict user purchase intentions. However, the e-commerce platform is a complex ecosystem. Factors such as user behavior preferences, long-tail distribution of products, and hot event marketing will all bring huge challenges to conversion rate estimation. How to make better use of massive transaction data to efficiently and accurately predict users' purchase intentions is a technical problem that artificial intelligence and big data need to continue to solve in e-commerce scenarios.

This competition takes Ali e-commerce advertising as the research object and provides massive real transaction data on the platform. The contestants use artificial intelligence technology to build a predictive model to estimate the user's purchase intention, that is, the user and the advertisement product related to a given advertisement click. Predict the probability of an advertisement's purchase behavior (pCVR) under the conditions of (ad), search term (query), context content (context), store (shop) and other information, which is formally defined as:

pCVR=P(conversion=1 | query, user, ad, context, shop). Combining the business scenarios and different traffic characteristics of the Taobao platform, two types of challenges are defined: "daily conversion rate estimation" and "conversion rate estimation on special dates".

Through the introduction of the players, the difficulty of the question will be found. The data of the first seven days was provided to predict the eighth day during the preliminary match, and the data of the eighth day in the morning was provided to predict the afternoon, and the corresponding data volume was also provided. Increase, in the preliminary round, the training data set is 480,000, and the test data set is 60,000; in the semi-finals, the training data set is 10 million, and the test data set is 1.73 million.

Questions-solving ideas for contestants in the finals

After passing through layers of screening, 8 teams entered the finals. The members of these teams are from universities, scientific research institutions or technology companies, with both strength and experience.

Champion DOG Team: Simplify code and transfer learning

The competition for the finals seemed extremely fierce. The DOG team composed of Hua Zhixiang single-man from the industry finally won the crown.

Hua Zhixiang first explained the idea of solving problems in the preliminary and semi-finals. The data of the first seven days was relatively stable, while the eighth day showed large fluctuations. Therefore, according to the data of 1 to 7 days, the data of the morning and afternoon of the eighth day were predicted at the same time. This is actually the method of transfer learning. Predict the scenario of the promotion scenario under the scenario. Then combine the sales training model on the morning of the eighth day of the promotion to obtain the result, that is, predict the data in the afternoon of the day. And this whole model only uses Lightgbm to make.

4.types are used in terms of model features. The statistical characteristics include the number of user clicks on the product item, the last search time, the maximum number of pages viewed, the average search hour, the interaction time, etc.; the time difference feature mainly considers the time between two interactions. Including user, product item interaction, product category item_category, product brand item_brand_id, the duration of two interactions, etc. These factors are expressed in the ranking characteristics as the number of interactions between the user and the product.

In terms of characterization features, bag-of-words is used to count the existence of property, the proportion of all users viewed on features, and the average of the proportions of users whose items are viewed on these features, and borrow these features for modeling In order to achieve accurate prediction of user behavior. In terms of the core code, the contestants successfully displayed it in only one page, and the concise code is also the reason for their victory.

The judges commented on the DOG team as the use of transfer learning is eye-catching, the whole method is simple, effective, and clear thinking .

Runner-up Blue Whale Incense Burning Team: Model data is complete and comprehensive

The runner-up in the final was the Blue Whale Burning Incense Team composed of BRYAN, Sang Ju and Li Kunkun from the industry.

The speaker first analyzed the competition questions, with business scenarios, search and conversion estimates being the key points; in terms of data analysis, the overall trend of daily samples and transaction numbers, daily transaction rates, and hourly conversion rates were estimated; data types Divide, and fill in missing data by means of average and mode; in terms of user analysis, find low-frequency appeals by the number of user clicks, and find the long-tail distribution of purchases. The combination of the two can find instant interest and targeted users; and then go deeper. After analysis, find the hidden information in the data, and finally draw the trend of daily hits.

In order to improve the efficiency of the optimization algorithm, reduce the luck component of online results, and avoid the problem that the algorithm relies too much on online data sets, the method of offline testing is adopted, and the optimization of online verification has been significantly improved offline. In terms of model design, the team designed three models: the main model, the global data model, and the time information model to achieve accurate prediction.

In terms of characteristics, Blue Whale Burning Incense divides the characteristic groups into three types of characteristic groups. The first type of primitive characteristics includes basic characteristics; the second type of simple characteristics includes conversion rate characteristics, ranking characteristics, proportion characteristics, trend characteristics, etc.; and the third type is complex Features include query interaction features, user interaction features, competitive features, business features, etc. After offline testing with multiple features, different feature groups are used to improve the prediction accuracy to find important features. In terms of model fusion, a simple weighted fusion method is used to fuse the LightGBM model.

The judges commented on the Blue Whale Burning Incense Team as "The speech is impressive, the entire model data and other aspects are very comprehensive and complete, and very good results have been achieved."

Third place lying team: deep understanding of business

The team that won the third place was a lying team composed of Chen Bocheng from Zhejiang University of Technology, Robin Li from Central South University and Wu Hao from Tianjin University.

The lying team first analyzed the competition questions. They believed that the difficulty of the competition questions lies in how to find the characteristics suitable for expressing promotion or sudden change in the normal traffic data; on the other hand, how to choose the model and how to find it Lightweight framework for industry. After analysis, it is found that the last day is a big promotion day, so the modeling direction can be divided into two types, one is to model the user and various interactions in the conventional way, and the other is to model the change in promotion.

Therefore, the lying team proposed four sets of training programs, which were distinguished by changing Only-7, full data All-day, full data sampling Sample-All, and full statistical feature extraction Day7 feature All-to-7. The contest questions were verified separately.

In terms of feature engineering, the lying team first classifies the basic features, then removes the columns with small changes in value, and then removes the columns with too many missing values. In terms of user characteristics, basic data is used to determine user preference behavior; then the user s recent behavior is introduced through time difference. Then make a portrait of the situation of the crowd attracted by the store and the situation of the crowd attracted by the advertisement.

With the help of these features, the data of the first 7 days can be used to predict the probability value of the eighth day, and the degree of matching between Item_property_list and predict_category_property can be calculated. Considering the actual scene of the contest, when the user searches, the predicted category of the query word is related to the search If the words match, users have more chances to buy.

In terms of model selection, a neural network is selected, so that the ID feature can be put into the cross-layer continuous feature Embedding. After summarizing, it is found that firstly, changing characteristics should be considered for the promotion period, secondly, a reasonable feature extraction framework is the way to win, and the fusion of multiple models can improve the accuracy more.

The judges commented on the lying team as "very complete thinking about the entire system, deep business understanding, and good business analysis".

Innovation Award: Internships are forbidden, why are you looking for jobs? Teams and strong east teams

In the original plan, two teams with special awards will be produced in the finals, but the performance of the two teams and the strong east team is forbidden to intern, so the judges temporarily decided to change the award to an innovation award to encourage the process These two teams have innovative ideas in China.

The team is composed of Zhuang Xiaomin from the Chinese Academy of Sciences, Zhang Weimin from the Institute of Computing Technology of the Chinese Academy of Sciences, and Li Haoyang from the Hong Kong University of Science and Technology. They first divided the data into time intervals, effectively used historical data with different characteristics, and analyzed user behavior with statistical characteristics; in doing so, they discovered the behavior characteristics of two users. The first User data is sparse, and most users only appear in one day. The conversion rate of users with less second data is higher.

Therefore, users with less data are distinguished by structural features, which is convenient for the overall judgment of the model; while for users with more data, the structural features are directly used to specifically express user behavior. Time features include hour hotspots, trend features, windows, etc. Many are strong features; and the more special one is the Embedding feature. The items clicked by the same user are sorted in chronological order. As a doc, such a doc actually represents Is the user's click sequence. The context of each word (item) in the Doc represents the item that the user pays attention to, which is similar to this item; in the same way, the characteristics of Shop and User can be calculated; and the Embedding feature is tested on several models. Downward promotion is 3+ to 6+ per million; in addition, because the more high-quality web pages refer to, the higher the probability that it is high-quality, the PageRank value that users click on is equally important.

In terms of model algorithm, the combined feature model needs to be combined with different single model features for certain screening; for the Kfold-average model, the single model is used for 10fold, 9fold training, the test set is predicted, and the 10fold is used for avg. It can effectively reduce the variance, and the result is improved and more stable. The final selected model is shown in the figure above.

The judges commented on the ban on internship and job hunting team as "the team is more distinctive, fully mining user serialized behavior information, user expression, and improving the effect."

The Qiangdong team is composed of Li Qiang from Jilin University, Dongdong Shen from Shandong University, and Jiang Haoran from Central South University. They first analyzed the contest questions and found that 98% of user clicks in shopping interactions were less than 10 times. Make some characteristic points for this, such as the first click, the total number of clicks, favorite products, etc. But what is really useful for this competition is some features of deep learning, of which there are three main types, encoding for single-type features, encoding for continuous features after binning, and using attention weighting for pads with multiple features.

Multi-features can be input to the Embedding layer through the pad. The team borrowed from the idea of the DIN network and created an attention layer to weight the multi-features. Most of the ctr models in the deep learning field are optimizing the second-order combination of features. The lr and fm layers are used in the first and second order, and the fm layer is optimized to be linear. Multi-order features can also use the cin layer or the mvm layer. Considering the complexity of the cin layer is too high, a simple combination of mvm layers is used. feature.

In the deep layer, the nonlinear relationship between the features can be obtained. The input is the embdding of discrete features, the embdding of continuous features after bucketing, and the weighted vector after multi-feature attention. Encoding the leaves of lgb into the deep layer can obtain the combination information of the features more explicitly.

It is worth noting that when debugging the DL model, matrix operations should be used as much as possible; using one-dimensional dropout for the embedding layer will reduce the risk of overfitting; NN features are relatively random, and the model trained each time is random. Take the average several times; use hashtrick to greatly reduce resource consumption. The judges commented on the Qiangdong team as "using deep learning methods for end-to-end learning, which involves industrial models, which is very eye-catching among all the players."