GROWING GRADIENT BOOSTING TREES FOR THE NEXT 10 BILLION TRIPS
Felix Cheung and Nan Zhu
XGBoost has been one of the most popular Gradient Boosting Tree systems in both industry and academia. It grows from a tool for data science competition and has been graduating to various production environments in top companies across the industry.
Uber is one of the earliest and now largest XGBoost users in the industry. In Uber, we have XGBoost models improving the driver’s safety in trips, recommending foods and restaurants and estimating the arrival time of rides, etc.
Over the past 4 years, we have been running with multiple XGBoost versions. We have created successful stories on making our business more intelligent via XGBoost. We have also struggled and eventually succeeded to scale XGBoost, while leveraging our open-source-based data platform like Apache Spark, to fit in our tremendously growing business. In this talk, we will share
(1) the challenges we have been going through with multiple versions of XGBoost;
(2) how we improve XGBoost along the way to scale XGBoost to handle datasets over tens of TBs and tens of billions of records
(3) how we collaborate together across teams and projects to maintain/release/resolve issues for an open source project, and furthermore collect business requirements, develop solutions and work with the open source community to contribute and to build a roadmap