Just about two years ago, I started to research and read more about the applications of machine learning and deep learning techniques beyond the textbooks. Initially, it was just an enthusiasm, but soon it became a habit! I started by reading blog posts and engineering blogs published by big tech companies to have a better understanding about what is actually happening behind the scenes and how the basic ideas fit into the software engineering design components including web servers, data bases, load balancers, etc.

In this post, I gathered a growing list of such materials, where a great deal of this effort has already been done in other posts that I cited them at the end. I tried to organize these materials and posts in a logical manner that is easier to follow for somebody who already have some knowledge in ML/DL field and is interested to know more about the implementation and deployment of these concepts.

I have also prepared a similar categorization for the state of the art (SOTA) papers presented for the popular applications. Hopefully, I can continue and keep it updated.

Last update: January 2022


Table of Contents


Part-1: Systems and Concepts by Companies


Part-2: Concepts Reviews and Surveys


Part-3: Concepts In Practic

  1. Data Quality
  2. Data Engineering
  3. Data Discovery
  4. Feature Stores
  5. Classification
  6. Regression
  7. Forecasting
  8. Recommendation
  9. Search & Ranking
  10. Embeddings
  11. Natural Language Processing
  12. Sequence Modelling
  13. Computer Vision
  14. Reinforcement Learning
  15. Anomaly Detection
  16. Graph
  17. Optimization
  18. Information Extraction
  19. Weak Supervision
  20. Generation
  21. Audio
  22. Validation and A/B Testing
  23. Model Management
  24. Efficiency
  25. Ethics
  26. Infra
  27. MLOps Platforms
  28. Practices
  29. Team Structure
  30. Fails

Part-1 Machine Learning System Desing Patterns in Tech Companies


Google Machine Learning Courses 🔝

  1. Google’s fast-paced, practical introduction to machine learning
  2. Recommendation systems
    general information regarding candidate generation (content-based and collaborative filtering), retrieval, scoring, and re-ranking
  3. Google Research Publications | Data Mining and Modeling
  4. Google Research Publications | Machine Intelligence
  5. Tensorflow embedding projector
  6. Jeff Dean On Large-Scale Deep Learning At Google

Facebook (Meta) Machine Learning Contents 🔝

  1. Field Guide to Machine Learning video series
  2. Fighting Abuse @Scale
  3. Preventing abuse using unsupervised learning
  4. Community Standards report
  5. Scalable data classification for security and privacy
  6. Unicorn: A System for Searching the Social Graph
  7. Hive – A Petabyte Scale Data Warehouse using Hadoop
  8. Nemo: Data discovery at Facebook
  9. Embedding-based Retrieval in Facebook Search (Paper)
  10. How machine learning powers Facebook’s News Feed ranking algorithm
  11. Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective (Paper)
  12. Neural Code Search: ML-based code search using natural language queries
  13. AI advances to better detect hate speech
  14. Leveraging online social interactions for enhancing integrity at Facebook
  15. Scalable data classification for security and privacy
  16. Powered by AI: Advancing product understanding and building new shopping experiences
  17. Here’s how we’re using AI to help detect misinformation
  18. GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce

Instagram 🔝

  1. Powered by AI: Instagram’s Explore recommender system
  2. Lessons Learned at Instagram Stories and Feed Machine Learning
  3. Instagram Engineering

Twitter Engineering 🔝

  1. Embeddings@Twitter
  2. Using Deep Learning at Scale in Twitter’s Timelines
  3. Improving engagement on digital ads with delayed feedback

Uber 🔝

  1. Forecasting at Uber: An Introduction
  2. Applying Customer Feedback: How NLP & Deep Learning Improve Uber’s Maps
  3. Food Discovery with Uber Eats: Building a Query Understanding Engine
  4. Food Discovery with Uber Eats: Recommending for the Marketplace
  5. Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations

Netflix 🔝

  1. [Video] A Multi-Armed Bandit Framework for Recommendations at Netflix | Netflix

LinkedIn 🔝

  1. An Introduction to AI at LinkedIn
  2. Communities AI: Building Communities Around Interests on LinkedIn
  3. Fairness, Privacy, and Transparency by Design in AI/ML Systems

Airbnb 🔝

  1. Using Machine Learning to Predict Value of Homes On Airbnb
  2. Listing Embeddings in Search Ranking
  3. Learning Market Dynamics for Optimal Pricing
  4. Categorizing Listing Photos at Airbnb
  5. Applying Deep Learning To Airbnb Search
  6. Discovering and Classifying In-app Message Intent at Airbnb
  7. Machine Learning-Powered Search Ranking of Airbnb Experiences

Spotify 🔝

  1. Personalized music recommendations
  2. For Your Ears Only: Personalizing Spotify Home with Machine Learning

Part-2: Machine Learning Desing Concept | Reviews and Surveys


Recommendation 🔝

Deep Learning 🔝

Natural Language Processing 🔝

Computer Vision 🔝

Vision and Language 🔝

Reinforcement Learning 🔝

Graph 🔝

Embeddings 🔝

Meta-learning and Few-shot Learning 🔝

Others 🔝


Part-3: Machine Learning Desing Concepts | By Topics


Data Quality 🔝

  1. Reliable and Scalable Data Ingestion at AirbnbAirbnb 2016
  2. Monitoring Data Quality at Scale with Statistical ModelingUber 2017
  3. Data Management Challenges in Production Machine Learning (Paper) ➖ Google 2017
  4. Automating Large-Scale Data Quality Verification (Paper)Amazon 2018
  5. Meet Hodor — Gojek’s Upstream Data Quality ToolGojek 2019
  6. An Approach to Data Quality for Netflix Personalization SystemsNetflix 2020
  7. Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper) ➖ Facebook 2020

Data Engineering 🔝

  1. Zipline: Airbnb’s Machine Learning Data Management PlatformAirbnb 2018
  2. Sputnik: Airbnb’s Apache Spark Framework for Data EngineeringAirbnb 2020
  3. Unbundling Data Science Workflows with Metaflow and AWS Step FunctionsNetflix 2020
  4. How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing DemandDoorDash 2020
  5. Revolutionizing Money Movements at Scale with Strong Data ConsistencyUber 2020
  6. Zipline - A Declarative Feature Engineering FrameworkAirbnb 2020
  7. Automating Data Protection at Scale, Part 1 (Part 2) ➖ Airbnb 2021
  8. Real-time Data Infrastructure at UberUber 2021
  9. Introducing Fabricator: A Declarative Feature Engineering FrameworkDoordash 2022
  10. Functions & DAGs: introducing Hamilton, a microframework for dataframe generationStitch Fix 2021

Data Discovery 🔝

  1. Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code) ➖ Apache
  2. Collect, Aggregate, and Visualize a Data Ecosystem’s Metadata (Code) ➖ WeWork
  3. Discovery and Consumption of Analytics Data at TwitterTwitter 2016
  4. Democratizing Data at AirbnbAirbnb 2017
  5. Databook: Turning Big Data into Knowledge with Metadata at UberUber 2018
  6. Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code) ➖ Netflix 2018
  7. Amundsen — Lyft’s Data Discovery & Metadata EngineLyft 2019
  8. Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code) ➖ Lyft 2019
  9. DataHub: A Generalized Metadata Search & Discovery Tool (Code) ➖ LinkedIn 2019
  10. Amundsen: One Year LaterLyft 2020
  11. Using Amundsen to Support User Privacy via Metadata Collection at SquareSquare 2020
  12. Turning Metadata Into Insights with DatabookUber 2020
  13. DataHub: Popular Metadata Architectures ExplainedLinkedIn 2020
  14. How We Improved Data Discovery for Data Scientists at SpotifySpotify 2020
  15. How We’re Solving Data Discovery Challenges at ShopifyShopify 2020
  16. Nemo: Data discovery at FacebookFacebook 2020
  17. Exploring Data @ Netflix (Code) ➖ Netflix 2021

Feature Stores 🔝

  1. Distributed Time Travel for Feature GenerationNetflix 2016
  2. Building the Activity Graph, Part 2 (Feature Storage Section)LinkedIn 2017
  3. Fact Store at Scale for Netflix RecommendationsNetflix 2018
  4. Zipline: Airbnb’s Machine Learning Data Management PlatformAirbnb 2018
  5. Introducing Feast: An Open Source Feature Store for Machine Learning (Code) ➖ Gojek 2019
  6. Michelangelo Palette: A Feature Engineering Platform at UberUber 2019
  7. The Architecture That Powers Twitter’s Feature StoreTwitter 2019
  8. Accelerating Machine Learning with the Feature Store ServiceCondé Nast 2019
  9. Feast: Bridging ML Models and DataGojek 2020
  10. Building a Scalable ML Feature Store with Redis, Binary Serialization, and CompressionDoorDash 2020
  11. Rapid Experimentation Through Standardization: Typed AI features for LinkedIn’s FeedLinkedIn 2020
  12. Building a Feature StoreMonzo Bank 2020
  13. Butterfree: A Spark-based Framework for Feature Store Building (Code) ➖ QuintoAndar 2020
  14. Building Riviera: A Declarative Real-Time Feature Engineering FrameworkDoorDash 2021
  15. Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information TheoryUber 2021
  16. ML Feature Serving Infrastructure at LyftLyft 2021

Classification 🔝

  1. Prediction of Advertiser Churn for Google AdWords (Paper) ➖ Google 2010
  2. High-Precision Phrase-Based Document Classification on a Modern Scale (Paper) ➖ LinkedIn 2011
  3. Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper) ➖ Walmart 2014
  4. Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper) ➖ NAVER 2016
  5. Learning to Diagnose with LSTM Recurrent Neural Networks (Paper) ➖ Google 2017
  6. Discovering and Classifying In-app Message Intent at AirbnbAirbnb 2019
  7. Teaching Machines to Triage Firefox BugsMozilla 2019
  8. Categorizing Products at ScaleShopify 2020
  9. How We Built the Good First Issues FeatureGitHub 2020
  10. Testing Firefox More Efficiently with Machine LearningMozilla 2020
  11. Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper) ➖ Microsoft 2020
  12. Scalable Data Classification for Security and Privacy (Paper) ➖ Facebook 2020
  13. Uncovering Online Delivery Menu Best Practices with Machine LearningDoorDash 2020
  14. Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item TaggingDoorDash 2020
  15. Deep Learning: Product Categorization and ShelvingWalmart 2021
  16. Large-scale Item Categorization for e-Commerce (Paper) ➖ DianPing, eBay 2021

Regression 🔝

  1. Using Machine Learning to Predict Value of Homes On AirbnbAirbnb 2017
  2. Using Machine Learning to Predict the Value of Ad RequestsTwitter 2020
  3. Open-Sourcing Riskquant, a Library for Quantifying Risk (Code) ➖ Netflix 2020
  4. Solving for Unobserved Data in a Regression Model Using a Simple Data AdjustmentDoorDash 2020

Forecasting 🔝

  1. Engineering Extreme Event Forecasting at Uber with RNNUber 2017
  2. Forecasting at Uber: An IntroductionUber 2018
  3. Transforming Financial Forecasting with Data Science and Machine Learning at UberUber 2018
  4. Under the Hood of Gojek’s Automated Forecasting ToolGojek 2019
  5. BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video) ➖ Google 2020
  6. Retraining Machine Learning Models in the Wake of COVID-19DoorDash 2020
  7. Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code) ➖ Atlassian 2020
  8. Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code) ➖ Uber 2021
  9. Managing Supply and Demand Balance Through Machine LearningDoorDash 2021
  10. Greykite: A flexible, intuitive, and fast forecasting libraryLinkedIn 2021

Recommendation 🔝

  1. Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper) ➖ Amazon 2003
  2. Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2) ➖ Netflix 2012
  3. How Music Recommendation Works — And Doesn’t WorkSpotify 2012
  4. Learning to Rank Recommendations with the k -Order Statistic Loss (Paper) ➖ Google 2013
  5. Recommending Music on Spotify with Deep LearningSpotify 2014
  6. Learning a Personalized HomepageNetflix 2015
  7. Session-based Recommendations with Recurrent Neural Networks (Paper) ➖ Telefonica 2016
  8. Deep Neural Networks for YouTube RecommendationsYouTube 2016
  9. E-commerce in Your Inbox: Product Recommendations at Scale (Paper) ➖ Yahoo 2016
  10. To Be Continued: Helping you find shows to continue watching on NetflixNetflix 2016
  11. Personalized Recommendations in LinkedIn LearningLinkedIn 2016
  12. Personalized Channel Recommendations in SlackSlack 2016
  13. Recommending Complementary Products in E-Commerce Push Notifications (Paper) ➖ Alibaba 2017
  14. Artwork Personalization at NetflixNetflix 2017
  15. A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper) ➖ Twitter 2017
  16. Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper) ➖ Pinterest 2017
  17. How 20th Century Fox uses ML to predict a movie audience (Paper) ➖ 20th Century Fox 2018
  18. Calibrated Recommendations (Paper) ➖ Netflix 2018
  19. Food Discovery with Uber Eats: Recommending for the MarketplaceUber 2018
  20. Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper) ➖ Spotify 2018
  21. Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper) ➖ Alibaba 2019
  22. SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper) ➖ Alibaba 2019
  23. Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper) ➖ Alibaba 2019
  24. Personalized Recommendations for Experiences Using Deep LearningTripAdvisor 2019
  25. Powered by AI: Instagram’s Explore recommender systemFacebook 2019
  26. Marginal Posterior Sampling for Slate Bandits (Paper) ➖ Netflix 2019
  27. Food Discovery with Uber Eats: Using Graph Learning to Power RecommendationsUber 2019
  28. Music recommendation at SpotifySpotify 2019
  29. Using Machine Learning to Predict what File you Need Next (Part 1)Dropbox 2019
  30. Using Machine Learning to Predict what File you Need Next (Part 2)Dropbox 2019
  31. Learning to be Relevant: Evolution of a Course Recommendation SystemLinkedIn 2019
  32. Temporal-Contextual Recommendation in Real-Time (Paper) ➖ Amazon 2020
  33. P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper) ➖ Amazon 2020
  34. Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper) ➖ Alibaba 2020
  35. TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper) ➖ Alibaba 2020
  36. PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper) ➖ Alibaba 2020
  37. Controllable Multi-Interest Framework for Recommendation (Paper) ➖ Alibaba 2020
  38. MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper) ➖ Alibaba 2020
  39. ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper) ➖ Alibaba 2020
  40. For Your Ears Only: Personalizing Spotify Home with Machine LearningSpotify 2020
  41. Reach for the Top: How Spotify Built Shortcuts in Just Six MonthsSpotify 2020
  42. Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper) ➖ Spotify 2020
  43. The Evolution of Kit: Automating Marketing Using Machine LearningShopify 2020
  44. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1)LinkedIn 2020
  45. A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2)LinkedIn 2020
  46. Building a Heterogeneous Social Network Recommendation SystemLinkedIn 2020
  47. How TikTok recommends videos #ForYouByteDance 2020
  48. Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper) ➖ Google 2020
  49. Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper) ➖ Google 2020
  50. Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper) ➖ Google 2020
  51. Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper) ➖ Tencent 2020
  52. A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper) ➖ Home Depot 2020
  53. Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper) ➖ Ikea 2020
  54. How we use AutoML, Multi-task learning and Multi-tower models for Pinterest AdsPinterest 2020
  55. Multi-task Learning for Related Products Recommendations at PinterestPinterest 2020
  56. Improving the Quality of Recommended Pins with Lightweight RankingPinterest 2020
  57. Personalized Cuisine Filter Based on Customer Preference and Local PopularityDoorDash 2020
  58. How We Built a Matchmaking Algorithm to Cross-Sell ProductsGojek 2020
  59. Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper) ➖ Twitter 2021
  60. Self-supervised Learning for Large-scale Item Recommendations (Paper) ➖ Google 2021
  61. Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper) ➖ ByteDance 2021
  62. Using AI to Help Health Experts Address the COVID-19 PandemicFacebook 2021
  63. Advertiser Recommendation Systems at PinterestPinterest 2021
  64. On YouTube’s Recommendation SystemYouTube 2021

Search & Ranking 🔝

  1. Amazon Search: The Joy of Ranking Products (Paper, Video, Code) ➖ Amazon 2016
  2. How Lazada Ranks Products to Improve Customer Experience and ConversionLazada 2016
  3. Ranking Relevance in Yahoo Search (Paper) ➖ Yahoo 2016
  4. Learning to Rank Personalized Search Results in Professional Networks (Paper) ➖ LinkedIn 2016
  5. Using Deep Learning at Scale in Twitter’s TimelinesTwitter 2017
  6. An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper) ➖ Etsy 2017
  7. Powering Search & Recommendations at DoorDashDoorDash 2017
  8. Applying Deep Learning To Airbnb Search (Paper) ➖ Airbnb 2018
  9. In-session Personalization for Talent Search (Paper) ➖ LinkedIn 2018
  10. Talent Search and Recommendation Systems at LinkedIn (Paper) ➖ LinkedIn 2018
  11. Food Discovery with Uber Eats: Building a Query Understanding EngineUber 2018
  12. Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper) ➖ Alibaba 2018
  13. Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) ➖ Alibaba 2018
  14. Semantic Product Search (Paper) ➖ Amazon 2019
  15. Machine Learning-Powered Search Ranking of Airbnb ExperiencesAirbnb 2019
  16. Entity Personalized Talent Search Models with Tree Interaction Features (Paper) ➖ LinkedIn 2019
  17. The AI Behind LinkedIn Recruiter Search and recommendation systemsLinkedIn 2019
  18. Learning Hiring Preferences: The AI Behind LinkedIn JobsLinkedIn 2019
  19. The Secret Sauce Behind Search PersonalisationGojek 2019
  20. Neural Code Search: ML-based Code Search Using Natural Language QueriesFacebook 2019
  21. Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper) ➖ Alibaba 2019
  22. Cross-domain Attention Network with Wasserstein Regularizers for E-commerce SearchAlibaba 2019
  23. Understanding Searches Better Than Ever Before (Paper) ➖ Google 2019
  24. How We Used Semantic Search to Make Our Search 10x SmarterTokopedia 2019
  25. Query2vec: Search query expansion with query embeddingsGrubHub 2019
  26. MOBIUS: Towards the Next Generation of Query-Ad Matching in Baidu’s Sponsored SearchBaidu 2019
  27. Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper) ➖ Amazon 2020
  28. Managing Diversity in Airbnb Search (Paper) ➖ Airbnb 2020
  29. Improving Deep Learning for Airbnb Search (Paper) ➖ Airbnb 2020
  30. Quality Matches Via Personalized AI for Hirer and Seeker PreferencesLinkedIn 2020
  31. Understanding Dwell Time to Improve LinkedIn Feed RankingLinkedIn 2020
  32. Ads Allocation in Feed via Constrained Optimization (Paper, Video) ➖ LinkedIn 2020
  33. Understanding Dwell Time to Improve LinkedIn Feed RankingLinkedIn 2020
  34. AI at Scale in BingMicrosoft 2020
  35. Query Understanding Engine in Traveloka Universal SearchTraveloka 2020
  36. Bayesian Product Ranking at WayfairWayfair 2020
  37. COLD: Towards the Next Generation of Pre-Ranking System (Paper) ➖ Alibaba 2020
  38. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) ➖ Pinterest 2020
  39. Driving Shopping Upsells from Pinterest SearchPinterest 2020
  40. GDMix: A Deep Ranking Personalization Framework (Code) ➖ LinkedIn 2020
  41. Bringing Personalized Search to EtsyEtsy 2020
  42. Building a Better Search Engine for Semantic ScholarAllen Institute for AI 2020
  43. Query Understanding for Natural Language Enterprise Search (Paper) ➖ Salesforce 2020
  44. Things Not Strings: Understanding Search Intent with Better RecallDoorDash 2020
  45. Query Understanding for Surfacing Under-served Music Content (Paper) ➖ Spotify 2020
  46. Embedding-based Retrieval in Facebook Search (Paper) ➖ Facebook 2020
  47. Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper) ➖ JD 2020
  48. QUEEN: Neural query rewriting in e-commerce (Paper) ➖ Amazon 2021
  49. Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper) ➖ Amazon 2021
  50. Seasonal relevance in e-commerce search (Paper) ➖ Amazon 2021
  51. Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) ➖ Alibaba 2021
  52. How We Built A Context-Specific Bidding System for Etsy AdsEtsy 2021
  53. Pre-trained Language Model based Ranking in Baidu Search (Paper) ➖ Baidu 2021
  54. Stitching together spaces for query-based recommendationsStitch Fix 2021
  55. Deep Natural Language Processing for LinkedIn Search Systems (Paper) ➖ LinkedIn 2021
  56. Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset (Paper, Code) ➖ Seznam 2021

Embeddings 🔝

  1. Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper) ➖ Sears 2017
  2. Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper) ➖ Alibaba 2018
  3. Embeddings@TwitterTwitter 2018
  4. Listing Embeddings in Search Ranking (Paper) ➖ Airbnb 2018
  5. Understanding Latent StyleStitch Fix 2018
  6. Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper) ➖ LinkedIn 2018
  7. Personalized Store Feed with Vector EmbeddingsDoorDash 2018
  8. Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper) ➖ Moshbit 2019
  9. Machine Learning for a Better Developer ExperienceNetflix 2020
  10. Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code) ➖ Google 2020
  11. Embedding-based Retrieval at ScribdScribd 2021

Natural Language Processing 🔝

  1. Abusive Language Detection in Online User Content (Paper) ➖ Yahoo 2016
  2. Smart Reply: Automated Response Suggestion for Email (Paper) ➖ Google 2016
  3. Building Smart Replies for Member MessagesLinkedIn 2017
  4. How Natural Language Processing Helps LinkedIn Members Get Support EasilyLinkedIn 2019
  5. Gmail Smart Compose: Real-Time Assisted Writing (Paper) ➖ Google 2019
  6. Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper) ➖ Amazon 2019
  7. Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients WantStitch Fix 2019
  8. DeText: A deep NLP Framework for Intelligent Text Understanding (Code) ➖ LinkedIn 2020
  9. SmartReply for YouTube CreatorsGoogle 2020
  10. Using Neural Networks to Find Answers in Tables (Paper) ➖ Google 2020
  11. A Scalable Approach to Reducing Gender Bias in Google TranslateGoogle 2020
  12. Assistive AI Makes Replying EasierMicrosoft 2020
  13. AI Advances to Better Detect Hate SpeechFacebook 2020
  14. A State-of-the-Art Open Source Chatbot (Paper) ➖ Facebook 2020
  15. A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUsFacebook 2020
  16. Deep Learning to Translate Between Programming Languages (Paper, Code) ➖ Facebook 2020
  17. Deploying Lifelong Open-Domain Dialogue Learning (Paper) ➖ Facebook 2020
  18. Introducing Dynabench: Rethinking the way we benchmark AIFacebook 2020
  19. How Gojek Uses NLP to Name Pickup Locations at ScaleGojek 2020
  20. The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper) ➖ Baidu 2020
  21. PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code) ➖ Google 2020
  22. Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo) ➖ Salesforce 2020
  23. GeDi: A Powerful New Method for Controlling Language Models (Paper, Code) ➖ Salesforce 2020
  24. Applying Topic Modeling to Improve Call Center OperationsRICOH 2020
  25. WIDeText: A Multimodal Deep Learning FrameworkAirbnb 2020
  26. Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code) ➖ Facebook 2021
  27. How we reduced our text similarity runtime by 99.96%Microsoft 2021
  28. Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models)Facebook 2021

Sequence Modelling 🔝

  1. Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper) ➖ Sutter Health 2015
  2. Deep Learning for Understanding Consumer Histories (Paper) ➖ Zalando 2016
  3. Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper) ➖ Sutter Health 2016
  4. Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper) ➖ Telefonica 2017
  5. Deep Learning for Electronic Health Records (Paper) ➖ Google 2018
  6. Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)Alibaba 2019
  7. Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper) ➖ Alibaba 2020
  8. How Duolingo uses AI in every part of its appDuolingo 2020
  9. Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video) ➖ Facebook 2020
  10. Using deep learning to detect abusive sequences of member activity (Video) ➖ LinkedIn 2021

Computer Vision 🔝

  1. Creating a Modern OCR Pipeline Using Computer Vision and Deep LearningDropbox 2017
  2. Categorizing Listing Photos at AirbnbAirbnb 2018
  3. Amenity Detection and Beyond — New Frontiers of Computer Vision at AirbnbAirbnb 2019
  4. How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling ErrorsDeepomatic
  5. Making machines recognize and transcribe conversations in meetings using audio and videoMicrosoft 2019
  6. Powered by AI: Advancing product understanding and building new shopping experiencesFacebook 2020
  7. A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper) ➖ Google 2020
  8. Machine Learning-based Damage Assessment for Disaster Relief (Paper) ➖ Google 2020
  9. RepNet: Counting Repetitions in Videos (Paper) ➖ Google 2020
  10. Converting Text to Images for Product Discovery (Paper) ➖ Amazon 2020
  11. How Disney Uses PyTorch for Animated Character RecognitionDisney 2020
  12. Image Captioning as an Assistive Technology (Video) ➖ IBM 2020
  13. AI for AG: Production machine learning for agricultureBlue River 2020
  14. AI for Full-Self Driving at TeslaTesla 2020
  15. On-device Supermarket Product RecognitionGoogle 2020
  16. Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper) ➖ Google 2020
  17. Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) ➖ Pinterest 2020
  18. Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper) ➖ Google 2020
  19. Vision-based Price Suggestion for Online Second-hand Items (Paper) ➖ Alibaba 2020
  20. New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model) ➖ Facebook 2021
  21. An Efficient Training Approach for Very Large Scale Face Recognition (Paper) ➖ Alibaba 2021
  22. Identifying Document Types at ScribdScribd 2021
  23. Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper) ➖ Walmart 2021

Reinforcement Learning 🔝

  1. Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper) ➖ Alibaba 2018
  2. Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper) ➖ Alibaba 2018
  3. Reinforcement Learning for On-Demand LogisticsDoorDash 2018
  4. Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) ➖ Alibaba 2018
  5. Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper) ➖ Alibaba 2019
  6. Productionizing Deep Reinforcement Learning with Spark and MLflowZynga 2020
  7. Deep Reinforcement Learning in Production Part1 Part 2Zynga 2020
  8. Building AI Trading SystemsDenny Britz 2020

Anomaly Detection 🔝

  1. Detecting Performance Anomalies in External Firmware DeploymentsNetflix 2019
  2. Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code) ➖ LinkedIn 2019
  3. Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video) ➖ Swedbank, Hopsworks 2019
  4. Preventing Abuse Using Unsupervised LearningLinkedIn 2020
  5. The Technology Behind Fighting Harassment on LinkedInLinkedIn 2020
  6. Uncovering Insurance Fraud Conspiracy with Network Learning (Paper) ➖ Ant Financial 2020
  7. How Does Spam Protection Work on Stack Exchange?Stack Exchange 2020
  8. Auto Content Moderation in C2C e-CommerceMercari 2020
  9. Blocking Slack Invite Spam With Machine LearningSlack 2020
  10. Cloudflare Bot Management: Machine Learning and MoreCloudflare 2020
  11. Anomalies in Oil Temperature Variations in a Tunnel Boring MachineSENER 2020
  12. Using Anomaly Detection to Monitor Low-Risk Bank CustomersRabobank 2020
  13. Fighting fraud with Triplet LossOLX Group 2020
  14. Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative) ➖ Facebook 2020
  15. How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4Facebook 2020

Graph 🔝

  1. Building The LinkedIn Knowledge GraphLinkedIn 2016
  2. Scaling Knowledge Access and Retrieval at AirbnbAirbnb 2018
  3. Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)Pinterest 2018
  4. Food Discovery with Uber Eats: Using Graph Learning to Power RecommendationsUber 2019
  5. AliGraph: A Comprehensive Graph Neural Network Platform (Paper) ➖ Alibaba 2019
  6. Contextualizing Airbnb by Building Knowledge GraphAirbnb 2019
  7. Retail Graph — Walmart’s Product Knowledge GraphWalmart 2020
  8. Traffic Prediction with Advanced Graph Neural NetworksDeepMind 2020
  9. SimClusters: Community-Based Representations for Recommendations (Paper, Video) ➖ Twitter 2020
  10. Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper) ➖ Alibaba 2021
  11. Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) ➖ Alibaba 2021
  12. JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper) ➖ JPMorgan Chase 2021

Optimization 🔝

  1. Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3)Lyft 2016
  2. The Data and Science behind GrabShare Carpooling (Part 1)Grab 2017
  3. How Trip Inferences and Machine Learning Optimize Delivery Times on Uber EatsUber 2018
  4. Next-Generation Optimization for Dasher Dispatch at DoorDashDoorDash 2020
  5. Optimization of Passengers Waiting Time in Elevators Using Machine LearningThyssen Krupp AG 2020
  6. Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper) ➖ Amazon 2020
  7. Optimizing DoorDash’s Marketing Spend with Machine LearningDoorDash 2020

Information Extraction 🔝

  1. Unsupervised Extraction of Attributes and Their Values from Product Description (Paper) ➖ Rakuten 2013
  2. Using Machine Learning to Index Text from Billions of ImagesDropbox 2018
  3. Extracting Structured Data from Templatic Documents (Paper) ➖ Google 2020
  4. AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video) ➖ Amazon 2020
  5. One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper) ➖ Alibaba 2020
  6. Information Extraction from Receipts with Graph Convolutional NetworksNanonets 2021

Weak Supervision 🔝

  1. Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper) ➖ Google 2019
  2. Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper) ➖ Intel 2019
  3. Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) ➖ Apple 2019
  4. Bootstrapping Conversational Agents with Weak Supervision (Paper) ➖ IBM 2019

Generation 🔝

  1. Better Language Models and Their Implications (Paper)OpenAI 2019
  2. Image GPT (Paper, Code) ➖ OpenAI 2019
  3. Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post) ➖ OpenAI 2020
  4. Deep Learned Super Resolution for Feature Film Production (Paper) ➖ Pixar 2020
  5. Unit Test Case Generation with TransformersMicrosoft 2021

Audio 🔝

  1. Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)Google 2020
  2. The Machine Learning Behind Hum to SearchGoogle 2020

Validation and A/B Testing 🔝

  1. Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper) ➖ Google 2010
  2. The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper) ➖ Google 2015
  3. Twitter Experimentation: Technical OverviewTwitter 2015
  4. It’s All A/Bout Testing: The Netflix Experimentation PlatformNetflix 2016
  5. Building Pinterest’s A/B Testing PlatformPinterest 2016
  6. Experimenting to Solve CrammingTwitter 2017
  7. Building an Intelligent Experimentation Platform with Uber EngineeringUber 2017
  8. Scaling Airbnb’s Experimentation PlatformAirbnb 2017
  9. Meet Wasabi, an Open Source A/B Testing Platform (Code) ➖ Intuit 2017
  10. Analyzing Experiment Outcomes: Beyond Average Treatment EffectsUber 2018
  11. Under the Hood of Uber’s Experimentation PlatformUber 2018
  12. Constrained Bayesian Optimization with Noisy Experiments (Paper) ➖ Facebook 2018
  13. Reliable and Scalable Feature Toggles and A/B Testing SDK at GrabGrab 2018
  14. Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code) ➖ Better 2019
  15. Detecting Interference: An A/B Test of A/B TestsLinkedIn 2019
  16. Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper) ➖ Uber 2020
  17. Enabling 10x More Experiments with Traveloka Experiment PlatformTraveloka 2020
  18. Large Scale Experimentation at Stitch Fix (Paper) ➖ Stitch Fix 2020
  19. Multi-Armed Bandits and the Stitch Fix Experimentation PlatformStitch Fix 2020
  20. Experimentation with Resource ConstraintsStitch Fix 2020
  21. Computational Causal Inference at Netflix (Paper) ➖ Netflix 2020
  22. Key Challenges with Quasi Experiments at NetflixNetflix 2020
  23. Making the LinkedIn experimentation engine 20x fasterLinkedIn 2020
  24. Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedInLinkedIn 2020
  25. How to Use Quasi-experiments and Counterfactuals to Build Great ProductsShopify 2020
  26. Improving Experimental Power through Control Using Predictions as CovariateDoorDash 2020
  27. Supporting Rapid Product Iteration with an Experimentation Analysis PlatformDoorDash 2020
  28. Improving Online Experiment Capacity by 4X with Parallelization and Increased SensitivityDoorDash 2020
  29. Leveraging Causal Modeling to Get More Value from Flat Experiment ResultsDoorDash 2020
  30. Iterating Real-time Assignment Algorithms Through ExperimentationDoorDash 2020
  31. Spotify’s New Experimentation Platform (Part 1) (Part 2)Spotify 2020
  32. Interpreting A/B Test Results: False Positives and Statistical SignificanceNetflix 2021
  33. Interpreting A/B Test Results: False Negatives and PowerNetflix 2021
  34. Running Experiments with Google Adwords for Campaign OptimizationDoorDash 2021
  35. The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000%DoorDash 2021
  36. Experimentation Platform at Zalando: Part 1 - EvolutionZalando 2021
  37. Designing Experimentation GuardrailsAirbnb 2021
  38. Network Experimentation at Scale(Paper] Facebook 2021
  39. Universal Holdout Groups at Disney StreamingDisney 2021

Model Management 🔝

  1. Operationalizing Machine Learning—Managing Provenance from Raw Data to PredictionsComcast 2018
  2. Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) ➖ Apple 2019
  3. Runway - Model Lifecycle Management at NetflixNetflix 2020
  4. Managing ML Models @ Scale - Intuit’s ML PlatformIntuit 2020
  5. ML Model Monitoring - 9 Tips From the TrenchesNubank 2021

Efficiency 🔝

  1. GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper) ➖ Facebook 2020
  2. How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUsRoblox 2020
  3. Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper) ➖ Uber 2021

Ethics 🔝

  1. Building Inclusive Products Through A/B Testing (Paper) ➖ LinkedIn 2020
  2. LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper) ➖ LinkedIn 2020

Infra 🔝

  1. Reengineering Facebook AI’s Deep Learning Platforms for InteroperabilityFacebook 2020
  2. Elastic Distributed Training with XGBoost on RayUber 2021

MLOps Platforms 🔝

  1. Meet Michelangelo: Uber’s Machine Learning PlatformUber 2017
  2. Operationalizing Machine Learning—Managing Provenance from Raw Data to PredictionsComcast 2018
  3. Big Data Machine Learning Platform at PinterestPinterest 2019
  4. Core Modeling at InstagramInstagram 2019
  5. Open-Sourcing Metaflow - a Human-Centric Framework for Data ScienceNetflix 2019
  6. Managing ML Models @ Scale - Intuit’s ML PlatformIntuit 2020
  7. Real-time Machine Learning Inference Platform at ZomatoZomato 2020
  8. Introducing Flyte: Cloud Native Machine Learning and Data Processing PlatformLyft 2020
  9. Building Flexible Ensemble ML Models with a Computational GraphDoorDash 2021
  10. LyftLearn: ML Model Training Infrastructure built on KubernetesLyft 2021
  11. “You Don’t Need a Bigger Boat”: A Full Data Pipeline Built with Open-Source Tools (Paper) ➖ Coveo 2021
  12. MLOps at GreenSteam: Shipping Machine LearningGreenSteam 2021
  13. Evolving Reddit’s ML Model Deployment and Serving ArchitectureReddit 2021

Practices 🔝

  1. Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper) ➖ Yoshua Bengio 2012
  2. Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper) ➖ Google 2014
  3. Rules of Machine Learning: Best Practices for ML EngineeringGoogle 2018
  4. On Challenges in Machine Learning Model ManagementAmazon 2018
  5. Machine Learning in Production: The Booking.com ApproachBooking 2019
  6. 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper) ➖ Booking 2019
  7. Successes and Challenges in Adopting Machine Learning at Scale at a Global BankRabobank 2019
  8. Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper) ➖ Cambridge 2020
  9. Reengineering Facebook AI’s Deep Learning Platforms for InteroperabilityFacebook 2020
  10. The problem with AI developer tools for enterprisesDatabricks 2020
  11. Continuous Integration and Deployment for Machine Learning Online Serving and ModelsUber 2021
  12. Tuning Model PerformanceUber 2021
  13. Maintaining Machine Learning Model Accuracy Through MonitoringDoorDash 2021
  14. Building Scalable and Performant Marketing ML Systems at WayfairWayfair 2021
  15. Our approach to building transparent and explainable AI systemsLinkedIn 2021
  16. 5 Steps for Building Machine Learning Models for BusinessShopify 2021

Team structure 🔝

  1. Engineers Shouldn’t Write ETL: A Guide to Building a High Functioning Data Science DepartmentStitch Fix 2016
  2. Building The Analytics Team At WishWish 2018
  3. Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science GeneralistStitch Fix 2019
  4. Cultivating Algorithms: How We Grow Data Science at Stitch FixStitch Fix
  5. Analytics at Netflix: Who We Are and What We DoNetflix 2020
  6. Building a Data Team at a Mid-stage Startup: A Short StoryErikbern 2021

Fails 🔝

  1. When It Comes to Gorillas, Google Photos Remains BlindGoogle 2010
  2. 160k+ High School Students Will Graduate Only If a Model Allows Them toInternational Baccalaureate 2020
  3. An Algorithm That ‘Predicts’ Criminality Based on a Face Sparks a FurorHarrisburg University 2020
  4. It’s Hard to Generate Neural Text From GPT-3 About MuslimsOpenAI 2020
  5. A British AI Tool to Predict Violent Crime Is Too Flawed to UseUnited Kingdom 2020
  6. More in awful-ai

References

[1] Machine Learning System Design

[2] Machine Learning Surveys 【forked from eugeneyan/ml-surveys

[3] Applied Machine Learning 【forked from eugeneyan/applied-ml