Machine Learning System Design
Just about two years ago, I started to research and read more about the applications of machine learning and deep learning techniques beyond the textbooks. Initially, it was just an enthusiasm, but soon it became a habit! I started by reading blog posts and engineering blogs published by big tech companies to have a better understanding about what is actually happening behind the scenes and how the basic ideas fit into the software engineering design components including web servers, data bases, load balancers, etc.
In this post, I gathered a growing list of such materials, where a great deal of this effort has already been done in other posts that I cited them at the end. I tried to organize these materials and posts in a logical manner that is easier to follow for somebody who already have some knowledge in ML/DL field and is interested to know more about the implementation and deployment of these concepts.
I have also prepared a similar categorization for the state of the art (SOTA) papers presented for the popular applications. Hopefully, I can continue and keep it updated.
Last update: January 2022
Table of Contents
Part-1: Systems and Concepts by Companies
Part-2: Concepts Reviews and Surveys
- Recommendation
- Deep Learning
- Natural Language Processing
- Computer Vision
- Vision and Language
- Reinforcement Learning
- Graph
- Embeddings
- Meta-learning and Few-shot Learning
- Others
Part-3: Concepts In Practic
- Data Quality
- Data Engineering
- Data Discovery
- Feature Stores
- Classification
- Regression
- Forecasting
- Recommendation
- Search & Ranking
- Embeddings
- Natural Language Processing
- Sequence Modelling
- Computer Vision
- Reinforcement Learning
- Anomaly Detection
- Graph
- Optimization
- Information Extraction
- Weak Supervision
- Generation
- Audio
- Validation and A/B Testing
- Model Management
- Efficiency
- Ethics
- Infra
- MLOps Platforms
- Practices
- Team Structure
- Fails
Part-1 Machine Learning System Desing Patterns in Tech Companies
Google Machine Learning Courses π
- Googleβs fast-paced, practical introduction to machine learning
- Recommendation systems
general information regarding candidate generation (content-based and collaborative filtering), retrieval, scoring, and re-ranking- Google Research Publications | Data Mining and Modeling
- Google Research Publications | Machine Intelligence
- Tensorflow embedding projector
- Jeff Dean On Large-Scale Deep Learning At Google
Facebook (Meta) Machine Learning Contents π
- Field Guide to Machine Learning video series
- Fighting Abuse @Scale
- Preventing abuse using unsupervised learning
- Community Standards report
- Scalable data classification for security and privacy
- Unicorn: A System for Searching the Social Graph
- Hive β A Petabyte Scale Data Warehouse using Hadoop
- Nemo: Data discovery at Facebook
- Embedding-based Retrieval in Facebook Search (Paper)
- How machine learning powers Facebookβs News Feed ranking algorithm
- Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective (Paper)
- Neural Code Search: ML-based code search using natural language queries
- AI advances to better detect hate speech
- Leveraging online social interactions for enhancing integrity at Facebook
- Scalable data classification for security and privacy
- Powered by AI: Advancing product understanding and building new shopping experiences
- Hereβs how weβre using AI to help detect misinformation
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce
Instagram π
Twitter Engineering π
Uber π
- Forecasting at Uber: An Introduction
- Applying Customer Feedback: How NLP & Deep Learning Improve Uberβs Maps
- Food Discovery with Uber Eats: Building a Query Understanding Engine
- Food Discovery with Uber Eats: Recommending for the Marketplace
- Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations
Netflix π
LinkedIn π
Airbnb π
- Using Machine Learning to Predict Value of Homes On Airbnb
- Listing Embeddings in Search Ranking
- Learning Market Dynamics for Optimal Pricing
- Categorizing Listing Photos at Airbnb
- Applying Deep Learning To Airbnb Search
- Discovering and Classifying In-app Message Intent at Airbnb
- Machine Learning-Powered Search Ranking of Airbnb Experiences
Spotify π
Part-2: Machine Learning Desing Concept | Reviews and Surveys
Recommendation π
- Algorithms: Recommender systems survey (2013)
- Algorithms: Deep Learning based Recommender System: A Survey and New Perspectives (2019)
- Algorithms: Are We Really Making Progress? An Analysis of Neural Recommendation Approaches (2019)
- Serendipity: A Survey of Serendipity in Recommender Systems (2016)
- Diversity: Diversity in Recommender Systems β A survey (2017)
- Explanations: A Survey of Explanations in Recommender Systems (2007)
Deep Learning π
- Architecture: A State-of-the-Art Survey on Deep Learning Theory and Architectures (2019)
- Knowledge distillation: Knowledge Distillation: A Survey (2021)
- Model compression: Compression of Deep Learning Models for Text: A Survey (2020)
- Transfer learning: A Survey on Deep Transfer Learning (2018)
- Neural architecture search: A Comprehensive Survey of Neural Architecture Search (2021)
- Neural architecture search: Neural Architecture Search: A Survey (2019)
Natural Language Processing π
- Deep Learning: Recent Trends in Deep Learning Based Natural Language Processing (2018)
- Classification: Deep Learning Based Text Classification: A Comprehensive Review (2021)
- Generation: Survey of the SOTA in Natural Language Generation: Core tasks, applications and evaluation (2018)
- Generation: Neural Language Generation: Formulation, Methods, and Evaluation (2020)
- Transfer learning: Exploring Transfer Learning with T5: the Text-To-Text Transfer Transformer (2020)
- Transformers: Efficient Transformers: A Survey (2020)
- Metrics: Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (2020)
- Metrics: Evaluation of Text Generation: A Survey (2020)
Computer Vision π
- Object detection: Object Detection in 20 Years (2019)
- Adversarial attacks: Threat of Adversarial Attacks on Deep Learning in Computer Vision (2018)
- Autonomous vehicles: Computer Vision for Autonomous Vehicles: Problems, Datasets and SOTA (2021)
- Image Captioning: A Comprehensive Survey of Deep Learning for Image Captioning (2018)
Vision and Language π
- Trends: Trends in Integration of Vision and Language Research: Tasks, Datasets, and Methods (2021)
- Trends: Multimodal Research in Vision and Language: Current and Emerging Trends (2020)
Reinforcement Learning π
- Algorithms: A Brief Survey of Deep Reinforcement Learning (2017)
- Transfer learning: Transfer Learning for Reinforcement Learning Domains (2009)
- Economics: Review of Deep Reinforcement Learning Methods and Applications in Economics (2020)
- Discovery: Deep Reinforcement Learning for Search, Recommendation, and Online Advertising (2018)
Graph π
- Survey: A Comprehensive Survey on Graph Neural Networks (2019)
- Survey: A Practical Guide to Graph Neural Networks (2020)
- Fraud detection: A systematic literature review of graph-based anomaly detection approaches (2020)
- Knowledge graphs: A Comprehensive Introduction to Knowledge Graphs (2021)
Embeddings π
- Text: From Word to Sense Embeddings:A Survey on Vector Representations of Meaning (2018)
- Text: Diachronic Word Embeddings and Semantic Shifts (2018)
- Text: Word Embeddings: A Survey (2019)
- Text: A Reproducible Survey on Word Embeddings and Ontology-based Methods for Word Similarity (2019)
- Graph: A Comprehensive Survey of Graph Embedding: Problems, Techniques and Applications (2017)
Meta-learning and Few-shot Learning π
- NLP: Meta-learning for Few-shot Natural Language Processing: A Survey (2020)
- Domain Agnostic: Learning from Few Samples: A Survey (2020)
- Neural Networks: Meta-Learning in Neural Networks: A Survey (2020)
- Domain Agnostic: A Comprehensive Overview and Survey of Recent Advances in Meta-Learning (2020)
- Domain Agnostic: Baby steps towards few-shot learning with multiple semantics (2020)
- Domain Agnostic: Meta-Learning: A Survey (2018)
- Domain Agnostic: A Perspective View And Survey Of Meta-learning (2002)
Others π
- Transfer learning: A Survey on Transfer Learning (2009)
Part-3: Machine Learning Desing Concepts | By Topics
Data Quality π
- Reliable and Scalable Data Ingestion at Airbnb β
Airbnb2016 - Monitoring Data Quality at Scale with Statistical Modeling β
Uber2017 - Data Management Challenges in Production Machine Learning (Paper) β
Google2017 - Automating Large-Scale Data Quality Verification (Paper)
Amazon2018 - Meet Hodor β Gojekβs Upstream Data Quality Tool β
Gojek2019 - An Approach to Data Quality for Netflix Personalization Systems β
Netflix2020 - Improving Accuracy By Certainty Estimation of Human Decisions, Labels, and Raters (Paper) β
Facebook2020
Data Engineering π
- Zipline: Airbnbβs Machine Learning Data Management Platform β
Airbnb2018 - Sputnik: Airbnbβs Apache Spark Framework for Data Engineering β
Airbnb2020 - Unbundling Data Science Workflows with Metaflow and AWS Step Functions β
Netflix2020 - How DoorDash is Scaling its Data Platform to Delight Customers and Meet Growing Demand β
DoorDash2020 - Revolutionizing Money Movements at Scale with Strong Data Consistency β
Uber2020 - Zipline - A Declarative Feature Engineering Framework β
Airbnb2020 - Automating Data Protection at Scale, Part 1 (Part 2) β
Airbnb2021 - Real-time Data Infrastructure at Uber β
Uber2021 - Introducing Fabricator: A Declarative Feature Engineering Framework β
Doordash2022 - Functions & DAGs: introducing Hamilton, a microframework for dataframe generation β
Stitch Fix2021
Data Discovery π
- Apache Atlas: Data Goverance and Metadata Framework for Hadoop (Code) β
Apache - Collect, Aggregate, and Visualize a Data Ecosystemβs Metadata (Code) β
WeWork - Discovery and Consumption of Analytics Data at Twitter β
Twitter2016 - Democratizing Data at Airbnb β
Airbnb2017 - Databook: Turning Big Data into Knowledge with Metadata at Uber β
Uber2018 - Metacat: Making Big Data Discoverable and Meaningful at Netflix (Code) β
Netflix2018 - Amundsen β Lyftβs Data Discovery & Metadata Engine β
Lyft2019 - Open Sourcing Amundsen: A Data Discovery And Metadata Platform (Code) β
Lyft2019 - DataHub: A Generalized Metadata Search & Discovery Tool (Code) β
LinkedIn2019 - Amundsen: One Year Later β
Lyft2020 - Using Amundsen to Support User Privacy via Metadata Collection at Square β
Square2020 - Turning Metadata Into Insights with Databook β
Uber2020 - DataHub: Popular Metadata Architectures Explained β
LinkedIn2020 - How We Improved Data Discovery for Data Scientists at Spotify β
Spotify2020 - How Weβre Solving Data Discovery Challenges at Shopify β
Shopify2020 - Nemo: Data discovery at Facebook β
Facebook2020 - Exploring Data @ Netflix (Code) β
Netflix2021
Feature Stores π
- Distributed Time Travel for Feature Generation β
Netflix2016 - Building the Activity Graph, Part 2 (Feature Storage Section) β
LinkedIn2017 - Fact Store at Scale for Netflix Recommendations β
Netflix2018 - Zipline: Airbnbβs Machine Learning Data Management Platform β
Airbnb2018 - Introducing Feast: An Open Source Feature Store for Machine Learning (Code) β
Gojek2019 - Michelangelo Palette: A Feature Engineering Platform at Uber β
Uber2019 - The Architecture That Powers Twitterβs Feature Store β
Twitter2019 - Accelerating Machine Learning with the Feature Store Service β
CondΓ© Nast2019 - Feast: Bridging ML Models and Data β
Gojek2020 - Building a Scalable ML Feature Store with Redis, Binary Serialization, and Compression β
DoorDash2020 - Rapid Experimentation Through Standardization: Typed AI features for LinkedInβs Feed β
LinkedIn2020 - Building a Feature Store β
Monzo Bank2020 - Butterfree: A Spark-based Framework for Feature Store Building (Code) β
QuintoAndar2020 - Building Riviera: A Declarative Real-Time Feature Engineering Framework β
DoorDash2021 - Optimal Feature Discovery: Better, Leaner Machine Learning Models Through Information Theory β
Uber2021 - ML Feature Serving Infrastructure at Lyft β
Lyft2021
Classification π
- Prediction of Advertiser Churn for Google AdWords (Paper) β
Google2010 - High-Precision Phrase-Based Document Classification on a Modern Scale (Paper) β
LinkedIn2011 - Chimera: Large-scale Classification using Machine Learning, Rules, and Crowdsourcing (Paper) β
Walmart2014 - Large-scale Item Categorization in e-Commerce Using Multiple Recurrent Neural Networks (Paper) β
NAVER2016 - Learning to Diagnose with LSTM Recurrent Neural Networks (Paper) β
Google2017 - Discovering and Classifying In-app Message Intent at Airbnb β
Airbnb2019 - Teaching Machines to Triage Firefox Bugs β
Mozilla2019 - Categorizing Products at Scale β
Shopify2020 - How We Built the Good First Issues Feature β
GitHub2020 - Testing Firefox More Efficiently with Machine Learning β
Mozilla2020 - Using ML to Subtype Patients Receiving Digital Mental Health Interventions (Paper) β
Microsoft2020 - Scalable Data Classification for Security and Privacy (Paper) β
Facebook2020 - Uncovering Online Delivery Menu Best Practices with Machine Learning β
DoorDash2020 - Using a Human-in-the-Loop to Overcome the Cold Start Problem in Menu Item Tagging β
DoorDash2020 - Deep Learning: Product Categorization and Shelving β
Walmart2021 - Large-scale Item Categorization for e-Commerce (Paper) β
DianPing,eBay2021
Regression π
- Using Machine Learning to Predict Value of Homes On Airbnb β
Airbnb2017 - Using Machine Learning to Predict the Value of Ad Requests β
Twitter2020 - Open-Sourcing Riskquant, a Library for Quantifying Risk (Code) β
Netflix2020 - Solving for Unobserved Data in a Regression Model Using a Simple Data Adjustment β
DoorDash2020
Forecasting π
- Engineering Extreme Event Forecasting at Uber with RNN β
Uber2017 - Forecasting at Uber: An Introduction β
Uber2018 - Transforming Financial Forecasting with Data Science and Machine Learning at Uber β
Uber2018 - Under the Hood of Gojekβs Automated Forecasting Tool β
Gojek2019 - BusTr: Predicting Bus Travel Times from Real-Time Traffic (Paper, Video) β
Google2020 - Retraining Machine Learning Models in the Wake of COVID-19 β
DoorDash2020 - Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow (Paper, Code) β
Atlassian2020 - Introducing Orbit, An Open Source Package for Time Series Inference and Forecasting (Paper, Video, Code) β
Uber2021 - Managing Supply and Demand Balance Through Machine Learning β
DoorDash2021 - Greykite: A flexible, intuitive, and fast forecasting library β
LinkedIn2021
Recommendation π
- Amazon.com Recommendations: Item-to-Item Collaborative Filtering (Paper) β
Amazon2003 - Netflix Recommendations: Beyond the 5 stars (Part 1 (Part 2) β
Netflix2012 - How Music Recommendation Works β And Doesnβt Work β
Spotify2012 - Learning to Rank Recommendations with the k -Order Statistic Loss (Paper) β
Google2013 - Recommending Music on Spotify with Deep Learning β
Spotify2014 - Learning a Personalized Homepage β
Netflix2015 - Session-based Recommendations with Recurrent Neural Networks (Paper) β
Telefonica2016 - Deep Neural Networks for YouTube Recommendations β
YouTube2016 - E-commerce in Your Inbox: Product Recommendations at Scale (Paper) β
Yahoo2016 - To Be Continued: Helping you find shows to continue watching on Netflix β
Netflix2016 - Personalized Recommendations in LinkedIn Learning β
LinkedIn2016 - Personalized Channel Recommendations in Slack β
Slack2016 - Recommending Complementary Products in E-Commerce Push Notifications (Paper) β
Alibaba2017 - Artwork Personalization at Netflix β
Netflix2017 - A Meta-Learning Perspective on Cold-Start Recommendations for Items (Paper) β
Twitter2017 - Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time (Paper) β
Pinterest2017 - How 20th Century Fox uses ML to predict a movie audience (Paper) β
20th Century Fox2018 - Calibrated Recommendations (Paper) β
Netflix2018 - Food Discovery with Uber Eats: Recommending for the Marketplace β
Uber2018 - Explore, Exploit, and Explain: Personalizing Explainable Recommendations with Bandits (Paper) β
Spotify2018 - Behavior Sequence Transformer for E-commerce Recommendation in Alibaba (Paper) β
Alibaba2019 - SDM: Sequential Deep Matching Model for Online Large-scale Recommender System (Paper) β
Alibaba2019 - Multi-Interest Network with Dynamic Routing for Recommendation at Tmall (Paper) β
Alibaba2019 - Personalized Recommendations for Experiences Using Deep Learning β
TripAdvisor2019 - Powered by AI: Instagramβs Explore recommender system β
Facebook2019 - Marginal Posterior Sampling for Slate Bandits (Paper) β
Netflix2019 - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations β
Uber2019 - Music recommendation at Spotify β
Spotify2019 - Using Machine Learning to Predict what File you Need Next (Part 1) β
Dropbox2019 - Using Machine Learning to Predict what File you Need Next (Part 2) β
Dropbox2019 - Learning to be Relevant: Evolution of a Course Recommendation System β
LinkedIn2019 - Temporal-Contextual Recommendation in Real-Time (Paper) β
Amazon2020 - P-Companion: A Framework for Diversified Complementary Product Recommendation (Paper) β
Amazon2020 - Deep Interest with Hierarchical Attention Network for Click-Through Rate Prediction (Paper) β
Alibaba2020 - TPG-DNN: A Method for User Intent Prediction with Multi-task Learning (Paper) β
Alibaba2020 - PURS: Personalized Unexpected Recommender System for Improving User Satisfaction (Paper) β
Alibaba2020 - Controllable Multi-Interest Framework for Recommendation (Paper) β
Alibaba2020 - MiNet: Mixed Interest Network for Cross-Domain Click-Through Rate Prediction (Paper) β
Alibaba2020 - ATBRG: Adaptive Target-Behavior Relational Graph Network for Effective Recommendation (Paper) β
Alibaba2020 - For Your Ears Only: Personalizing Spotify Home with Machine Learning β
Spotify2020 - Reach for the Top: How Spotify Built Shortcuts in Just Six Months β
Spotify2020 - Contextual and Sequential User Embeddings for Large-Scale Music Recommendation (Paper) β
Spotify2020 - The Evolution of Kit: Automating Marketing Using Machine Learning β
Shopify2020 - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 1) β
LinkedIn2020 - A Closer Look at the AI Behind Course Recommendations on LinkedIn Learning (Part 2) β
LinkedIn2020 - Building a Heterogeneous Social Network Recommendation System β
LinkedIn2020 - How TikTok recommends videos #ForYou β
ByteDance2020 - Zero-Shot Heterogeneous Transfer Learning from RecSys to Cold-Start Search Retrieval (Paper) β
Google2020 - Improved Deep & Cross Network for Feature Cross Learning in Web-scale LTR Systems (Paper) β
Google2020 - Mixed Negative Sampling for Learning Two-tower Neural Networks in Recommendations (Paper) β
Google2020 - Future Data Helps Training: Modeling Future Contexts for Session-based Recommendation (Paper) β
Tencent2020 - A Case Study of Session-based Recommendations in the Home-improvement Domain (Paper) β
Home Depot2020 - Balancing Relevance and Discovery to Inspire Customers in the IKEA App (Paper) β
Ikea2020 - How we use AutoML, Multi-task learning and Multi-tower models for Pinterest Ads β
Pinterest2020 - Multi-task Learning for Related Products Recommendations at Pinterest β
Pinterest2020 - Improving the Quality of Recommended Pins with Lightweight Ranking β
Pinterest2020 - Personalized Cuisine Filter Based on Customer Preference and Local Popularity β
DoorDash2020 - How We Built a Matchmaking Algorithm to Cross-Sell Products β
Gojek2020 - Lessons Learned Addressing Dataset Bias in Model-Based Candidate Generation (Paper) β
Twitter2021 - Self-supervised Learning for Large-scale Item Recommendations (Paper) β
Google2021 - Deep Retrieval: End-to-End Learnable Structure Model for Large-Scale Recommendations (Paper) β
ByteDance2021 - Using AI to Help Health Experts Address the COVID-19 Pandemic β
Facebook2021 - Advertiser Recommendation Systems at Pinterest β
Pinterest2021 - On YouTubeβs Recommendation System β
YouTube2021
Search & Ranking π
- Amazon Search: The Joy of Ranking Products (Paper, Video, Code) β
Amazon2016 - How Lazada Ranks Products to Improve Customer Experience and Conversion β
Lazada2016 - Ranking Relevance in Yahoo Search (Paper) β
Yahoo2016 - Learning to Rank Personalized Search Results in Professional Networks (Paper) β
LinkedIn2016 - Using Deep Learning at Scale in Twitterβs Timelines β
Twitter2017 - An Ensemble-based Approach to Click-Through Rate Prediction for Promoted Listings at Etsy (Paper) β
Etsy2017 - Powering Search & Recommendations at DoorDash β
DoorDash2017 - Applying Deep Learning To Airbnb Search (Paper) β
Airbnb2018 - In-session Personalization for Talent Search (Paper) β
LinkedIn2018 - Talent Search and Recommendation Systems at LinkedIn (Paper) β
LinkedIn2018 - Food Discovery with Uber Eats: Building a Query Understanding Engine β
Uber2018 - Globally Optimized Mutual Influence Aware Ranking in E-Commerce Search (Paper) β
Alibaba2018 - Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) β
Alibaba2018 - Semantic Product Search (Paper) β
Amazon2019 - Machine Learning-Powered Search Ranking of Airbnb Experiences β
Airbnb2019 - Entity Personalized Talent Search Models with Tree Interaction Features (Paper) β
LinkedIn2019 - The AI Behind LinkedIn Recruiter Search and recommendation systems β
LinkedIn2019 - Learning Hiring Preferences: The AI Behind LinkedIn Jobs β
LinkedIn2019 - The Secret Sauce Behind Search Personalisation β
Gojek2019 - Neural Code Search: ML-based Code Search Using Natural Language Queries β
Facebook2019 - Aggregating Search Results from Heterogeneous Sources via Reinforcement Learning (Paper) β
Alibaba2019 - Cross-domain Attention Network with Wasserstein Regularizers for E-commerce Search β
Alibaba2019 - Understanding Searches Better Than Ever Before (Paper) β
Google2019 - How We Used Semantic Search to Make Our Search 10x Smarter β
Tokopedia2019 - Query2vec: Search query expansion with query embeddings β
GrubHub2019 - MOBIUS: Towards the Next Generation of Query-Ad Matching in Baiduβs Sponsored Search β
Baidu2019 - Why Do People Buy Seemingly Irrelevant Items in Voice Product Search? (Paper) β
Amazon2020 - Managing Diversity in Airbnb Search (Paper) β
Airbnb2020 - Improving Deep Learning for Airbnb Search (Paper) β
Airbnb2020 - Quality Matches Via Personalized AI for Hirer and Seeker Preferences β
LinkedIn2020 - Understanding Dwell Time to Improve LinkedIn Feed Ranking β
LinkedIn2020 - Ads Allocation in Feed via Constrained Optimization (Paper, Video) β
LinkedIn2020 - Understanding Dwell Time to Improve LinkedIn Feed Ranking β
LinkedIn2020 - AI at Scale in Bing β
Microsoft2020 - Query Understanding Engine in Traveloka Universal Search β
Traveloka2020 - Bayesian Product Ranking at Wayfair β
Wayfair2020 - COLD: Towards the Next Generation of Pre-Ranking System (Paper) β
Alibaba2020 - Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) β
Pinterest2020 - Driving Shopping Upsells from Pinterest Search β
Pinterest2020 - GDMix: A Deep Ranking Personalization Framework (Code) β
LinkedIn2020 - Bringing Personalized Search to Etsy β
Etsy2020 - Building a Better Search Engine for Semantic Scholar β
Allen Institute for AI2020 - Query Understanding for Natural Language Enterprise Search (Paper) β
Salesforce2020 - Things Not Strings: Understanding Search Intent with Better Recall β
DoorDash2020 - Query Understanding for Surfacing Under-served Music Content (Paper) β
Spotify2020 - Embedding-based Retrieval in Facebook Search (Paper) β
Facebook2020 - Towards Personalized and Semantic Retrieval for E-commerce Search via Embedding Learning (Paper) β
JD2020 - QUEEN: Neural query rewriting in e-commerce (Paper) β
Amazon2021 - Using Learning-to-rank to Precisely Locate Where to Deliver Packages (Paper) β
Amazon2021 - Seasonal relevance in e-commerce search (Paper) β
Amazon2021 - Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) β
Alibaba2021 - How We Built A Context-Specific Bidding System for Etsy Ads β
Etsy2021 - Pre-trained Language Model based Ranking in Baidu Search (Paper) β
Baidu2021 - Stitching together spaces for query-based recommendations β
Stitch Fix2021 - Deep Natural Language Processing for LinkedIn Search Systems (Paper) β
LinkedIn2021 - Siamese BERT-based Model for Web Search Relevance Ranking Evaluated on a New Czech Dataset (Paper, Code) β
Seznam2021
Embeddings π
- Vector Representation Of Items, Customer And Cart To Build A Recommendation System (Paper) β
Sears2017 - Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (Paper) β
Alibaba2018 - Embeddings@Twitter β
Twitter2018 - Listing Embeddings in Search Ranking (Paper) β
Airbnb2018 - Understanding Latent Style β
Stitch Fix2018 - Towards Deep and Representation Learning for Talent Search at LinkedIn (Paper) β
LinkedIn2018 - Personalized Store Feed with Vector Embeddings β
DoorDash2018 - Should we Embed? A Study on Performance of Embeddings for Real-Time Recommendations(Paper) β
Moshbit2019 - Machine Learning for a Better Developer Experience β
Netflix2020 - Announcing ScaNN: Efficient Vector Similarity Search (Paper, Code) β
Google2020 - Embedding-based Retrieval at Scribd β
Scribd2021
Natural Language Processing π
- Abusive Language Detection in Online User Content (Paper) β
Yahoo2016 - Smart Reply: Automated Response Suggestion for Email (Paper) β
Google2016 - Building Smart Replies for Member Messages β
LinkedIn2017 - How Natural Language Processing Helps LinkedIn Members Get Support Easily β
LinkedIn2019 - Gmail Smart Compose: Real-Time Assisted Writing (Paper) β
Google2019 - Goal-Oriented End-to-End Conversational Models with Profile Features in a Real-World Setting (Paper) β
Amazon2019 - Give Me Jeans not Shoes: How BERT Helps Us Deliver What Clients Want β
Stitch Fix2019 - DeText: A deep NLP Framework for Intelligent Text Understanding (Code) β
LinkedIn2020 - SmartReply for YouTube Creators β
Google2020 - Using Neural Networks to Find Answers in Tables (Paper) β
Google2020 - A Scalable Approach to Reducing Gender Bias in Google Translate β
Google2020 - Assistive AI Makes Replying Easier β
Microsoft2020 - AI Advances to Better Detect Hate Speech β
Facebook2020 - A State-of-the-Art Open Source Chatbot (Paper) β
Facebook2020 - A Highly Efficient, Real-Time Text-to-Speech System Deployed on CPUs β
Facebook2020 - Deep Learning to Translate Between Programming Languages (Paper, Code) β
Facebook2020 - Deploying Lifelong Open-Domain Dialogue Learning (Paper) β
Facebook2020 - Introducing Dynabench: Rethinking the way we benchmark AI β
Facebook2020 - How Gojek Uses NLP to Name Pickup Locations at Scale β
Gojek2020 - The State-of-the-art Open-Domain Chatbot in Chinese and English (Paper) β
Baidu2020 - PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization (Paper, Code) β
Google2020 - Photon: A Robust Cross-Domain Text-to-SQL System (Paper) (Demo) β
Salesforce2020 - GeDi: A Powerful New Method for Controlling Language Models (Paper, Code) β
Salesforce2020 - Applying Topic Modeling to Improve Call Center Operations β
RICOH2020 - WIDeText: A Multimodal Deep Learning Framework β
Airbnb2020 - Dynaboard: Moving Beyond Accuracy to Holistic Model Evaluation in NLP (Code) β
Facebook2021 - How we reduced our text similarity runtime by 99.96% β
Microsoft2021 - Textless NLP: Generating expressive speech from raw audio (Part 1) (Part 2) (Part 3) (Code and Pretrained Models) β
Facebook2021
Sequence Modelling π
- Doctor AI: Predicting Clinical Events via Recurrent Neural Networks (Paper) β
Sutter Health2015 - Deep Learning for Understanding Consumer Histories (Paper) β
Zalando2016 - Using Recurrent Neural Network Models for Early Detection of Heart Failure Onset (Paper) β
Sutter Health2016 - Continual Prediction of Notification Attendance with Classical and Deep Networks (Paper) β
Telefonica2017 - Deep Learning for Electronic Health Records (Paper) β
Google2018 - Practice on Long Sequential User Behavior Modeling for Click-Through Rate Prediction (Paper)
Alibaba2019 - Search-based User Interest Modeling with Sequential Behavior Data for CTR Prediction (Paper) β
Alibaba2020 - How Duolingo uses AI in every part of its app β
Duolingo2020 - Leveraging Online Social Interactions For Enhancing Integrity at Facebook (Paper, Video) β
Facebook2020 - Using deep learning to detect abusive sequences of member activity (Video) β
LinkedIn2021
Computer Vision π
- Creating a Modern OCR Pipeline Using Computer Vision and Deep Learning β
Dropbox2017 - Categorizing Listing Photos at Airbnb β
Airbnb2018 - Amenity Detection and Beyond β New Frontiers of Computer Vision at Airbnb β
Airbnb2019 - How we Improved Computer Vision Metrics by More Than 5% Only by Cleaning Labelling Errors β
Deepomatic - Making machines recognize and transcribe conversations in meetings using audio and video β
Microsoft2019 - Powered by AI: Advancing product understanding and building new shopping experiences β
Facebook2020 - A Neural Weather Model for Eight-Hour Precipitation Forecasting (Paper) β
Google2020 - Machine Learning-based Damage Assessment for Disaster Relief (Paper) β
Google2020 - RepNet: Counting Repetitions in Videos (Paper) β
Google2020 - Converting Text to Images for Product Discovery (Paper) β
Amazon2020 - How Disney Uses PyTorch for Animated Character Recognition β
Disney2020 - Image Captioning as an Assistive Technology (Video) β
IBM2020 - AI for AG: Production machine learning for agriculture β
Blue River2020 - AI for Full-Self Driving at Tesla β
Tesla2020 - On-device Supermarket Product Recognition β
Google2020 - Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings (Paper) β
Google2020 - Shop The Look: Building a Large Scale Visual Shopping System at Pinterest (Paper, Video) β
Pinterest2020 - Developing Real-Time, Automatic Sign Language Detection for Video Conferencing (Paper) β
Google2020 - Vision-based Price Suggestion for Online Second-hand Items (Paper) β
Alibaba2020 - New AI Research to Help Predict COVID-19 Resource Needs From X-rays (Paper, Model) β
Facebook2021 - An Efficient Training Approach for Very Large Scale Face Recognition (Paper) β
Alibaba2021 - Identifying Document Types at Scribd β
Scribd2021 - Semi-Supervised Visual Representation Learning for Fashion Compatibility (Paper) β
Walmart2021
Reinforcement Learning π
- Deep Reinforcement Learning for Sponsored Search Real-time Bidding (Paper) β
Alibaba2018 - Budget Constrained Bidding by Model-free Reinforcement Learning in Display Advertising (Paper) β
Alibaba2018 - Reinforcement Learning for On-Demand Logistics β
DoorDash2018 - Reinforcement Learning to Rank in E-Commerce Search Engine (Paper) β
Alibaba2018 - Dynamic Pricing on E-commerce Platform with Deep Reinforcement Learning (Paper) β
Alibaba2019 - Productionizing Deep Reinforcement Learning with Spark and MLflow β
Zynga2020 - Deep Reinforcement Learning in Production Part1 Part 2 β
Zynga2020 - Building AI Trading Systems β
Denny Britz2020
Anomaly Detection π
- Detecting Performance Anomalies in External Firmware Deployments β
Netflix2019 - Detecting and Preventing Abuse on LinkedIn using Isolation Forests (Code) β
LinkedIn2019 - Deep Anomaly Detection with Spark and Tensorflow (Hopsworks Video) β
Swedbank,Hopsworks2019 - Preventing Abuse Using Unsupervised Learning β
LinkedIn2020 - The Technology Behind Fighting Harassment on LinkedIn β
LinkedIn2020 - Uncovering Insurance Fraud Conspiracy with Network Learning (Paper) β
Ant Financial2020 - How Does Spam Protection Work on Stack Exchange? β
Stack Exchange2020 - Auto Content Moderation in C2C e-Commerce β
Mercari2020 - Blocking Slack Invite Spam With Machine Learning β
Slack2020 - Cloudflare Bot Management: Machine Learning and More β
Cloudflare2020 - Anomalies in Oil Temperature Variations in a Tunnel Boring Machine β
SENER2020 - Using Anomaly Detection to Monitor Low-Risk Bank Customers β
Rabobank2020 - Fighting fraud with Triplet Loss β
OLX Group2020 - Facebook is Now Using AI to Sort Content for Quicker Moderation (Alternative) β
Facebook2020 - How AI is getting better at detecting hate speech Part 1, Part 2, Part 3, Part 4 β
Facebook2020
Graph π
- Building The LinkedIn Knowledge Graph β
LinkedIn2016 - Scaling Knowledge Access and Retrieval at Airbnb β
Airbnb2018 - Graph Convolutional Neural Networks for Web-Scale Recommender Systems (Paper)
Pinterest2018 - Food Discovery with Uber Eats: Using Graph Learning to Power Recommendations β
Uber2019 - AliGraph: A Comprehensive Graph Neural Network Platform (Paper) β
Alibaba2019 - Contextualizing Airbnb by Building Knowledge Graph β
Airbnb2019 - Retail Graph β Walmartβs Product Knowledge Graph β
Walmart2020 - Traffic Prediction with Advanced Graph Neural Networks β
DeepMind2020 - SimClusters: Community-Based Representations for Recommendations (Paper, Video) β
Twitter2020 - Metapaths guided Neighbors aggregated Network for Heterogeneous Graph Reasoning (Paper) β
Alibaba2021 - Graph Intention Network for Click-through Rate Prediction in Sponsored Search (Paper) β
Alibaba2021 - JEL: Applying End-to-End Neural Entity Linking in JPMorgan Chase (Paper) β
JPMorgan Chase2021
Optimization π
- Matchmaking in Lyft Line (Part 1) (Part 2) (Part 3) β
Lyft2016 - The Data and Science behind GrabShare Carpooling (Part 1) β
Grab2017 - How Trip Inferences and Machine Learning Optimize Delivery Times on Uber Eats β
Uber2018 - Next-Generation Optimization for Dasher Dispatch at DoorDash β
DoorDash2020 - Optimization of Passengers Waiting Time in Elevators Using Machine Learning β
Thyssen Krupp AG2020 - Think Out of The Package: Recommending Package Types for E-commerce Shipments (Paper) β
Amazon2020 - Optimizing DoorDashβs Marketing Spend with Machine Learning β
DoorDash2020
Information Extraction π
- Unsupervised Extraction of Attributes and Their Values from Product Description (Paper) β
Rakuten2013 - Using Machine Learning to Index Text from Billions of Images β
Dropbox2018 - Extracting Structured Data from Templatic Documents (Paper) β
Google2020 - AutoKnow: self-driving knowledge collection for products of thousands of types (Paper, Video) β
Amazon2020 - One-shot Text Labeling using Attention and Belief Propagation for Information Extraction (Paper) β
Alibaba2020 - Information Extraction from Receipts with Graph Convolutional Networks β
Nanonets2021
Weak Supervision π
- Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale (Paper) β
Google2019 - Osprey: Weak Supervision of Imbalanced Extraction Problems without Code (Paper) β
Intel2019 - Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) β
Apple2019 - Bootstrapping Conversational Agents with Weak Supervision (Paper) β
IBM2019
Generation π
- Better Language Models and Their Implications (Paper)
OpenAI2019 - Image GPT (Paper, Code) β
OpenAI2019 - Language Models are Few-Shot Learners (Paper) (GPT-3 Blog post) β
OpenAI2020 - Deep Learned Super Resolution for Feature Film Production (Paper) β
Pixar2020 - Unit Test Case Generation with Transformers β
Microsoft2021
Audio π
- Improving On-Device Speech Recognition with VoiceFilter-Lite (Paper)
Google2020 - The Machine Learning Behind Hum to Search β
Google2020
Validation and A/B Testing π
- Overlapping Experiment Infrastructure: More, Better, Faster Experimentation (Paper) β
Google2010 - The Reusable Holdout: Preserving Validity in Adaptive Data Analysis (Paper) β
Google2015 - Twitter Experimentation: Technical Overview β
Twitter2015 - Itβs All A/Bout Testing: The Netflix Experimentation Platform β
Netflix2016 - Building Pinterestβs A/B Testing Platform β
Pinterest2016 - Experimenting to Solve Cramming β
Twitter2017 - Building an Intelligent Experimentation Platform with Uber Engineering β
Uber2017 - Scaling Airbnbβs Experimentation Platform β
Airbnb2017 - Meet Wasabi, an Open Source A/B Testing Platform (Code) β
Intuit2017 - Analyzing Experiment Outcomes: Beyond Average Treatment Effects β
Uber2018 - Under the Hood of Uberβs Experimentation Platform β
Uber2018 - Constrained Bayesian Optimization with Noisy Experiments (Paper) β
Facebook2018 - Reliable and Scalable Feature Toggles and A/B Testing SDK at Grab β
Grab2018 - Modeling Conversion Rates and Saving Millions Using Kaplan-Meier and Gamma Distributions (Code) β
Better2019 - Detecting Interference: An A/B Test of A/B Tests β
LinkedIn2019 - Announcing a New Framework for Designing Optimal Experiments with Pyro (Paper) (Paper) β
Uber2020 - Enabling 10x More Experiments with Traveloka Experiment Platform β
Traveloka2020 - Large Scale Experimentation at Stitch Fix (Paper) β
Stitch Fix2020 - Multi-Armed Bandits and the Stitch Fix Experimentation Platform β
Stitch Fix2020 - Experimentation with Resource Constraints β
Stitch Fix2020 - Computational Causal Inference at Netflix (Paper) β
Netflix2020 - Key Challenges with Quasi Experiments at Netflix β
Netflix2020 - Making the LinkedIn experimentation engine 20x faster β
LinkedIn2020 - Our Evolution Towards T-REX: The Prehistory of Experimentation Infrastructure at LinkedIn β
LinkedIn2020 - How to Use Quasi-experiments and Counterfactuals to Build Great Products β
Shopify2020 - Improving Experimental Power through Control Using Predictions as Covariate β
DoorDash2020 - Supporting Rapid Product Iteration with an Experimentation Analysis Platform β
DoorDash2020 - Improving Online Experiment Capacity by 4X with Parallelization and Increased Sensitivity β
DoorDash2020 - Leveraging Causal Modeling to Get More Value from Flat Experiment Results β
DoorDash2020 - Iterating Real-time Assignment Algorithms Through Experimentation β
DoorDash2020 - Spotifyβs New Experimentation Platform (Part 1) (Part 2) β
Spotify2020 - Interpreting A/B Test Results: False Positives and Statistical Significance β
Netflix2021 - Interpreting A/B Test Results: False Negatives and Power β
Netflix2021 - Running Experiments with Google Adwords for Campaign Optimization β
DoorDash2021 - The 4 Principles DoorDash Used to Increase Its Logistics Experiment Capacity by 1000% β
DoorDash2021 - Experimentation Platform at Zalando: Part 1 - Evolution β
Zalando2021 - Designing Experimentation Guardrails β
Airbnb2021 - Network Experimentation at Scale(Paper]
Facebook2021 - Universal Holdout Groups at Disney Streaming β
Disney2021
Model Management π
- Operationalizing Machine LearningβManaging Provenance from Raw Data to Predictions β
Comcast2018 - Overton: A Data System for Monitoring and Improving Machine-Learned Products (Paper) β
Apple2019 - Runway - Model Lifecycle Management at Netflix β
Netflix2020 - Managing ML Models @ Scale - Intuitβs ML Platform β
Intuit2020 - ML Model Monitoring - 9 Tips From the Trenches β
Nubank2021
Efficiency π
- GrokNet: Unified Computer Vision Model Trunk and Embeddings For Commerce (Paper) β
Facebook2020 - How We Scaled Bert To Serve 1+ Billion Daily Requests on CPUs β
Roblox2020 - Permute, Quantize, and Fine-tune: Efficient Compression of Neural Networks (Paper) β
Uber2021
Ethics π
- Building Inclusive Products Through A/B Testing (Paper) β
LinkedIn2020 - LiFT: A Scalable Framework for Measuring Fairness in ML Applications (Paper) β
LinkedIn2020
Infra π
- Reengineering Facebook AIβs Deep Learning Platforms for Interoperability β
Facebook2020 - Elastic Distributed Training with XGBoost on Ray β
Uber2021
MLOps Platforms π
- Meet Michelangelo: Uberβs Machine Learning Platform β
Uber2017 - Operationalizing Machine LearningβManaging Provenance from Raw Data to Predictions β
Comcast2018 - Big Data Machine Learning Platform at Pinterest β
Pinterest2019 - Core Modeling at Instagram β
Instagram2019 - Open-Sourcing Metaflow - a Human-Centric Framework for Data Science β
Netflix2019 - Managing ML Models @ Scale - Intuitβs ML Platform β
Intuit2020 - Real-time Machine Learning Inference Platform at Zomato β
Zomato2020 - Introducing Flyte: Cloud Native Machine Learning and Data Processing Platform β
Lyft2020 - Building Flexible Ensemble ML Models with a Computational Graph β
DoorDash2021 - LyftLearn: ML Model Training Infrastructure built on Kubernetes β
Lyft2021 - βYou Donβt Need a Bigger Boatβ: A Full Data Pipeline Built with Open-Source Tools (Paper) β
Coveo2021 - MLOps at GreenSteam: Shipping Machine Learning β
GreenSteam2021 - Evolving Redditβs ML Model Deployment and Serving Architecture β
Reddit2021
Practices π
- Practical Recommendations for Gradient-Based Training of Deep Architectures (Paper) β
Yoshua Bengio2012 - Machine Learning: The High Interest Credit Card of Technical Debt (Paper) (Paper) β
Google2014 - Rules of Machine Learning: Best Practices for ML Engineering β
Google2018 - On Challenges in Machine Learning Model Management β
Amazon2018 - Machine Learning in Production: The Booking.com Approach β
Booking2019 - 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com (Paper) β
Booking2019 - Successes and Challenges in Adopting Machine Learning at Scale at a Global Bank β
Rabobank2019 - Challenges in Deploying Machine Learning: a Survey of Case Studies (Paper) β
Cambridge2020 - Reengineering Facebook AIβs Deep Learning Platforms for Interoperability β
Facebook2020 - The problem with AI developer tools for enterprises β
Databricks2020 - Continuous Integration and Deployment for Machine Learning Online Serving and Models β
Uber2021 - Tuning Model Performance β
Uber2021 - Maintaining Machine Learning Model Accuracy Through Monitoring β
DoorDash2021 - Building Scalable and Performant Marketing ML Systems at Wayfair β
Wayfair2021 - Our approach to building transparent and explainable AI systems β
LinkedIn2021 - 5 Steps for Building Machine Learning Models for Business β
Shopify2021
Team structure π
- Engineers Shouldnβt Write ETL: A Guide to Building a High Functioning Data Science Department β
Stitch Fix2016 - Building The Analytics Team At Wish β
Wish2018 - Beware the Data Science Pin Factory: The Power of the Full-Stack Data Science Generalist β
Stitch Fix2019 - Cultivating Algorithms: How We Grow Data Science at Stitch Fix β
Stitch Fix - Analytics at Netflix: Who We Are and What We Do β
Netflix2020 - Building a Data Team at a Mid-stage Startup: A Short Story β
Erikbern2021
Fails π
- When It Comes to Gorillas, Google Photos Remains Blind β
Google2010 - 160k+ High School Students Will Graduate Only If a Model Allows Them to β
International Baccalaureate2020 - An Algorithm That βPredictsβ Criminality Based on a Face Sparks a Furor β
Harrisburg University2020 - Itβs Hard to Generate Neural Text From GPT-3 About Muslims β
OpenAI2020 - A British AI Tool to Predict Violent Crime Is Too Flawed to Use β
United Kingdom2020 - More in awful-ai
References
[1] Machine Learning System Design
[2] Machine Learning Surveys γforked from eugeneyan/ml-surveysγ
[3] Applied Machine Learning γforked from eugeneyan/applied-mlγ