I have over 14 years of experience in shaping and delivering large-scale AI solutions for various products with a track record of developing and leading multi-year technology strategies. Over 7 years of experience managing research and engineering teams across multiple sites
Compressed Sensing and Network Coding
Peer-to-peer networks
I have formed and currently lead a team of over 30 AI scientists and engineers to pre-train, post-train, and deploy an 100B+ parameter foundational model for LinkedIn’s personalization tasks at scale.
Leading a medium size team with diverse profiles, research scientists and software engineers. Our mission is to advance AI technologies to keep users safe online. My team builds multimodal content understanding services used across many Meta integrity products.
Leading LinkedIn Ads Sponsored Update relevance (five engineers, one analytics, one PM). The team is responsible for modeling and raking advertising content on the LinkedIn news feed that shows ads from millions of advertisers to hundreds of millions of Linkedin daily active users.
Leading a four-engineer group for forecasting. We are responsible for a) Predicting sales attributes (dollar amount, closed date, and the closing probability) for the Sales team b) Predicting the possibility of churn for the Customer Success (CSM) team. media coverage
Building an early warning system based on the Bayesian network that provides diagnosis and prognosis of large industrial machines.
In this report we demonstrate that our 150B parameter foundation model trained on 1T tokens can solve over 30 personalization tasks on LinkedIn platform without task-specific fine-tuning and no complex feature engineering. The can generalize to out-of-domain tasks and surfaces, and achieves performance similar to or better than the production model.
In this work we demonstrate that LLMs performance affected by the relative distance between pieces of information in the context. The further apart the information is within long context, the more the model’s performance deteriorates.
Liger Kernel is a collection of Triton kernels designed specifically for LLM training. It can effectively increase multi-GPU training throughput by 20% and reduces memory usage by 60%.
This system-model co-design work focus on leveraging syncronization in data parallelism hierarchical partitioning to avoid race conditioning in gradient update for LLM training.
We formulate CoT as a reasoning graph and propose a prompt strategy for multi-step reasoning that can capture complex processes in tasks such as mathematics and commonsense reasoning.
The race condition between AllGather and device-to-device copy for the 2nd partition causes instability in training large models such as Llama-7B and Falcon-40B on a moderately large number of GPUs. After discovering the algorithmic issue, we landed the fix in the DeepSpeed repository.
We propose a framework for understanding how Data Aaugmentation interacts with class-level learning dynamics. We show that simple class-conditional augmentation strategies informed by our framework improve performance on the negatively affected classes.