When it comes to putting data to work, no one does it like the titans of technology – the Tech Giants. Facebook, Google, LinkedIn and Twitter dominate and enable our communication and online search actions; and are committed to advancing the way people connect, communicate and providing them with the most relevant information on their request in a fraction of a second. These tech giants are joining us at the 5th edition of the Online Data Innovation Summit to share their state-of-the-art methods of using data, AI, ML to power their massive tech platforms and make decisions that impact millions of people daily.
Facebook’s Misspelling Oblivious Embeddings model
We are thrilled to have Fabrizio Silvestri, Research Scientist at Facebook AI – the research division of Facebook, the most popular social media platform among the tech giants, dedicated to advancing artificial intelligence and machine learning to create new technologies and give people better ways to communicate. Fabrizio is going to present Facebook’s breakthrough model in the field of Natural Language Processing (NLP), called the Misspelling Embeddings Model (MOE).
During his season Misspelling Oblivious Embeddings on the Machine and Deep Leaning stage on August 20th, Fabrizio will present this novel embedding that is resilient to misspelling and show experimental evidence that the method works in practice. This novel Misspelling Embeddings Model (MOE) model combines Facebook’s open-source library fastText with a supervised task that embeds misspellings close to their correct variants. The loss function of fastText aims to more closely embed words that occur in the same context, which Facebook calls a semantic loss. In addition to this, MOE also considers an additional supervised loss that is called spell correction loss, which aims to embed misspellings close to their correct versions by minimizing the weighted sum of the semantic loss and spell correction loss.
How Google uses AI and ML in the enterprise and draws insights from customer feedback using NLP
Representing the search engine tech giant and leader in ML, AI and NLP, Rich Dutton, Head of Machine Learning for Corporate Engineering | Google Corporate Engineering, will tell us more about How Google Uses AI and ML in the Enterprise at the IGNITE stage on August 20th.
In his session, Rich will outline how Google’s Corporate Engineering team is using AI and machine learning to spur innovation within Google. Additionally, Rich will identify the work that his team does (the structure, example use cases etc.), and the research that’s driving the work his team does and the democratization of AI (work in ML Fairness, Privacy, Interpretability and AutoML technologies).
Rich Dutton will also reveal the types of challenges that Google solves by using AI, how Google has built an enterprise AI team, as well as some considerations to keep in mind as you think about employing AI in your enterprise.
Also coming from Google and representing Austin, Texas, Peter Grabowski, Austin Site Lead of Enterprise Machine Learning | Google Corporate Engineering, will be talking about Drawing Insights from Customer Feedback Using NLP at the ACCELERATE stage on August 20th.
Companies are frequently faced with large amounts of unstructured text data, like forum comments or product reviews. Important trends can emerge in these datasets, but it can be time-consuming to read through comments, and keyword matching frequently misses critical nuances.
Peter will discuss how they have approached this problem at Google using Natural Language Processing, with examples of the approach applied to open datasets. He’ll also explore how this fits into the ML project lifecycle, with examples of common pitfalls. And finally, Peter will highlight how to use this technology as part of a “human in the loop” approach to supercharge your existing team members.
Fueling Machine Learning with Feature Engineering at Twitter
From the most popular search engine of the tech giants, we go to the social networking site that is among the most popular social media platforms for getting news, following celebrities or expressing your opinion – Twitter. Jigyasa Grover, Machine Learning Engineer at Twitter, will be discussing Fueling Machine Learning with Feature Engineering at the Data Engineering stage on August 205h.
As Jigyasa states, in the contemporary world of learning algorithms – “data is the new oil”. Data demands efficient refinement to expose valuable information. To lay a strong foundation for the state-of-the-art machine learning algorithms to work their magic, the crude oil-like data needs to be infused with domain knowledge and extracted into “features”.
In her talk, Jigyasa will introduce the topic of Feature Engineering and talk about the power of the most creative aspect of data science, which often does not get its due limelight. She will also walk the audience through the process of feature engineering as done in formal settings with a simple hands-on Pythonic example on publicly available data, along with putting forward some popular techniques like hashing, encoding, and embedding, which assists in pulling the most out of the data after giving it a proper structure for predictive modelling. Jigyasa will cast light on terms pertaining to the realm of feature engineering like relevance, selection, combination, and explosion will also be discussed. The goal is to institute the importance of data, especially in its worthy format, and the spell it casts on fabricating smart learning algorithms.
Understanding customers and making profitable decisions: a lesson by LinkedIn
Rounding up this series of tech giants, we are going to look at the business and employment-oriented social networking service – LinkedIn, and Zheng Shao’s Data Innovation Summit session.
Zheng Shao, Staff Data Scientist at LinkedIn, will share insights on Leverage Testing to Understand Customers and Make Profitable Decisions. Having worked as a Data Scientist at Facebook/Instagram, and currently leading the data science team at LinkedIn, Zheng will explore why if testing is in the blood of many Internet/online organizations, it is not widely adopted in retail/offline companies.
In his talk, Zheng will outline some challenges that these organizations are facing regarding testing, and will propose how to overcome them. Some of the other relevant points that Zheng will propose in his session are that no great customer data makes it hard to test in an offline environment, how offline companies are becoming more rigorous in testing and more digitally relevant, and how online companies are developing trade-off frameworks when users don’t mean the same as customers.