AI+DB: The Dynamic Duo

June 16, 2016, 11AM

302- 308




AI and DB integration is an inevitable trend in the era of Big data. Traditionally, DB was focused on scalability on large amounts of data while AI was focused on intelligent processing, but not on large data. Recently, practical systems with real-life applications need integrated technology from both AI and DB, introducing new challenges that can change the way we think about traditional AI and DB issues. To this end, I will describe my research starting from data analytics I have done at Stanford to AI at Google Research.

As massive amounts of information are available for analysis, scalable integration techniques that provide a unified view to the heterogenous information from various sources are becoming important for data analytics. In this talk, within information integration, I will focus on the problem of entity resolution (ER), which identifies objects that refer to the same real world entity. In practice, ER is not a one-time process, but is constantly improved as the information schema and application are better understood. I will address the problem of keeping the ER result up-to-date when the ER logic "evolves" frequently by using evolving rules.

Biperpedia is a state-of-art knowledge base for search applications developed at Google Research (for which I was the technical leader). While the attributes of existing knowledge bases like Freebase are manually curated, Biperpedia automatically extracts attributes (thousands per class) on the long tail from Search queries and Web Text using machine learning and natural language processing techniques. As an application, I will show how Biperpedia attributes can be used to find latent subsumption relationships among concepts on the Web. I will also briefly describe an open information extraction system (called ReNoun) that extracts facts for nominal (명사형) attributes, and a framework for training rule-based grammar for attributes.

Finally, I will briefly introduce ongoing research at Google that I am doing in a large-scale machine learning systems team.