Back To Schedule
Sunday, October 22 • 4:20pm - 4:50pm
Best Practices for Aggregating, Cleaning and Normalizing Data

Log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Disparate data sources are a constant problem in text analytics. Different sources have varied document formats, dating conventions, and ways of presenting the same information. Collecting, normalizing, and building models on such data sets is a challenge that requires artful human involvement.

In this talk, I discuss my experiences, as well as popular conventions, for building human-in-the-loop machine learning systems from vast and varying data sources.

avatar for Daniel Dandurand

Daniel Dandurand

Data Engineer, Compliance.ai
Dan was a physics Ph.D. student at UC Berkeley before making the switch to data science. He was a Data Engineering Fellow at Insight Data Science before joining Compliance.ai as a Data Engineer. Also, last year Dan spent time in China as an organizer of the global summit, Cre8, a... Read More →

Sunday October 22, 2017 4:20pm - 4:50pm PDT
Room E Room E