Log in to bookmark your favorites and sync them to your phone or calendar.

Data Analytics [clear filter]
Sunday, October 22

9:45am PDT

Everything You Wish You Knew About Search
Whether it is to browse the web, to shop the latest trends or to retrieve relevant documents, search engines make our lives easier, and we use them on a daily basis, both in our personal and professional lives. However, no matter how effortless they appear, their creation is anything but easy. In reality, large corporations employ tens and even sometimes hundreds of engineers, product managers and data scientists to support the development, improvement and maintenance of the entire system. In this talk, I will explain how search differs across industries and how enterprise search contrasts with retail search or web search.  I will describe some of the most popular ranking algorithms and give some tips about how to choose the one that is the most appropriate for your application. I will also dig into the different peripheral algorithms and pieces that come into play, emphasizing the main reasons why creating a comprehensive, high-performance engine is ultimately a very challenging task.

avatar for Jennifer Prendki

Jennifer Prendki

Head of Data Science, Atlassian
Jennifer Prendki is the Head of Data Science at Atlassian, where she uses Big Data and Machine Learning to create products that help other companies change the world. Her original area of expertise is particle physics, a field aiming at measuring subtle signals through the analysis... Read More →

Sunday October 22, 2017 9:45am - 10:15am PDT
Room E Room E

10:20am PDT

Machine Learning & Data Science in the Age of the GPU: Smarter, Faster, Better
Companies are exploring data in ways we once only associated with science fiction films. Data scientists and analysts live in a world with access to a plethora of tools to analyze and visualize this data - but considering the vast amount of data businesses collect and the machine learning limitations of CPU compute capacity, end users are forced to design their structures and systems with limitations. 

Until now. Graphic Processing Units (GPUs) have stepped in to massively advance and parallel machine learning, data science and analytics for companies both small and large. Equipped with the ability to render graphics instantly, GPUs are computing, exploring and visualizing billions of rows of data in milliseconds - all on one chip. 

As a result, the ability to analyze data and run queries in real-time is giving machine learning algorithms the tools to become even smarter and faster, and is giving companies in industries like financial services, government, retail, adtech and telecommunications the types of tools to compete more effectively, respond more rapidly and tackle challenges they previously considered too hard for their legacy compute platforms. 
In this talk, Todd Mostak will explore the capabilities and applications of GPUs to advance and accelerate machine learning and data visualization. From design to interactivity to user experience, Todd will reveal how this technology is producing faster, measurable and more significant outcomes for businesses today.

avatar for Aaron Williams

Aaron Williams

VP, Global Community at MapD
Aaron is responsible for MapD’s developer, user and open source communities. He comes to MapD with more than two decades of previous success building ecosystems around some of software’s most familiar platforms. Most recently he ran the global community for Mesosphere, including... Read More →

Sunday October 22, 2017 10:20am - 10:50am PDT
Room E Room E

10:55am PDT

Algorithmic Content Augmentation: Applying Data Science to Internet Media
With increasing competition for internet traffic, content creators and publishers are looking for new ways to enhance user experiences. With Career Trend (careertrend.com), we wanted to provide unique, data-infused tools, and present data in a way that is new and useful to users.
We algorithmically created 340 occupation landing pages infused with data spanning several government agencies and different classification systems. We present occupation data as a time series, comparing past predictions with actual data collected, a way these data have not been presented before. An example of one of these pages is here (careertrend.com/software-developers.html), with a full listing here (careertrend.com/nav-occupations.html). We surface links to related occupations on each article page using document similarity techniques.

In addition we created unique tools like the career word cloud quiz (careertrend.com/quiz.html) using a recursive clustering algorithm which enforces (roughly) equal cluster sizes to improve the usability of the quiz.

Career Trend is powered by looking for new ways to extract value and insight from data, from presenting disparate data in a new way to inventive use of machine learning tools like clustering.

avatar for Matthew Theisen

Matthew Theisen

Software Engineer, Leaf Group
I'm currently a software engineer at Leaf Group working on web content and analytics. I look for ways to algorithmically understand and enhance user experiences on our content channels. This includes article recommendation, finding related images, and infusing data into content pages.In... Read More →

Sunday October 22, 2017 10:55am - 11:25am PDT
Room E Room E

11:30am PDT

Best Practices in Data Partnerships Between Mayor's Office and Academia
In this session, we will share 3 to 5 best practices anchored around stories from the partnership between the Mayor's Office of Budget & Innovation, top-tier universities, community groups and professional organizations.

avatar for Juan Vasquez

Juan Vasquez

Data Programs Manager, Business Experience Unit, Los Angeles Office of Finance
I am a communications professional currently working at the LA Mayor's Office with the Operations Innovation Team.My role merges data analysis and storytelling, a healthy dose of partnership building, and a wealth of public speaking and project management.In past lives, I've been... Read More →

Sunday October 22, 2017 11:30am - 12:00pm PDT
Room E Room E

12:00pm PDT

Lunch Break
Sunday October 22, 2017 12:00pm - 1:00pm PDT
Room E Room E

1:00pm PDT

State of AI/ML in Real Estate
The real estate industry is generating terabytes of data, but till this day a tiny percentage is being utilized or processed. Currently, we have a huge spike in interest real estate analytics field – from multiple startups in the field with multi-million funding to top prize competitions on Kaggle.

In this talk, we are going to discuss popular approaches to model real estate markets and also explore the most promising AI/ML techniques for the field

avatar for Anton Polishko

Anton Polishko

Anton is a published author of predictive modeling algorithms with application in computational biology. Anton received his Ph.D. in Computer Science from UC Riverside, Master of Science in Finance and honored Master of Science in Applied Mathematics from Taras Shevchenko National... Read More →

Sunday October 22, 2017 1:00pm - 1:30pm PDT
Room E Room E

1:40pm PDT

Interpretable Machine Learning for Human Behavioral Data
Understanding the mechanisms and drivers of human behavior is a difficult problem, accentuated by the heterogeneous nature of human behavioral data. This poses major issues for our ability to model and understand social systems, with important implications for design, testing and interventions in such systems.In this talk, I will present a statistical methodology to understand human behavior that quantifies feature importance, feature correlations and the level of predictability in human behavior data.

This methodology is highly interpretable, utilizing the R2 coefficient of determination to measure both the predictability of a system and the cumulative contribution of each feature towards this overall predictability. Our approach is non-parametric and free of any functional form, thus allowing for the capturing of non-linear and heterogeneous data which regularly occurs in human behavioral dynamics.

To illustrate, I will show our applications of this approach to various domains including the analysis of human performance in the Stack Exchange online forum and information sharing in the Twitter and Digg online social networks. In the case of information sharing on Twitter, for example, we show how our method effectively uncovers correlations among both individual-based features and information-based features, presenting a hierarchy of features that cumulatively explain human behavior in this social system.

avatar for Peter Fennell

Peter Fennell

Postdoctoral research fellow, USC Information Sciences Institute
Dr. Peter Fennell is a James S. McDonnell Postdoctoral Fellow at the Information Sciences Institute, University of Southern California. His research examines human behavior and social networks and developing statistical and machine learning methods to understand and model such systems... Read More →

Sunday October 22, 2017 1:40pm - 2:10pm PDT
Room E Room E

2:20pm PDT

Fake News Detection Using Machine Learning and Blockchain Technology
After the 2016 US presidential election, many people shocked with the outcome, looked for the cause of the “surprising” result. Some blamed the outcome on the fake news phenomenon during the election: it painted false images of candidates for better or for worse. A majority of the most viral news articles covering the election turned out to be fake stories . We created Geppetto. Geppetto is a platform that identifies valid contents in a news article. Geppetto can accurately identify and screen fake news in real-time, using tamper-proof and cost-effective methods that preserves trust in the process. Users can communicate with the Geppetto platform in three ways; (1) as a publisher who can publish its own article on the platform, (2) as a voter to validate/dis-validate a news, (3) as a consumer who searches a news on this platform.The platform requires a database to store the newly published news and verified news by the voters. There is a veracity engine which consists of several natural language processing and machine learning models that can score the legitimacy of the articles and the reliability of validators (voters) and publishers. Because fake news evolves rapidly, the performance of any model built on the historical data can decay rapidly. To address this, we use a continuous learning (CL) algorithm to update the model continuously. The CL component requires a knowledge base to store historical data and re-evaluate the models. Geppetto uses blockchain technology to ensure the system’s data is tamper-proof and decentralized.

avatar for Farshad Kheiri

Farshad Kheiri

Lead Data Scientist, BCG Digital Ventures
Farshad is a Lead Data Scientist at BCG DV; he has been working on several projects used machine learning including deep learning, natural language processing, and other data science techniques for several projects. On one of his last project, he has used data science and block-chain... Read More →

Sunday October 22, 2017 2:20pm - 2:50pm PDT
Room E Room E

3:00pm PDT

Ensuring Data-centric Systems are Useful and Usable
The data model you have accurately describes and predict the available dates but your stakeholders are refusing to accept their accuracy. This talk is aimed at helping data scientists understand how to maximize usefulness and usability of their work.

avatar for Mike Oren

Mike Oren

Consultant, Independent
Mike Oren has a Ph.D. in human-computer interaction and sociology with a BA in computer science. He currently consults with Uptake and Pogorelc.

Sunday October 22, 2017 3:00pm - 3:30pm PDT
Room E Room E

3:40pm PDT

How Machine Learning Can Eliminate Hiring Biases and Identify the Right Candidate For a Job
Hiring the right developer for a job has been a challenge for a long time and is getting worse because of the demand. It’s a time-consuming process. According to Amazon's CTO Werner Vogels, engineers spend 30% of their time on evaluating talent; time that could be better spent on building products.

Not only that, the traditional recruiting methods, like resumes and face-to-face interviews, are susceptible to unconscious biases against candidates. These are social stereotypes about certain groups of people that are ingrained in people without poor intentions.However, these biases can cost someone their job. Machine Learning can help eliminate some of the biases that are inherently found in traditional recruiting tactics.

This talk is about how we are working to extract purely objective features from a typical technical interview and automate them through machine learning models. Ultimately, companies can eliminate bias and reduce effort in their hiring process.

avatar for Shiv Muddada

Shiv Muddada

Engineering Manager, HackerRank
Shiv Muddada is one of the founding engineers of HackerRank and currently head engineering at its headquarters in Palo Alto. HackerRank creates opportunities for developers by helping companies find great developers based on their skills instead of pedigree.In the past five years... Read More →

Sunday October 22, 2017 3:40pm - 4:10pm PDT
Room E Room E

4:20pm PDT

Best Practices for Aggregating, Cleaning and Normalizing Data
Disparate data sources are a constant problem in text analytics. Different sources have varied document formats, dating conventions, and ways of presenting the same information. Collecting, normalizing, and building models on such data sets is a challenge that requires artful human involvement.

In this talk, I discuss my experiences, as well as popular conventions, for building human-in-the-loop machine learning systems from vast and varying data sources.

avatar for Daniel Dandurand

Daniel Dandurand

Data Engineer, Compliance.ai
Dan was a physics Ph.D. student at UC Berkeley before making the switch to data science. He was a Data Engineering Fellow at Insight Data Science before joining Compliance.ai as a Data Engineer. Also, last year Dan spent time in China as an organizer of the global summit, Cre8, a... Read More →

Sunday October 22, 2017 4:20pm - 4:50pm PDT
Room E Room E
Filter sessions
Apply filters to sessions.