[Recent Update]

A Two-Day Hackathon with Credit Suisse

Disclaimer: Views and opinions expressed in this post are solely my own and do not express the views or opinions of Credit Suisse or other participants.

On my way back to Boston for the summer, I took a two-day layover at the at the Raleigh Credit Suisse offices for Code Suisse 2018. Overall, the event was engaging and fun. Students came from across the country, and there was a good selection of projects to work on for the Hackathon. A particular project proposal caught my attention, and I worked on it with Prabhath Kotha, a student from UNC. Every group worked hard and produced impressive results; more importantly, everyone seemed to learn a lot in the process. Despite the stiff competition, the we managed to place 1st in the event! Our time management, clear goals, and understanding of the project and its purpose contributed most to our success.

The Project

The goal of our project was to streamline the user experience on a Credit Suisse website. By using machine learning techniques to analyze usage patterns, we aimed to decrease the amount of clicks needed to find particular pages of interest. Our dataset included over a million data points relating to users access patterns. These data included time of day accessed, user ID, region, and more. We were ultimately tasked with building a mockup website that showcased our results.

Project Results

The respository for our final project can be found here. In includes anonymized user data, code for data analysis, code for the final mockup website, and our final presentation. Instructions for replicating our results and running the website can be found in the README.

Our basic recommendation algorithm, a multi-class (3000+) classification with KNN, was quite successful. According to results on test data, there is a 70 percent chance that a user's page of interest is linked on their home page. Based on popular sites, we also recommend sites the user may not have yet visited in the 'explore' section.

Our secondary recommendation algorithm aimed to analyze time-series usage patterns of all users. It recommends pages for users to visit next based on their current browsing data. The results are shown when the user visits a 'recommended' or 'explore' site. We used a basic recurrent neural network model with LSTM (long short term memory) to analyze the time series data. Training on all the data was prohibitively expensive, even when renting an GPU from Google, so we limited the dataset to 10K entries. It was harder to evaluate the success of this approach, especially with the time constraint, but I'm curious to explore this more in the future.

Final Thoughts

Overall, I really enjoyed the event. I learned a lot, and it was a great to apply my new skills from COMP135 (Intro to Machine Learning and Data Mining) on a real-world task. I'm excited to pursue similar projects, and hopefully I'll be able to participate in more hackathons in the future.