Portfolio

Scraping and Analyzing One Million Medium Articles

Medium is a blogging platform where writers and readers share their ideas. This purpose of this project was to give Medium writers a benchmark to measure their own performance, as well as a goal that might increase the rankings of their stories in Medium's recommendation engine.

With more than two hundred thousand writers in my dataset, this project has the potential to ease the creative process for thousands, and increase the quality of Medium's stories for its readers.


How To Learn Data Science If You're Broke (Article)

This article conveys my advice and guidelines for building a self-driven data science education, no matter what your situation is. I hope to give others the resources to start a passionate career in data science. This post has been published and featured in Towards Data Science, with 3K reads on Medium in the first 24 hours.


Don't Use Dropout in Convolutional Networks (Article)

This blog post has been published and featured in Towards Data Science, with 3K reads on Medium in 2 weeks. It has also been reposted as a guest blog on KDNuggets, a leading site on Analytics, Big Data, Data Science, and Machine Learning, reaching over 500K unique visitors per month and over 230K subscribers/followers via email and social media.


Don't Use Dropout in Convolutional Networks (Experiment)

This experiment tests whether convolutional neural networks with dropout or batch normalization are more performant in image recognition tasks. The notebook in this repository is experimental evidence supporting the Medium post I wrote explaining how to more effectively build convolutional neural networks.


Batch Normalization: Intuition and Implementation (Article)

A Towards Data Science article explaining the mathematics behind batch normalization in convolutional neural networks. After a brief description of batch normalization's regularizing effects, I explain how to implement BN in Keras. I then build an image recognition classifier using the Cifar 100 dataset.


Global Average Pooling: Object Localization

In this project, I implemented the deep learning method for object localization (finding objects in an image) proposed in this research paper. I improved code written by Alexis Cook to handle multi-class localization of images.

Computer vision has innumerable real world applications. This project was my introduction to the world of computer vision research. Since the conclusion of this project, I have focused heavily on researching recent advances in convolutional neural network architectures. Furthermore, I have made an emphasis on learning how to apply these concepts using Tensorflow and Keras.


Apple Sentiment Analysis

This project was motivated by my drive to learn about the best practices of predictive modeling in text data. In the write-up, I cleaned and vectorized Twitter data, visualized and examined patterns, and created a linear classifier to predict document sentiment with 89% accuracy on a validation set.


Toxic Comments

In this project, I used unsupervised learning to cluster forum discussions. Specifically, I performed LDA clustering on Wikipedia forum comments to see if I could isolate clusters of toxic comments.(insults, slurs,...)

I was successful in isolating toxic comments into one group. Furthermore, I gained valuable knowledge about the discussions held within the forum dataset, labeling forum posts into nine distinct categories. These nine categories could be further grouped as either relevant discussion, side conversations, or outright toxic comments.


Clustering Mental Health

In this write-up, I sought to answer whether a survey of mental health benefits of tech industry employees could be used to cluster employees into groups with good and bad mental health coverage.

In completing this project, I learned how to encode categorical data and create an insightful EDA with great visualizations. I also learned how to implement clustering methods on data, and analyze the appropriateness of the clustering method with various techniques.


Data Science Lifecycle

In this write-up, I sought to practice the entire data science lifecycle. This includes defining project end goals, data cleaning, exploratory data analysis, model comparisons, and model tuning.

After a brief EDA, I visualized the Titanic dataset via a 2D projection. I then compared several machine learning algorithms and found the most accurate model to be a Gradient Boosted Machine. After a model tuning phase I increased model accuracy from 77% to 79%.