H. Jansma

Automating AWS EC2 Shutdown with Bash Scripts

A helpful tutorial I wrote recently on how to set up a Bash script that utilized the AWS CLI to start, log into, then shutdown an EC2 instance. (I didn't want to forget the instance was running and lose money)

Harrison Jansma on January 20, 2020

RNNs, LSTMs, and Attention Mechanisms for Language Modelling (PyTorch)

Tested the use of Word2Vec embeddings with a variety of sequential input deep learning models towards the task of language modeling (predicting the next word in a sentence).

Harrison Jansma on December 10, 2019

DavisBase: A Custom Designed Database (Python)

A fully functional, SQL-compliant database implemented from scratch in Python. DavisBase compresses data to a custom-designed bit-level encoding for maximal data compression. By utilizing a file size of 512Kb, DavisBase performs well in low memory environments while also maximizing query time.

Harrison Jansma on December 02, 2019

Reinforcement Learning: Dynamic Policy Gradients (Numpy)

My implementation of dynamic policy gradients in Python. This reinforcement learning algorithm was then used to train an agent to traverse a dangerous environment.

Harrison Jansma on November 06, 2019

Hidden Markov Models for Parts of Speech Tagging (Python)

Custom implementation of Hidden Markov Models to assign parts of speech labels to a free text dataset. Model was coded from scratch in base Python and utilizes the Viterbi Algorithm for decoding the probability of a given sequence.

Harrison Jansma on October 22, 2019

Named Entity Recognition with Scikit-Learn and PyTorch

Performed the task of finding named entities (like "Google", or "Harrison") within the CoNLL (2003) dataset. A logistic regression and LSTM were trained on the data and achieved F1 scores of 0.926 and 0.899 respectively.

Harrison Jansma on Sep 20, 2019

XGBoost for Text Classification

Utilized Python's XGBoost package to implement gradient boosting on a textual dataset. A similar code was later used during my work at Sprint to build machine learning models for text classification.

Harrison Jansma on June 01, 2019

PySpark and HIVE for Data Analysis

Worked on an AWS EMR cluster to learn the basics of PySpark and HIVE while working at Sprint. These tools allowed me to collect massive amounts of data from Sprint's production data lake.

Harrison Jansma on May 10, 2019

Implementing Common Algorithms in C++

The only way to become a not-garbage-coder is to code a lot. This repository contains some of the coursework I've completed over the last few months. As the semester winds down, I hope to start a new passion project soon!

Harrison Jansma on March 19, 2019

Scraping and Analyzing One Million Medium Articles

Medium is a blogging platform where writers and readers share their ideas. This purpose of this project was to give Medium writers a benchmark to measure their own performance, as well as a goal that might increase the rankings of their stories in Medium's recommendation engine.

With more than two hundred thousand writers in my dataset, this project has the potential to ease the creative process for thousands, and increase the quality of Medium's stories for its readers.

Harrison Jansma on October 10, 2018

How To Learn Data Science If You're Broke (Article)

This article conveys my advice and guidelines for building a self-driven data science education, no matter what your situation is. I hope to give others the resources to start a passionate career in data science.

Harrison Jansma on September 16, 2018

Experimenting with Dropout in Conv Nets

This experiment tests whether convolutional neural networks with dropout or batch normalization are more performant in image recognition tasks.

Harrison Jansma on August 14, 2018

Global Average Pooling: Object Localization

In this project, I implemented the deep learning method for object localization (finding objects in an image) proposed in this research paper. I improved code written by Alexis Cook to handle multi-class localization of images.

Computer vision has innumerable real world applications. This project was my introduction to the world of computer vision research. Since the conclusion of this project, I have focused heavily on researching recent advances in convolutional neural network architectures. Furthermore, I have made an emphasis on learning how to apply these concepts using Tensorflow and Keras.

Harrison Jansma on July 16, 2018

Apple Sentiment Analysis

This project was motivated by my drive to learn about the best practices of predictive modeling in text data. In the write-up, I cleaned and vectorized Twitter data, visualized and examined patterns, and created a linear classifier to predict document sentiment with 89% accuracy on a validation set.

Harrison Jansma on June 20, 2018

Toxic Comments

In this project, I used unsupervised learning to cluster forum discussions. Specifically, I performed LDA clustering on Wikipedia forum comments to see if I could isolate clusters of toxic comments.(insults, slurs,...)

I was successful in isolating toxic comments into one group. Furthermore, I gained valuable knowledge about the discussions held within the forum dataset, labeling forum posts into nine distinct categories. These nine categories could be further grouped as either relevant discussion, side conversations, or outright toxic comments.

Harrison Jansma on June 13, 2018

Clustering Mental Health

In this write-up, I sought to answer whether a survey of mental health benefits of tech industry employees could be used to cluster employees into groups with good and bad mental health coverage.

In completing this project, I learned how to encode categorical data and create an insightful EDA with great visualizations. I also learned how to implement clustering methods on data, and analyze the appropriateness of the clustering method with various techniques.

Harrison Jansma on May 23, 2018

Portfolio