Anthony Huynh

Portfolio

Level 10 Maintenance Email

Live, automated emails that have been setup using HTML, CSS, SQL stored procedures and DBMail to streamline administrative tasks. Live email template developed for a client (Level 10) in my previous role at ClickHome.

Burbank RFC Email

Live, automated emails that have been setup using HTML, CSS, SQL stored procedures and DBMail to streamline administrative tasks. Live email template developed for a client (Burbank) in my previous role at ClickHome.

Guess My Number

A simple game developed using HTML/CSS and Javascript. Included coursework from Udemy

Pig Game

A 2-player game developed using HTML/CSS and Javascript. Included coursework from Udemy

ABC Analysis:

BTi Logistics: Increasing Warehouse Efficiency

Increasing warehouse efficiency through effectively managing its stock by conducting an ABC (Always Better Control) analysis.

Executive Summary

An Always Better Control (ABC) analysis is a common practice among logistics companies to give an indication of the importance of each product. Each product is classified into three buckets, where an ‘A’ item is a fast moving and high impact product, while a ‘C’ item is the item that contributes the least in movement volume and revenue. An ABC analysis was conducted on the BTi logistics warehouse where each item’s classification was determined by its picking rate. Another ABC analysis was conducted on regions of the warehouse where the classification was determined by the travel time in respect to the picking area. A combination of these two practices has produced a table where each row indicates where each item is currently located (in an A, B or C region), and where it should be located (whether it’s an A, B or C product) - resulting in a ‘quick win’ swap. Data visualisation has been achieved through Dash, a Python application for an interactive dashboard. Time series analysis for each category has been conducted so that BTi has a strong prediction of the demands of each product in the future - helping them prepare for the next quarters’ demand. Time series analysis was done using the Long Term Short Memory (LSTM) and Seasonal Autoregressive Integrated Moving Average (SARIMA) models. The report thoroughly explains the findings and methodology used to go about these approaches - ultimately increasing the efficiency of the warehouse and providing suggestions for better stock management.

Github repository for this project has been made private as directed by management.

Web Visit and Item Sold

DiGiCOR: A Predictive Correlation Model

Developed a model to find a correlation between web visit data and item sold and made it into a predictive model using Python.

Executive Summary

Predicting customer behaviour based on collected data would serve as a valuable tool in helping companies efficiently direct resources (Martínez et al. 2020). Each product sold by DiGiCOR is associated with a web page where the user can purchase the item. DiGiCOR has collected data for each of the products they sell. The data includes the item’s quantity sold as well as its website data. To help DiGiCOR with their marketing strategies, the relationship between website data and quantity sold for each of DiGiCOR’s products were investigated. Two different datasets were prepared from combining the raw data. One has the sum of each feature aggregated by item (referred to as ‘aggregated dataset’), and another has the sum of each feature aggregated by month (referred to as ‘monthly dataset’). Using both datasets, quantity sold was plotted against the following features on Python using Matplotlib: page views, organic searches, new users, number of sessions per user, users and average time on page. Each plot did not display any discernible trend and there seemed to be no significant correlation. The approaches used for modelling were:

Linear Regression
Poisson Regression
Zero-Inflated Poisson Regression

Each model was fitted onto both the aggregated dataset and monthly dataset and failed to be used as any kind of tool to predict the quantity sold of a product. These results have suggested that people may not be using the website as a point of sale. Instead, they may have been using it as a place to do such things like compare prices or inspect product specifications. It has been concluded that the website has been underperforming and DiGiCOR will be putting more focus into increasing website visits.

Applying Neural Networks to Recognize Traffic Signs

A machine learning project that involves classifying a dataset of approximately 3500 traffic sign images using multi-layer neural network and convolutional neural network

Executive Summary

Two machine learning tasks were undertaken. Classifying an image dataset of traffic signs in accordance to their shape, and classifying the images according to their sign type. The supervised machine learning techniques that were used for these tasks were the implementation of neural networks. This project explores how well a neural network can classify images and investigate its limitations and the implications of some hyper-parameter tuning. The development of the model used a typical 80/20 split for training and testing. Using an MLP with dropout to classify traffic signs into shapes was shown to be an acceptable model. However, changing this to a CNN suggested that the predecessor was a less practical approach to classifying images. The complexity of the task did not require much fine tuning of the hyper-parameters, therefore the model that was produced in the end was a simple CNN model. Most hyper-parameters such as number of convolutions and kernel size were values that were very common to use in machine learning. Nonetheless, the model was shown to perform well in predicting traffic signs into their sign type and shape.

Online News Popularity

Evaluating what specific attribute contributes to the popularity of online news articles and to observe how well specific classification algorithms can predict how popular a news article is from the website Mashable.

Executive Summary

The purpose of this study was to investigate and evaluate what specific attribute contributes to the popularity of online news articles and to observe how well specific classification algorithms can predict how popular a news article is from the website Mashable. “Online News Popularity” was selected from an array of other interesting datasets, from the UCI repository . The data had been collected from the website Mashable, in which 1 the public has access to the contents of which makes up the data and does not share any of the original content. From the study, one is able to identify which attributes are of more importance and those that are used as an aid to help enhance the quality of the article. Analysing specific trends that may occur between the number of images and videos with the number of shares, or the topic in which an article is about. It can be concluded that every attribute within an article does play a significant role. For instance, the recommended day in which one should publish an article is between Tuesday to Thursday and should be about the topic world. It is also recommended that the article should not be cluttered with images or videos, as it there to strengthen the article. The choice of lexemes should lean more towards the positive, with the title suggested to be the same. The models that were used to predict the popularity of each article were somewhat accurate. Accuracy was increased as more samples were reserved for training data.

Predictive Modelling Using Linear Discriminant, Decision Trees and Random Forest

Three different statistical learning approaches - linear discriminant analysis (LDA), decision tree and random forest classifier were used to classify mushrooms from the Agarcicus and Lepiota family as either edible or poisonous.

Executive Summary

In this study, three different statistical learning approaches - linear discriminant analysis (LDA), decision tree and random forest classifier were used to classify mushrooms from the Agarcicus and Lepiota family as either edible or poisonous. The statistical software package ‘R’ was used for random forest classification and LDA, and SAS was used to create the decision tree. All three approaches yielded the same results of having the model predict on the testing set and returning a misclassification rate of 0. This suggests that the dataset had classes that were well separated. Learning approaches were assessed in regards to its simplicity, interpretability and robustness. It was concluded that LDA was the simplest model and decision trees were the easiest to interpret. Furthermore the random forest classifier was deemed to be the most robust model since LDA can be sensitive to outliers and decision trees are known to not generalize as well. However, the LDA approach was concluded to be the recommended approach when dealing with these sort of classification tasks due to it having a relatively good balance of simplicity, interpretability and robustness out of the three approaches.

Time Series Forecast on AirMiles

A time series forecast conducted on an R dataset using the quadratic trend model and ARIMA.

Executive Summary

A passenger mile is a unit of measurement that represents one mile travelled by one passenger on an aeroplane. Monitoring passenger miles can serve as good indication of overall plane activity with passengers. Furthermore, it would be useful for airlines, if one could predict if the number passenger miles will increase or decrease in the future years. A dataset obtained from the R dataset repository called ‘AirMiles’ represents a time series of the number of passenger miles done on US commercial airlines from 1937 to 1960. Out of the quadratic trend model, ARIMA (0,1,0), ARIMA (1,1,0) and ARIMA (2,1,0) – the ARIMA (1,1,0) best fits the data. The diagnostics from ARIMA (1,1,0) show that the residuals are white noise, and its parameters are significant in comparison to the other models. The ARIMA (1,1,0) model was then used to forecast passenger miles from the year 1961 to 1970.

Anthony Huynh

Tech Stack

Programming

Frameworks

Statistical Software

Project Management and Version Control tools

Currently Learning

Portfolio

Level 10 Maintenance Email

Burbank RFC Email

Guess My Number

Pig Game

ABC Analysis:

BTi Logistics: Increasing Warehouse Efficiency

Web Visit and Item Sold

DiGiCOR: A Predictive Correlation Model

Applying Neural Networks to Recognize Traffic Signs

Online News Popularity

Predictive Modelling Using Linear Discriminant, Decision Trees and Random Forest

Time Series Forecast on AirMiles