Alchemy lab by Quy Vu

This is where crazy ideas are experimented. Lots of them didn't work, but some turned out to be interesting!

About me:

Machine learning, deep learning with Python, Matlab
Extract data with SQL
Cloud computing in the Azure ecosystem
Present result with PowerBI, Data engineering with PySpark.
Version control with Git

Currently part of the Data and Analytics team at Mott MacDonald in London. Previously worked at PwC and KPMG in Vietnam as consultant. MSc in data science at City, University of London.

Play Dota 2 and basketball in my free time. Worship dogs.

Computer vision

Face and character detection and recognition

Processed group and individual photos, then extracted features for 3 ML algorithms and
Performed transfer learning (VGG-Net) to achive 99.25% accuracy.
Wrote a program to detect and recognise faces in pictures and video.
Also detect and recognise any number shown in the picture

Report

Optical character recognition

Highlighted the main difference of CNN and MLP and compare their performances in OCR task.
Messed up traning data labels to see how can the network learn - THEY CAN!!! Turned out to be one of the active research area!

Report

Reinforcement learning

Trained an agent to solve the Cliff Walking problem using Q-learning and SARSA
Experimented different learning parameters such as exploration factor (epsilon), decay factor (lambda), learning policy, learning rate, discount factor.
Concluded that the agent trained with only Q-learning is quite dumb ...

Report | Repo

Imbalance learning

Credit card fraud detection

Used PCA to identify important predictors.
My first machine learning project at City University.
As the main goal was to to understand more about the data, I selected logistic regression and decision tree. Obiviously could get better performance with more sophisticated models.
Earned me a final interview at GoldmanSachs

Notebook | Repo

Big data

Natural Language Processing

Spam detection with Logistic Regression, Naive Bayes, Support Vector Machines + PySpark

Experimented 3 ML algorithms to classify spam messages
The bigger the hash vector, the better prediction, since differents words are less likely to be assigned to the same position. However, there was no improvement as the vector size exceeds 3,000: The topic may not be diversed.
Normalizing samples to unit L1 or L2 norm limits SVM's accuracy to ~84%. Not normalising boosted SVM's accuracy to 90%. This can be regarded as an alternative to tunning SVM's kernel function.

Notebook

Sentiment classification of Amazon Reviews with PySpark

Used Logistic Regression and Naive Bayes to classify 4 millions Amazon Reviews.
Set up a PySpark pipeline to process data.
TF-IDF gave similar result to word2vec, but took less time to run.
TF-IDF length of 10,000 gave AUC score of 0.92, how about 100,000? No better. It turns out that 10,000 is enough for one single topic (which is product review)

Notebook | Repo | Poster

Under construction ...

Contact

Email: quy.vu@city.ac.uk!

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
City		City
Getting away from boredom		Getting away from boredom
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Alchemy lab by Quy Vu

Computer vision

Face and character detection and recognition

Optical character recognition

Reinforcement learning

Imbalance learning

Credit card fraud detection

Big data

Natural Language Processing

Spam detection with Logistic Regression, Naive Bayes, Support Vector Machines + PySpark

Sentiment classification of Amazon Reviews with PySpark

Contact

About

Releases

Packages

Languages

TommyNeeld/Alchemy

Folders and files

Latest commit

History

Repository files navigation

Alchemy lab by Quy Vu

Computer vision

Face and character detection and recognition

Optical character recognition

Reinforcement learning

Imbalance learning

Credit card fraud detection

Big data

Natural Language Processing

Spam detection with Logistic Regression, Naive Bayes, Support Vector Machines + PySpark

Sentiment classification of Amazon Reviews with PySpark

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages