Hi, I'm John Ruoro

Data Scientist | Machine Learning Engineer

I am a Junior Data Scientist who is results-oriented and possesses strong critical thinking and problem solving skills. I am equipped with technical skills necessary to extract data, build predictive models, and find solutions for business problems.

Contact Me

About Me

My Introduction

I have experience working with large volumes of data to build and deploy machine learning models. I am proficient in data extraction, data mining, data visualization, and predictive modeling using scripting languages such as Python and SQL. I have a strong background in statistics with extensive skills in mathematics and statistical algorithms. I enjoy challenges, and I am always looking for opportunities to learn something new.


My Skills

Data Scientist | Machine Learning Engineer

I am skilled in:

Machine Learning








Scikit Learn




MS Excel


Microsoft Azure



My personal journey
2018 - 2022


Jomo Kenyatta University

Grade: Second Class Honors Lower Division

- Pursued a Bachelor's degree in Microbiology. I acquired skills in various statistical areas such as data analysis, probability and hypothesis testing as well as time series analysis.(Epidemiology and Biostatistics)

2013 - 2016

Kenya Certificate of Secondary Education

The Kenya High School

Grade: B -

- I excelled in subjects such as Mathematics and Physics, attaining a grade A in each in my KCSE examination. I was the head of the Junior Achievement (JA) club whose main goal was to prepare students to succeed in the global economy.

2022 - Present

Junior Data Scientist

Data Glacier


Role Description
- Extracting data, building predictive models, and finding solutions for business problems as part of the data science framework.
- Working with data in various ICL projects to build and deploy machine learning models.
- Performing data extraction, data mining, data visualisation, and predictive modelling using scripting languages such as Python and SQL as required in various ICL projects.



Credit Card Fraud Detection Model

The project is a classification problem on credit card fraud detection. The data I used was preprocessed and PCA transformed. I conducted data mining and created visualizations using libraries such as matplotlib and seaborn to understand patterns in the data. I performed data balancing using simple oversampling and undersampling techniques and iterated various classification models over the balanced data . I then performed anomaly detection by using isolation forest to identify anomalous data points. The best performing model was the Random Forest model with an F1 score of 0.74.

View on GitHub

Bank Churn Model

The project is on bank churn prediction where I created a model to predict whether a bank customer will churn. I used visualizations such as countplots, pie charts and scatterplots to analyze the data. I also conducted data mining by checking correlations within the data. I encoded categorical features and standardized the data to put the variables on the same scale. I then balanced the data and fit it into a logistic regression model. The ROC AUC score improved after balancing the data from the performance of the base model by 0.11.

View on GitHub

Disaster Tweets Model

This is a Natural Language Processing(NLP) project where the goal was to create a model to predict tweets about real disasters and those that are not. I started with exploratory analysis where I checked the distribution of the text data and plotted visualizations to understand the underlying structure of the data. I then conducted text preprocessing techniques such as lower-casing and the removal of punctuation and tags. I used TfidfVectorizer to transform the text data and fit it on a logistic regression model that attained a ROC AUC score of 0.64.

View on GitHub

Contact Me

Get in touch

Call Me





Nairobi - Kenya
Send Message