Data Scientist | Python | SQL | Machine Learning | Excel
I am a motivated and detail-oriented Data Scientist with a background in physics and natural sciences. I transitioned into data science to combine my analytical thinking with technology to solve real-world problems. I enjoy uncovering insights from data, building predictive models, and continuously learning new tools and techniques.
09/2025 to 09/2025
Completed a certified course in Microsoft Excel, developing strong skills in creating and managing spreadsheets, using formulas and functions, organising and visualising data, and improving productivity with Microsoft 365 tools.
07/2024 to 03/2025
Successfully completed a professional certification program in data science, gaining skills in data analysis, visualization, machine learning, and working with tools such as Python, SQL, and data analysis libraries.
02/2024 to 06/2024
Successfully completed a professional certification program in AI development, gaining hands-on experience in machine learning, neural networks, and AI model deployment.
03/2024 to 06/2024
Successfully completed a course covering the Python programming, including variables, data structures, loops, functions, and control flow. Gained foundational skills in writing clean and efficient Python code.
03/2020 to 06/2021
Completed the first year of a Natural Sciences program, focusing on interdisciplinary studies in technology, mathematics, and science.
09/2002 to 03/2005
Studied physics at the undergraduate level before discontinuing to move to England. Gained a strong foundation in theoretical and applied physics.
This project analyzes company sales performance across multiple countries, products, and customer segments using Microsoft Excel.
The goal was to clean, analyze, and visualize sales data to identify revenue trends, profit margins, and top-performing products while comparing actual performance against predefined targets.
This file is 55.3 MB and may take a moment to download.
In this project, I built a machine learning model to predict whether a customer is likely to default on a loan using the German Credit dataset (UCI repository). I applied data preprocessing techniques such as one-hot encoding and scaling, then trained a balanced Random Forest classifier. The model achieved an accuracy of ~73% and an AUC score of ~0.79. I also analyzed feature importance and visualized the ROC curve to evaluate the model's performance. Tools used: Python, pandas, scikit-learn, seaborn, matplotlib
Analysed user review text data from Spotify by cleaning the text, separating positive and negative sentiment, and identifying the most frequent words. Utilised Python (Pandas, NLTK, WordCloud) for sentiment analysis and created custom visualisations using a Spotify logo mask.