I’m a data analyst with 3+ years of experience transforming raw numbers into clear, impactful business stories. My work ranges from building scalable ETL pipelines to designing intuitive dashboards that guide high-stakes decisions.
I’ve worked extensively with SQL, Python, Power BI, and cloud platforms like AWS, Azure, and GCP. Whether it’s uncovering retention patterns, automating reports, or designing predictive models, I love the challenge of connecting the dots between data and business impact.
Outside of work, I’m a curious learner, an occasional open-source contributor, and someone who enjoys exploring how AI and generative models can shape the future of analytics.
Here are a few technologies I’ve been working with recently:
Aug 2025 – Present
Aug 2025 – Present
Jul 2021 – Aug 2024
Jan 2021 – Jul 2021
Apr 2020 – Jul 2020
May 2019 – Jul 2019
Apr 2018 – Jul 2018
A collection of classic and educational games built with Python and Pygame. Designed for beginners and hobbyists to explore game development fundamentals while coding fun, interactive experiences.
This repository includes a variety of games such as Breakout, Pong, Tetris, and text-based utilities like Compare Documents and Markov text generation. Each game is implemented in Python using Pygame, making it easy to understand, modify, and extend.
A showcase of multiple classic games recreated in Python — from arcade-style challenges to text-based generators.
A full-stack social media application built to replicate the core features of modern platforms. Includes user authentication, post creation, likes, comments, and real-time updates for an engaging interactive experience.
Developed with React, Node.js, Express, and MongoDB, It features JWT-based authentication, protected routes, WebSocket-powered real-time interactions, and a responsive UI. Designed as a learning project to practice Web development.
Full-stack social media platform featuring posts, comments, likes, and real-time updates.
A Python-based application for generating concise summaries from long texts using state-of-the-art NLP models. Designed for both experimentation and production use, it supports flexible configuration and deployment options.
The project offers notebook workflows for rapid prototyping, modular Python code for integration, and Dockerized environments for consistent deployments. Users can adjust summarization parameters via a simple YAML configuration file and run the tool from notebooks, scripts, or containers.
Generates concise summaries of long documents using configurable NLP pipelines.
A complete end-to-end data engineering solution that extracts, processes, and analyzes YouTube data using Python and Apache Airflow. The pipeline automates data extraction via the YouTube Data API v3, cleans and transforms it, stores it locally, and orchestrates the entire workflow with Airflow — all running locally and free of cloud costs.
Designed for both experimentation and production, the project enables configurable parameters through environment variables, robust logging, and comprehensive testing. Data analysis is provided through Jupyter notebooks, revealing insights into channel performance, engagement metrics, and content trends over time.
Automated ETL pipeline for extracting and analyzing YouTube data, orchestrated with Apache Airflow.
This project builds a machine learning pipeline to predict students' Maths scores based on demographic and academic data. It covers data ingestion, preprocessing, model training, and deployment via a Flask web app, offering real-time predictions through a user-friendly interface.
Features include automated data processing, scaling and encoding of features, model evaluation, and an interactive frontend where users input student data to get instant Maths score predictions. The project demonstrates end-to-end ML workflows combining data science with web deployment.
This project demonstrates an end-to-end data warehousing approach using the medallion architecture (Bronze, Silver, Gold layers) to ingest, cleanse, transform, and present business-ready data for analytics and reporting. It emphasizes scalable, structured data pipelines with ETL/ELT processes and data cataloging.
The solution supports building dimension and fact tables optimized for BI tools, delivering actionable insights and enabling robust data-driven decision-making across enterprises.
A modern, responsive recipe book application built with Angular. Users can create, read, update, and delete recipes, browse a clean list view, and search or filter recipes by title or ingredient. The design adapts seamlessly between desktop and mobile.
Features include routing between pages, reactive form handling, localStorage persistence, and a sleek, menu-driven interface powered by Angular Material/Bootstrap.
This repository contains a series of personal web development projects designed to practice and improve front-end development skills. The projects range from layout exercises to interactive applications, focusing on HTML, CSS, JavaScript, and occasional frontend frameworks or libraries.
Each project explores different aspects of frontend development — from responsive design and UI components to DOM manipulation and API integration — building a practical foundation for creating modern, user-friendly websites.
This project presents a comprehensive analytics suite of four interconnected Tableau dashboards. It delves into the vast world of entertainment by analyzing data from Netflix, IMDb's Top 1000 Movies, IMDb TV Shows, and the Oscars. The primary goal is to provide actionable insights for content strategy, production decisions, and audience targeting, offering a multi-dimensional view of the industry's trends, successes, and audience preferences.
Check it out!
This project analyzes and predicts crime in Los Angeles using the LAPD Crime Dataset (2020–2025). It focuses on two main goals: Crime Type Classification into categories like Assault, Burglary, and Other; and Crime Count Forecasting to predict monthly crime volumes for better law enforcement resource allocation.
Built in Python and deployed on AWS SageMaker, the project uses advanced feature engineering, dimensionality reduction (TruncatedSVD), class balancing (SMOTE), and machine learning models like Random Forest, LightGBM, and XGBoost for classification. Time series models, including Linear Regression, Random Forest Regressor, and SARIMAX, were applied to forecast crime trends.
XGBoost achieved ~83% accuracy for classification, outperforming others, while Linear Regression delivered the best forecasting performance, showing that engineered temporal and seasonal features effectively captured underlying patterns.
Classification accuracy comparison between XGBoost, LightGBM, and Random Forest models.
Linear Regression outperforming other models in monthly crime count forecasting.