Calantha Mohanraj

Welcome to my professional portfolio, where data meets curiosity and creativity. I specialize in transforming raw data into meaningful insights, building data-driven solutions, and developing end-to-end analytics and machine learning projects. This site showcases my academic work, technical projects, and research initiatives that reflect my passion for solving real-world problems through data science

Projects

Netflix Movie-Recommender-System

Built a content-based recommender system using TF-IDF vectorization of movie genres and cosine similarity to suggest top-N similar movies. The system was tested on the MovieLens dataset and provides personalized recommendations such as finding movies similar to Jumanj

Tools Used

TfidfVectorizer

Scikit-learn

cosine_similarity

Preprocessing

Pandas

scikit-learn

XGBoost

Skills Gained

Scoring

Skills Gained

Vectorization

Similarity

Recommendation

View On Github

Linear Regression Boston Housing Case Study

Predicting Boston housing prices using simple and multiple linear regression. Includes correlation analysis, feature selection, model evaluation (R², RMSE, MAE), and visualizations for interpretability

Tools Used

scikit-learn

Multicollinearity

Statsmodels

VIF Analysis

Skills Gained

Matplotlib

Regression

Interpretation

Visualization

View On Github

ETL-Automated-Airflow Pipeline for Mouse USV Analysis

Built an automated ETL pipeline with Airflow, MATLAB, and SQL Server to process 300+ ultrasonic vocalization recordings, reducing manual work by 80% and centralizing results in a queryable database. This project builds a fully automated ETL pipeline that processes mouse ultrasonic vocalization (USV) recordings using VocalMat (MATLAB), orchestrated by Apache Airflow and stored in SQL Server Express

Tools Used

Airflow

Docker

SQL

Orchestration

Integration

SSMS

Skills Gained

Automation

Scalability

View On Github

Sephora Product Value Score

Designed a metric to identify top-performing categories and products by combining key signals — customer engagement (reviews), price impact (revenue per unit), and customer sentiment (ratings) — to guide product, pricing, and marketing decisions

Tools Used

SHAP

Tools Used

Plotly

Explainability

Engineering

Valuation

View On Github

A/B Testing Case Study

Improving Click-Through Rate via Design Optimization This project showcases a full A/B testing workflow using Python to evaluate the impact of a UI design change on click-through rates (CTR). It includes statistical testing, power analysis, simulation of user behavior, and business interpretation of the results

Tools Used

Inference

Hypothesis

Experimentation

Optimization

Analysis

Skills Gained

Experimentation

Z Test

Modeling

Evaluation

View On Github

inventory management system

This is a simple console-based inventory management system built using Python and Object-Oriented Programming principles. It allows users to add, remove, update, search, and export product inventory. The program saves data between sessions using a JSON file

Tools Used

Python

Inventory

OOP

Tkinter

Skills Gained

CSV

Management

Reporting

Automation

Learn more
View On Github

Work & Research Experience

Data Engineering Intern

Gangliagaurdian Lab, Richardson, TX

May 25 - August 25

  • Built scalable ETL pipelines with Python and Apache Airflow to automate the processing of 300+ ultrasonic vocalization (.wav) recordings, reducing manual workflow time by 80%.

  • Designed and maintained a relational database in SQL Server Express, integrating VocalMat Excel outputs containing 10+ acoustic features (e.g., frequency, duration, harmonic structure, noise) into a centralized research repository.

  • Developed and optimized 5+ reusable data ingestion and transformation scripts, ensuring reproducibility, consistency, and high-quality data across large-scale neuroscience experiments.

Research Assistant

Gangliagaurdian Lab, Richardson, TX

August 25 - Present

  • Processed and curated 300+ mouse ultrasonic vocalization (.wav) recordings into structured datasets with 10+ acoustic features using Python, VocalMat, and MATLAB.

  1. Built a Golden USV Reference Dataset with manually validated calls to benchmark classifiers and improve detection accuracy.

  2. Developed ML models (SVM, Random Forest, neural networks) to classify USV types and predict social behaviors, uncovering potential autism biomarkers.

  3. Engineered features and applied PCA/t-SNE for call pattern insights, integrating FFT-based pipelines for large-scale behavioral analysis.

SOCIALS

Contact

If you’re interested in collaborating, please provide your information, and I will contact you soon. looking forward to connecting with you.