
Calantha Mohanraj
Welcome to my professional portfolio, where data meets curiosity and creativity. I specialize in transforming raw data into meaningful insights, building data-driven solutions, and developing end-to-end analytics and machine learning projects. This site showcases my academic work, technical projects, and research initiatives that reflect my passion for solving real-world problems through data science

Projects
Netflix Movie-Recommender-System
Built a content-based recommender system using TF-IDF vectorization of movie genres and cosine similarity to suggest top-N similar movies. The system was tested on the MovieLens dataset and provides personalized recommendations such as finding movies similar to Jumanj
Tools Used
TfidfVectorizer
Scikit-learn
cosine_similarity
Preprocessing
Pandas
scikit-learn
XGBoost
Skills Gained
Scoring
Skills Gained
Vectorization
Similarity
Recommendation
Linear Regression Boston Housing Case Study
Predicting Boston housing prices using simple and multiple linear regression. Includes correlation analysis, feature selection, model evaluation (R², RMSE, MAE), and visualizations for interpretability
Tools Used
scikit-learn
Multicollinearity
Statsmodels
VIF Analysis
Skills Gained
Matplotlib
Regression
Interpretation
Visualization
ETL-Automated-Airflow Pipeline for Mouse USV Analysis
Built an automated ETL pipeline with Airflow, MATLAB, and SQL Server to process 300+ ultrasonic vocalization recordings, reducing manual work by 80% and centralizing results in a queryable database. This project builds a fully automated ETL pipeline that processes mouse ultrasonic vocalization (USV) recordings using VocalMat (MATLAB), orchestrated by Apache Airflow and stored in SQL Server Express
Tools Used
Airflow
Docker
SQL
Orchestration
Integration
SSMS
Skills Gained
Automation
Scalability
Sephora Product Value Score
Designed a metric to identify top-performing categories and products by combining key signals — customer engagement (reviews), price impact (revenue per unit), and customer sentiment (ratings) — to guide product, pricing, and marketing decisions
Tools Used
SHAP
Tools Used
Plotly
Explainability
Engineering
Valuation
A/B Testing Case Study
Improving Click-Through Rate via Design Optimization This project showcases a full A/B testing workflow using Python to evaluate the impact of a UI design change on click-through rates (CTR). It includes statistical testing, power analysis, simulation of user behavior, and business interpretation of the results
Tools Used
Inference
Hypothesis
Experimentation
Optimization
Analysis
Skills Gained
Experimentation
Z Test
Modeling
Evaluation
inventory management system
This is a simple console-based inventory management system built using Python and Object-Oriented Programming principles. It allows users to add, remove, update, search, and export product inventory. The program saves data between sessions using a JSON file
Tools Used
Python
Inventory
OOP
Tkinter
Skills Gained
CSV
Management
Reporting
Automation
Work & Research Experience
Data Engineering Intern
Gangliagaurdian Lab, Richardson, TX
May 25 - August 25
Built scalable ETL pipelines with Python and Apache Airflow to automate the processing of 300+ ultrasonic vocalization (.wav) recordings, reducing manual workflow time by 80%.
Designed and maintained a relational database in SQL Server Express, integrating VocalMat Excel outputs containing 10+ acoustic features (e.g., frequency, duration, harmonic structure, noise) into a centralized research repository.
Developed and optimized 5+ reusable data ingestion and transformation scripts, ensuring reproducibility, consistency, and high-quality data across large-scale neuroscience experiments.
Research Assistant
Gangliagaurdian Lab, Richardson, TX
August 25 - Present
Processed and curated 300+ mouse ultrasonic vocalization (.wav) recordings into structured datasets with 10+ acoustic features using Python, VocalMat, and MATLAB.
Built a Golden USV Reference Dataset with manually validated calls to benchmark classifiers and improve detection accuracy.
Developed ML models (SVM, Random Forest, neural networks) to classify USV types and predict social behaviors, uncovering potential autism biomarkers.
Engineered features and applied PCA/t-SNE for call pattern insights, integrating FFT-based pipelines for large-scale behavioral analysis.
SOCIALS
Contact
If you’re interested in collaborating, please provide your information, and I will contact you soon. looking forward to connecting with you.