I am a detail-oriented developer with expertise in Python, R, SQL, and JavaScript, specializing in Machine/Deep Learning, full-stack development, and cloud technologies. Proven track record of building scalable, high-performance solutions using AWS, Azure, and GCP, and leveraging AI-driven tools for actionable insights.
Experienced in designing and implementing scalable web applications with strong proficiency in JavaScript, React, Node.js, Java, and SQL. Skilled at streamlining deployment processes through CI/CD pipelines, enhancing user experiences through robust testing, and optimizing cloud-based applications for performance and reliability.
Toronto, ON, Canada
- Developed ETL pipelines in Python & SQL to clean large textual survey datasets, improving operational efficiency by 60%.
- Utilized Python and GPT-4 embedding models to encode textual responses, enabling deeper investigation into human goal-setting behaviors.
- Enhanced data quality through imputation using KNN, linear regression, and logistic regression models.
- Created data visualizations with ggplot2 and Matplotlib to present complex analysis results.
- Fine-tuned transformer models (BERT) with PyTorch to label qualitative data with 82% accuracy.
Toronto, ON, Canada
- Developed interactive web-based experiment paradigms with JavaScript & NeuroBS for 3 language studies.
- Analyzed neuroimaging datasets via pandas and scikit-learn, identifying trends between structural damage and language impairments.
- Created data preprocessing pipelines using Python, Bash, and C++, reducing manual workload by 50%.
- Streamlined healthcare administration processes by managing MySQL databases and automating data processing tasks, reducing admin work by 30%.
-
MS, Data Science, University of British Columbia
Expected Graduation: June 2025 -
HBSc, Computer Science & Neuroscience, University of Toronto
Graduation: June 2024
GOGO | MongoDB, Express, ReactJS, NodeJS (May 2023 - Aug 2023)
- Built a full-stack web app (in a team of 5) that allows strangers to connect and securely chat together.
- Coded frontend using ReactJS, backend with Express (REST APIs), & MongoDB for Data Storage.
- Used Socket.IO to implement a real-time encrypted chat feature.
- Set up a CI/CD pipeline via GitHub Actions for backend deployment on AWS EC2.
- Developed a PM2.5 smoke particle visualization dashboard, enabling users to monitor nearby air quality in real time.
- Visualized real-time and longitudinal air quality trends with React, Leaflet, and D3.js.
- Migrated backend to ExpressJS and database to SQLite for improved flexibility and scalability.
- Deployed the lightweight backend to Google Cloud Functions, ensuring cost efficiency and scalability.
SOFOS | Python, NodeJS, NextJS (Nov 2024 - Dec 2024)
- Developed Sofos, a full-stack web app enhancing discussions on Canvas by providing personalized recommendations and complexity assessments for user replies, fostering deeper peer engagement.
- Implemented a microservices architecture using Node.js for discussion services and Python FastAPI for recommendation and complexity analysis, deployed on Render.com and Vercel.
- Utilized TF-IDF for content recommendation and a custom complexity assessment model, integrating seamlessly with Canvas API for real-time data retrieval and feedback.
- Built a machine learning pipeline predicting wine quality using Random Forest Classifier and hyperparameter tuning on 11 physiochemical features.
- Achieved high test accuracy with precision-recall analysis, showing minimal misclassifications (AP = 0.99).
- Automated ETL pipelines using Makefiles, Docker, and Conda; evaluated model performance with F1, ROC, AP curves, and cross-validation for robustness.
NYC Airbnb Regression Project (Dec 2024 - Jan 2025)
- Constructed an end-to-end regression pipeline for NYC Airbnb 2019 with Quarto, Makefiles, and Conda-lock, ensuring reproducible data analysis.
- Engineered sentiment features and optimized multiple models (Ridge, Random Forest, LightGBM, Elastic Net) to predict monthly reviews, reaching 0.693 R² on the test set.
- Leveraged SHAP and permutation importance for interpretability, revealing key factors driving Airbnb listing popularity.
- Developed a predictive model for the Jane Street Kaggle competition, utilizing 88 anonymized features to forecast a continuous target variable.
- Implemented LightGBM with DART for parallelized training, enhancing computational efficiency and model performance.
- Employed Altair for data visualization and Recursive Feature Elimination with Cross-Validation (RFECV) for feature selection.
MEDLIFE App | Swift, Firebase (Dec 2022 - Apr 2023)
- Built an iOS app for student organizations to track member task progress, publish events, and monitor ticket sales.
- Developed the user interface using SwiftUI and integrated backend functionality with Firebase Realtime Database and Firebase Storage.
- Followed MVVM design pattern and AGILE methodologies to ensure scalable and maintainable code.
- Email: [email protected]
- LinkedIn: www.linkedin.com/in/farhanbinfaisa
- GitHub: https://github.com/Farhan-Faisal
- Portfolio: https://dev-portfolio-farhan.vercel.app