🎯 Project Objective:
This project aims to develop a robust statistical model to predict neonatal weight using a dataset from three hospitals. The focus is on analyzing various maternal and neonatal variables to assess their influence on newborn weight, with particular attention to the impact of maternal smoking.
🔍 Data Mastery:
- Utilized a comprehensive dataset of 2500 newborns, enhancing understanding of key factors influencing neonatal weight.
- Executed meticulous data cleaning and verification to ensure the accuracy of the analysis.
🌟 Challenging Task Conquered:
- Developed predictive models incorporating multiple variables that significantly affect neonatal outcomes.
- Addressed complex statistical challenges, such as nonlinear relationships and interactions between variables.
💡 Innovative Approaches:
- Employed advanced inferential statistical methods to draw meaningful conclusions about neonatal weight influences.
- Implemented a multiple linear regression model, exploring beyond linear assumptions by investigating potential nonlinear effects and interactions.
📊 Key Dataset Properties:
- The dataset includes variables such as mother's age, number of pregnancies, gestational age, and neonatal physical measurements.
- Captures key categorical data like maternal smoking, type of delivery, hospital ID, and sex of the newborn, providing a rich basis for multivariate analysis.
🔮 Your Impact:
- Significantly advanced the field of neonatal care by identifying critical maternal and neonatal factors that predict weight at birth.
- Enhanced decision-making tools for healthcare professionals, contributing to improved neonatal health strategies.
🔗 GitHub Repository: Dive into the codebase to follow the journey of crafting a sophisticated statistical model in R. Discover how exploratory data analysis, hypothesis testing, and regression modeling come together to predict neonatal weight effectively. See how each analytical step contributes to a comprehensive understanding of the factors impacting newborn health.
-
Data Import:
- Ensured accurate import and handling of the
neonati.csv
dataset into the R environment.
- Ensured accurate import and handling of the
-
Descriptive Analysis:
- Thoroughly described dataset properties, focusing on variables critical to neonatal weight prediction.
-
Exploratory Data Analysis (EDA):
- Utilized statistical indices and visual tools to uncover patterns and insights within the data.
-
Hypothesis Testing:
- Tested hypotheses about differences in neonatal weights and lengths, across various subgroups including gender and hospital type.
-
Multivariate Analysis:
- Developed and refined a multiple linear regression model, using rigorous criteria to select the most effective model.
-
Residual Analysis:
- Performed detailed diagnostics to ensure the model’s reliability, identifying influential cases that could impact predictive performance.
-
Predictive Performance:
- Evaluated the model's accuracy through real-world predictions, such as estimating the weight for a third pregnancy at the 39th week without ultrasound data.
- Model Visualization:
- Created detailed graphical representations to make the statistical model's results accessible and understandable.