Arfa Khalid
Data Scientist
Data Scientist
"Where there is Uncertainty, there is Statistics."
Arfa Khalid is a data professional with over four years of experience in business analytics, specializing in transforming data into strategies that drive business growth. She is adept at predictive analytics, reporting, and problem-solving in high-precision environments. Her career spans the Research, Technology, Finance, and Retail sectors, where she has used machine learning and predictive modeling to enhance decision-making, improving project efficiency by 20% and reducing turnaround time by 15%.
With expertise in Python, R, and SQL, Arfa excels in data analysis, model development, and effectively communicating insights to bridge technical and non-technical audiences. She is passionate about continuous learning and stays updated with the latest trends and tools in her field. Her journey in statistics began unexpectedly during her undergraduate studies, leading her to pursue a master's degree abroad, which has enriched both her personal and professional development.
Beyond her career, Arfa enjoys painting, playing the piano, hiking, and learning new languages, including French. She is open to connecting with like-minded professionals who share her commitment to data excellence and problem-solving.
Welcome to my data science portfolio, where I embark on a journey through captivating projects that showcase my prowess in unravelling insights from intricate datasets. Beyond mere exercises, these projects are tales of real-world problem-solving, driven by a passion for deciphering the narratives hidden within the numbers, employing cutting-edge machine-learning techniques.
I derive joy from transforming data into actionable outcomes, be it forecasting future trends, unveiling hidden patterns, or implementing innovative algorithms for task automation.
Let my work be the voice: from predicting the future to uncovering hidden patterns, witness the power of pairing data science with practical applications. Explore my portfolio, and let the magic of data unfold before you.
Explored NASA's 30 Years datasets for climate change analysis, employing the Seaborn library in Python to forecast temperature trends in the coming years.
View ProjectIn this project, I leveraged my skills in data analysis. I utilized the Python programming language, particularly the Seaborn library, to conduct an in-depth analysis of NASA's 30 Years datasets for climate change. By applying data visualization and predictive modeling techniques, I aimed to understand and predict temperature trends in the upcoming years.
Implemented a deep learning model using Keras for Natural Language Processing. The model achieved an accuracy of 90% on custom text inputs for sentiment analysis on IDMB Movies Reviews.
View ProjectLeveraging deep learning with TensorFlow and Keras, I built a model that predicts the sentiment (positive or negative) of movie reviews from the IMDB dataset, achieving an impressive 86.97% accuracy on unseen data.
Designed and developed a Multinomial Logistic Regression Model using pandas, scikit-learn, and Matplotlib. Implemented features like vaccination status, age groups, and Covid-19 cases.
View ProjectDeveloped a multinomial logistic regression model, the project predicts an individual's vaccination status (Fully Vaccinated, Partially Vaccinated, Unvaccinated) based on age group and the number of Covid-19 cases. This allows for a data-driven assessment of vaccine effectiveness against case rates across various age demographics.This project demonstrates my proficiency in building and evaluating machine learning models for real-world impact.
In this project, I worked on US major airline datasets. I successfully applied a classification model to predict flight arrival delays, providing insights into model performance and areas for improvement.
View ProjectThis project tackled the challenge of predicting on-time flight arrivals using a classification model. Harnessing a major U.S. airline dataset, I built a robust pipeline that cleans, manipulates, and prepares data for optimal model training. Employing a Random Forest classifier from scikit-learn, I achieved an impressive 86.43% accuracy in predicting on-time arrivals.
Analyzed New York City's collision data to assess hospital response, revealing patterns through interactive maps. Proposed strategic locations for new hospitals to minimize distant collision occurrences, emphasizing the proactive role of spatial analysis in optimizing emergency medical services.
View ProjectLeveraged spatial data analysis and visualization to pinpoint high-collision areas, assess hospital coverage, and propose strategic hospital locations for improved emergency response.
This project implemented database triggers in PostgreSQL to automatically track and record changes in an employee database.
View ProjectThis project showcases my expertise in database management and trigger design, delivering a robust and automated solution for employee data auditing with significant benefits for data integrity, transparency, and efficiency.