Data Scientist, Machine Learning Enthusiast, Data Visualization Engineer
Hire Me Download CVHello! I am Naisargi Dave. I am a Data Engineer at Amazon, MS in Data Science, passionate about using Data Science techniques and algorithms to explore, analyze and extract insights from data and then visualizing them with informative, interactive and creative dashboards to help decision making.
I have 3+ years of experience working with data engineering, data science, machine learning and data visualization teams in multinational corporations.

Jun 2021 – Present
Tools : AWS: DataNet, DataCraft, Cradle, Redshift, Andes, S3, Glue
- Worked with economists to build a cascading multi-stage Fisher Index pipeline based on Cradle, SQL to analyze the month over month glance view weighted price inflation trends at Amazon.
- Created Redshift tables to store the staging data and the final result. Received the Pathfinder award for finding solutions that impact and change business processes.
- Built a pipeline that tracks Amazon’s price competitiveness by incorporating newly launched promotions using DataNet Extract and Load Jobs.
- Created Andes and Redshift tables to store the resultant data and backfilled it.
- Generated automated weekly reports using SQL Metric Jobs to highlight the issues in our pricing systems and aid decision making.
- To ensure similar products (e.g. two same tshirts, differing only in color) arebpriced the same, built a DataNet and Redshift based metric pipeline, to report the price consistency of such products at Amazon.
- Analyzing data stored in Andes tables and EDX files to identify redundant, extraneous data to be deprecated to reduce storage costs.
- Projected expense saved - 30%.
- Read nested JSON data from DynamoDB stream using DataCraft and converted it to TSV format by defining complex SDL schema in Cradle to replace unscalable and expensive DynamoDB scans.
- Created a pipeline to nudge vendors to update the price of their products on Amazon marketplace.
- Implemented the logic to identify the products that need price update using SQL in Cradle with output data written to S3 buckets.
- Created a Glue database to store and query the output data and attached schema using Crawler.
- Performed validations and data analysis using Athena.

May 2020 – Nov 2020
Tools : QlikSense, PowerApps, SharePoint, Python
- Designed a database and developed an application using SharePoint and PowerApps with the Quality Assurance team for capturing the changeover details of equipment used in drug manufacturing processes.
- Represented employee details and the acquisition, attrition rates using visualizations in a Qlik Sense Dashboard for the Strategic Operations team to monitor.
- Implemented data masking script using Python to handle sensitive data.
- Created a dashboard for supervisors and employees across multiple departments to track and highlight upcoming and past due requirements to ensure compliance.
- Conducted several training sessions on the dashboard for a large user base within the organization
- Worked on and delivered presentations on several Office365 applications such as Planner, PowerApps, Power Automate, SharePoint, Forms and Teams advanced.

Aug 2017 – Jun 2019
Tools : Python - Numpy, Pandas, Sklearn, NLTK, Matplotlib, Tableau, Hive, HADOOP
- Created a dashboard for the CIO of Reliance to monitor performance of the teams.
- Integrated data from flat files, SQL tables and SAP-HANA, modelled it and used Tableau to visualize the Key Performance Indicators (KPIs).
- Automated the classification of user queries entered in the Grievance Redressal Portal of Reliance by clustering and classifying them using natural language processing techniques, Naive Bayes and Support Vector Machine classifiers.
- Improved the efficiency of the team by approximately 30% by automating the old manual process.
- Developed a scorecard to track the performance of several teams and rank them using 17 KPIs.
- Performed statistical modelling in Python and developed visualizations using Tableau.
- Performed complex data modelling for procurement spend analysis of the organization using Hive on Hadoop - MapReduce framework. The results of the analysis were visualized using Zoomdata.

Feb 2021
Tools : Python - PyTorch, Keras, Flask, Tensorflow, SQLite, CSS, Bootstrap
- Built an application to extract transcripts from video lectures and provide short summaries and deployed it as web application with Flask
- Used a Transformer model with pretrained BERT tokenizer to summarize text transcripts, PyTorch Silero models to extract text from audio files and ffmpeg to extract audio content from videos
- Extended the web application to serve as a platform for professors/teachers to upload videos for students with automatic summarization to aid with remote education
Code Demo Report

Feb 2021
Tools : Python - PyTorch, Keras, Flask, Google Cloud Platform - Cloud Run, Cloud Build, JavaScript, CSS, Bootstrap
- Built a BERT based model with PyTorch and Keras preprocessing libraries to detect the probability that a given news article is fake.
- Deployed the model to an application server built with Flask to serve requests from frontend applications. Also deployed the application server to Google Cloud Run by setting up automatic CI/CD pipelines from Github repository.
- Created a Chrome Extension to serve as a frontend with JavaScript and Bootstrap to allow users to verify the presence of fake news in articles online.
- Used available APIs to suggest alternative sources of information when articles with high probabilities of being fake get accessed.
Code Code (Chrome Extension) Demo Report

Aug 2020 - Dec 2020
Tools : JavaScript - React, d3.js
- Analyzed the details of space missions launched worldwide since 1957.
- Designed and developed interactive visualizations implementing techniques such as highlighting, brushing and filtering using d3.js and React to showcase the analysis.
Report Visualization

Jun 2020 – Aug 2020
Tools : Python - PyTorch, OpenCV, Matplotlib, Pandas
- Detected melanoma among images of benign and malignant skin lesions from the SIIM ISIC Melanoma Challenge Dataset.
- Used EfficientNet for feature extraction, incorporated metadata, performed data augmentation, and image preprocessing to crop out regions of interest from the images. Achieved 80% accuracy.
Code

Jun 2020 – Aug 2020
Tools : Python - PyTorch, OpenCV, Matplotlib, Pandas
- Performed multi-class classification to identify all the types of proteins present in the cell images from Human Protein Classification dataset.
- Performed data pre-processing and data augmentation, used Transfer Learning with pre-trained ResNet50 model to make predictions.
Code

Jan 2020 – May 2020
Tools : Python - NLTK, Pandas, Numpy
- Created a search engine to fetch the most relevant reviews to the given user query using BM25.
- Performed topic modelling using LDA.
- Generated more accurate user ratings based on Vader sentiment analysis of the user reviews.
Code

Jan 2020 – May 2020
Tools : Python - NLTK, Pandas, Numpy
- Performed data mining of Twitter data using twitter API and tweepy library for tweets containing the keywords “Donald Trump” and “Joe Biden”.
- Performed exploratory and sentiment analysis to figure out the general sentiment for each candidate and predict the winner of the 2020 presidential elections.
Code Report

Jan 2020 – May 2020
Tools : Android Studio, Java, SQLite
- Developed an Entity Relationship Model and a Database using SQLite to capture the customer and travel details such as preferred transport, hotel, restaurant and tourist attraction.
- Created an Android application to allow user to plan the trip and view their itinerary.
Code Report

Jan 2020 – May 2020
Tools : Python - NLTK, BeautifulSoup
- Crawled webpages using BeautifulSoup, performed text preprocessing using natural language processing (NLP) techniques.
- Implemented PageRank algorithm to rank the webpages.
Code
Aug 2019 – Dec 2019
Tools : Python - Numpy
- Performed feature extraction and developed a human physical activity recognition model for predicting the activity performed based on the person’s movement data.
- Evaluated and compared several statistical learning methods including Logistic Regression, Random Forest and Support Vector Machine.

December 2020 - Present
Tools : Python - Keras, TensorFlow, Numpy
- Implemented Feed Forward neural networks, CNNs and RNNs with backpropagation, L2 regularization, Dropout regularization using only Numpy without deep learning frameworks
- Implemented Image Classification, Object Detection, Face Detection, Neural Style Transfer
- Built Neural Machine Translation, Trigger Word detection and Text generation models

Aug 2019 - May 2021

Aug 2013 - May 2017