I am an engineering professional with 12 years of experience in performing data analysis, visualization and managing development of data centric applications. I have knowledge of statistical analysis, machine learning, data retrieval and processing techniques. In addition I also have experience in product management, user research, prototyping and wireframing, UX design.
Following are some of the projects I have been working on with links to the reports. Thank you for your interest.
In this project I developed a model, using machine learning algorithms, to identify fraud using Enron e-mail and financial data. Throughout the project I performed data exploration, outlier analysis, feature selection/engineering, training and validation. To develop the model I used Python scikit-learn package to select algorithm and features that provide best performance. My model exceeded expected performance of 0.3 for precision and recall at (0.49 and 0.44).
Technology Used: Python (scikit-learn, GridSearchCV, tf-idf), Machine Learning, NLP
In this project I performed data exploration of red wine data using R programming language. The analysis objective was to use data exploration and charting techniques in R to identify the chemical properties that have influence on wine quality. During the project I used various univariate, bivariate and multivariate analysis using data visualization, regression and correlation. The report was generated using the knitr package.
Technology Used: R (ggplot2, dplyr, tidyr, gridExtra, GGally, knitr)
In this project I analyzed Open Street Map data for Los Angeles area using Python and MongoDB. The map data was downloaded in XML format and Python was used to parse and clean the map data entries after analyzing for validity, accuracy, completeness, consistency and uniformity. The data was then and converted to JSON format and uploaded to MongoDB. This data was then queried to find interesting summary analytics regarding distribution of basic point of interests and popular amenities.
Technology Used: Python, MongoDB, XML, JSON
In this project I analyzed factors influencing New York City subway ridership using NumPy, Pandas and Matplotlib packages in Python to perform data exploration and regression analysis. The project involves combining weather and subway turnstile data to perform the regression analysis and determine what factors influence the ridership.
Technology Used: Python (NumPy, Pandas and Matplotlib)
In this project I used descriptive statistics and a statistical test to analyze the statistical significance of Stroop effect, a classic result of experimental psychology.
Technology Used: Descriptive and Inferential Statistics
© 2016 Ajay Das | Connect with me on LinkedIn Ajay Das