I was given a list of over 2000 company names and asked if it was possible to find which of those we have as accounts in our Dynamics CRM. There were no unique identifiers to directly match the companies apart from their names which could also potentially be spelt differently.
Data Programming Tools
For this project R will be used as the core programming tool using R Studio as the IDE. R is a statistical programing language based on the S programming language. R was initially released in 1995 by Ross Ihaka and Robert Gentleman at the University of Auckland New Zealand. According to (Piatetsky, 2015) R is the most widely used tool for predictive modelling with 38% share of users compared to the next competitor RapidMinder with only 31% in 2014. R is used in a variety of areas such as for data mining and data analysis. R has a wide of range of tools available that can be used to create this solution.
The objective of this assignment is to analyse a dataset concerning bike rentals. The dataset is based on the real data from Capital Bikeshare company that maintains a bike rental network in Washington DC. The dataset has one row for each hour of each day in 2011 and 2012, for a total of 17,379 rows. It contains features of the day (workday, holiday) as well as weather parameters such as temperature and humidity. The range of hourly bike rentals is from 1 to 977. The bike usage is stored in the field ‘cnt’. Our task is to develop a prediction model for the number of bike rentals such that Capital Bikeshare can predict the bike usage in advance