Learning Try R

image

 

R is a tool for statistics and data modeling. It is an elegant and versatile programming language and has a highly expressive syntax designed around working with data. R also includes extremely powerful graphics capabilities. If you want to easily manipulate your data and present it in compelling ways, R is the tool for you.

The seven functions of R :

  1. Try R
  2. Vectors
  3. Matrices
  4. Summary Statistics
  5. Factors
  6. Data Frame
  7. Real-World Data

Chapter One (Try R) tells you the basic R expressions with numbers, strings  and True/False values. How to store those values in variables and pass them to functions and how to get help  on functions. For instance, with “Expression” you can type anything at the prompt and R will evaluate it and print the answer, e g  > 1 + 1 gives you [1]  2, likewise if you type the string  “Arr,  matey!” It gives you [1] “Arr, matey!”

The “Logical value” like TRUE or FALSE which can be refered to as “Boolean” values, e g > 3 > 4 gives [1] TRUE and another a double-equals sign to check whether two values are equal, e g > 2 + 2 == 5 [1] FALSE

Variables” with this  you can store values into a variable to access it later, e g  > x <- 42 to store a value in x

Functions” you can call a function by typing its name, followed by one or more arguments to that function in parenthesis, e g > sum(1,  3,  5) gives [1]  9

Help” help (functionname) brings up help for the given function.

Files“,   “R”  commands can also be written in plain text file with a R extension by convention for executing later by calling “list.files” function

Chapter Two (Vectors)  is  list of values that R rely on for many of its operations. Like Sequence which use the start:end notation e g seq(5,  9) which gives [1] 5 6 7 8 9.

Vector Access which retrieves values within a vector by providing its numeric index in square brackets  by using “sentence” variable e g “sentence [3]” after this <- c(‘walk’, ‘the’, ‘plank’) gives [1] “plank”.

Vector Names, you can assign names to vector’s elements  by passing a second vector filled with names to the ‘names’. assignment function, e g  names(ranks) <- c(“first”, “second”, “third”) after > ranks <- 1:3

Plotting One Vector, barplot function draws a bar chart with vector’s values  by typing barplot(vesselsSunk) after > vesselsSunk <-  c(4, 5, 1)

Vector Math” with this , you can add a single value to a vector and the scalar will be added to each values in the vector, returning a new vector with the results, e g typing a + 1 after <- c(1, 2, 3) gives [1] 2 3 4. The same is of division, multiplication, or any other basic arithmetic.

“Scatter Plots” this takes two vector, one for X value and one for Y value and draws a graph of them.

“NA Value” this can be useful when working with sample data, and a given value isn’t available, but it’s not a good idea to just throw those values out. R has a value that explicitly indicates a sample was not available.

Chapter Three (MATRICES)  this is just a fancy term for a 2-dimensional array, it helps when needs data in rows and column. For instance, making a matrix 3 rows high by 4 column wide with all its fields set to 0: type “matrix(0, 3, 4) into the console and you get the result below

there are still other functions of matrices like MATRIX ACCESS and MATRIX PLOTTING.

Chapter Four (SUMMARY STATISTICS) this explain data. It has Mean, Median, and Standard Deviation functions and how they can be displayed on graph.

Chapter Five (FACTORS) this help to group data by  categories. R uses Factor function to track categorized values. For instance

Chapter Six (DATA FRAMES) R uses this to structure data just like in Excel spreadsheet or database table. In this chapter you can use Data frame Access to access individual potions of a data frame, you can load data from external files with Loading Data Frame and with Merging Data Frame, you can join two data frames together, using the content of one or more columns.

Chapter Seven (REAL-WORLD DATA) is about correlating charts, adding some of R  extra functionality such as ggplot2 which gives a better appearance of the charts

Bar and Line Graph ggplot2

R GRAPH

 

To achieve the above ggplot2 graph

I googled http://www.cookbook-r.com/Graphs/Bar_and_line_graphs_(ggplot2)/    Cookbook for R

I opened my R console, opened a new script called R editor by clicking on file on the top-left corner of the R console, then, I went back to the R cookbook

Firstly, I copied some sample data (derived from the tips datasets in the reshape2 package) under the “Bar Graph of  Value”
dat time total_bill
#> 1 Lunch 14.89
#> 2 Dinner 17.23

# Load the ggplot2 package
library(ggplot2)
and pasted into my R editor and then Wright clicked, ran it, which automatically appears in my R console

 

Secondly, I went back to the R cookbook, copied and pasted some variable mappings which consist: (time:x-axis and sometimes color fill and (total_bill : y-axis into my R editor, wrigth clicked and ran it which automatically popped up the graph above in my R graphic device

 

image

 

 

image

 

image

 

In order to achieve all the graphs above, I copied and pasted all data below into my R console and run them one after the other.
# Very basic bar graph
ggplot(data=dat, aes(x=time, y=total_bill)) +
geom_bar(stat=”identity”)

# Map the time of day to different fill colors
ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
geom_bar(stat=”identity”)

## This would have the same result as above
# ggplot(data=dat, aes(x=time, y=total_bill)) +
# geom_bar(aes(fill=time), stat=”identity”)

# Add a black outline
ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
geom_bar(colour=”black”, stat=”identity”)

# No legend, since the information is redundant
ggplot(data=dat, aes(x=time, y=total_bill, fill=time)) +
geom_bar(colour=”black”, stat=”identity”) +
guides(fill=FALSE)

In conclusion, it is a great tool to work with if you can use is properly.

 

 

 

 

 

 

 

 

Big Data Analytics

image

 

“You can’t mange what you don’t measure” (W. Edwards & Perter Drucker)

To my understanding, since the enbracement of  Big data analytics, this world became a better place to live and  businesses became more interesting. Organizations now have a better understanding of their businesses and can quickly react to any situation that might give them setbacks, they can now draw more insight from their big data and complex datasets to predict future customer behaviours, trends and outcomes.

Looking at the definition of Big Data:

“Big Data is high-volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation.” (Gartner, 2014)

Big data is the data that exceeds the processing capacity of conventional database systems.  These high Speed Volume and Varities of  Structured, unstructured and semi-structured data are being generated from data collected from social networks, web logs, traffic flow sensors, satellite imagery, broadcast audio streams, MP3s of rock music , web pages content, scans of Government documents, GPS trails, Telementry from automobiles, Banking transactions, financial market data etc.

Volume : Scale of data, the amount of data, the mass quantities of data that organizations are trying to harness to improve decision-making across the enterprise. Data volumes continue to increase at an unprecedented rate

Velocity : Analysis of Streaming  data, data in motion, the speed at which data is generated, processed and analyzed continues to skyrocket, and combined with the real-time nature of how these data is generated.

Variety : Refers to the  complex multiple data types and forms of different sources, which are structured, semi-structured and unstructured from a complex array of both traditional and non-traditional information sources, from within and outside the enterprise.

Big data falls into two categories ; Analytics and enabling new products.

Big data analytics is the application of advanced analytic techniques to very large, diverse data sets that often include varied data types and streaming data. (TDWI)

These analytics explores the granular details of business operations and customer interactions that seldom find their way into a data warehouse or standard report, with unstructured data coming from sensors, devices, Web applications, and social media which were mostly sourced in real time on a large scale. Using advanced analytics techniques such as predictive analytics, data mining, statistics, and natural language processing, businesses can study big data to understand the current state of the business and track evolving aspects such as customer behavior with tools like Hadoop and MapReduce.

Business analytics enable businesses to identify and visualize trends and patterns in areas, such as customer analysis that can have a profound effect on business performance. They can compare scenarios, anticipate potential threats and opportunities, better plan, budget and forecast resources, balance risks against expected returns and work to meet regulatory requirements.
Big data analytics can reveal hidden patterns such as peer influence by customers, shoppers transactions, social and geographical data.

 

As an enabler of new products and services, by combining a large number of signals of user’s action and those of their friends, Facebook was able to craft a highly personalised user experience and create a new advertising business.

Looking at the story of Amazon which stated as a startup and with the help of big data was able to transform a startup into a massive industry.
Amazon was able to achieve this with the help of big data analytics; by tracking what customers bought, what they show interest in, how they navigate the web and was able to predict what customer is likely to buy next.
“The traditional bookstore had no chance to access this evidently valuable information (MacAfee et al 2012).”

With the help of Big data analytics provided to Dublin City by IBM, the city was able to identify and solve the root causes of traffic in its public bus network, which improved traffic flow and better mobility for commuters.
Integrating data from network of sensors with geospatial data made it easier for the city official to better monitor and manage its fleet of buses in real-time.

The road and traffic department was able to combine Big Data streaming in from an array of sources – bus timetables, inductive-loop traffic detectors, rain gauges and closed-circuit television cameras, GPS updates that each of the city’s 1,000 buses transmits every 20 seconds – and build a digital map of the city overlaid with the real-time positions of Dublin’s buses using stream computing and geospatial data.

With the big data analytics, traffic controllers could now see the current status of the entire bus network at a glance and rapidly spot and drill down into a detailed visualization of areas of the network that are experiencing delay. These insights and the interface allow visualization of the data that gave them an opportunity to identify the cause of the delay as it is emerging and before it moves further downstream. Hence, they were able to accelerate the decision-making process to clear congestion more swiftly. Using advanced analytics on data collected on each bus’ journey in real-time, they were able to quickly hone in on network issues as a result of analyzing Big Data and respond faster.

In conclusion,

Applying this to my current college situation, coming to college has been so easy, in the sense that, I leave my house at the appropriate time, instead of having to stay in the cold weather  waiting for bus that I am not even  sure of its arrival time,  it saves my time and energy and made it possible for me to be punctual, not having too much to worry about is good for someone’s health. Real-time bus time table provided by Dublin bus made life easier  for commuter like me.

As a small business owner, one can now enjoy the derived potentials of big data that was formerly available to statisticians and multinational enterprises only. SMEs now have access to  cost efficient, useful data driven tools and analytical systems to gain meaningful insight on market, competitors and thereby able to measure their business performance and also discover new product and services. Organization can now have greater insight  of their business and are able to predict consumer preferences. Big data can play a significant economic role to the benefit of not only private commerce but also of national economies and their citizens.

References:

Is Big Data too Big for SMEs?
MS&E-238: Leading Trends in Information Technology Stanford University,Summer.2014. Extracted 9:13pm Aug,. 2015.
http://web.stanford.edu/class/msande238/projects/2014/GainIT.pdf
Stephen p. Robbins. David A. DeCenzo. Mary Coulter
Fundamental of Management 8th ed. p 154.
Big data: The next frontier for innovation, competition, and productivity.Extracted 9:25pm 20/8/2015.

IBM news room 2013. Big data helps city of Dublin improve its public bus transportation network and reduce congestion

http://m.ibm.com/http/www-03.ibm.com/press/us/en/pressrelease/41068.wss

 

Business Intelligence (BI)

bi

DATA + INFORMATION + KNOWLEDGE  + DECISION MAKING

Business Intelligence is the collection of information about your customers, your competitors, your business partners, your competitive environment and your own internal operations that gives you the ability to make effective, important and strategic business decisions.

“Business Intelligence” is a term used by hardware and software vendors and information technology consultants to describe the infrastructure for warehousing, integrating, reporting, and analyzing data that comes from the business environment. The foundation infrastructure collects, stores, cleans, and make relevant information available to managers. E.g, Databases, data warehouses, and data mart. “Business Analytics” is a vendor-defined term that focuses more on tools and techniques for analyzing and understanding data. E.g, (OLAP)  online analytical processing, statistics, models, and data mining.

The underlying foundation is a powerful database system that captures all relevant data, which are stored in transactional databases or combined and integrated into an enterprise-data warehouse or series of interrelated data marts to operate the business. The result from BI are delivered to managers through MIS, DSS, and ESS platform. SAP, ORACLE, IBM, SAS,, and MICROSOFT Institute are the five major systems vendors of these software and hardware suites.

According to MIS Quarterly, as a data driven approach, BI&A has its roots in the long-standing database management field, which rely heavily on various data collection, extradition, and analytics technologies (Chaudhuri et al 2011; Turban et al. 2008;  Watson and Wixom 2007).

These technologies and applications are considered as BI&A 1.0 where data are mostly structured, collected by companies through various legacy systems and stored in commercial relational database management systems (RDBMS). Data management and warehousing are considered the foundation of BI&A.

The goals of Business Intelligence & Analytics is to deliver accurate nearly real-time information to decision makers.

Main functionalities of BI systems are:

  1. Production Reports
  2. Parameterized reports
  3. Dashboards/scorecards
  4. Ad hoc query search/report creation
  5. Drill down
  6. Forecasts, scenarios, models
  • 80% are casual users relying on production reports
  • Senior executives- Use monitoring functionalities for firm activities using visual interfaces like dashboards and score cards
  • Middle managers and analysts- Ad-hoc analysis
  • Operational employees -prepackaged reports. E.g Sale forecast, Customer satisfaction, loyalty and attrition, supply chain backlog etc.

Looking at the information availability through big data, business intelligence (BI) as a concept provides a means to obtain crucial information to improve strategic decisions and therefore plays an important role in current decision support systems (Inmon 2005)

According to kimball et al. (2008), the data warehouse industry – as the technological basis of  BI – has reached full maturity and acceptance in the business world.

So many enterprises are making use of BI to gain competitive advantage. For instance, looking at a case study of how a fashion company (Desigual) used Business Intelligence tool through a vendor called (Board) to analysed, monitored, and improved their business performance on varied channels which composed of 200 single-brand stores (retail), 1700 department store concessions, 30 franchises (Franchising) and over 7000 multi-brand clients ( Wholesale) distribution in over 70 countries on 5 continents.

According to (Board) Desigual used a standard fashion data model based on item (Colour/Size), sub-families, families, and collections – data, which are analysed with special attention paid to organic growth, e.g. the evaluation of increased sales (excluding the contribution made by the opening of new stores in the case of the retail channel and by the acquisition of new client stores in the case of the wholesale channel.

With the BI tool, the company was able to carry out an hourly analysis of the footfall-to-sales conversion ratio, accurately evaluate the level of services provided to customers and the performance of sales staff in each individual store.

With the BI tool in use, planning and distribution to sales outlets were analysed, the tool provided  the company with a precise up-to-the-minute overview of the entire logistics flow at any given time.

Customers base segmentation was analysed through Loyalty cards. In this area they were able to divide their customer base into two dimensions: Average spend and frequency of purchase; from this analysis four macro-segments were identified (High frequency, High Spend, High frequency Low Spend; Low frequency, High Spend; Low frequency, Low Spend.

Tableau  software is another example of business intelligence that shows and tells. It shows you the story that’s hidden in your data, then helps you tell it to others in a clear and compelling way.

 

In conclusion,

In today’s competitive corporate world, and if organization wants to stay ahead of its rivals, they need to recognise that the data and information they collect is a key asset. Data visualization tools help users see patterns and relationships  in large amounts of data that would have been difficult to discern if the data were presented as traditional list of text. The ability to utilise and analyze this data provides the business with the intelligence needed to make both strategic and operational decisions. With (BI) an organisation can effectively measure its business strategy and leverage the data to make a quicker and better decision.
References:

Kenneth C & Jane P. Laudon. Management Information System 12th ed. Managing the Digital Firm.

www.board.com/us/case-studies-12/2013-01 – 10- 17-42-54/item/227-Desigual

Business Intelligence and Analytics. From Big Data to Big Impact http://hmchen.shidler.hawaii.edu/Chen_big_data_MISQ_2012.pdf. Accessed August 2015. 02:56 am

INFORMATION SYSTEMS

In this 21Century, I believe  Information Systems is indispensable for any business that wants a perfect structured and well coordinated business model in order to improve operational excellence, competitive advantage, and leverage market share.

Information Systems improve business processes by automating many steps in business processes that were formerly preformed manually. For instance generating an invoice or shipping order.

So, what is an Information System?

Information System can be defined technically as a set of interrelated components that collect (or retrieve), process, store , and distribute information to support decision making and control in an organization. (Laudon & Laudon, 12th ed, p47).

In addition, information systems also help managers and workers to analyse problems, visualise complex subjects, and create new products.

Information systems contain information about significant people, places, and things within the organisation.

Input, Processing, Output and Feedback are the activities of an information system .

Input captures or collects raw data from within the organisation or from its external environment.

Processing converts this raw input into  a meaningful form.

Output  transfers the processed information to the people/activities

Feedback is the output returned to appropriate  organisation’s members to help them evaluate or correct the impute stage.

In order to achieve the benefits of information systems, it must be built with a clear understanding of the organization in which they will be used.  Factors to consider when building information systems are:

  • The environment in which the organisation must function
  • The structure of the organization; hierarchy, specialization, routines, and business processes
  • The organization’s culture and politics
  • The type of organization and its style of leadership
  • The principal interest groups affected by the system and the attitudes of workers who will be using the system
  • The kind of task the , decisions, and business processes that the information system is designed to assist.

TYPES OF INFORMATION SYSTEMS

There are different kinds of information systems, because there are different interests, specialties, and  levels in organizations, as the says goes “No single system can provide all the information an organization needs.

  • Transaction Processing Systems (TPS)
  • Management Information Systems (MIS)
  • Decision Supporting Systems (DSS)
  • Executive Supporting Systems. (ESS)

Transaction Processing  Systems (TPS) are computerised systems that performs and records the daily routine transactions necessary to conduct businesses such as sales order entry, hotel reservations, payroll, employee record keeping, and shipping. It is designed to serve the  Operational Level managers in order to track the activities and transactions of the organisation, such as sales, receipt,   cash deposits, credit decisions, payroll, and the flow of  material into the factory. For instance, the (TPS) for payroll processing keep track of  money paid to employees, time sheet with the employee’s name, social security number, and hours of worked per week.

It also provide information for the other systems and business function, serves the middle managers to monitor the status of internal operations and firm’s relation with the external environment, the data creates reports of interest to management and government agencies.

Management Information Systems (MIS) provides Middle Managers with reports on the organization’s current performance and the information is used to monitor, control the business and  predict future performance.

Decision Support Systems (DSS) this support non-routine decision making. They focuses on problem that are unique and rapidly changing for which procedure for arriving at solution may not be fully predefined. This rely heavily on modelling using mathematical or analytical models to perform what-if or other kinds of analysis. E.g, what happens if we raise product prices by 5%. They are designed for Middle Managers and the Executives.

Executive Support System (ESS) helps senior management (C-level managers) address non-routine decisions requiring judgement, evaluation, and insight, and also to focus on the really important performance information that affect the overall profitability and success of the firm. ESS present graphs and data from many sources through an interface that is easy for senior managers to use. ESS are designed to incorporate data about external events, such as tax laws or competitors. They also draw summarized information from internal MIS and DSS.

In conclusion,

From a business perspective , information systems are part of a series of value-adding activities for acquiring, transforming, and distributing information that managers need  to improve decision making, enhance organisational performance, and ultimately increase firm profitability.

Being able to analyse business processes gives me the understanding of how business actually works in a formal environment where information system is adopted. Information system makes it possible to manage all the information processes about customers, suppliers, employees, invoices, payment, and even your product and services and most helpful to managers in their roles to disseminate information , providing liaisons between organisational levels and allocating resources and also in making better decisions.

References

Kenneth C. Laudon. Jane P. Laudon. Managment Information System 12th ed

 

GOOGLE FUSION TABLES ON IRISH POPULATION

Population of Ireland 2011: Heat map above

Above is the image of a merge heat map of  Population of Republic of Ireland based on the 2011 Census data collected from the Irish Central Statistic Office (CSO).  The counties are represented in five different colours based on their sizes as you can see above. The purple and blue colours represents the highest populated counties and the light blue, yellow and orange represent the lower populated counties.

TO ACHIEVE THIS GOOGLE FUSION HEAT MAP:

In order to get this Irish Population visual Map, I used Google Fusion Tables to glean some mapping information from the Irish Population Census 2011 Data published on the CSO website.

Step One :

Step Two:

I copied and pasted the 2011 Irish population Data into an Excel 2013 Spreadsheet, the Data was scrubbed as there were Provinces data included, and some Counties were also divided into sets, Laois was wrongly spelt of which I did not realise initially until I later found out and as such, I had to open a new spreadsheet and do some sorting so that it would match with data in the kml file; After sorting the Data, it was stored in my document folder. The Irish  KML mapping data was also downloaded and stored  in my folder.

Step Three:

Then I logged in to my Gmail account and opened Google Drive where I selected the Google Fusion Table. The population Data was loaded into Google Fusion Tables and the kml data was uploaded to the Google Fusion Tables as well. I later clicked on the map of geometry to view what the map looks like, but there were no distinguishing features, but we need the colour features for the purpose of this exercise, at the Configure Map area, change feature styles was selected, and under polygons features,  fill colours was selected  and buckets column was clicked, five different colours were assigned to differentiate the sizes of the counties, and range was set and saved, automatic legend button clicked on and saved to show polygon fill legend.

Step Four:

MERGING THE GOOGLE FUSION TABLES

Now that I had the geometry and the Data, from the “File” menu drop-down, “Merge” button was selected which brought the “Merge: Select a table” Dialogue and the lead map kml was clicked and another dialogue table popped up where I had to chose which columns were relevant to my project which was done and merged table created, hence the view above was achieved. But sharing it to the public on my blog, the default settings was changed to public, and the embedded code was copied and pasted into the text editor of my page and updated for preview.

INFORMATION GLEANED FROM THE HEAT MAP

After critically examined the heat-map I began to see what one could glean out of it. Taking the Motorways into consideration, cities like Limerick, Galway, Cork are all connected to Dublin  by large motorways.

It could be suggested that there should be a large motorway from Cork to Athlone, and there could be another large motorway from Donegal to Sligo linking to Athlone, and from Sligo linking to Galway thereby allowing people from Donegal, Athlone, Sligo have a peaceful smooth ride all the way to Dublin.

The heat map could be used for many business strategic analysis, for instance, I was viewing the Dublin area of the heat map thinking if I could suggest another road network for the city centre area, but what was appearing to me were forks and Knifes symbols, glass cup symbols, Graduation hat symbols which indicates either restaurant, Inn, bars, Schools and so many other signs that could be useful to know what is happening in a particular area and as such, one would be able to know what kind of business fits into a particular area, where some other businesses are missing, the strategic location where your business might be best located in terms of competitions, and even the government could use it to trace criminals, which areas needs some better architectural designs and so on and so forth.