What is Data?
Factual information (as measurements or statistics) used as a basis for reasoning,discussion, or calculation.
Information output by a sensing device or organ that includes both useful and irrelevant or redundant information must be processed to be meaningful.
Information in numerical form that can be digitally transmitted or processed.
Simply we can say, data is Factual Information or Data is raw facts before it has been processed. Data is our every day life, we deal with it every day basis.
What is Big Data?
Big data is high volume, high velocity and high-variety information assets that demand cost effective, innovative forms of information processing for enhanced insight and decision making-Gartner.
Big data refers to large data sets that are challenging to store, search, share, visualize and analyze.
Big data is data that is going to give a certain contextual information for us to make a decision but it is not trivial in size.
Big data — Big data technology is a new generation of technology and architectures designed to extract value economically from very large volume of a wide variety of data by enabling high velocity capture, discovery and analysis.
Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast and does not fit the structures of existing data base architecture.
It is enormous amount of data .
When we going to deal with enourmous amount of data , these are the two problems that we need to sort first.
So we going to deal with problem by problem here.
Data Storage-needs to increase
Data Processing-needs to Have more information
So big data is a problem domain. More data usually beats better algorithms
Good news is that Big Data is here
Bad news is that we are struggling to store and analyse it.
Data Sources of Big Data
There are majorly 4 sources of data for big data:
1) Social Media- social media is one of the key platform that is trying to pull information from us by providing such key environment for the user to put information.
We have popular social media websites, everybody knows-Facebook, Twitter, Instagrams, Pintrace etc. everything is there and we push information to it.
The Model of Generating/Consuming Data has Changed
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming data
Attributes of Big Data
Volume(size)- Size, is one of the criteria to term data as big data but size itself is a subjective criteria which depends upon infrastructure that u choose to handle the data to process and compute.
Variety(Complexity)- the data type in other words or called formats , variety , variety of data is important.
Relational Data (Tables/Transaction/Legacy Data)
Text Data (Web)
Semi-structured Data (XML)
Social Network, Semantic Web (RDF), …
You can only scan the data once
A single application can be generating/collecting many types of data
Big Public Data (online, weather, finance, etc)
Data is begin generated fast and need to be processed fast
Online Data Analytics
Late decisions è missing opportunities
E-Promotions: Based on your current location, your purchase history, what you like è send promotions right now for store next to you
Healthcare monitoring: sensors monitoring your activities and body è any abnormal measurements require immediate reaction
Trustworthiness of the data
Data in Doubt