If this internet buzz word still blurry in your mind or you just heard everyone is murmuring around or even not yet clear-cut the differences. Well, you’re in the right place with no need to worry about it simply because we’re all in the same learning journey.
As the name cites, it simply describes a large volume of data sets flowing in either structured or unstructured way with higher Volume, Velocity and Variety known as the three Vs. However, it’s not about how large or big is this data.
The question is what to do next with the mountainous information available in those colossal data sets? An obvious answer would be analyze it for better insights, a smart decision making process or strategic business moves from another perspective. This sounds great but HOW? Let’s first understand the three Vs:
Volume: here we mean the size of hundreds terabytes or petabytes data that come mostly in unstructured and indistinct way from different sources such as historical transactions of your customers on POS machines, beacons, near field communication (NFC) equipments, radio frequency identification (RFID) tags, your social media pages posts, feeds streamlines, and so forth.
Velocity: how fast data is received, collected and processed matters at this stage. So, it’s a combination of the infrastructure and data management systems. In a plain way without frills, your website or mob-app realtime response to data access for instance, transaction execution time or necessary data updates and analyses.
Variety: aforementioned structured, unstructured or semistructured data type like text, audio, video, image, email to be processed in similar way to a raw material. Well, the complexity starts here since multiple sources are involved in feeding immense information and sizeable data that are not always in consistent pace. However due data balance is necessary while on another hand makes it unreasonably challenging our human brains.
To address those challenges, new researchers and technology providers have added more Vs or factors to be considered in this approach: Value, Veracity, Visualization, Viscosity, and Virality.
Value raises the question if your required data is accessible whenever and wherever needed.
Veracity asks if you deal with information or disinformation meaning true vs false data;
Visualization illustrates not only how artistry is your data? But does it make sense?
Viscosity questions if your data calls for action? Does it help to make a decision?
Virality goes far beyond if your information creates WOW moment and advocates other to make viral over social media sites for example.
Theory appears always rigorous and sometimes illogic but let’s look at it practically from an industry wise. Big data can solve lot of problems for a panoply of industries; healthcare, automotive, banking, mortgage, public sector, utilities, etc; from different facets. It could be engaged in product development, customer experience, machine learning, predictive maintenance, security, and so on.
For instance, Netflix, Amazon & P&G can not only estimates your next purchase but anticipate it by simply building predictive models based on key attributes used on a product or service your brought in the past then match them to new current ones.
To put it in simple words, Amazon gathers mammoth of data whenever you browse their website. So more you frequently visit them, more they know about your personality as an individual then build-up a 360-degree view on who you are?
Hence, it’s enough for them to recommend highly accommodated products by using collaborative filtering technology backed by series of complicated algorithms. That’s the trick of what is called big data-driven marketing.
Similarly, if we look at how Google is reaping billions of dollars by juicing an unfathomable amount of data from all its empire platforms, namely its colossus Search Engine, Youtube, Gmail, Chrome and the smart voice speaker Google Home, we understand how serious is big data.
On the same line of thoughts, this poses lot privacy issues and contradictions on how consumers and people’s data are being used and manipulated by Tech-companies even with the new tougher privacy laws introduction like the European Union’s General Data Protection Regulation (GDPR).
Back on topic, Google uses deep learning approach to build its predictive models not only on its facile search engine but above dissimilar disciplines and industries the same as its astonishing healthcare predictive tool that analyzed 46 billion individual data of 216,000 patients from two hospitals with a purpose to reduce mortality, unexpected readmission and long-stay inpatients.
Nevertheless, big data is the hard core of Google’s business model along with other jumbo tech-companies ruling the data sphere globally.
Now, let’s inspect another horizon with Terra Seismic, the earthquake forecasting company that uses its satellite big data technology to forecast tsunamis and powerful aftershocks with 90% accuracy anywhere in the world. HOW? They’re merely monitoring live-streaming data from satellite images added to atmospheric sensors blended with historical data from previous recorded shocks.
Then voila! An amalgam of data with again predictive models having plenty of atmospheric generated patterns of energy or clouds movement in the past can enable Terra Seismic’s technology prevent lot of catastrophes and calamities.
Sit still for more examples and cases to understand how governments like the United Kingdom used well the big data goldmine which is literally linked to the Internet of Things or IoT to improve the Public Transport for London. Everyone knows, the Big Smoke is one of the busiest cities in the world with more than 8.6 million inhabitants connote always with public transportation complain by any British citizen.
However Transport for London or TfL which runs all public network of roads, cycle paths, ferries, buses and taxis succeeded in sharply using all its big data generated by the roads sensors, ticketing systems, and even social media listening further to surveys to many more. TfL has successfully implemented the velocity concept that requires a sophisticated infrastructure with an efficient data management system long time back besides an incessant betterment.
The gains were enormous from various stands be it disrupted schedule management in how to fix a breakdown of a bus in the city center or handle an unexpected delay on some routes; personalized and accommodated announcements or break-news to specific people in particular routes or stations for instance helped not only residents of London but tourists and foreign visitors in living a unique experience travel journey.
Against this background, this fuzz internet word big data is not something new at all!
In the mining industry for a matter of fact, Japanese engineers in Hitachi helped a lot US mining companies with sensors arrays that gather tremendous information from gigantic trucks, large excavators, big shovels and other heavy machinery for more than two decades now. It seems very logic in a remote environment field that characterizes the mining business. Thus the new trend would be from connectivity speed aspect or data processing systems and technologies.
Through the above examples, we can initially conclude that analytics with their astute algorithm scripts are the backbone of big data game. However, what kind of analytics? Analytics have been always and still an arguable moot point whenever touched on. Are we talking descriptive, predictive, or prescriptive analytics? I still remember the day our professor of statistics raised this question during a class two-years back in our business school.
Descriptive analytics is the analysis of historical data to understand what happened by reading averages, deviations, correlations and regressions. At this stage it doesn’t require any coding background to do it. This work can be performed simply on an Excel worksheet or a sophisticated software like Tableau or QlikView.
At the end you’re going to have a tabular or a graphical illustration with all required descriptive metrics that analyze the past trends. From a small business stand, a sales revenue report of last three business quarters can carry out such analytics. Google analytics for instance provides lot of descriptive data of your website to gain so much insights.
Going step ahead, predictive analytics predict future trends with a perfect accuracy that goes to one-hundred percent. Again, statistical algorithms with machine learning techniques are the backbone of this method behind all those futuristic analytics. On the contrary of descriptive analytics, predictive analytics use sentiment analysis to study for example an emotional tendency in a text with metrics at the end showing negative, positive or neutral sentiments.
Similarly, these analytics cannot be performed on a plain Excel sheet but on highly specialized softwares using neural networks and decision trees to anticipate behaviors or events. So, complex functions are compiled like graph analysis, market basket analysis and tokenization in the case of data security. Similarly, lot of common uses are there such as detection of fraud, reducing risk, improving operations, behavioral customer experience, etc.
Last not least, prescriptive analytics is the next level far off what descriptive and predictive analytics can do. This is new field in the data science discipline with a potential to show a decision maker of a corporation available viable solutions for a particular problem with its impact trend in the future. Self-autonomous vehicles namely Tesla cars are using and developing such technologies with extremely accurate prescriptive analytics.