Big Data refers to big data collected by companies from all industries, analyzed in order to extract valuable information.
Before defining Big Data, or big data, it is important to understand what data is . This term defines the quantities, characters or symbols on which operations are carried out by a computer. Data can be stored or transmitted as electrical signals and recorded on mechanical, optical or magnetic media.
The term Big Data refers to large sets of data collected by companies, which can be explored and analyzed in order to generate actionable information or used for Machine Learning projects .
Big Data is often defined by the “3 Vs” that characterize it: the volume and variety of data, and the velocity with which it is generated, collected and processed.
These three characteristics were first identified in 2001 by Doug Laney, analyst at Meta Group Inc. They were then popularized by Gartner following the acquisition of Meta Group in 2005. Today, other characteristics are sometimes attributed to the Big Data like veracity, value and variability.
In companies of all industries, systems for processing and storing Big Data have become indispensable. For good reason, traditional data management tools are not able to store or process such massive sets.
What is Big Data used for?
In all sectors, companies use the Big Data stored in their systems for different purposes. It could be to improve operations, provide better customer service, create personalized marketing campaigns based on consumer preferences, or simply increase revenue.
Thanks to Big Data, companies can enjoy a competitive advantage against their competitors who do not exploit the data. They can make faster and more accurate decisions based directly on the information.
For example, a company can analyze big data to uncover valuable insights into the needs and expectations of its customers . This information can then be leveraged to create new products or targeted marketing campaigns to increase customer loyalty or increase conversion rate. A company that relies completely on data to direct its evolution is called “data-driven”.
Also, Big Data is used in the field of medical research . In particular, it makes it possible to identify disease risk factors, or to make more reliable and precise diagnoses. Medical data also makes it possible to anticipate and monitor possible epidemics.
Big data is used in almost every industry without exception. The energy industry uses it to discover potential drilling areas and monitor their operations or the power grid. Financial services use it to manage risk and analyze real-time market data.
Manufacturers and transportation companies , meanwhile, manage their supply chains and optimize their delivery routes with data. Similarly, governments are harnessing Big Data for crime prevention or for Smart City initiatives .
What are its sources?
Big data can come from a wide variety of sources . Common examples include transaction systems, customer databases, or medical records.
Similarly, the activity of Internet users generates a myriad of data. Click logs, mobile applications, or even social networks capture a lot of information. The Internet of Things is also a source of data thanks to their sensors, whether industrial machines or “general public” connected objects such as wristbands dedicated to sporting activity.
To better understand, here are some concrete examples of Big Data sources. The New York Stock Exchange alone generates about one terabyte of data per day.
It’s huge, but it’s nothing compared to social networks . Thus, Facebook ingests more than 500 terabytes of new data into its databases every day. This data is mainly generated by photo and video uploads, message exchanges and comments left under posts.
In just 30 minutes of flight, a single aircraft engine can generate more than 10 terabytes of data. As you will have understood, Big Data is now flowing in from multiple sources and the data is always more voluminous as technology progresses…
The different types of Big Data
Big Data data comes from a variety of sources, and can therefore take many forms. There are several main categories
When the data can be stored and processed in a fixed and well-defined format, then it is called “structured” data . Thanks to the many advances made in the field of computing, techniques now make it possible to work effectively with this data and to extract all its value
However, even structured data can be problematic due to its massive volume . As the volume of a set now reaches several zettabytes, storage and processing present real challenges.
Data whose format or structure is unknown, on the other hand, is considered “unstructured” data . This type of data presents many challenges in terms of processing and exploitation, beyond their massive volume.
A typical example is a heterogeneous data source containing a combination of text, image, and video files. In the digital and multimedia era, this type of data is increasingly common. Companies therefore have vast amounts of data at their fingertips, but struggle to take advantage of it because of the difficulty of processing this unstructured information…
Finally, “semi-structured” data is halfway between these two categories. It may be for example structured data in terms of format, but not clearly defined within a database.
Before unstructured or semi-structured data can be processed and analyzed, it needs to be prepared and transformed using different types of data mining or data preparation tools.