Friday, April 12, 2013

The Technical Architecture of Big Data


In the time it took you to read this sentence, NASA gathered approximately 1.73 gigabytes of data from nearly 100 currently active missions. We do this every hour, every day, every year – and the collection rate is growing exponentially. Handling, storing, and managing this data is a massive challenge. 



The whole idea of big data is still relatively new. In the earlier blogs we have already seen what Big Data actually is and today we will be looking at a holistic view of the Big Data System and some of the technical terminology involved in Big Data.


Below are major Big Data Terms, which will help in understanding the above overview - 

Real Time Streams- These are various sources of information on Internet through which raw data is being made available.

Real Time Processing- Real time processing allows the user to sort through the massive amounts of data and produce information for analysis. While processing, data can be sorted and grouped based on algorithms, but it’s important to understand the limitations and constraints without applying human thought evaluation.

Data Visualization- As data is collected, stored and then analyzed, it needs to be presented in a way that it can be understood and digested. Through Data Visualization programs are able to analyze big data and represent it in a visual display for easier consumption and/or to show results.

Real time Structured Databases-Real time structured databases were created to manage volumes of data that do not have a fixed schema.
NoSQL gained popularity as major companies adopted the system due to an huge volume of data, which could not use the traditional RDBMS solutions. 

Batch processing- This is execution of a series of programs ("jobs") on a computer without manual intervention. Hadoop was developed to enable applications to work with thousands of computational independent computers and petabytes of data.

Interactive Analysis- Interactive Analysis tools dramatically reduces the time required for data analysts/scientists to discover, visualize and explore large volumes of diverse data.

Serialization (Structure & Unstructured Data)- Serialization is the process of converting data structure or object state into a format able to be stored. This stage occurs after the data is collected and when it is being processed. 

Cloud Infrastructure- Cloud infrastructure is the necessary infrastructure required supporting the storage and processing of the big data, which has been gathered. The cloud resources (hardware and software) are usually delivered as a service over a network (typically the Internet). 


References 

http://www.pentahobigdata.com/ecosystem/capabilities/instaview

http://www.greenplum.com/

http://open.nasa.gov/

http://www.wikipedia.org/

http://bigdataarchitecture.com/

No comments:

Post a Comment