Readiness to deal with growing size, deep insights, high availability, scalability and transactional changes are the primary things any enterprise expects from data management infrastructure they have invested in. When we talk about data management and processing today, we simply cannot follow the approach “One Size fits all”. Organizations need to evaluate the applicability and relevance to opt right kind of data storage and management technologies. Primarily, the world of data management and analytics is revolving around RDBMS, New SQL, Big Data, NoSQL & Machine Learning.
RDBMS like Microsoft SQL Server, Oracle, MySQL, Postgres and IBM DB2 have repute of over 3 decades and have practically ruled the data management world so far. Support for ACID principles (Atomicity, Consistency, Integrity, and Durability) with a mature stack of development and administration tools RDBMS are still widely used for transactional based use cases. However, with everything going on Cloud and introduction of automatic database like Oracle 18c that tunes, provisions, encrypts, backs up, updates, and patches itself without DBAs has reduced the maintenance effort and let to shifting of focus to get maximum value from data.
The biggest concern is to be always available, store more and respond fast. Need of the hour is to scale up and scale out fast and distribute the load to multiple servers.
NewSQL databases are modern RDBMS with scalability potential of NoSQL systems for read write workloads and support the ACID principles of a traditional relational database system. NewSQL was designed to preserve SQL at same time addressing the scalability and performance issues with traditional online transaction processing systems. Examples include VoltDB, NuoDB, MemSQL, SAP HANA, Splice Machine, Clustrix, and Altibase. However, these are still not the best options for databases over a few terabytes.
Every enterprise wants best possible insight into the data to gain the competitive edge. This requires gathering more and more, structured and unstructured data and processing each possible byte of it.
Big data ecosystem & NoSQL databases have been the best FIT for these use cases and doing same is complex with traditional RDBMS or NewSQL.
NoSQL database are Schema less databases and can easily hold unstructured and semi-structured data like XML or JSON which serve as a catalyst for agile methodologies and iterative development.
Features like Auto Sharding and Spreading query and data load across the cluster machines which are usually cheap commodity server makes the scaling of cluster really convenient & cost effective to store, and process huge volume of data.
Categories of NoSQL Database
- Key-Value Stores - Riak, Amazon S3 (Dynamo), Redis, MemcacheDB,
- Graph Stores - Neo4j, FlockDB, InfiniteGraph, MarkLogic, Virtuoso
- Column Stores - Hbase, Cassandra, Vertica (HP)
- Document Stores - MongoDB, CouchDB, DocumentDB
Handling petabytes of data with low latency, 360-degree views, real time analytics, real time recommendations and risk modeling are some of the avenues for which we can say convincingly that big data platforms like Apache Spark, Storm, Pig, TeZ, HDFS/Map Reduce, Kafka, Flume have left the mark in the data management & processing world.
A similar paradigm shift is happening in the data science world where though traditional data science tools are still thought of as great for working with traditional data to analyze patterns, give recommendations, forecast the future outcomes & do predictive analysis, etc. However, if you want to dig deeper or tackle big data, machine learning has proved as a sole hero.
Enterprises have to evaluate their use case, and what kind of mix of available data management & processing platforms can derive the best value for them.
Enterprises need to evaluate their cases, and see what kind of right mix of available Data Management & Processing platforms can derive the best value for their business.
Written by: Rishab Mehra
Oracle Database Administrator