Posts

Showing posts from July, 2017

NoSql Tutorial 9 : HBase Vs Cassandra Vs MongoDB

Image
Cassandra When to use Cassandra These are some cases when to use Cassandra, ·          If we are looking for simple setup, maintenance ,code. ·          Since all the administration tasks are automated. Scale up ,scale down, adding nodes will be done fast. ·          Very high velocity random read and writes.Because of the columnar architecture . ·          Flexible spare/wide column requirement. ·          No multiple secondary index needsBecause it effects the overall performance of the Cassandra, it is  mainly suitable for non-group by systems. ·          No Groups and No joins should be their. When not use Cassandra These are some scenarios when not to use Cassandra database. Cassandra is query modeled concept, whi...

NoSql Tutorial 8 : Document databases

Image
It pair each key with a complex data structure known as a document. Documents can contain many different key-value pairs, or key-array pairs, or even nested documents. ·          A document is a key value collection where the key allows access to its value. ·          Documents are not typically forced to have a schema and therefore are flexible and easy to change. ·          Documents are stored into collections in order to group different kinds of data. Here is a comparison between the classic relational model and the document model : Relational model Document model Tables Collections Rows Documents Columns Key/value pairs Joins not available

NoSql Tutorial 7 : Graph Stores

Image
These are used to store information about networks, such as social connections. Graph stores include Neo4J and HyperGraphDB. ·          A graph database stores data in a graph. ·          It is capable of elegantly representing any kind of data in a highly accessible way. ·          A graph database is a collection of nodes and edges ·          Each node represents an entity (such as a student or business) and each edge represents a connection or relationship between two nodes. ·          Every node and edge is defined by a unique identifier. ·          Each node knows its adjacent nodes. ·          As the number of nodes increases, the cost of a local step (or hop) remains the same. · ...

NoSql Tutorial 6 : Column-oriented

Image
These databases are optimized for queries over large datasets, and store columns of data together, instead of rows. ·          Column-oriented databases primarily work on columns and every column is treated individually. ·          Values of a single column are stored contiguously. ·          Column stores data in column specific files. ·          In Column stores, query processors work on columns too. ·          All data within each column datafile have the same type which makes it ideal for compression. ·          Column stores can improve the performance of queries as it can access specific column data. ·          High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX). · ...

NoSql Tutorial 5 : NoSql: Key-value stores

Image
It is the simplest NoSQL databases. Every single item in the database is stored as an attribute name (or "key"), together with its value. Examples of key-value stores are Riak and Voldemort. Some key-value stores, such as Redis, allow each value to have a type, such as "integer", which adds functionality. ·          Designed to handle huge amounts of data. ·          Based on Amazon’s Dynamo paper. ·          Key value stores allow developer to store schema-less data ·          In the key-value storage, database stores data as hash table where each key is unique and the value can be string, JSON, BLOB (basic large object) etc.. ·          Key-Value stores follows the 'Availability' and 'Partition' aspects of CAP theorem. ·          Key-...

NoSql Tutorial 4 : NoSQL Categories

There are currently 150 types of NoSQL databases. These are some NoSQL categories Wide column store/column Familiar: Examples: Hbase , Cassandra ,Accumulo ,Cloudata etc.. Document Store: Examples : MongoDB ,CouchDB ,RethinkDB etc.. Key value/ Tuple store : Examples  : Riak, Redis , DynamoDB etc.. Graph Databases: Examples : Neo4j ,sparksee , Titan , InfoGrid etc.. Multi model databases: Examples: ArangoDB ,OrientDB,Datomic etc.. Object databases: Examples: Versant,db4o,objectivity etc.. Grid and cloud database solutions: Examples: Oracle Coherence , GigaSpaces, Queplix etc.. XML databases: Examples: eXist, sedan , BaseX etc.. Multidimensional databases: Examples : DaggarDB , Globals , GT.M etc.. Multivalue databases: Examples: U2 , OpenInsight, Reality etc.. Event sourcing: Examples : Event Store Network Model : Examples: VyhoDB        

NoSql Tutorial 3 : CAP Theorm

Image
In   theoretical computer science , the   CAP theorem , also known as   Brewer's theorem. In order to talk about NoSQL databases or when designing any distributed system. CAP theorem states that there are three basic requirements which exist in a special relation when designing applications for a distributed architecture. Consistency   - This means that the data in the database remains consistent after the execution of an operation. For example after an update operation all clients see the same data. Availability   - This means that the system is always on (service guarantee availability), no downtime. Partition Tolerance   - This means that the system continues to function even the communication among the servers is unreliable, i.e. the servers may be partitioned into multiple groups that cannot communicate with one another. In theoretically it is impossible to fulfill all 3 requirements. CAP provides the basic requirements for a distributed ...

NoSql Tutorial 2 : Nosql vs RDBMS

RDBMS vs NoSQL RDBMS   - Structured and organized data   - Structured query language (SQL)   - Data and its relationships are stored in separate tables.   - Data Manipulation Language, Data Definition Language   - Tight Consistency   - BASE Transaction NoSQL   - Stands for Not Only SQL - No declarative query language - No predefined schema   - Key-Value pair storage, Column Store, Document Store, Graph databases - Eventual consistency rather ACID property   - Unstructured and unpredictable data - CAP Theorem   - Prioritizes high performance, high availability and scalability Advantages of NoSQL Databases over Relational Databases The Growth of Big Data: Big Data is one of the key forces driving the growth and popularity of NoSQL for business. The almost limitless array of data collection technologies ranging from simple online actions to point of sale systems to GPS tools to smartphones and...

NoSql Tutorial 1 : NoSql Databases Overview

Image
What is NoSQl NoSQL is a non-relational database management system.It is designed for distributed data stores where very large scale of data storing needs. It encompasses a wide variety of different database technologies that were developed in response to a rise in the volume of data stored about users, objects and products, the frequency, in which this data is accessed, and performance and processing needs.   Why NoSQL In today’s time data is becoming easier to access and capture through third parties such as Facebook, Google+ and others. Personal user information, social graphs, geo location data, user-generated content and machine logging data are just a few examples where the data has been increasing exponentially. To avail the above service properly, it is required to process huge amount of data. Which SQL databases were never designed. The evolution of NoSql databases is to handle these huge data properly.

Hive Tutorial 7 : Hive configuration precedence order

There is a precedence hierarchy to setting properties. In the following list, lower numbers take precedence over higher numbers: 1.     The Hive SET command 2.     The command line -hiveconf option 3.     hive-site.xml 4.     hive-default.xml 5.     hadoop-site.xml (or, equivalently, core-site.xml, hdfs-site.xml, and mapred-site.xml) 6.     hadoop-default.xml (or, equivalently, core-default.xml, hdfs-default.xml, and mapred-default.xml)