MongoDB and OpenStack — OSI Days 2014, India

By Selvakumar Arumugam
November 19, 2014

The 11th edition of Open Source India, 2014 was held at Bengaluru, India. The two day conference was filled with three parallel tech talks and workshops which was spread across various Open Source technologies.

In-depth look at Architecting and Building solutions using MongoDB

Aveekshith Bushan & Ranga Sarvabhouman from MongoDB started off the session with a comparison of the hardware cost involved with storage systems in earlier and recent days. In earlier days, the cost of storage hardware was very expensive, so the approach was to filter the data to reduce the size before storing into the database. So we were able to generate results from filtered data and we didn’t have option to process the source data. After the storage became cheap, we can now store the raw data and then we do all our filter/processing and then distribute it.

Earlier,

Filter -> Store -> Distribute

Present,

Store -> Filter -> Distribute

Here we are storing huge amount of data, so we need a processing system to handle and analyse the data in efficient manner. In current world, the data is growing like anything and 3Vs are phenomenal of growing (Big)Data. We need to handle the huge Volume of Variety of data in a Velocity. MongoDB follows certain things to satisfy the current requirement.

MongoDB simply stores the data as a document without any data type constraints which helps to store huge amount of data quickly. It leaves the constraints checks to the application level to increase the storage speed in database end. But it does recognises the data type after the data is stored as document. In simple words, the philosophy is: Why do we need to check the same things (datatype or other constraints) in two places (application and database)?

MongoDB stores all relations as single document and fetches the data in single disk seek. By avoiding multiple disk seeks, this results in the fastest retrieval of data. Whereas in relational database the relations stored in different tables which leads to multiple disk seek to retrieve the complete data of an entity. And MongoDB doesn’t support joins but it have Reference option to refer another collection(Table) without imposing foreign key constraints.

As per db-engines rankings, MongoDB stays in the top of NoSQL database world. Also it provides certain key features which I have remembered from the session:

Sub-documents duplicates the data but it helps to gain the performance(since the storage is cheap, the duplication doesn’t affect much)
Auto-sharding (Scalability)
Sharding helps parallel access to the system
Range Based Sharding
Replica Sets (High availability)
Secondary indexes available
Indexes are single tunable part of the MongoDB system
Partition across systems
Rolling upgrades
Schema free
Rich document based queries
Read from secondary

When do you need MongoDB?

The data grows beyond the system capacity in relational database
In a need of performance in online requests

Finally, speakers emphasized to understand use case clearly and choose right features of MongoDB to get effective performance.

OpenStack Mini Conf

A special half day OpenStack mini conference was organised at second half of first day. The talks were spread across basics to in depth of OpenStack project. I have summarised all the talks here to give an idea of OpenStack software platform.

OpenStack is a Open Source cloud computing platform to provision the Infrastructure as a Service(IaaS). There is a wonderful project DevStack out there to set up the OpenStack on development environment in easiest and fastest way. A well written documentation of the OpenStack project clearly explains everything. In addition, anyone can contribute to OpenStack with help of How to contribute guide, also project uses Gerrit review system and Launchpad bug tracking system.

OpenStack have multiple components to provide various features in Infrastructure as a Service. Here is the list of OpenStack components and the purpose of each one.

Nova (Compute) — manages the pool of computer resources
Cinder (Block Storage) — provides the storage volume to machines
Neutron (Network) — manages the networks and IP addresses
Swift (Object Storage) — provides distributed high availability(replication) on storage system.
Glance (Image) — provides a repository to store disk and server images
KeyStone (Identity) — enables the common authentication system across all components
Horizon (Dashboard) — provides GUI for users to interact with OpenStack components
Ceilometer (Telemetry) — provides the services usage and billing reports
Ironic (Bare Metal) — provisions bare metal instead of virtual machines
Sahara (Map Reduce) — provisions hadoop cluster for big data processing

OpenStack services are usually mapped to AWS services to better understand the purpose of the components. The following table depicts the mapping of similar services in OpenStack and AWS:

OpenStack	AWS
Nova	EC2
Cinder	EBS
Neutron	VPC
Swift	S3
Glance	AMI
KeyStone	IAM
Horizon	AWS Console
Ceilometer	Cloudwatch
Sahara	Elastic Mapreduce

Along with the overview of OpenStack architecture, there were couple of in-depth talks which are listed below with slides.

Neependra Khare from RedHat gave a presentation on using Docker in OpenStack Nova.
Pushpesh Sharma presented a comparison between storage component OpenStack Swift and Ceph
Sridhar Rao presented about OpenStack and its role in Network Function Virtualisation (NFV) — Slides

That was a wonderful Day One of OSI 2014 which helped me to get better understanding of MongoDB and OpenStack.

big-data cloud conference mongodb open-source

Custom Ecommerce

Application Development

Database Consulting

Cloud Hosting

Systems Integration

Legacy Business Systems

Security & Compliance

GIS

DevOps / Cloud

Databases

Frameworks

Ecommerce

About Us

Our Team

Clients

Blog

Careers

Our Blog