Monday, April 13, 2015

Mongo DB Sharding Setup

Mongo DB: (from "humongous") is an open-source document database, and the leading NoSQL database. Written in C++
Sharding : Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. 
Here are the steps to do sharding setup on MongoDB. For sharding, now, we are going to have :
  1. 3 config servers
  2. 3 mongo servers 
  3. 1 sharded server
Config Server
  • Config servers are special mongod instances that store the metadata for a sharded cluster. Config servers use a two-phase commit to ensure immediate consistency and reliability. Config servers do not run as replica sets. All config servers must be available to deploy a sharded cluster or to make any changes to cluster metadata.
    A production sharded cluster has exactly three config servers. For testing purposes you may deploy a cluster with a single config server. But to ensure redundancy and safety in production, you should always use three.
  • These are the machines that do not store the data rather they just be used to manage the metadata of the sharded environment
Mongo Server
  • These are the machines that actually have the data, so we need these machines to be powerful enough to hold the data and do the work heavily and that too continuously
  • Also we can have the replica of these machines, the way we configure replica sets in normal scenarios
Sharded Server
  • This machine is going to do pretty heavy work and manipulation.

The environment what we will deploy is going to look like this:
For demo let us consider the IP of the 7 machines are as follows:
EC2 Instance(Size)
Machine A192.168.100.101ReplicaBig Machine
Machine B192.168.100.102                             ReplicaBig Machine
Machine C192.168.100.103ReplicaBig Machine
Machine D192.168.100.104ConfigMedium Size
Machine E192.168.100.105ConfigMedium Size
Machine F          Size
Shard Machine192.168.100.100ShardBig Size
  • Now while installing we have to make sure about the permissions of the  folder to the user of the mongod
  • Also for config server we have to make a different directory other than the data directory say /data/configdb (default path what config server of mongo db searches for)
    • make sure to change the ownership of this directory as well
Now the commands to be followed:
Machine D, E, F :
  • mongod     --configsvr
    • This starts the config server in these machines which is going to be used by the sharded server to connect to
    • by default it starts on 27019, we can specify the port via  script or via command,please see the docs provided by mongo db for that (doc here)
Shard Machine:
  • We have to start a shard server by connecting to these config server(Machine D, E, F), for that we will have to start a command :
    • mongos  --configdb,,  --port  27020
        • this will start a sharded server at port 27020
Now we have to attach clusters to this shard server but before that we have to setup the shard cluster as an individual mongod  instance with shard server parameter, so for that use the following step:
Machine A, B, C :
  • mongod  --shardsvr
    • this will start the shard enabled cluster on the default port 27018
Adding the shard enable cluster in the shard machine:
  • For this connect via mongo client to the sharded machine in the admin mode as
    • mongo
      • This will connect you to the shard  server
After this we have to add the shard enabled clusters in the shard server (Machine A, B, C)using the:
  • sh.addShard(""); 
  • sh.addShard(""); 
  • sh.addShard(""); 
    • These commands will add the shard enabled cluster to our shard machine
If our shard enabled servers are replica sets in that case we just have to add one machine out of that replica set and that will going to take care of other replica set machines:
  • sh.addShard("<replica_set_name>/");
  • Before version 2.0.3, it is required to specify all the replica set members to be added to the sharded cluster like
    • sh.addShard("<replica_set_name>/,,"); info here
If everything goes well you can see the status of the shard environment using:
  •  sh.status()
After the setup we have to decide which database to shard under which, which collection to shard under which, which is going to be the shard key and what type of shard key it will be like normal/sharded:
  • db.runCommand({enableSharding:"testDB"})
  • sh.shardCollection("testDB.testCollection",{"testKeyname":1});
    • 1 denotes the normal
    • this is for making the collection named testCollection under testDB as a sharded collection with testKeyname as the key
    • if we want to  make a group of keys as sharded key then we can't have as hashed key(here)
      • Use the form {field: "hashed"} to create a hashed shard key. Hashed shard keys may not be compound indexes.
  • You have to configure the sharding again for your databases in case you deleted the databases.
  • Also, when you do that be sure you are connected to admin database.(use admin)
  • db.runCommand({enableSharding:"<database_name>"})
  • sh.shardCollection("<database_name>.<collection_name>",{"token":"hashed"});
  • Also for tuning the mongodb shard setup some of the points to take care of:
    • sudo vi /etc/security/limits.conf
    • add this at the end of the file in the format given:
      • root - nofile 64000
        * - nofile 64000
        root - nproc  64000
        * - nproc 64000
        root - stack 16384
        * - stack 16384
      • save and close the vi
      • then reset the limits for the current session
      • ulimit -n 64000
        ulimit -u 64000
        ulimit -s 16384

No comments:

Post a Comment