Fast way to find duplicate data in MongoDB

I need to find out the duplicate data content in my 40 Millions records, then I can make the unique index to my name field.

[code lang=”shell”]
> db.collecton.aggregate([
… { $group : {_id : "$field_name", total : { $sum : 1 } } },
… { $match : { total : { $gte : 2 } } },
… { $sort : {total : -1} },
… { $limit : 5 }],
… { allowDiskUse: true}
… );

{ "_id" : "data001", "total" : 2 }
{ "_id" : "data004231", "total" : 2 }
{ "_id" : "data00751", "total" : 2 }
{ "_id" : "data0021", "total" : 2 }
{ "_id" : "data001543", "total" : 2 }
>
[/code]

{ allowDiskUse: true} is optional if your data is not huge.

{ $limit : 5 }, you can set display more data.

How to Install MongoDB 3.2 on CentOS 7

[code lang=”shell”]
vim /etc/yum.repos.d/mongodb.repo
[/code]
Paste this to the file and save using :wq
[code lang=”shell”]

[MongoDB]
name=MongoDB Repository
baseurl=http://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/3.2/x86_64/
gpgcheck=0
enabled=1
[/code]
Download and install mongodb using yum
[code lang=”shell”]
yum install mongodb-org -y
[/code]
Start mongod and configure auto start while system boot
[code lang=”shell”]
/etc/init.d/mongod restart
chkconfig mongod on
[/code]
Check all the versions
[code lang=”shell”]
[[email protected] ~]# mongo –version
MongoDB shell version: 3.2.3
[[email protected] ~]# mongod –version
db version v3.2.3
git version: b326ba837cf6f49d65c2f85e1b70f6f31ece7937
OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
allocator: tcmalloc
modules: none
build environment:
distmod: rhel70
distarch: x86_64
target_arch: x86_64
[/code]
Test the connection
[code lang=”shell”]
[[email protected] ~]# mongo
MongoDB shell version: 3.2.3
> use test
switched to db test
> db.test.save( { juzhax: 1 } )
WriteResult({ "nInserted" : 1 })
> db.test.find()
{ "_id" : ObjectId("56d4ac48b376b143e4749229"), "juzhax" : 1 }
[/code]

WARNING: /sys/kernel/mm/transparent_hugepage/enabled is ‘always’.

After I install MongoDB 3.2.3 in Centos 7, I received this error when I start mongo in shell.

[code lang=”shell”]
[[email protected] ~]# mongo
MongoDB shell version: 3.2.3
connecting to: test
Server has startup warnings:
2016-02-29T14:11:49.308-0500 I CONTROL [initandlisten]
2016-02-29T14:11:49.308-0500 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is ‘always’.
2016-02-29T14:11:49.308-0500 I CONTROL [initandlisten] ** We suggest setting it to ‘never’
2016-02-29T14:11:49.308-0500 I CONTROL [initandlisten]
2016-02-29T14:11:49.308-0500 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is ‘always’.
2016-02-29T14:11:49.308-0500 I CONTROL [initandlisten] ** We suggest setting it to ‘never’
2016-02-29T14:11:49.308-0500 I CONTROL [initandlisten]
2016-02-29T14:11:49.308-0500 I CONTROL [initandlisten] ** WARNING: soft rlimits too low. rlimits set to 4096 processes, 64000 files. Number of processes should be at least 32000 : 0.5 times number of files.
[/code]

Solution

Create the init.d script.
Create the following file at /etc/init.d/disable-transparent-hugepages:

[code lang=”shell”]
#!/bin/sh
### BEGIN INIT INFO
# Provides: disable-transparent-hugepages
# Required-Start: $local_fs
# Required-Stop:
# X-Start-Before: mongod mongodb-mms-automation-agent
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Disable Linux transparent huge pages
# Description: Disable Linux transparent huge pages, to improve
# database performance.
### END INIT INFO

case $1 in
start)
if [ -d /sys/kernel/mm/transparent_hugepage ]; then
thp_path=/sys/kernel/mm/transparent_hugepage
elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
thp_path=/sys/kernel/mm/redhat_transparent_hugepage
else
return 0
fi

echo ‘never’ > ${thp_path}/enabled
echo ‘never’ > ${thp_path}/defrag

unset thp_path
;;
esac
[/code]

Make it executable.
Run the following command to ensure that the init script can be used:

[code lang=”shell”]
sudo chmod 755 /etc/init.d/disable-transparent-hugepages
[/code]

[code lang=”shell”]
sudo chkconfig –add disable-transparent-hugepages
[/code]

WARNING: Cannot detect if NUMA interleaving is enabled. Failed to probe “/sys/devices/system/node/node1”: Permission denied

[code lang=”shell”]
[[email protected] ~]# mongo
MongoDB shell version: 3.2.3
connecting to: test
Server has startup warnings:
2016-02-29T23:11:36.666+0700 I CONTROL [initandlisten]
2016-02-29T23:11:36.667+0700 I CONTROL [initandlisten] ** WARNING: Cannot detect if NUMA interleaving is enabled. Failed to probe "/sys/devices/system/node/node1": Permission denied
2016-02-29T23:11:36.667+0700 W CONTROL [initandlisten]
2016-02-29T23:11:36.667+0700 W CONTROL [initandlisten] Failed to probe "/sys/kernel/mm/transparent_hugepage": Permission denied
2016-02-29T23:11:36.667+0700 W CONTROL [initandlisten]
2016-02-29T23:11:36.667+0700 W CONTROL [initandlisten] Failed to probe "/sys/kernel/mm/transparent_hugepage": Permission denied
2016-02-29T23:11:36.667+0700 I CONTROL [initandlisten]
2016-02-29T23:11:36.667+0700 I CONTROL [initandlisten] ** WARNING: soft rlimits too low. rlimits set to 4096 processes, 262144 files. Number of processes should be at least 131072 : 0.5 times number of files.
[/code]

Solution

I’m using the OVH kernel, so it is impossible to use with MongoDB, to solve this issue I have to install back the original kernel of the linux, then this error will be gone.

WARNING: soft rlimits too low. rlimits set to 4096 processes, 64000 files. Number of processes should be at least 32000 : 0.5 times number of files.

I’ve received this error while starting mongo in shell while installing on
[code lang=”shell”]
[[email protected] ~]# mongod –version
db version v3.2.3
git version: b326ba837cf6f49d65c2f85e1b70f6f31ece7937
OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
allocator: tcmalloc
modules: none
build environment:
distmod: rhel70
distarch: x86_64
target_arch: x86_64
[[email protected] ~]# mongo –version
MongoDB shell version: 3.2.3
CentOS Linux release 7.2.1511 (Core)
[/code]

[code lang=”shell”]
WARNING: soft rlimits too low. rlimits set to 4096 processes, 64000 files. Number of processes should be at least 32000 : 0.5 times number of files.
[/code]

Solution

[code lang=”shell”]
vim /etc/security/limits.d/90-nproc.conf
[/code]
Then put in
[code lang=”shell”]
mongod soft nproc 64000
[/code]
and
[code lang=”shell”]
reboot
[/code]

MongoDB Error: about to fork child process, waiting until server is ready for connections.

I tried to install MongoDB to CentOS 6.4 64Bit and found the error while I launch mongodb like this:
[code lang=”bash”]
numactl –interleave=all /usr/bin/mongod -f /etc/mongod.conf
[/code]
Here are the error code.
[code lang=”bash”]
about to fork child process, waiting until server is ready for connections.
forked process: 9713
Wed Oct 16 02:00:00.640 terminate() called, printing stack (if implemented for platform):
0xdddd81 0x6cfbae 0x35d60203be6 0x35d60203c13 0x35d60203d0e 0x35d601a8ce7 0x35d60201a04 0x35d601ad3bc 0x35d601ae226 0xdfb5df 0xdfbf2b 0xdf8bd0 0x9ed4df 0x6dde80 0x6dfc29 0x35d5f938cdd 0x6cf999
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
/usr/bin/mongod(_ZN5mongo11myterminateEv+0x3e) [0x6cfbae]
/usr/lib64/libstdc++.so.6(+0xbcbe6) [0x35d60203be6]
/usr/lib64/libstdc++.so.6(+0xbcc13) [0x35d60203c13]
/usr/lib64/libstdc++.so.6(+0xbcd0e) [0x35d60203d0e]
/usr/lib64/libstdc++.so.6(_ZSt21__throw_runtime_errorPKc+0x67) [0x35d601a8ce7]
/usr/lib64/libstdc++.so.6(+0xbaa04) [0x35d60201a04]
/usr/lib64/libstdc++.so.6(_ZNSt6locale5_ImplC2EPKcm+0x4c) [0x35d601ad3bc]
/usr/lib64/libstdc++.so.6(_ZNSt6localeC2EPKc+0x5f6) [0x35d601ae226]
/usr/bin/mongod(_ZN5boost11filesystem34path21wchar_t_codecvt_facetEv+0x4f) [0xdfb5df]
/usr/bin/mongod(_ZNK5boost11filesystem34path14root_directoryEv+0xbb) [0xdfbf2b]
/usr/bin/mongod(_ZN5boost11filesystem38absoluteERKNS0_4pathES3_+0x40) [0xdf8bd0]
/usr/bin/mongod(_ZN5mongo27initializeServerGlobalStateEb+0x15f) [0x9ed4df]
/usr/bin/mongod() [0x6dde80]
/usr/bin/mongod(main+0x9) [0x6dfc29]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x35d5f938cdd]
/usr/bin/mongod() [0x6cf999]
Wed Oct 16 02:00:00.645 Got signal: 6 (Aborted).

Wed Oct 16 02:00:00.649 Backtrace:
0xdddd81 0x6d0d29 0x35d5f94c960 0x35d5f94c8e5 0x35d5f94e0c5 0x6cfbb3 0x35d60203be6 0x35d60203c13 0x35d60203d0e 0x35d601a8ce7 0x35d60201a04 0x35d601ad3bc 0x35d601ae226 0xdfb5df 0xdfbf2b 0xdf8bd0 0x9ed4df 0x6dde80 0x6dfc29 0x35d5f938cdd
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdddd81]
/usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6d0d29]
/lib64/libc.so.6(+0x32960) [0x35d5f94c960]
/lib64/libc.so.6(gsignal+0x35) [0x35d5f94c8e5]
/lib64/libc.so.6(abort+0x175) [0x35d5f94e0c5]
/usr/bin/mongod(_ZN5mongo11myterminateEv+0x43) [0x6cfbb3]
/usr/lib64/libstdc++.so.6(+0xbcbe6) [0x35d60203be6]
/usr/lib64/libstdc++.so.6(+0xbcc13) [0x35d60203c13]
/usr/lib64/libstdc++.so.6(+0xbcd0e) [0x35d60203d0e]
/usr/lib64/libstdc++.so.6(_ZSt21__throw_runtime_errorPKc+0x67) [0x35d601a8ce7]
/usr/lib64/libstdc++.so.6(+0xbaa04) [0x35d60201a04]
/usr/lib64/libstdc++.so.6(_ZNSt6locale5_ImplC2EPKcm+0x4c) [0x35d601ad3bc]
/usr/lib64/libstdc++.so.6(_ZNSt6localeC2EPKc+0x5f6) [0x35d601ae226]
/usr/bin/mongod(_ZN5boost11filesystem34path21wchar_t_codecvt_facetEv+0x4f) [0xdfb5df]
/usr/bin/mongod(_ZNK5boost11filesystem34path14root_directoryEv+0xbb) [0xdfbf2b]
/usr/bin/mongod(_ZN5boost11filesystem38absoluteERKNS0_4pathES3_+0x40) [0xdf8bd0]
/usr/bin/mongod(_ZN5mongo27initializeServerGlobalStateEb+0x15f) [0x9ed4df]
/usr/bin/mongod() [0x6dde80]
/usr/bin/mongod(main+0x9) [0x6dfc29]
/lib64/libc.so.6(__libc_start_main+0xfd) [0x35d5f938cdd]

ERROR: child process failed, exited with error number 14
[/code]

Solved

The way i use to solve the problem by this.
[code lang=”bash”]
[[email protected] ~]# export LANGUAGE=en_US.UTF-8
[[email protected] ~]# export LANG=en_US.UTF-8
[[email protected] ~]# export LC_ALL=en_US.UTF-8
[[email protected] ~]#
[[email protected] ~]# numactl –interleave=all /usr/bin/mongod -f /etc/mongod.conf
about to fork child process, waiting until server is ready for connections.
forked process: 9821
all output going to: /var/log/mongo/mongod.log
child process started successfully, parent exiting
[/code]