Recently I kept getting advertisement on migrating your SQL database to NoSQL. It's been 5 years since I last deal with database related work, excluding the CRUD on database task. NoSQL was quite new to me then, our platform main database backbone was Oracle DB. I am still not sure how or why we were convinced to have a new subsystem to use MongoDB, but that's the strength of developers, creating a good looking resume profile, and new task opportunities in the company. (I don't say new job opportunities, as we don't hire new DBA to support NoSQL. I forgot, and could be I was not informed, if the DBAs were sent for training to support this. :D)
Based on my past experience and study, NoSQL was for a totally different use case. It is for non-structured database, simple design, and ...??
I just did a quick check on my profile, the my M101P: MongoDB for developers certificate is no longer exist. The course in MongoDB University is already replaced by M220P: MongoDB for Python Developers. I am not sure when this happened. But, there must be a lot of events happening in NoSQL world since 2014. :D I am not sure why specifically for Python developer, it could be due to the development in Big Data are mostly in Python. But, we were on Perl back then. If you are interested to pursue in this course, please visit this free 3-week online course by MongoDB University.
OK, I write the above paragraph about 2 months ago? And I thought this comparison that I wanted to do, is easy. I forgot the fact thatI have been not doing any coding for a loooooooooong while. I completed the basic MongoDB course. Then I searched for free dataset that I could use. One thing useful but scary about Google is, it brings you to a bigger unknown world to you. Anyway, I chose a YouTube dataset provided at YouTube 2 data set. Alright, you might see page not reachable result. I did copied the dataset from this page, and copied down the necessary information in my OneNote. I am creating a github repository for this project, I'll put all the necessary data there. This is the simplest dataset that I could found, and I think something not right with the data. For the edges data, it says it's representing the user's friend, and the network is symmetric. Is this the same YouTube that I know?
And the new world that I was introduced to, was Graph database and neo4j. When I started this topic, I only think of MySQL (or MariaDB) and MongoDB. Based on what I read, seems like neo4j is a better choice for Graph database. But, I want to get my initial idea done, so I'll skip neo4j for later.
So, basically this NoSQL v SQL is MongoDB v MySQL (MariaDB). :D Let's get to some statistic.
I am using Perl programming to do the insert data into the database. The scripts parse the data file, and insert the record one by one into the database. The database design is simple, as the raw data is in table format, so there's nothing to redesign for MySQL. I did think of having an array for the edges data, since it's symmetric, considering if I am doing a graphlookup or aggregation would be easier. But I would have to tweak a bit on the Perl scripts. So, I put this thought aside. At this point, I haven't derive the query that I'd like to perform to compare these 2 databases, but the general idea is there already.
This is the result of insert the data with time command. I triggered MongoDB first, and triggered the MySQL scripts later in another terminal. Both hosted on the same machine.
# time perl setup-mongodb-data.pl Setting up nodes table Setting up groups table Setting up edges table Setting up group-edges table real 38m34.230s user 28m45.195s sys 2m1.537s
# time perl setup-mysql-data.pl Setting up nodes table Setting up groups table Setting up edges table Setting up group-edges table real 28m55.591s user 2m13.600s sys 0m38.488s
It is obvious that MongoDB took longer time to complete this. It could be the Perl module not optimized... I don't know. But I use the standard Perl library. When I saw MySQL scripts finished first, I drop my idea of run the MySQL scripts first, then MongoDB later.
Next, is the CPU usage. I monitor these 2 processes using the following command.
# top -b -d 10 -p 2823,2836 | tee DB-projects/top.log
This is the pid details:
# ps -ax | grep perl 2823 pts/1 R+ 9:43 perl setup-mongodb-data.pl 2836 pts/0 S+ 1:12 perl setup-mysql-data.pl 3123 pts/2 S+ 0:00 grep --color=auto perl
Here's some part of what I logged.
top - 21:44:18 up 87 days, 17:43, 4 users, load average: 2.29, 2.20, 1.73 Tasks: 2 total, 2 running, 0 sleeping, 0 stopped, 0 zombie %Cpu(s): 55.9 us, 9.7 sy, 0.0 ni, 1.4 id, 31.7 wa, 0.0 hi, 1.2 si, 0.0 st KiB Mem : 3880640 total, 400852 free, 2030452 used, 1449336 buff/cache KiB Swap: 8601596 total, 8598268 free, 3328 used. 1312860 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2823 root 20 0 490684 311020 3156 R 79.1 8.0 15:28.08 perl 2836 root 20 0 252720 7744 3968 R 9.7 0.2 1:54.53 perl top - 21:44:28 up 87 days, 17:43, 4 users, load average: 2.32, 2.21, 1.74 Tasks: 2 total, 2 running, 0 sleeping, 0 stopped, 0 zombie %Cpu(s): 55.7 us, 9.7 sy, 0.0 ni, 1.0 id, 32.5 wa, 0.0 hi, 1.1 si, 0.0 st KiB Mem : 3880640 total, 396388 free, 2032612 used, 1451640 buff/cache KiB Swap: 8601596 total, 8598268 free, 3328 used. 1310680 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2823 root 20 0 492860 313196 3156 R 79.9 8.1 15:36.08 perl 2836 root 20 0 252720 7744 3968 R 10.0 0.2 1:55.53 perl top - 21:44:39 up 87 days, 17:43, 4 users, load average: 2.27, 2.20, 1.74 Tasks: 2 total, 1 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 55.5 us, 9.2 sy, 0.0 ni, 0.9 id, 33.2 wa, 0.0 hi, 1.2 si, 0.0 st KiB Mem : 3880640 total, 392064 free, 2034764 used, 1453812 buff/cache KiB Swap: 8601596 total, 8598268 free, 3328 used. 1308548 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2823 root 20 0 495164 315500 3156 S 80.3 8.1 15:44.12 perl 2836 root 20 0 252720 7744 3968 R 9.6 0.2 1:56.49 perl top - 21:44:49 up 87 days, 17:43, 4 users, load average: 2.23, 2.20, 1.75 Tasks: 2 total, 1 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 55.3 us, 9.6 sy, 0.0 ni, 1.7 id, 32.4 wa, 0.0 hi, 1.0 si, 0.0 st KiB Mem : 3880640 total, 387444 free, 2037096 used, 1456100 buff/cache KiB Swap: 8601596 total, 8598268 free, 3328 used. 1306184 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2823 root 20 0 497340 317676 3156 S 79.8 8.2 15:52.11 perl 2836 root 20 0 252720 7744 3968 R 9.8 0.2 1:57.47 perl
A few records before the MySQL scripts ends.
top - 21:53:19 up 87 days, 17:52, 4 users, load average: 2.29, 2.19, 1.93 Tasks: 2 total, 0 running, 2 sleeping, 0 stopped, 0 zombie %Cpu(s): 56.0 us, 9.6 sy, 0.0 ni, 1.8 id, 31.6 wa, 0.0 hi, 1.0 si, 0.0 st KiB Mem : 3880640 total, 236036 free, 2149652 used, 1494952 buff/cache KiB Swap: 8601596 total, 8598268 free, 3328 used. 1193556 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2823 root 20 0 613096 433304 3156 S 79.4 11.2 22:39.56 perl 2836 root 20 0 252720 7744 3968 S 10.1 0.2 2:48.32 perl top - 21:53:29 up 87 days, 17:52, 4 users, load average: 2.25, 2.19, 1.93 Tasks: 2 total, 0 running, 2 sleeping, 0 stopped, 0 zombie %Cpu(s): 56.2 us, 9.5 sy, 0.0 ni, 1.2 id, 32.0 wa, 0.0 hi, 1.1 si, 0.0 st KiB Mem : 3880640 total, 231448 free, 2152032 used, 1497160 buff/cache KiB Swap: 8601596 total, 8598268 free, 3328 used. 1191292 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2823 root 20 0 615404 435740 3156 S 79.8 11.2 22:47.55 perl 2836 root 20 0 252720 7744 3968 S 10.4 0.2 2:49.36 perl top - 21:53:39 up 87 days, 17:52, 4 users, load average: 2.28, 2.20, 1.93 Tasks: 2 total, 1 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 56.2 us, 9.6 sy, 0.0 ni, 1.3 id, 32.0 wa, 0.0 hi, 1.0 si, 0.0 st KiB Mem : 3880640 total, 226952 free, 2154380 used, 1499308 buff/cache KiB Swap: 8601596 total, 8598268 free, 3328 used. 1188976 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2823 root 20 0 617708 438044 3156 R 79.6 11.3 22:55.52 perl 2836 root 20 0 252720 7744 3968 S 10.1 0.2 2:50.37 perl top - 21:53:49 up 87 days, 17:52, 4 users, load average: 2.31, 2.21, 1.94 Tasks: 2 total, 1 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 56.1 us, 9.8 sy, 0.0 ni, 1.4 id, 31.6 wa, 0.0 hi, 1.1 si, 0.0 st KiB Mem : 3880640 total, 222396 free, 2156508 used, 1501736 buff/cache KiB Swap: 8601596 total, 8598268 free, 3328 used. 1186700 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2823 root 20 0 619884 440220 3156 S 79.8 11.3 23:03.51 perl 2836 root 20 0 252720 7744 3968 R 10.2 0.2 2:51.39 perl top - 21:53:59 up 87 days, 17:52, 4 users, load average: 2.11, 2.17, 1.93 Tasks: 1 total, 1 running, 0 sleeping, 0 stopped, 0 zombie %Cpu(s): 52.5 us, 7.8 sy, 0.0 ni, 17.1 id, 21.6 wa, 0.0 hi, 1.0 si, 0.0 st KiB Mem : 3880640 total, 221692 free, 2155004 used, 1503944 buff/cache KiB Swap: 8601596 total, 8598268 free, 3328 used. 1188212 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2823 root 20 0 622188 442524 3156 R 79.7 11.4 23:11.49 perl
The raw database file size, if I got this correct. I just installed MongoDB, and just started to use both MySQL and MongoDB in this VM. All I was done is keep running my scripts, drop the tables, recreate the tables. I could have use the stats() command or check the information_schema.tables, but I found out some discrepancies. So, I hope this statistic is "clean".
# ls -lah total 52K drwx------ 2 mysql mysql 95 Mar 12 21:53 . drwxr-xr-x. 6 mysql mysql 192 Mar 9 19:17 .. -rw-rw---- 1 mysql mysql 65 Mar 9 19:17 db.opt -rw-rw---- 1 mysql mysql 8.4K Mar 12 21:32 edges.frm -rw-rw---- 1 mysql mysql 8.5K Mar 12 21:53 group_edges.frm -rw-rw---- 1 mysql mysql 8.4K Mar 12 21:32 groups.frm -rw-rw---- 1 mysql mysql 8.4K Mar 12 21:25 nodes.frm
# ls -lah total 110M drwxr-xr-x 4 mongod mongod 4.0K Mar 12 23:40 . drwxr-xr-x. 69 root root 4.0K Mar 5 16:07 .. -rw------- 1 mongod mongod 36K Mar 11 22:54 collection-0--1442641377275254385.wt -rw------- 1 mongod mongod 8.0K Mar 5 16:09 collection-184--1442641377275254385.wt -rw------- 1 mongod mongod 20K Mar 11 22:54 collection-194--1442641377275254385.wt -rw------- 1 mongod mongod 20K Mar 5 16:09 collection-2--1442641377275254385.wt -rw------- 1 mongod mongod 17M Mar 12 21:36 collection-221--1442641377275254385.wt -rw------- 1 mongod mongod 20K Mar 12 21:36 collection-223--1442641377275254385.wt -rw------- 1 mongod mongod 51M Mar 12 22:03 collection-225--1442641377275254385.wt -rw------- 1 mongod mongod 864K Mar 12 22:04 collection-227--1442641377275254385.wt -rw------- 1 mongod mongod 36K Mar 12 23:38 collection-4--1442641377275254385.wt drwx------ 2 mongod mongod 155 Mar 12 23:41 diagnostic.data -rw------- 1 mongod mongod 36K Mar 11 22:54 index-1--1442641377275254385.wt -rw------- 1 mongod mongod 8.0K Mar 5 16:09 index-185--1442641377275254385.wt -rw------- 1 mongod mongod 20K Mar 11 22:54 index-195--1442641377275254385.wt -rw------- 1 mongod mongod 20K Mar 11 22:54 index-196--1442641377275254385.wt -rw------- 1 mongod mongod 11M Mar 12 21:36 index-222--1442641377275254385.wt -rw------- 1 mongod mongod 20K Mar 12 21:36 index-224--1442641377275254385.wt -rw------- 1 mongod mongod 29M Mar 12 22:03 index-226--1442641377275254385.wt -rw------- 1 mongod mongod 480K Mar 12 22:04 index-228--1442641377275254385.wt -rw------- 1 mongod mongod 20K Mar 5 16:09 index-3--1442641377275254385.wt -rw------- 1 mongod mongod 36K Mar 12 23:38 index-5--1442641377275254385.wt -rw------- 1 mongod mongod 12K Mar 12 23:39 index-6--1442641377275254385.wt drwx------ 2 mongod mongod 110 Mar 12 21:58 journal -rw------- 1 mongod mongod 36K Mar 12 22:03 _mdb_catalog.wt -rw------- 1 mongod mongod 5 Mar 5 16:08 mongod.lock -rw------- 1 mongod mongod 44K Mar 12 23:38 sizeStorer.wt -rw------- 1 mongod mongod 114 Mar 5 16:08 storage.bson -rw------- 1 mongod mongod 47 Mar 5 16:08 WiredTiger -rw------- 1 mongod mongod 4.0K Mar 5 16:08 WiredTigerLAS.wt -rw------- 1 mongod mongod 21 Mar 5 16:08 WiredTiger.lock -rw------- 1 mongod mongod 1.2K Mar 12 23:40 WiredTiger.turtle -rw------- 1 mongod mongod 92K Mar 12 23:40 WiredTiger.wt
I'll just make a rough figure for the comparison. Since the differences are too obvious. :D
Summary.
| MongoDB | MySQL (MariaDB) ------------------------------+--------------+------------------ User time to insert data | 28m45.195s | 2m13.600s Average CPU usage | 79% | 10% Average memory usage | 11% | 0.2% Disk space consumption | 110M | 52K
Let's see how's the query performance in the next episode. Stay tuned! :D
The source code and data can be found at jcrys26/DB-projects
No comments:
Post a Comment