Monday, March 30, 2020

Basic MySQL/MariaDB

This post is about setting up MariaDB in Linux. I am using CentOS 7.

1. Install MariaDB

# sudo yum install mariadb-server

2. Enable and start MariaDB

# sudo systemctl enable mariadb
# sudo systemctl start mariadb

3. Use MariaDB

# mysql


For my project, I am preparing a YouTube database, based on a dataseet that I got from Internet. Somehow, this page is no longer available, but I have made a copy of it in my github repository.

At the MariaDB CLI, I created a database called "yt_data" with a DB user that has all privilege to this database.

MariaDB [(none)]> create database yt_data;
MariaDB [(none)]> grant all privileges on yt_data.* to 'db_user'@'localhost' identified by 'db_pass';
MariaDB [(none)]> flush privileges;

In order to login to MariaDB with the user account I created to access to the database that I created, here's the command line.

# mysql -u db_user -p yt_data


Note, by default, MariaDB can be accessed without any credential. You'll need to do necessary configuration/setup for security measurement.

Thursday, March 12, 2020

NoSQL v SQL Part 1 - resource usage comparison

To skip my ranting, go to the actual topic, click here. Or you would like to jump to the summary directly.

Recently I kept getting advertisement on migrating your SQL database to NoSQL. It's been 5 years since I last deal with database related work, excluding the CRUD on database task. NoSQL was quite new to me then, our platform main database backbone was Oracle DB. I am still not sure how or why we were convinced to have a new subsystem to use MongoDB, but that's the strength of developers, creating a good looking resume profile, and new task opportunities in the company. (I don't say new job opportunities, as we don't hire new DBA to support NoSQL. I forgot, and could be I was not informed, if the DBAs were sent for training to support this. :D)

Based on my past experience and study, NoSQL was for a totally different use case. It is for non-structured database, simple design, and ...??

I just did a quick check on my profile, the my M101P: MongoDB for developers certificate is no longer exist. The course in MongoDB University is already replaced by M220P: MongoDB for Python Developers. I am not sure when this happened. But, there must be a lot of events happening in NoSQL world since 2014. :D I am not sure why specifically for Python developer, it could be due to the development in Big Data are mostly in Python. But, we were on Perl back then. If you are interested to pursue in this course, please visit this free 3-week online course by MongoDB University.

OK, I write the above paragraph about 2 months ago? And I thought this comparison that I wanted to do, is easy. I forgot the fact thatI have been not doing any coding for a loooooooooong while. I completed the basic MongoDB course. Then I searched for free dataset that I could use. One thing useful but scary about Google is, it brings you to a bigger unknown world to you. Anyway, I chose a YouTube dataset provided at YouTube 2 data set. Alright, you might see page not reachable result. I did copied the dataset from this page, and copied down the necessary information in my OneNote. I am creating a github repository for this project, I'll put all the necessary data there. This is the simplest dataset that I could found, and I think something not right with the data. For the edges data, it says it's representing the user's friend, and the network is symmetric. Is this the same YouTube that I know?

And the new world that I was introduced to, was Graph database and neo4j. When I started this topic, I only think of MySQL (or MariaDB) and MongoDB. Based on what I read, seems like neo4j is a better choice for Graph database. But, I want to get my initial idea done, so I'll skip neo4j for later.

So, basically this NoSQL v SQL is MongoDB v MySQL (MariaDB). :D Let's get to some statistic.



I am using Perl programming to do the insert data into the database. The scripts parse the data file, and insert the record one by one into the database. The database design is simple, as the raw data is in table format, so there's nothing to redesign for MySQL. I did think of having an array for the edges data, since it's symmetric, considering if I am doing a graphlookup or aggregation would be easier. But I would have to tweak a bit on the Perl scripts. So, I put this thought aside. At this point, I haven't derive the query that I'd like to perform to compare these 2 databases, but the general idea is there already.

This is the result of insert the data with time command. I triggered MongoDB first, and triggered the MySQL scripts later in another terminal. Both hosted on the same machine.

# time perl setup-mongodb-data.pl
Setting up nodes table
Setting up groups table
Setting up edges table
Setting up group-edges table

real    38m34.230s
user    28m45.195s
sys     2m1.537s

# time perl setup-mysql-data.pl
Setting up nodes table
Setting up groups table
Setting up edges table
Setting up group-edges table

real    28m55.591s
user    2m13.600s
sys     0m38.488s

It is obvious that MongoDB took longer time to complete this. It could be the Perl module not optimized... I don't know. But I use the standard Perl library. When I saw MySQL scripts finished first, I drop my idea of run the MySQL scripts first, then MongoDB later.

Next, is the CPU usage. I monitor these 2 processes using the following command.

# top -b -d 10 -p 2823,2836  | tee DB-projects/top.log

This is the pid details:

# ps -ax | grep perl
 2823 pts/1    R+     9:43 perl setup-mongodb-data.pl
 2836 pts/0    S+     1:12 perl setup-mysql-data.pl
 3123 pts/2    S+     0:00 grep --color=auto perl

Here's some part of what I logged.

top - 21:44:18 up 87 days, 17:43,  4 users,  load average: 2.29, 2.20, 1.73
Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s): 55.9 us,  9.7 sy,  0.0 ni,  1.4 id, 31.7 wa,  0.0 hi,  1.2 si,  0.0 st
KiB Mem :  3880640 total,   400852 free,  2030452 used,  1449336 buff/cache
KiB Swap:  8601596 total,  8598268 free,     3328 used.  1312860 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2823 root      20   0  490684 311020   3156 R  79.1  8.0  15:28.08 perl
 2836 root      20   0  252720   7744   3968 R   9.7  0.2   1:54.53 perl

top - 21:44:28 up 87 days, 17:43,  4 users,  load average: 2.32, 2.21, 1.74
Tasks:   2 total,   2 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s): 55.7 us,  9.7 sy,  0.0 ni,  1.0 id, 32.5 wa,  0.0 hi,  1.1 si,  0.0 st
KiB Mem :  3880640 total,   396388 free,  2032612 used,  1451640 buff/cache
KiB Swap:  8601596 total,  8598268 free,     3328 used.  1310680 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2823 root      20   0  492860 313196   3156 R  79.9  8.1  15:36.08 perl
 2836 root      20   0  252720   7744   3968 R  10.0  0.2   1:55.53 perl

top - 21:44:39 up 87 days, 17:43,  4 users,  load average: 2.27, 2.20, 1.74
Tasks:   2 total,   1 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s): 55.5 us,  9.2 sy,  0.0 ni,  0.9 id, 33.2 wa,  0.0 hi,  1.2 si,  0.0 st
KiB Mem :  3880640 total,   392064 free,  2034764 used,  1453812 buff/cache
KiB Swap:  8601596 total,  8598268 free,     3328 used.  1308548 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2823 root      20   0  495164 315500   3156 S  80.3  8.1  15:44.12 perl
 2836 root      20   0  252720   7744   3968 R   9.6  0.2   1:56.49 perl

top - 21:44:49 up 87 days, 17:43,  4 users,  load average: 2.23, 2.20, 1.75
Tasks:   2 total,   1 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s): 55.3 us,  9.6 sy,  0.0 ni,  1.7 id, 32.4 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem :  3880640 total,   387444 free,  2037096 used,  1456100 buff/cache
KiB Swap:  8601596 total,  8598268 free,     3328 used.  1306184 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2823 root      20   0  497340 317676   3156 S  79.8  8.2  15:52.11 perl
 2836 root      20   0  252720   7744   3968 R   9.8  0.2   1:57.47 perl

A few records before the MySQL scripts ends.

top - 21:53:19 up 87 days, 17:52,  4 users,  load average: 2.29, 2.19, 1.93
Tasks:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
%Cpu(s): 56.0 us,  9.6 sy,  0.0 ni,  1.8 id, 31.6 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem :  3880640 total,   236036 free,  2149652 used,  1494952 buff/cache
KiB Swap:  8601596 total,  8598268 free,     3328 used.  1193556 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2823 root      20   0  613096 433304   3156 S  79.4 11.2  22:39.56 perl
 2836 root      20   0  252720   7744   3968 S  10.1  0.2   2:48.32 perl

top - 21:53:29 up 87 days, 17:52,  4 users,  load average: 2.25, 2.19, 1.93
Tasks:   2 total,   0 running,   2 sleeping,   0 stopped,   0 zombie
%Cpu(s): 56.2 us,  9.5 sy,  0.0 ni,  1.2 id, 32.0 wa,  0.0 hi,  1.1 si,  0.0 st
KiB Mem :  3880640 total,   231448 free,  2152032 used,  1497160 buff/cache
KiB Swap:  8601596 total,  8598268 free,     3328 used.  1191292 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2823 root      20   0  615404 435740   3156 S  79.8 11.2  22:47.55 perl
 2836 root      20   0  252720   7744   3968 S  10.4  0.2   2:49.36 perl

top - 21:53:39 up 87 days, 17:52,  4 users,  load average: 2.28, 2.20, 1.93
Tasks:   2 total,   1 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s): 56.2 us,  9.6 sy,  0.0 ni,  1.3 id, 32.0 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem :  3880640 total,   226952 free,  2154380 used,  1499308 buff/cache
KiB Swap:  8601596 total,  8598268 free,     3328 used.  1188976 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2823 root      20   0  617708 438044   3156 R  79.6 11.3  22:55.52 perl
 2836 root      20   0  252720   7744   3968 S  10.1  0.2   2:50.37 perl

top - 21:53:49 up 87 days, 17:52,  4 users,  load average: 2.31, 2.21, 1.94
Tasks:   2 total,   1 running,   1 sleeping,   0 stopped,   0 zombie
%Cpu(s): 56.1 us,  9.8 sy,  0.0 ni,  1.4 id, 31.6 wa,  0.0 hi,  1.1 si,  0.0 st
KiB Mem :  3880640 total,   222396 free,  2156508 used,  1501736 buff/cache
KiB Swap:  8601596 total,  8598268 free,     3328 used.  1186700 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2823 root      20   0  619884 440220   3156 S  79.8 11.3  23:03.51 perl
 2836 root      20   0  252720   7744   3968 R  10.2  0.2   2:51.39 perl

top - 21:53:59 up 87 days, 17:52,  4 users,  load average: 2.11, 2.17, 1.93
Tasks:   1 total,   1 running,   0 sleeping,   0 stopped,   0 zombie
%Cpu(s): 52.5 us,  7.8 sy,  0.0 ni, 17.1 id, 21.6 wa,  0.0 hi,  1.0 si,  0.0 st
KiB Mem :  3880640 total,   221692 free,  2155004 used,  1503944 buff/cache
KiB Swap:  8601596 total,  8598268 free,     3328 used.  1188212 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 2823 root      20   0  622188 442524   3156 R  79.7 11.4  23:11.49 perl

The raw database file size, if I got this correct. I just installed MongoDB, and just started to use both MySQL and MongoDB in this VM. All I was done is keep running my scripts, drop the tables, recreate the tables. I could have use the stats() command or check the information_schema.tables, but I found out some discrepancies. So, I hope this statistic is "clean".

# ls -lah
total 52K
drwx------  2 mysql mysql   95 Mar 12 21:53 .
drwxr-xr-x. 6 mysql mysql  192 Mar  9 19:17 ..
-rw-rw----  1 mysql mysql   65 Mar  9 19:17 db.opt
-rw-rw----  1 mysql mysql 8.4K Mar 12 21:32 edges.frm
-rw-rw----  1 mysql mysql 8.5K Mar 12 21:53 group_edges.frm
-rw-rw----  1 mysql mysql 8.4K Mar 12 21:32 groups.frm
-rw-rw----  1 mysql mysql 8.4K Mar 12 21:25 nodes.frm

# ls -lah
total 110M
drwxr-xr-x   4 mongod mongod 4.0K Mar 12 23:40 .
drwxr-xr-x. 69 root   root   4.0K Mar  5 16:07 ..
-rw-------   1 mongod mongod  36K Mar 11 22:54 collection-0--1442641377275254385.wt
-rw-------   1 mongod mongod 8.0K Mar  5 16:09 collection-184--1442641377275254385.wt
-rw-------   1 mongod mongod  20K Mar 11 22:54 collection-194--1442641377275254385.wt
-rw-------   1 mongod mongod  20K Mar  5 16:09 collection-2--1442641377275254385.wt
-rw-------   1 mongod mongod  17M Mar 12 21:36 collection-221--1442641377275254385.wt
-rw-------   1 mongod mongod  20K Mar 12 21:36 collection-223--1442641377275254385.wt
-rw-------   1 mongod mongod  51M Mar 12 22:03 collection-225--1442641377275254385.wt
-rw-------   1 mongod mongod 864K Mar 12 22:04 collection-227--1442641377275254385.wt
-rw-------   1 mongod mongod  36K Mar 12 23:38 collection-4--1442641377275254385.wt
drwx------   2 mongod mongod  155 Mar 12 23:41 diagnostic.data
-rw-------   1 mongod mongod  36K Mar 11 22:54 index-1--1442641377275254385.wt
-rw-------   1 mongod mongod 8.0K Mar  5 16:09 index-185--1442641377275254385.wt
-rw-------   1 mongod mongod  20K Mar 11 22:54 index-195--1442641377275254385.wt
-rw-------   1 mongod mongod  20K Mar 11 22:54 index-196--1442641377275254385.wt
-rw-------   1 mongod mongod  11M Mar 12 21:36 index-222--1442641377275254385.wt
-rw-------   1 mongod mongod  20K Mar 12 21:36 index-224--1442641377275254385.wt
-rw-------   1 mongod mongod  29M Mar 12 22:03 index-226--1442641377275254385.wt
-rw-------   1 mongod mongod 480K Mar 12 22:04 index-228--1442641377275254385.wt
-rw-------   1 mongod mongod  20K Mar  5 16:09 index-3--1442641377275254385.wt
-rw-------   1 mongod mongod  36K Mar 12 23:38 index-5--1442641377275254385.wt
-rw-------   1 mongod mongod  12K Mar 12 23:39 index-6--1442641377275254385.wt
drwx------   2 mongod mongod  110 Mar 12 21:58 journal
-rw-------   1 mongod mongod  36K Mar 12 22:03 _mdb_catalog.wt
-rw-------   1 mongod mongod    5 Mar  5 16:08 mongod.lock
-rw-------   1 mongod mongod  44K Mar 12 23:38 sizeStorer.wt
-rw-------   1 mongod mongod  114 Mar  5 16:08 storage.bson
-rw-------   1 mongod mongod   47 Mar  5 16:08 WiredTiger
-rw-------   1 mongod mongod 4.0K Mar  5 16:08 WiredTigerLAS.wt
-rw-------   1 mongod mongod   21 Mar  5 16:08 WiredTiger.lock
-rw-------   1 mongod mongod 1.2K Mar 12 23:40 WiredTiger.turtle
-rw-------   1 mongod mongod  92K Mar 12 23:40 WiredTiger.wt

I'll just make a rough figure for the comparison. Since the differences are too obvious. :D



Summary.

                              |  MongoDB     |  MySQL (MariaDB)
------------------------------+--------------+------------------
User time to insert data      |  28m45.195s  |  2m13.600s
Average CPU usage             |  79%         |  10%
Average memory usage          |  11%         |  0.2%
Disk space consumption        |  110M        |  52K

Let's see how's the query performance in the next episode. Stay tuned! :D

The source code and data can be found at jcrys26/DB-projects

Basic MongoDB

This post is about setting up MongoDB in Linux. I am using CentOS 7.

1. Adding the mongoDB repository

> vi /etc/yum.repos.d/mongodb.repo

[MongoDB]
name=MongoDB Repository
baseurl=http://repo.mongodb.org/yum/redhat/$releasever/mongodb-org/4.2/$basearch/
gpgcheck=1
enabled=1
gpgkey=https://www.mongodb.org/static/pgp/server-4.2.asc

2. Install MongoDB

> yum install mongodb-org

3. Starting MongoDB service

> systemctl start mongod.service

4. Use the MongoDB

> mongo

Voila, you can start to use MongoDB right away.

For my project, I am preparing a YouTube database, based on a dataset that I got from YouTube 2 data set.

At the mongo shell, I am creating a database called "yt_data" with a DB user that is granted with readwrite access only to this database.

> use yt_data
> db.createUser( { user: "db_user", pwd: "db_pass", roles: [{ role: "readWrite", db: "yt_data"}] });
Successfully added user: {
"user" : "db_user",
"roles" : [
{
"role" : "readWrite",
"db" : "yt_data"
}
]
}

So, in order for me to login to mongo shell with this user account, I'll need to provide the credential, and also the database name.

> mongo --username db_user --password db_pass yt_data


Note, by default, mongoDB can be accessed without any credential. You'll need to do necessary configuration/setup for security measurement.


Reference : Install MongoDB Community Edition on Red Hat or CentOS

Wednesday, March 11, 2020

Working with github repository

It's been more than 5 years, I don't really work on any programming project. All my git/perl knowledge are gone. I was stunt a bit when looking back my documentation at Basic commands of git.

I have cleaned up my github account, and start new repository to host the files and scripts on a database related research that I am working on. Spent some time to read some more git commands online, and the instruction on github, finally I manage to link my local repository with the github repository that I created. It took me quite sometime to get things right. :(

Here's the step.

1. Create a github repository.

2. Create a local git repository.
> git init

3. Add the github repository as our origin
> git remote add origin https://github.com/<your github repository>

4. Clone the github repository to local (If you have files in the github repository already)
> git clone https://github.com/<your github repository>

5. Create a branch in local
> git checkout -b <branch name> origin/master

6. Make commits

7. Push to github repository
> git push origin <branch name>


Then... I found an easier way... And, work on the trunk only.

1. Create a github repository.

2. Clone your github repository
> git clone https://github.com/<your github repository>

3. Make commits

4. Push to github repository
> git push origin master


Some other git commands
To set user name and email only for this current local git repository

> git config user.email "<your email>"
> git config user.name "<your name>"

Again, please refer to git cheat-sheats for more command. :)