Data Abstraction: End-to-End Series (Part — 1)

Accredian Publication

5 min readMar 12, 2021

By Hemanka Sarmah

What is a database?
MongoDB
i. Installing PyMongo
ii. Importing Libraries
iii. Connecting to the database
iv. Checking all the collections present in the database
v. Connecting to the collection
vi. Counting the number of documents in the collection
vii. Checking all the documents
viii. Importing data from MongoDB to Notebook
ix. Converting data to CSV
MySQL
i. Installing PyMongo
ii. Importing Libraries
iii. Connecting to the database
iv. Getting the list of all the tables in our DB
v. Fetching and calling the data in our notebook
vi. Saving it as a CSV

Also, Check out our Article on:

Data Preprocessing and EDA: End-to-End Series (Part — 2)
Model Building and Experimentation: End-to-End Series (Part — 3)
Creating a WebApp using Flask+Gunicorn on Ubuntu Server: End-to-End Series (Part — 4)
Containerizing the WebApp using Docker: End-to-End Series (Part — 5)
Scaling our Docker Container using Kubernetes: End-to-End Series (Part — 6)
Automating building and deployment using Jenkins: End-to-End Series (Part — 7)

What is a database?

A database is an ordered collection of information (data) that can be easily accessed and managed.
The main purpose of databases is to handle the humongous amount of data by storing, retrieving and managing.
Many different examples of databases are MySQL, MariaDB, MongoDB, PostgreSQL and so on.

In this article we will look into the ways to push and pull data into MongoDb, MariaDb and MySQL using python.

MongoDB

MongoDB is a document database, it stores data in JSON-like documents.
Support for modern use-cases like:
- geo-based search
- graph search
- Text search.
Queries are themselves JSON, and thus easily composable.

We will be pulling the Telecom Churn Data from the MongoDB into our notebook and convert it to CSV.

→ Installing PyMongo

!pip install pymongo[srv]

→ Importing Libraries

import pymongofrom pymongo import MongoClientimport urllibimport pandas as pdimport jsonimport numpy as npfrom bson import ObjectId

→ Connecting to the database

string_mongo = "Connection String"client = MongoClient(string_mongo)

→ Connecting to a Database

db = client['DataBase_Name']

→ Checking all the collections present in the database

db.list_collection_names()

→ Connecting to the collection

db_c = db['Collection_Name']

→ Counting number of documents in the collection

db_c.count_documents({})

→ Checking all the documents

cursor = db_c.find({})for document in cursor:
    print(document)

→ Importing data from MongoDB to Notebook

mongo_docs = db_c.find({'_id':ObjectId('ID String')})mongo_docs

→ Converting data to CSV

fields = {}for doc in mongo_docs:
    for key, val in doc.items():
        try:
            fields[Key] = np.append(fields[key], val)
        except KeyError:
            fields[key] = np.array([val])print(fields)

series_list = []columns = []for key, val in fields.items():
    if key != "_id":
        columns.append(key)
        print("\n\n--------------------------")
        print(key)
        print(fields[key])
        series_list.append(fields[key])

columns

df_series = {}for num, series in enumerate(series_list):
    val = series[0].values()
    df_series[columns[num]] = valmongo_df = pd.DataFrame(df_series)mongo_df.head()

mongo_df.dtypes

mongo_df.to_csv("Churn_data.csv")

MySQL

Relational Database Management System that allows you to manage relational databases.
It is open-source software backed by Oracle.
MySQL can run on various platforms UNIX, Linux, Windows, etc.

→ Installing Library

!pip install pymysql

→ Importing Libraries

import pymysql
import pandas
pymysql.__version__

→ Connecting to our database

conn = pymysql.connect(host='localhost',
                         user='root',
                         password='password',
                         db='Database_Name')cursor = conn.cursor()

→ Getting the list of all the tables in our DB

cursor.execute("SELECT table_name FROM information_schema.tables 
                WHERE table_schema = 'Database_Name'")for table in [tables[0] for tables in cursor.fetchall()]:
    print(table)

→ Fetching and calling the data in our notebook

query = 'select * from churn'
results = pandas.read_sql_query(query, conn)
results.head()

type(results)

results.dtypes

→ Saving it as a CSV

results.to_csv("output.csv", index=False)

Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.

Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.

Visit us on https://www.insaid.co/

Data Abstraction: End-to-End Series (Part — 1)

Contents

Also, Check out our Article on:

What is a database?

MongoDB

→ Installing PyMongo

→ Importing Libraries

→ Connecting to the database

→ Connecting to a Database

→ Checking all the collections present in the database

→ Connecting to the collection

→ Counting number of documents in the collection

→ Checking all the documents

→ Importing data from MongoDB to Notebook

→ Converting data to CSV

MySQL

→ Installing Library

→ Importing Libraries

→ Connecting to our database

→ Getting the list of all the tables in our DB

→ Fetching and calling the data in our notebook

→ Saving it as a CSV

Visit us on https://www.insaid.co/

Written by Accredian Publication

No responses yet