Data Abstraction: End-to-End Series (Part — 1)

Accredian Publication
5 min readMar 12, 2021

--

By Hemanka Sarmah

Contents

  1. What is a database?
  2. MongoDB
    i. Installing PyMongo
    ii. Importing Libraries
    iii. Connecting to the database
    iv. Checking all the collections present in the database
    v. Connecting to the collection
    vi. Counting the number of documents in the collection
    vii. Checking all the documents
    viii. Importing data from MongoDB to Notebook
    ix. Converting data to CSV
  3. MySQL
    i. Installing PyMongo
    ii. Importing Libraries
    iii. Connecting to the database
    iv. Getting the list of all the tables in our DB
    v. Fetching and calling the data in our notebook
    vi. Saving it as a CSV

Also, Check out our Article on:

  1. Data Preprocessing and EDA: End-to-End Series (Part — 2)
  2. Model Building and Experimentation: End-to-End Series (Part — 3)
  3. Creating a WebApp using Flask+Gunicorn on Ubuntu Server: End-to-End Series (Part — 4)
  4. Containerizing the WebApp using Docker: End-to-End Series (Part — 5)
  5. Scaling our Docker Container using Kubernetes: End-to-End Series (Part — 6)
  6. Automating building and deployment using Jenkins: End-to-End Series (Part — 7)

What is a database?

  • A database is an ordered collection of information (data) that can be easily accessed and managed.
  • The main purpose of databases is to handle the humongous amount of data by storing, retrieving and managing.
  • Many different examples of databases are MySQL, MariaDB, MongoDB, PostgreSQL and so on.

In this article we will look into the ways to push and pull data into MongoDb, MariaDb and MySQL using python.

MongoDB

  • MongoDB is a document database, it stores data in JSON-like documents.​
  • Support for modern use-cases like:​
    - geo-based search
    - graph search
    - Text search.​
  • Queries are themselves JSON, and thus easily composable. ​

We will be pulling the Telecom Churn Data from the MongoDB into our notebook and convert it to CSV.

→ Installing PyMongo

!pip install pymongo[srv]

→ Importing Libraries

import pymongofrom pymongo import MongoClientimport urllibimport pandas as pdimport jsonimport numpy as npfrom bson import ObjectId

→ Connecting to the database

string_mongo = "Connection String"client = MongoClient(string_mongo)

→ Connecting to a Database

db = client['DataBase_Name']

→ Checking all the collections present in the database

db.list_collection_names()

→ Connecting to the collection

db_c = db['Collection_Name']

→ Counting number of documents in the collection

db_c.count_documents({})

→ Checking all the documents

cursor = db_c.find({})for document in cursor:
print(document)

→ Importing data from MongoDB to Notebook

mongo_docs = db_c.find({'_id':ObjectId('ID String')})mongo_docs

→ Converting data to CSV

fields = {}for doc in mongo_docs:
for key, val in doc.items():
try:
fields[Key] = np.append(fields[key], val)
except KeyError:
fields[key] = np.array([val])
print(fields)
series_list = []columns = []for key, val in fields.items():
if key != "_id":
columns.append(key)
print("\n\n--------------------------")
print(key)
print(fields[key])
series_list.append(fields[key])
columns
df_series = {}for num, series in enumerate(series_list):
val = series[0].values()
df_series[columns[num]] = val
mongo_df = pd.DataFrame(df_series)mongo_df.head()
mongo_df.dtypes
mongo_df.to_csv("Churn_data.csv")

MySQL

  • Relational Database Management System that allows you to manage relational databases. ​
  • It is open-source software backed by Oracle.​
  • MySQL can run on various platforms UNIX, Linux, Windows, etc.​

→ Installing Library

!pip install pymysql

→ Importing Libraries

import pymysql
import pandas
pymysql.__version__

→ Connecting to our database

conn = pymysql.connect(host='localhost',
user='root',
password='password',
db='Database_Name')
cursor = conn.cursor()

→ Getting the list of all the tables in our DB

cursor.execute("SELECT table_name FROM information_schema.tables 
WHERE table_schema = 'Database_Name'")
for table in [tables[0] for tables in cursor.fetchall()]:
print(table)

→ Fetching and calling the data in our notebook

query = 'select * from churn'
results = pandas.read_sql_query(query, conn)
results.head()
type(results)
results.dtypes

→ Saving it as a CSV

results.to_csv("output.csv", index=False)

Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.

Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.

Visit us on https://www.insaid.co/

--

--

Accredian Publication

One of India’s leading institutions providing world-class Data Science & AI programs for working professionals with a mission to groom Data leaders of tomorrow!