Data Abstraction: End-to-End Series (Part — 1)
5 min readMar 12, 2021
Contents
- What is a database?
- MongoDB
i. Installing PyMongo
ii. Importing Libraries
iii. Connecting to the database
iv. Checking all the collections present in the database
v. Connecting to the collection
vi. Counting the number of documents in the collection
vii. Checking all the documents
viii. Importing data from MongoDB to Notebook
ix. Converting data to CSV - MySQL
i. Installing PyMongo
ii. Importing Libraries
iii. Connecting to the database
iv. Getting the list of all the tables in our DB
v. Fetching and calling the data in our notebook
vi. Saving it as a CSV
Also, Check out our Article on:
- Data Preprocessing and EDA: End-to-End Series (Part — 2)
- Model Building and Experimentation: End-to-End Series (Part — 3)
- Creating a WebApp using Flask+Gunicorn on Ubuntu Server: End-to-End Series (Part — 4)
- Containerizing the WebApp using Docker: End-to-End Series (Part — 5)
- Scaling our Docker Container using Kubernetes: End-to-End Series (Part — 6)
- Automating building and deployment using Jenkins: End-to-End Series (Part — 7)
What is a database?
- A database is an ordered collection of information (data) that can be easily accessed and managed.
- The main purpose of databases is to handle the humongous amount of data by storing, retrieving and managing.
- Many different examples of databases are MySQL, MariaDB, MongoDB, PostgreSQL and so on.
In this article we will look into the ways to push and pull data into MongoDb, MariaDb and MySQL using python.
MongoDB
- MongoDB is a document database, it stores data in JSON-like documents.
- Support for modern use-cases like:
- geo-based search
- graph search
- Text search. - Queries are themselves JSON, and thus easily composable.
We will be pulling the Telecom Churn Data from the MongoDB into our notebook and convert it to CSV.
→ Installing PyMongo
!pip install pymongo[srv]
→ Importing Libraries
import pymongofrom pymongo import MongoClientimport urllibimport pandas as pdimport jsonimport numpy as npfrom bson import ObjectId
→ Connecting to the database
string_mongo = "Connection String"client = MongoClient(string_mongo)
→ Connecting to a Database
db = client['DataBase_Name']
→ Checking all the collections present in the database
db.list_collection_names()
→ Connecting to the collection
db_c = db['Collection_Name']
→ Counting number of documents in the collection
db_c.count_documents({})
→ Checking all the documents
cursor = db_c.find({})for document in cursor:
print(document)
→ Importing data from MongoDB to Notebook
mongo_docs = db_c.find({'_id':ObjectId('ID String')})mongo_docs
→ Converting data to CSV
fields = {}for doc in mongo_docs:
for key, val in doc.items():
try:
fields[Key] = np.append(fields[key], val)
except KeyError:
fields[key] = np.array([val])print(fields)
series_list = []columns = []for key, val in fields.items():
if key != "_id":
columns.append(key)
print("\n\n--------------------------")
print(key)
print(fields[key])
series_list.append(fields[key])
columns
df_series = {}for num, series in enumerate(series_list):
val = series[0].values()
df_series[columns[num]] = valmongo_df = pd.DataFrame(df_series)mongo_df.head()
mongo_df.dtypes
mongo_df.to_csv("Churn_data.csv")
MySQL
- Relational Database Management System that allows you to manage relational databases.
- It is open-source software backed by Oracle.
- MySQL can run on various platforms UNIX, Linux, Windows, etc.
→ Installing Library
!pip install pymysql
→ Importing Libraries
import pymysql
import pandas
pymysql.__version__
→ Connecting to our database
conn = pymysql.connect(host='localhost',
user='root',
password='password',
db='Database_Name')cursor = conn.cursor()
→ Getting the list of all the tables in our DB
cursor.execute("SELECT table_name FROM information_schema.tables
WHERE table_schema = 'Database_Name'")for table in [tables[0] for tables in cursor.fetchall()]:
print(table)
→ Fetching and calling the data in our notebook
query = 'select * from churn'
results = pandas.read_sql_query(query, conn)
results.head()
type(results)
results.dtypes
→ Saving it as a CSV
results.to_csv("output.csv", index=False)
Follow us for more upcoming future articles related to Data Science, Machine Learning, and Artificial Intelligence.
Also, Do give us a Clap👏 if you find this article useful as your encouragement catalyzes inspiration for and helps to create more cool stuff like this.