CSV Files in Python

CSV Files in Python

Introduction

The term "comma separated values" (or "csv") refers to a straightforward file format that uses certain formatting to arrange tabular data. It provides a standard format for data interchange and stores tabular data, such as from a spreadsheet or database, in plain text. The rows and columns of data in a csv file that opens in an excel sheet determine the format that should be used.

Reading CSV files

Python has many routines for reading csv files. We are outlining a few reading functions.

Using the csv.reader() function

The csv file is read using Python's csv.reader() function. It creates a list of all the columns for each row in the file.

The following information is in the text file python.txt, which by default uses the comma (,) as a delimiter:

name,department,birthday month   
Parker,Accounting,November   
Smith,IT,October    

Example

import csv   
with open('python.csv') as csv_file:   
    csv_reader = csv.reader(csv_file, delimiter=',')   
    line_count = 0   
    for row in csv_reader:   
        if line_count == 0:   
            print(f'Column names are {", ".join(row)}')   
            line_count += 1    

Output:

Column names are name, department, birthday month
  Parker works in the Accounting department, and was born in November.
  Smith works in the IT department, and was born in October.
Processed 3 lines.

Using the open() function, the file "python.csv" was opened in the code above. To read the file, we used the csv.reader() function, which produces an iterable reader object. The data were contained in the reader object, and we used a for loop to iteratively print the contents of each row.

write your code here: Coding Playground

Read a CSV into a Dictionary

Instead than dealing with a list of distinct text items, we can use the DictReader() function to read the data from the CSV file directly into a dictionary.

A reminder of the contents of our input file, python.txt

name,department,birthday month   
Parker,Accounting,November   
Smith,IT,October    

Example

import csv     
with open('python.txt', mode='r') as csv_file:   
    csv_reader = csv.DictReader(csv_file)   
    line_count = 0   
    for row in csv_reader:   
        if line_count == 0:   
            print(f'The Column names are as follows {", ".join(row)}')   
            line_count += 1   
        print(f'\t{row["name"]} works in the {row["department"]} department, and was born in {row["birthday month"]}.')   
        line_count += 1   
    print(f'Processed {line_count} lines.')    

Output:

The Column names are as follows name, department, birthday month
  Parker works in the Accounting department, and was born in November.
  Smith works in the IT department, and was born in October.
Processed 3 lines.

write your code here: Coding Playground

Pandas reading csv files

The NumPy library serves as the foundation for the open-source Pandas library. It offers quick data preparation, data cleaning, and analysis for the user.

It takes only a few seconds and is simple to read the csv file into a pandas DataFrame. To open, examine, and read the csv file in pandas and have it save the data in a DataFrame, we don't need to write many lines of code.

Here, we're going to read a slightly more challenging file named hrdata.csv, which contains information about the company's employees.

Name,Hire Date,Salary,Leaves Remaining   
John Idle,08/15/14,50000.00,10   
Smith Gilliam,04/07/15,65000.00,8   
Parker Chapman,02/21/14,45000.00,10   
Jones Palin,10/14/13,70000.00,3   
Terry Gilliam,07/22/14,48000.00,7   
Michael Palin,06/28/13,66000.00,8    

Example

import pandas   
df = pandas.read_csv('hrdata.csv')   
print(df)    

Output:

  Name                Hire Date     Salary      Leaves Remaining
0     John Idle              03/15/14      50000.0       10
1     Smith Gilliam          06/01/15      65000.0       8
2     Parker Chapman         05/12/14      45000.0       10
3     Jones Palin            11/01/13      70000.0       3
4     Terry Gilliam          08/12/14      48000.0       7
5     Michael Palin          05/23/13      66000.0       8

write your code here: Coding Playground

Writing CSV Files

The csv.writer() function in Python can be used to write any new or existing CSV files. It contains two methods, the writer function or the Dict Writer class, and is similar to the csv.reader() module.

It provides the writerow() and writerows methods (). The writerows() function writes many rows, whereas the writerow() function only writes one row.

Dialects

It is described as a framework that enables the creation, storage, and reuse of different formatting settings. It supports a variety of characteristics; the most popular ones are:

  • Dialect.delimiter: This attribute is used to denote a field's boundary. The comma is the default value (,).
  • Dialect.quotechar: Fields with special characters are quoted using this property.
  • Dialect.lineterminator : whose default value is "rn," is used to start new lines.

Let's create a CSV file with the data below.

data = [{'Rank': 'B', 'first_name': 'Parker', 'last_name': 'Brian'},    
{'Rank': 'A', 'first_name': 'Smith', 'last_name': 'Rodriguez'},   
{'Rank': 'C', 'first_name': 'Tom', 'last_name': 'smith'},   
{'Rank': 'B', 'first_name': 'Jane', 'last_name': 'Oscar'},    
{'Rank': 'A', 'first_name': 'Alex', 'last_name': 'Tim'}]    

Example:

import csv   
   
with open('Python.csv', 'w') as csvfile:   
    fieldnames = ['first_name', 'last_name', 'Rank']   
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)   
   
    writer.writeheader()   
    writer.writerow({'Rank': 'B', 'first_name': 'Parker', 'last_name': 'Brian'})   
    writer.writerow({'Rank': 'A', 'first_name': 'Smith',   
                    'last_name': 'Rodriguez'})   
    writer.writerow({'Rank': 'B', 'first_name': 'Jane', 'last_name': 'Oscar'})   
    writer.writerow({'Rank': 'B', 'first_name': 'Jane', 'last_name': 'Loive'})   
   
print("Writing complete")    

Output:

Writing complete

write your code here: Coding Playground

It returns a file called "Python.csv" that has the following information in it:

first_name,last_name,Rank   
Parker,Brian,B   
Smith,Rodriguez,A   
Jane,Oscar,B   
Jane,Loive,B     

Create a dictionary from a CSV file

To write the CSV file directly into a dictionary, we may also utilize the class DictWriter.

The following information is located in a file called "python.csv":

Parker, Accounting, November

Smith, IT, October

Example:

import csv   
with open('python.csv', mode='w') as csv_file:   
    fieldnames = ['emp_name', 'dept', 'birth_month']   
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)   
    writer.writeheader()   
    writer.writerow({'emp_name': 'Parker', 'dept': 'Accounting', 'birth_month': 'November'})   
    writer.writerow({'emp_name': 'Smith', 'dept': 'IT', 'birth_month': 'October'})    

Output:

import csv   
with open('python.csv', mode='w') as csv_file:   
    fieldnames = ['emp_name', 'dept', 'birth_month']   
    writer = csv.DictWriter(csv_file, fieldnames=fieldnames)   
    writer.writeheader()   
    writer.writerow({'emp_name': 'Parker', 'dept': 'Accounting', 'birth_month': 'November'})   
    writer.writerow({'emp_name': 'Smith', 'dept': 'IT', 'birth_month': 'October'})    

Create CSV Files Utilizing Pandas

As an open source library built on top of the Numpy library, Pandas is described as such. It offers quick data preparation, data cleaning, and analysis for the user.

It is as simple as using pandas to read the CSV file. The DataFrame, a two-dimensional, heterogeneous tabular data structure with three primary parts (data, columns, and rows), must be created. Here, we'll read a slightly more challenging file named hrdata.csv that provides information about the company's personnel.

Name,Hire Date,Salary,Leaves Remaining   
John Idle,08/15/14,50000.00,10   
Smith Gilliam,04/07/15,65000.00,8   
Parker Chapman,02/21/14,45000.00,10   
Jones Palin,10/14/13,70000.00,3   
Terry Gilliam,07/22/14,48000.00,7   
Michael Palin,06/28/13,66000.00,8  

Example:

import pandas   
df = pandas.read_csv('hrdata.csv',    
            index_col='Employee',    
            parse_dates=['Hired'],   
            header=0,    
            names=['Employee', 'Hired', 'Salary', 'Sick Days'])   
df.to_csv('hrdata_modified.csv')    

Output:

Employee, Hired, Salary, Sick Days
John Idle, 2014-03-15, 50000.0,10
Smith Gilliam, 2015-06-01, 65000.0,8
Parker Chapman, 2014-05-12, 45000.0,10
Jones Palin, 2013-11-01, 70000.0,3
Terry Gilliam, 2014-08-12 , 48000.0,7
Michael Palin, 2013-05-23, 66000.0,8

write your code here: Coding Playground