Convert All Numbers to Integers While Reading Csv Python
This article was published as a office of the Information Science Blogathon
Introduction
CSV isa typical file format that is often used in domains like Gonetary Services, etc. Most applications tin can enable you to import and export knowledge in CSV format.
Thus, it is necessary to induce a good understanding of the CSV format to higher handle the information you are used with every twenty-four hour period.
Then, throughout this commodity, nosotros'll run across various instances of operating with CSV files and provide examples to tie everything along.
Tabular array of Contents
ane. What'south CSV?
2. Basic Operations with CSV Files
- Working with CSV Files
- Opening a CSV File
- Saving a CSV File
3. Why CSV Files?
4. Basics of read_csv() function of Pandas
- Importing Pandas
- Opening a Local CSV File
- Opening a CSV File from a URL
five. Agreement parameters of read_csv() function
- sep parameter
- index_col parameter
- header parameter
- use_cols parameter
- squeeze parameter
- skiprows parameter
- nrows parameter
- encoding parameter
- error_bad_lines parameter
- dtype parameter
- parse_dates parameter
- convertors parameter
- na_values parameter
Allow's go started,
What is a CSV?
CSV (Comma Separated Values) may exist a simple file format accustomed to store tabular data, like a spreadsheet or database. CSV file stores tabular data (numbers and text) in plain text. Each line of the file could be a information record. Each record consists of 1 or more than fields, separated by commas,the utilization of the comma every bit a field separator is that the source of the name for this file format.
Basic Operations with CSV Files
In Basic operations, we are going to sympathize the subsequent three things:
- How to piece of work with CSV files
- How to open a CSV file
- How to Save a CSV file
Working with CSV Files
Working with CSV files isn't that tedious chore but it's pretty straightforward. Yet, counting on your workflow, in that location can become caveats that y'all simply might want to observe out for.
Opening a CSV File
If you've got a CSV file, you'll open information technology in Excel without much trouble. Just open Excel, open and detect the CSV file to figure with (or correct-click on the CSV file and choose Open up in Excel). After you open up the file, you lot'll notice that the info is simply plain text put into different cells.
Saving a CSV File
If you wish to save lots of your current workbook into a CSV file, you have got to apply the subsequent commands:
File -> Salvage Equally… and choose CSV file.
More than often than non, yous'll get this alert:
Paradigm Source: Google Images
Let'due south understand what'due south this error saying to us?
Here Excel is trying to mention is that your CSV Files don't save any reasonable formatting in the least.
For Example, Column widths, font styles, colors, etc. won't be saved.
Just your plainly former information are saved in an exceedingly comma-separated file.
Note that fifty-fifty after you put it aside, Excel volition still show the formats that yous just had, so don't exist fooled by this and think that later on you lot open up the workbook once again that your formats will still be at that place. They won't be.
Even subsequently y'all open up a CSV go in Excel, if you apply any quite formattingin the least, similar arrange the column widths to piece of work out the info, Excel will withal warn y'all that you simply can't salve the formats that you lot merely added,you'll become a warning like this one:
Image Source: Google Images
So, the aim to note that is your formats tin can never exist saved in CSV Files.
Why CSV Files?
CSV files are used as the simplest way to speak information between unlike applications. Say y'all had a database application and you wanted to export the info to a file. If you wish to export it to an Excel file, the database awarding would support exporting to XLS* files.
Still, since the CSV file format is extremely straightforwardand lightweight (much a lot of therefore than XLS* files), it'southward easier for varied applications to back up it. In its basic usage, you have a line of text, with every cavalcade of information go culling ways by a comma. That'southward information technology. And since of this simplicity, it'south simple for developers to make Export / Import practicality with CSV files to transfer cognition between applications instead of a lot of sophisticated file formats.
For Instance,
Permit's accept a tabular data in the given course below:
If we convert this data into a CSV Format, then its look like this:
Now, nosotros are completed with all the nuts of CSV files. So, In the later on part of the article, nosotros volition be discussing working with CSV files in a detailed manner.
Importing Pandas
Firstly, nosotros import the necessary dependencies such equally Pandas Library of Python.
import pandas as pd
And then, the dependency is imported, now we can load and read the dataset easily.
read-csv part
- It is an of import pandas part to read CSV files and practice operations on them.
- This role helps u.s. to load the file either from your local machine or from any URL.
Opening a local CSV file
If the file is present in the same location as in our Python File, and then give the file name simply to load that file, otherwise, you have to give the relative path to it.
df = pd.read_csv('aug_train.csv') df
Output:
Opening a CSV file from a URL
If the file is not present direct in our local machine, but we take to fetch the data from a given URL, and so nosotros take the assist of the requests module to load that information.
import requests from io import StringIO url = "https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv" headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac Bone 10 10.fourteen; rv:66.0) Gecko/20100101 Firefox/66.0"} req = requests.go(url, headers=headers) information = StringIO(req.text) pd.read_csv(data)
Output:
sep parameter
If nosotros have a dataset in which entities in a detail row are not separated past a comma, then we accept to employ the sep parameter to specify the separator or delimiter.
For Example, If we have a tsv file i.e, entities are tab-separated and if we try to directly load this data, then all the entities are load combined.
import pandas equally pd pd.read_csv('movie_titles_metadata.tsv')
Output:
To solve the above problem for the CSV file, nosotros have to overwrite the sep parameter to ' t 'instead of ', 'which is a default separator.
import pandas as pd pd.read_csv('movie_titles_metadata.tsv',sep='t')
Output:
In the to a higher place example, we have observed that the first row is treated as the column's name, and to solve this trouble and make our custom name for the columns, we have to specify the list of words with names as the name of the listing.
pd.read_csv('movie_titles_metadata.tsv',sep='t',names=['sno','proper noun','release_year','rating','votes','genres'])
Output:
index-col parameter
This parameter allows us to fix which columns to be used as the alphabetize of the data frame. The default value for this parameter is None, and pandas automatically will add together a new cavalcade kickoff from 0 to draw the index column.
So, it allows united states of america to use a column equally the row labels for a given DataFrame. This part comes in handy when lets we have an ID column present with our dataset and that column is not impacted our predictions, and then nosotros make that cavalcade our index for rows instead of the default.
pd.read_csv('aug_train.csv',index_col='enrollee_id')
Output:
header parameter
This allows us to specify which row will be used as column names for your data frame. It expects input as an int value or a list of int values.
The default value for this parameter is header=0, which implies that the first row of the CSV file volition be considered as cavalcade names.
pd.read_csv('test.csv',header=1)
Output:
use-cols parameter
Specify which columns to import from the consummate dataset to the data frame. Information technology tin accept input either a list of int values or directly the cavalcade names.
This part comes in handy when we have to practise our analysis on just some columns, non on all the columns of our dataset.
So, this parameter returns a subset of the columns from your dataset.
pd.read_csv('aug_train.csv',usecols=['enrollee_id','gender','education_level'])
Output:
squeeze parameter
If true and but one column is passed, returns pandas serial instead of a DataFrame.
pd.read_csv('aug_train.csv',usecols=['gender'],squeeze=Truthful)
Output:
skiprows parameter
This parameter is used to skip past rows in the new data frame.
pd.read_csv('aug_train.csv',skiprows=[0,1])
Output:
nrows parameter
This office only read the fixed number (decided by the user) of the offset rows from the file. Needs an int value.
This parameter comes in handy when nosotros accept a huge dataset, and we want to load our dataset in chunks instead of directly load the complete dataset.
pd.read_csv('aug_train.csv',nrows=100)
Output:
encoding parameter
This parameter helps to which encoding you lot have to employ for UTF when reading or writing the files.
Sometimes what happens is our files are not encoded in the default form i.due east, UTF-viii. And so, saving that with a text editor or adding the param "encoding='utf-viii′ doesn't work. In both cases, it returns the error.
So, to resolve this issue we call our read_csv function with encoding='latin1′, encoding='iso-8859-i′ or encoding='cp1252′ (these are some of the various encodings found on Windows).
pd.read_csv('zomato.csv',encoding='latin-1')
Output:
mistake-bad-lines parameter
If nosotros accept a dataset in which some lines is having as well many fields (For Example, a CSV line with too many commas), and so by default it raises and causes an exception, and no DataFrame volition be returned.
And then, to resolve these types of problems we have to make this parameter False, and then these "bad lines" volition be dropped from the DataFrame that is returned. (Only valid with C parser)
pd.read_csv('BX-Books.csv', sep=';', encoding="latin-1",error_bad_lines=False)
Output:
dtype parameter
Data type for data or columns. For Example, {'a': np.float64, 'b': np.int32}
Sometimes to convert our columns from bladder data type to int data type, this office comes in handy.
pd.read_csv('aug_train.csv',dtype={'target':int}).info()
Output:
parse-dates parameter
If we brand this parameter is Truthful, then information technology tries to parse the index.
For Example, If [1, two, 3] -> try parsing columns 1, ii, 3 each as a separate appointment column and if we have to combine columns 1 and 3 and parse as a single date cavalcade, then employ [[i,iii]].
pd.read_csv('IPL Matches 2008-2020.csv',parse_dates=['date']).info()
Output:
convertors parameter
This parameter helps us to convert values in the columns based on a custom function given by the user.
def rename(name): if proper noun == "Imperial Challengers Bangalore": return "RCB" else: return proper noun
rename("Royal Challengers Bangalore")
Output:
'RCB'
pd.read_csv('IPL Matches 2008-2020.csv',converters={'team1':rename})
Output:
na-values parameter
As nosotros know that, the default missing values will be NaN. If we desire other strings to be considered as NaN, and so we accept to use this parameter. It expects a list of strings every bit the input.
Sometimes in our dataset, another type of symbol is used to make them missing values, then at that fourth dimension to understand those values equally missing, nosotros use this parameter.
pd.read_csv('aug_train.csv',na_values=['Male',])
Output:
This completes our word!
NOTE: In this commodity, we will be but discussing those parameters that are very useful while working with CSV files daily. Only if you are interested to learn more parameters, and then refer to the official website of Pandas here.
Or y'all tin can refer to this link also.
End Notes
Thank you for reading!
If you lot liked this and want to know more, go visit my other articles on Information Science and Machine Learning by clicking on the Link
Please feel free to contact me on Linkedin, Email.
Something non mentioned or want to share your thoughts? Feel free to comment below And I'll get back to you lot.
About the Writer
Chirag Goyal
Currently, I am pursuing my Bachelor of Technology (B.Tech) in Informatics and Engineering science from the Indian Establish of Engineering science Jodhpur(IITJ). I am very enthusiastic about Car learning, Deep Learning, and Artificial Intelligence.
The media shown in this commodity are not owned by Analytics Vidhya and are used at the Author'south discretion.
Source: https://www.analyticsvidhya.com/blog/2021/06/complete-guide-to-working-with-csv-files-in-python-with-pandas/
0 Response to "Convert All Numbers to Integers While Reading Csv Python"
Post a Comment