A Data Science Central Community
When you're cleaning up data, you usually end up using a 5-8 functions a ton of times, and then a few more once or twice. Here are those 5-8 functions I find myself using again and again.
Here is a quick overview:
names() - returns the column names of a dateset
str() - gives the overview of a dataset
data.table package - includes functions for creating new columns, among other things
%in% operator - checks if a value is in a vector
Below are some examples. The dataset 'rock' is built into R.
> names(rock) # returns the column names
[1] "area" "peri" "shape" "perm"
> str(rock) # gives the format of the dataframe
'data.frame': 48 obs. of 4 variables:
$ area : int 4990 7002 7558 7352 7943 7979 9333 8209 8393 6425 ...
$ peri : num 2792 3893 3931 3869 3949 ...
$ shape: num 0.0903 0.1486 0.1833 0.1171 0.1224 ...
$ perm : num 6.3 6.3 6.3 6.3 17.1 17.1 17.1 17.1 119 119 ...
# import the data.table package
> install.packages("data.table") # don't forget these 3 steps!
> library(data.table)
> dtRock <- data.table(rock)
> dtRock[1:5] # returns the first 5 columns
area peri shape perm
1: 4990 2791.90 0.0903296 6.3
2: 7002 3892.60 0.1486220 6.3
3: 7558 3930.66 0.1833120 6.3
4: 7352 3869.32 0.1170630 6.3
5: 7943 3948.54 0.1224170 17.1
# and my favorite way to create a new column
# area is measured in pixels,
so areaMP is in mega pixels
> dtRock[, areaMP := area / 1000]
> dtRock[1, ] # indicates the first row, all columns
area peri shape perm areaMP
1: 4990 2791.9 0.0903296 6.3 4.99
> dtRock[, 'areaMP'] # returns the entire 'areaMP' column
# The %in% operator is one of the most useful functions in R, I think.
> a <- c(1,2,3,4)
> 4 %in% a # it's asking, is the value 4 in the vector a?
[1] TRUE
There are many other functions and packages, such as the 'dplyr' package by Hadley Wickham, but I am just showing the ones I use most frequently.
View the original post, and others from the author here.
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of AnalyticBridge to add comments!
Join AnalyticBridge