Subscribe to DSC Newsletter

ss

When you're cleaning up data, you usually end up using a 5-8 functions a ton of times, and then a few more once or twice. Here are those 5-8 functions I find myself using again and again.

Here is a quick overview:

names() - returns the column names of a dateset

str() - gives the overview of a dataset

data.table package - includes functions for creating new columns, among other things

%in% operator - checks if a value is in a vector

Below are some examples. The dataset 'rock' is built into R. 

>  names(rock) # returns the column names
[1] "area" "peri" "shape" "perm"

> str(rock)                         # gives the format of the dataframe
'data.frame': 48 obs. of 4 variables:
$ area : int 4990 7002 7558 7352 7943 7979 9333 8209 8393 6425 ...
$ peri : num 2792 3893 3931 3869 3949 ...
$ shape: num 0.0903 0.1486 0.1833 0.1171 0.1224 ...
$ perm : num 6.3 6.3 6.3 6.3 17.1 17.1 17.1 17.1 119 119 ...

# import the data.table package
> install.packages("data.table")             # don't forget these 3 steps!
> library(data.table)



> dtRock <- data.table(rock)

> dtRock[1:5]                    # returns the first 5 columns
area peri shape perm
1: 4990 2791.90 0.0903296 6.3
2: 7002 3892.60 0.1486220 6.3
3: 7558 3930.66 0.1833120 6.3
4: 7352 3869.32 0.1170630 6.3
5: 7943 3948.54 0.1224170 17.1

# and my favorite way to create a new column

# area is measured in pixels, so areaMP is in mega pixels

> dtRock[, areaMP := area / 1000]    

> dtRock[1, ]                        # indicates the first row, all columns
area peri shape perm areaMP
1: 4990 2791.9 0.0903296 6.3 4.99

> dtRock[, 'areaMP']                 # returns the entire 'areaMP' column

# The %in% operator is one of the most useful functions in R, I think.
> a <- c(1,2,3,4)

> 4 %in% a                  # it's asking, is the value 4 in the vector a?
[1] TRUE

There are many other functions and packages, such as the 'dplyr' package by Hadley Wickham, but I am just showing the ones I use most frequently.

View the original post, and others from the author here.

Views: 4627

Comment

You need to be a member of AnalyticBridge to add comments!

Join AnalyticBridge

Follow Us

On Data Science Central

On DataViz

On Hadoop

© 2017   AnalyticBridge.com is a subsidiary and dedicated channel of Data Science Central LLC   Powered by

Badges  |  Report an Issue  |  Terms of Service