Data Wrangling deals with transforming the data into it’s desired form. Around 80% of data analysis is spent on cleaning and preparing data.
This is the 2nd part of our Top Packages in R Series.
In first part, we talked about the most useful and top packages for Data Visualization in R. If you haven’t seen that part you can read it here → Top Packages in R – Data Visualization.
For this post we are going to talk about top packages in R for Data Wrangling (Manipulation).
Data Wrangling (Manipulation)
This is a famous package in R used for data munging when working with data frames.
Operations that can be performed on data sets using dplyr are subsetting, summarizing, rearranging, and joining.
dplyr is our favorite package for fast data manipulation.
Some data manipulations functions are :
- mutate() – adds new variables that are functions of existing variables, preserving existing variable.
- select() – picks variables based on their names.
- filter() – picks cases based on their values.
- summarise() – reduces multiple values down to a single summary.
- arrange() – changes the ordering of the rows.
- group_by() – allows you to perform any operation “by group” along with the above functions.
It is used for tidying data, reshaping and aggregation. It helps change data set layout to create tidy data.
Tidy data is data where:
- column represents a variable.
- row represents a observation.
- cell represents a value.
Fundamental functions are :
- gather() – takes multiple columns, and gathers them into key-value pairs: it makes “wide” data longer.
- spread() – takes two columns (key & value) and spreads in to multiple columns, it makes “long” data wider.
- separate() – turns a single character column into multiple columns.
- extract() – turns each group into a new column
Contains numerous functions used for text manipulation. Includes tools for regular expression and character string manipulation.
Its a simple and consistent Wrappers for Common String Operations.
Some Functions in stringr are:
- str_detect() – Detect the presence of a pattern match in a string.
- str_count() – Count the number of matches in a string.
- str_subset() – Return only the strings that contain a pattern match.
- str_length() – The width of strings.
- str_c() – Join multiple strings into a single string.
It makes working with date and time easy. Provides user friendly parsing of date and time data, extraction and updating of components of a date and time.
It is an extension of data.frame, popular for heavy-duty data wrangling. Analysts use data.table because of its speed with large data sets.
Provides faster aggregation of large data sets, fast add/modify/delete of columns.
Thanks for reading Top Packages in R – Data Wrangling (Manipulation)
Follow our website to learn about latest technologies, and concepts. Xpertup with us.
For any queries feel free to comment down below.