
- #CLEAN TEXT IN RSTUDIO CODE#
- #CLEAN TEXT IN RSTUDIO SERIES#
Note: I am not necessarily modelling good filenaming practice here. As you might guess, write.table() is the main workhorse for exporting rectangular, spreadsheet-y data. Recall that read.table() is the main workhorse for importing rectangular, spreadsheet-y data. The jCoefs ame is an example of an intermediate result that we want to store for the future and for downstream analyses or visualizations. tail(jCoefs) # country continent intercept slope # $ continent: Factor w/ 5 levels "Africa","Americas".: 3 4 1 1 2 5 4 3. JCoefs <- ddply(gDat, ~country + continent, jFun) Names(estCoefs) <- c( "intercept", "slope") library(plyr)ĮstCoefs <- coef( lm(lifeExp ~ I(year - yearMin), x)) We store only the estimated slope and intercept. Let's quickly redo an analysis we developed when learning about data aggregation: linear regression of life expectancy on year, for each country separately.
Nine simple ways to make it easier to (re)use your data by White, et al, primarily section 4 "Use Standard Data Formats". See the references below for links to two excellent articles on this topic: Try to get data into this form early on in an analysis and proactively keep it that way. I'm looking at you, make.įinally, certain data "shapes" or formats are simply more amenable to modelling, data aggregation, and visualization. Also remember there are other tools and workflows for making something reproducible. Maybe you are making a small but crucial contribution to a giant multi-author paper. Maybe you are just doing data cleaning to produce a valid input dataset. But there are lots of good reasons why (parts of) an analysis should not (only) be embedded in a dynamic report. There are projects and documents where the scope and personnel will allow you to geek out with knitr and R Markdown. How does this fit with the current push towards dynamic reports? There is a time and place for everything. It also creates barriers for anyone who has a different toolkit than you do. Reading and writing to exotic or proprietary formats will be the first thing to break in the future or on a different computer. A plain text file that is readable by a human being in a text editor should be your default until you have actual proof that this will not work. Think back on all the pain you have suffered importing data and try not self-inflict such pain in the future. a numerical result from data aggregation or modelling or statistical inferenceįirst tip: today's outputs are tomorrow's inputs. a clean ready-to-analyze dataset that you heroically created from messy data. There will be many occasions when you need to write data from R. # $ continent: Factor w/ 5 levels "Africa","Americas".: 3 3 3 3 3 3 3 3. # $ country : Factor w/ 142 levels "Afghanistan".: 1 1 1 1 1 1 1 1 1 1. Plan B (I use here, because of where the source of this tutorial lives): # data import from URLīasic sanity check that the import has gone well: str(gDat) # 'ame': 1704 obs. Load the Gapminder dataĪssuming the data can be found in the current working directory, this works: gDat <- lim( "gapminderDataFiveYear.txt") R, containing no spaces or other funny stuff, and evoking "data export" or "writing data to file". In due course, save this script with a name ending in. #CLEAN TEXT IN RSTUDIO CODE#
Develop and run your code from there (recommended) or periodicially copy "good" commands from the history. Open a new R script (in RStudio, File > New > R Script). Confirm that the new R process has the desired working directory, for example, with the getwd() command or by glancing at the top of RStudio's Console pane. To ensure a clean slate, you may wish to clean out your workspace and restart R (both available from the RStudio Session menu, among other methods). Now is the time to make sure you are working in the appropriate directory on your computer, perhaps through the use of an RStudio project.
#CLEAN TEXT IN RSTUDIO SERIES#
This is one in a series of tutorials in which we explore basic data import, exploration and much more using data from the Gapminder project. Ignore if you don't need this bit of support. Getting data out of R Optional getting started advice Other types of objects to use dput() or saveRDS() on.Reordering the levels of the country factor.