R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

  library(ggplot2)

  summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
  cars_local <- cars
  cars_means <- apply(cars,2,mean)
  
  mtcars$cyl = factor(mtcars$cyl)
  ggplot(data=mtcars, aes(x=disp, y=mpg, color=cyl)) + geom_point() +
    xlab("Displacement") + ylab("Miles per Gallon")

Python

We check to see if we can call Python.

print("Hello World. If you can read this, Python is working.")
## Hello World. If you can read this, Python is working.

And we do a little computation. Beware of indentation and be sure to note that r is reserved for the R/Python interface object. If you overwrite r, things will break.

G = 6.67 * (10 ** -11)
M = 2.0 * (10 ** 30) # Mass of the Sun
m = 6.0 * (10 ** 24) # Mass of the Earth
d = 3.0 * (10 ** 11)
F = G*M*m/((d/2) ** 2)

print("Force of gravity = ", F)
## Force of gravity =  3.5573333333333336e+22

And we mess with conditionals.

price = 257

if (price >= 300):
  price *= 0.7
elif (price >= 200):
  price *= 0.8
elif (price >= 100):
  price *= 0.9
elif (price >= 50):
  price *= 0.95
else:
  price
  
print(price)
## 205.60000000000002

Mess with string.

x = 3
y = 1

def rep_cat(x, y):
  return str(x) * 8 + str(y) * 5

z = rep_cat(x, y)  
print(z)
## 3333333311111

Fibonacci is always fun.

def fib(n):
  first = 0
  second = 1
  
  if (n < 1):
    return -1
    
  if (n == 1):
    return first
    
  if (n == 2):
    return second
  
  i = 3
  while (i <= n):
    fib_n = first + second
    first = second
    second = fib_n
    i += 1
  return fib_n
  
n = 10
print(fib(n))
## 34
def Fibonacci(n):
    # Check if input is 0 then it will
    # print incorrect input
    if (n < 0):
        print("Incorrect input")
    elif (n == 0):
        return 0
    elif (n == 1 or n == 2):
        return 1
    else:
        return Fibonacci(n-1) + Fibonacci(n-2)
        
n = 10
print(Fibonacci(n))
## 55
i = 0
while (n > 0):
  print(Fibonacci(i))
  n -= 1
  i += 1
## 0
## 1
## 1
## 2
## 3
## 5
## 8
## 13
## 21
## 34

Using Python Data in R

We can import Python data into R. First read a CSV file to create a Python data frame.

import pandas as pd

htwt = pd.read_csv("Data/HtWt.csv")
htwt.describe()
##           Height      Weight      Group
## count  20.000000   20.000000  20.000000
## mean   62.100000  139.600000   1.550000
## std     8.441127   43.122103   0.510418
## min    51.000000   82.000000   1.000000
## 25%    56.000000  108.250000   1.000000
## 50%    59.500000  123.500000   2.000000
## 75%    68.000000  166.750000   2.000000
## max    79.000000  228.000000   2.000000

Now we use the Python data frame in an R ggplot call.

  library(ggplot2)
  summary(py$htwt)
##      Height         Weight          Group     
##  Min.   :51.0   Min.   : 82.0   Min.   :1.00  
##  1st Qu.:56.0   1st Qu.:108.2   1st Qu.:1.00  
##  Median :59.5   Median :123.5   Median :2.00  
##  Mean   :62.1   Mean   :139.6   Mean   :1.55  
##  3rd Qu.:68.0   3rd Qu.:166.8   3rd Qu.:2.00  
##  Max.   :79.0   Max.   :228.0   Max.   :2.00
  ggplot(py$htwt, aes(x=Height, y=Weight)) + geom_point() +
    geom_smooth(se=FALSE) + geom_smooth(method="lm", color="orange", se=FALSE) +
    xlab("Height (in)") + ylab("Weight (lbs)")
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'

Using R data in Python

The cars data frame is built into R. We can create a local copy from the global.

  summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00
  cars_local = cars
  summary(cars_local)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Now we import the R data frames cars_local and mtcars into Python and play with them.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

cars_loc = r.cars_local
cars_loc.describe()
##            speed        dist
## count  50.000000   50.000000
## mean   15.400000   42.980000
## std     5.287644   25.769377
## min     4.000000    2.000000
## 25%    12.000000   26.000000
## 50%    15.000000   36.000000
## 75%    19.000000   56.000000
## max    25.000000  120.000000
mtcars = r["mtcars"]
mtcars.describe()
##              mpg        disp          hp  ...         am       gear     carb
## count  32.000000   32.000000   32.000000  ...  32.000000  32.000000  32.0000
## mean   20.090625  230.721875  146.687500  ...   0.406250   3.687500   2.8125
## std     6.026948  123.938694   68.562868  ...   0.498991   0.737804   1.6152
## min    10.400000   71.100000   52.000000  ...   0.000000   3.000000   1.0000
## 25%    15.425000  120.825000   96.500000  ...   0.000000   3.000000   2.0000
## 50%    19.200000  196.300000  123.000000  ...   0.000000   4.000000   2.0000
## 75%    22.800000  326.000000  180.000000  ...   1.000000   4.000000   4.0000
## max    33.900000  472.000000  335.000000  ...   1.000000   5.000000   8.0000
## 
## [8 rows x 10 columns]
fig = plt.figure()
sns.scatterplot(data=mtcars, x='disp', y='mpg', hue='cyl')
plt.xlabel("Displacement")
plt.ylabel("Miles per Gallon")
#plt.gca().legend_.remove()
plt.show()