Matrices, factores y dataframes

2.1 Matrices

  • Ejecuta los siguientes comandos.
matrix(data=5, nr=2, nc=2)
##      [,1] [,2]
## [1,]    5    5
## [2,]    5    5
matrix(1:6, 2, 3)
##      [,1] [,2] [,3]
## [1,]    1    3    5
## [2,]    2    4    6
matrix(1:6, 2, 3, byrow=TRUE)
##      [,1] [,2] [,3]
## [1,]    1    2    3
## [2,]    4    5    6
  • Crea un vector z con los 30 primeros números y crea con el una matriz m con 3 filas y 10 columnas.
z<-c(1:30)

m<-matrix(data=z, nr=3, nc=10, byrow = TRUE)
  • Escribe la tercera columna en un vector
z<-m[,3]

*Create in R the matrices

x =3 21 −1 1

y =1 4 0 0 1 -1

x<-c(3,21,-1,1)
xmatrix<-matrix(data=x,ncol = 2, nrow = 2, byrow = TRUE)
xmatrix
##      [,1] [,2]
## [1,]    3   21
## [2,]   -1    1
y<-c(1,4,0,0,1,-1)
ymatrix<-matrix(data=y, ncol=3, nrow=2, byrow = TRUE)
ymatrix
##      [,1] [,2] [,3]
## [1,]    1    4    0
## [2,]    0    1   -1
  • Y calcula los efectos de los siguientes comandos
    1. x[1,]
    1. x[2,]
    1. x[,2]
    1. y[1,2]
    1. y[,2:3]
xmatrix[1,]
## [1]  3 21
xmatrix[2,]
## [1] -1  1
xmatrix[,2]
## [1] 21  1
ymatrix[1,2]
## [1] 4
ymatrix[,2:3]
##      [,1] [,2]
## [1,]    4    0
## [2,]    1   -1
  • Transforma la matriz m que creaste en el ejercicio anterior en un array multidimensional.
z<-c(1:30)
m<-matrix(data=z, nr=3, nc=10, byrow = TRUE)

marray = array(m, dim=c(dim(m),2))
marray
## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7    8    9    10
## [2,]   11   12   13   14   15   16   17   18   19    20
## [3,]   21   22   23   24   25   26   27   28   29    30
## 
## , , 2
## 
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,]    1    2    3    4    5    6    7    8    9    10
## [2,]   11   12   13   14   15   16   17   18   19    20
## [3,]   21   22   23   24   25   26   27   28   29    30
  • Crea un array de 5 x 5 y rellénalo con valores del 1 al 25. Investiga la función array(). Llama al array x
x<-array(1:25, dim=c(5,5,1))
x
## , , 1
## 
##      [,1] [,2] [,3] [,4] [,5]
## [1,]    1    6   11   16   21
## [2,]    2    7   12   17   22
## [3,]    3    8   13   18   23
## [4,]    4    9   14   19   24
## [5,]    5   10   15   20   25
  • Escribe el array x en un vector y
y<-x[1:25]
  • Dadas las matrices m1 y m2 usa rbind() y cbind() para crear matrices nuevas utilizando estas funciones, llamalas M1 y M2. ¿En que se diferencian las matrices creadas?
m1 <- matrix(1, nr = 2, nc = 2)
m2 <- matrix(2, nr = 2, nc = 2)
M1 <-cbind(m1, m2)
M2 <-rbind(m1, m2)
M1
##      [,1] [,2] [,3] [,4]
## [1,]    1    1    2    2
## [2,]    1    1    2    2
M2
##      [,1] [,2]
## [1,]    1    1
## [2,]    1    1
## [3,]    2    2
## [4,]    2    2

La diferencia está en que cbind une por las columnas y rbind une por filas

  • El operador para el producto de dos matrices es ‘%* %’. Por ejemplo, considerando las dos matrices creadas en el ejercicio anterior utilízalo.
M1%*%M2
##      [,1] [,2]
## [1,]   10   10
## [2,]   10   10
  • Usa la matriz M1 del ejercicio anterior y aplica la función t(). ¿qué hace esa función?
help(t)
t(M1)
##      [,1] [,2]
## [1,]    1    1
## [2,]    1    1
## [3,]    2    2
## [4,]    2    2

La función T nos ofrece la matriz o el dataframe traspuesto del que le pasemos como argumento

  • Ejecuta los siguientes comandos basados en la función diag() sobre las matrices creadas anteriormente m1 y m2. ¿Qué tipo de acciones puedes ejecutar con ella?
diag(m1)
## [1] 1 1
diag(rbind(m1, m2) %*% cbind(m1, m2))
## [1] 2 2 8 8
diag(m1) <- 10

diag(3)
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1
v <- c(10, 20, 30)

diag(v)
##      [,1] [,2] [,3]
## [1,]   10    0    0
## [2,]    0   20    0
## [3,]    0    0   30
diag(2.1, nr = 3, nc = 5)
##      [,1] [,2] [,3] [,4] [,5]
## [1,]  2.1  0.0  0.0    0    0
## [2,]  0.0  2.1  0.0    0    0
## [3,]  0.0  0.0  2.1    0    0

La función diag nos permite obtener la diagonal de una matriz dada o crear una matriz con una diagonal especificada-

  • Ordena la matriz x <- matrix(1:100, ncol=10):
  1. En orden descendente por su segunda columna y asigna el resultado a una nueva matrix x1. Pista: función order()
  2. En orden descendente por su segunda fila y asigna el resultado a una nueva matrix x2
  3. Ordena solo la primera columna de x de forma descendente
x <- matrix(1:100, ncol=10)

x1<-x[order(x[,2], decreasing=TRUE),]
x2<-x[order(x[2,], decreasing=TRUE),]
x[,1] <-  sort(x[,1],decreasing = TRUE)
x1 <- x
  • Accede al dataset “women”.
  1. Primero confirma que los datos están ordenados de forma creciente según la altura (height) y el peso (weight) sin mirar los datos.
  2. Crea una nueva variable “bmi”. Este valor responde a la siguiente fórmula:

BMI = ( Weight in Pounds / (Height in inches) x (Height in inches) ) x 703

  1. Ordena el dataframe por el valor de bmi y por orden alfabético de la variable name
str(women)
## 'data.frame':    15 obs. of  3 variables:
##  $ height: num  58 59 60 61 62 63 64 65 66 67 ...
##  $ weight: num  115 117 120 123 126 129 132 135 139 142 ...
##  $ bmi   : num  24 23.6 23.4 23.2 23 ...
is.unsorted(women$height)
## [1] FALSE
is.unsorted(women$weight)
## [1] FALSE
women$bmi<-(women$weight/women$height^2)*703
order(women$bmi)
##  [1] 13 12 14 11 10 15  9  8  7  6  5  4  3  2  1
  • Crea los siguientes vectores:

Box office Star Wars: In Millions (!) First element: US, Second element:Non-US

new_hope = c(460.998007, 314.4)
empire_strikes = c(290.475067, 247.9)
return_jedi = c(309.306177, 165.8)
  • Construye la matriz star_wars_matrix con esos vectores
star_wars_matrix<-rbind(new_hope,empire_strikes,return_jedi )
star_wars_matrix
##                    [,1]  [,2]
## new_hope       460.9980 314.4
## empire_strikes 290.4751 247.9
## return_jedi    309.3062 165.8
  • Añádele nombres a las columnas y filas de la matriz según las descripciones dadas anteriormente de los datos
colnames(star_wars_matrix)<-c("US","Non-US")
star_wars_matrix
##                      US Non-US
## new_hope       460.9980  314.4
## empire_strikes 290.4751  247.9
## return_jedi    309.3062  165.8
  • Calcula las ganacias mundiales de cada película y guardalas en un vector que se llame worldwide_vector.
worldwide_vector<-rowSums(star_wars_matrix)
  • Añade éste último vector como una columna nueva a la matriz star_wars_matrix y asigna el resultado a all_wars_matrix. Usa para ello la función cbind().
all_wars_matrix<-cbind(star_wars_matrix,worldwide_vector)
all_wars_matrix
##                      US Non-US worldwide_vector
## new_hope       460.9980  314.4         775.3980
## empire_strikes 290.4751  247.9         538.3751
## return_jedi    309.3062  165.8         475.1062
  • Calcula las ganancias totales en USA y fuera de USA para las tres películas. Puedes usar para ello la función colSums()
colSums(all_wars_matrix[,1:2])
##       US   Non-US 
## 1060.779  728.100
  • Calcula la media de ganancias para todas las películas fuera de los estados unidos. Asigna esa media la variable non_us_all.
non_us_all<-mean(all_wars_matrix[,2])
  • Haz lo mismo pero solo para las dos primeras películas . Asigna el resultado a la variable non_us_some.
non_us_some<-mean(all_wars_matrix[1:2,2])
  • Calcula cuantos visitantes hubo para cada película en cada área geográfica. Ya tienes las ganancias totales en star_wars_matrix. Asume que el precio de las entradas es de cinco euros/dólares.
star_wars_matrix/5
##                      US Non-US
## new_hope       92.19960  62.88
## empire_strikes 58.09501  49.58
## return_jedi    61.86124  33.16
  • Calcula la media de visitantes en territorio USA y en territorio noUS
mean(star_wars_matrix[,1]/5)
## [1] 70.71862
mean(star_wars_matrix[,2]/5)
## [1] 48.54

2.2 Subsetting matrices y arrays

  • Crea un array i <- array(c(1:10),dim=c(5,2)). ¿Que información te dan los siguientes comandos?
i <- array(c(1:10),dim=c(5,2))

dim(i)
## [1] 5 2
#Nos dice el número de filas y columnas

nrow(i) 
## [1] 5
#Nos dice el número de filas

ncol(i) 
## [1] 2
#Nos dice el número de columnas
  • Crea un array de dimensiones 5 filas y dos columnas y rellénalo con valores del 1-5 y del 5 al 1
x <- array(c(c(1:5),c(5:1)), dim = c(5,2))
  • ¿Qué hace el comando x[i]? ¿y el comando x[i] <- 0? Comprueba que tienes en x antes.
x
##      [,1] [,2]
## [1,]    1    5
## [2,]    2    4
## [3,]    3    3
## [4,]    4    2
## [5,]    5    1
x[6]
## [1] 5

Devuelve el elemento i del array pero lo hace por columnas. El otro comando asigna el valor que pasamos a la derecha de <-, en este caso 0, al elemento i del array.

  • Descárgate el fichero array_datos.txt de PRADO (Datos/) e impórtalo en tu work space de R teniendo en cuenta que es un texto tabulado. Después crea un documento con los mismos datos pero en formato csv en vez de tab separated.
array_datos<-read.table("./datasets/array_datos.txt")
array_datos
##   edad peso altura
## 1   20   65    174
## 2   22   70    180
## 3   19   68    170
write.csv(array_datos, file="./datasets/array_datos2.csv")

2.3 Factors

  • Dado x = c(1, 2, 3, 3, 5, 3, 2, 4, NA), ¿cuáles son los levels de factor(x)?
x = c(1, 2, 3, 3, 5, 3, 2, 4, NA) 

factor(x)
## [1] 1    2    3    3    5    3    2    4    <NA>
## Levels: 1 2 3 4 5

Los NA no se tiene como posibles valores del factor, por lo que los niveles de este serán 1, 2, 3, 4 y 5.

  • Dado x <- c(11, 22, 47, 47, 11, 47, 11) y la ejecución de la sentencia factor(x, levels=c(11, 22, 47), ordered=TRUE)¿cuál es el cuarto elemento de la salida?
x <- c(11, 22, 47, 47, 11, 47, 11)
factor(x, levels=c(11, 22, 47), ordered=TRUE)
## [1] 11 22 47 47 11 47 11
## Levels: 11 < 22 < 47

No hay cuarto elemento, ya que solo tenemos tres posibles valores para el factor

  • Para el factor z <- c(“p”, “a” , “g”, “t”, “b”), reemplaza el tercer elemento de z por “b”.
z <- c("p", "a" , "g", "t", "b")
z[3] <- "b"
z
## [1] "p" "a" "b" "t" "b"
  • Dado z <- factor(c(“p”, “q”, “p”, “r”, “q”)) escribe una expresión de R que cambie el level “p” a “w”
z <- factor(c("p", "q", "p", "r", "q"))
z
## [1] p q p r q
## Levels: p q r
levels(z)[1] <- "w"
z
## [1] w q w r q
## Levels: w q r
  • Usa el dataset “iris” Escribe la expresión necesaria para convertir la variable “Sepal.Length” en un factor con cinco niveles (levels) .
irisfactor <- factor(cut(iris$Sepal.Length, b=5))
irisfactor
##   [1] (5.02,5.74] (4.3,5.02]  (4.3,5.02]  (4.3,5.02]  (4.3,5.02] 
##   [6] (5.02,5.74] (4.3,5.02]  (4.3,5.02]  (4.3,5.02]  (4.3,5.02] 
##  [11] (5.02,5.74] (4.3,5.02]  (4.3,5.02]  (4.3,5.02]  (5.74,6.46]
##  [16] (5.02,5.74] (5.02,5.74] (5.02,5.74] (5.02,5.74] (5.02,5.74]
##  [21] (5.02,5.74] (5.02,5.74] (4.3,5.02]  (5.02,5.74] (4.3,5.02] 
##  [26] (4.3,5.02]  (4.3,5.02]  (5.02,5.74] (5.02,5.74] (4.3,5.02] 
##  [31] (4.3,5.02]  (5.02,5.74] (5.02,5.74] (5.02,5.74] (4.3,5.02] 
##  [36] (4.3,5.02]  (5.02,5.74] (4.3,5.02]  (4.3,5.02]  (5.02,5.74]
##  [41] (4.3,5.02]  (4.3,5.02]  (4.3,5.02]  (4.3,5.02]  (5.02,5.74]
##  [46] (4.3,5.02]  (5.02,5.74] (4.3,5.02]  (5.02,5.74] (4.3,5.02] 
##  [51] (6.46,7.18] (5.74,6.46] (6.46,7.18] (5.02,5.74] (6.46,7.18]
##  [56] (5.02,5.74] (5.74,6.46] (4.3,5.02]  (6.46,7.18] (5.02,5.74]
##  [61] (4.3,5.02]  (5.74,6.46] (5.74,6.46] (5.74,6.46] (5.02,5.74]
##  [66] (6.46,7.18] (5.02,5.74] (5.74,6.46] (5.74,6.46] (5.02,5.74]
##  [71] (5.74,6.46] (5.74,6.46] (5.74,6.46] (5.74,6.46] (5.74,6.46]
##  [76] (6.46,7.18] (6.46,7.18] (6.46,7.18] (5.74,6.46] (5.02,5.74]
##  [81] (5.02,5.74] (5.02,5.74] (5.74,6.46] (5.74,6.46] (5.02,5.74]
##  [86] (5.74,6.46] (6.46,7.18] (5.74,6.46] (5.02,5.74] (5.02,5.74]
##  [91] (5.02,5.74] (5.74,6.46] (5.74,6.46] (4.3,5.02]  (5.02,5.74]
##  [96] (5.02,5.74] (5.02,5.74] (5.74,6.46] (5.02,5.74] (5.02,5.74]
## [101] (5.74,6.46] (5.74,6.46] (6.46,7.18] (5.74,6.46] (6.46,7.18]
## [106] (7.18,7.9]  (4.3,5.02]  (7.18,7.9]  (6.46,7.18] (7.18,7.9] 
## [111] (6.46,7.18] (5.74,6.46] (6.46,7.18] (5.02,5.74] (5.74,6.46]
## [116] (5.74,6.46] (6.46,7.18] (7.18,7.9]  (7.18,7.9]  (5.74,6.46]
## [121] (6.46,7.18] (5.02,5.74] (7.18,7.9]  (5.74,6.46] (6.46,7.18]
## [126] (7.18,7.9]  (5.74,6.46] (5.74,6.46] (5.74,6.46] (7.18,7.9] 
## [131] (7.18,7.9]  (7.18,7.9]  (5.74,6.46] (5.74,6.46] (5.74,6.46]
## [136] (7.18,7.9]  (5.74,6.46] (5.74,6.46] (5.74,6.46] (6.46,7.18]
## [141] (6.46,7.18] (6.46,7.18] (5.74,6.46] (6.46,7.18] (6.46,7.18]
## [146] (6.46,7.18] (5.74,6.46] (6.46,7.18] (5.74,6.46] (5.74,6.46]
## Levels: (4.3,5.02] (5.02,5.74] (5.74,6.46] (6.46,7.18] (7.18,7.9]
  • El factor responses se define como:

responses <- factor(c(“Agree”, “Agree”, “Strongly Agree”, “Disagree”, “Agree”))

Sin embargo nos damos cuenta que tiene un nuevo nivel, “Strongly Disagree”, que no estaba presente cuando se creó. Añade el nuevo nivel al factor y conviértelo en un factor ordenado de la siguiente forma:

Levels: Strongly Agree < Agree < Disagree < Strongly Disagree

responses <- factor(c("Agree", "Agree", "Strongly Agree", "Disagree", "Agree"))
levels(responses) = c(levels(responses), "Strongly disagree")
responses = factor(responses, levels(responses)[c(3,1:2,4)], ordered = TRUE)
responses
## [1] Agree          Agree          Strongly Agree Disagree      
## [5] Agree         
## Levels: Strongly Agree < Agree < Disagree < Strongly disagree
  • Dado el factor:

x <- factor(c(“high”, “low”, “medium”, “high”, “high”, “low”, “medium”))

Escribe la expresión en R que permita dar valores numéricos únicos para los distintos niveles (levels) de x según el siguiente esquema:

level high => value 1 level low => value 2 level medium => value 3

x <- factor(c("high", "low", "medium", "high", "high", "low", "medium"), levels=c("high","low","medium"),labels=c(1,2,3))
unique(x)
## [1] 1 2 3
## Levels: 1 2 3

2.4 Acceso y selección de secciones de un data frames

Vamos a trabajar con un ejemplo que viene por defecto en la instalación de R USArrests. Este data frame contiene la información para cada estado Americano de las tasas de criminales (por 100.000 habitantes). Los datos de las columnas se refieren a Asesinatos, violaciones yporcentaje de la población que vive en áreas urbanas. Los datos son de 1973. Contesta a las siguientes preguntas sobre los datos:

  1. Las dimensiones del dataframe
  2. La longitud del dataframe (filas o columnas)
  3. Numero de columnas
USArrests
##                Murder Assault UrbanPop Rape
## Alabama          13.2     236       58 21.2
## Alaska           10.0     263       48 44.5
## Arizona           8.1     294       80 31.0
## Arkansas          8.8     190       50 19.5
## California        9.0     276       91 40.6
## Colorado          7.9     204       78 38.7
## Connecticut       3.3     110       77 11.1
## Delaware          5.9     238       72 15.8
## Florida          15.4     335       80 31.9
## Georgia          17.4     211       60 25.8
## Hawaii            5.3      46       83 20.2
## Idaho             2.6     120       54 14.2
## Illinois         10.4     249       83 24.0
## Indiana           7.2     113       65 21.0
## Iowa              2.2      56       57 11.3
## Kansas            6.0     115       66 18.0
## Kentucky          9.7     109       52 16.3
## Louisiana        15.4     249       66 22.2
## Maine             2.1      83       51  7.8
## Maryland         11.3     300       67 27.8
## Massachusetts     4.4     149       85 16.3
## Michigan         12.1     255       74 35.1
## Minnesota         2.7      72       66 14.9
## Mississippi      16.1     259       44 17.1
## Missouri          9.0     178       70 28.2
## Montana           6.0     109       53 16.4
## Nebraska          4.3     102       62 16.5
## Nevada           12.2     252       81 46.0
## New Hampshire     2.1      57       56  9.5
## New Jersey        7.4     159       89 18.8
## New Mexico       11.4     285       70 32.1
## New York         11.1     254       86 26.1
## North Carolina   13.0     337       45 16.1
## North Dakota      0.8      45       44  7.3
## Ohio              7.3     120       75 21.4
## Oklahoma          6.6     151       68 20.0
## Oregon            4.9     159       67 29.3
## Pennsylvania      6.3     106       72 14.9
## Rhode Island      3.4     174       87  8.3
## South Carolina   14.4     279       48 22.5
## South Dakota      3.8      86       45 12.8
## Tennessee        13.2     188       59 26.9
## Texas            12.7     201       80 25.5
## Utah              3.2     120       80 22.9
## Vermont           2.2      48       32 11.2
## Virginia          8.5     156       63 20.7
## Washington        4.0     145       73 26.2
## West Virginia     5.7      81       39  9.3
## Wisconsin         2.6      53       66 10.8
## Wyoming           6.8     161       60 15.6
dim.data.frame(USArrests) # 50 * 4
## [1] 50  4
nrow(USArrests) # 50
## [1] 50
ncol(USArrests) # 4
## [1] 4
nrow(USArrests)  
## [1] 50
row.names(USArrests)
##  [1] "Alabama"        "Alaska"         "Arizona"        "Arkansas"      
##  [5] "California"     "Colorado"       "Connecticut"    "Delaware"      
##  [9] "Florida"        "Georgia"        "Hawaii"         "Idaho"         
## [13] "Illinois"       "Indiana"        "Iowa"           "Kansas"        
## [17] "Kentucky"       "Louisiana"      "Maine"          "Maryland"      
## [21] "Massachusetts"  "Michigan"       "Minnesota"      "Mississippi"   
## [25] "Missouri"       "Montana"        "Nebraska"       "Nevada"        
## [29] "New Hampshire"  "New Jersey"     "New Mexico"     "New York"      
## [33] "North Carolina" "North Dakota"   "Ohio"           "Oklahoma"      
## [37] "Oregon"         "Pennsylvania"   "Rhode Island"   "South Carolina"
## [41] "South Dakota"   "Tennessee"      "Texas"          "Utah"          
## [45] "Vermont"        "Virginia"       "Washington"     "West Virginia" 
## [49] "Wisconsin"      "Wyoming"
colnames(USArrests)
## [1] "Murder"   "Assault"  "UrbanPop" "Rape"
head(USArrests, 6)
##            Murder Assault UrbanPop Rape
## Alabama      13.2     236       58 21.2
## Alaska       10.0     263       48 44.5
## Arizona       8.1     294       80 31.0
## Arkansas      8.8     190       50 19.5
## California    9.0     276       91 40.6
## Colorado      7.9     204       78 38.7

*Ordena de forma decreciente las filas de nuestro data frame según el porcentaje de población en el área urbana. Para ello investiga la función order () y sus parámetros.

USArrests[order(USArrests$UrbanPop, decreasing = TRUE),]
##                Murder Assault UrbanPop Rape
## California        9.0     276       91 40.6
## New Jersey        7.4     159       89 18.8
## Rhode Island      3.4     174       87  8.3
## New York         11.1     254       86 26.1
## Massachusetts     4.4     149       85 16.3
## Hawaii            5.3      46       83 20.2
## Illinois         10.4     249       83 24.0
## Nevada           12.2     252       81 46.0
## Arizona           8.1     294       80 31.0
## Florida          15.4     335       80 31.9
## Texas            12.7     201       80 25.5
## Utah              3.2     120       80 22.9
## Colorado          7.9     204       78 38.7
## Connecticut       3.3     110       77 11.1
## Ohio              7.3     120       75 21.4
## Michigan         12.1     255       74 35.1
## Washington        4.0     145       73 26.2
## Delaware          5.9     238       72 15.8
## Pennsylvania      6.3     106       72 14.9
## Missouri          9.0     178       70 28.2
## New Mexico       11.4     285       70 32.1
## Oklahoma          6.6     151       68 20.0
## Maryland         11.3     300       67 27.8
## Oregon            4.9     159       67 29.3
## Kansas            6.0     115       66 18.0
## Louisiana        15.4     249       66 22.2
## Minnesota         2.7      72       66 14.9
## Wisconsin         2.6      53       66 10.8
## Indiana           7.2     113       65 21.0
## Virginia          8.5     156       63 20.7
## Nebraska          4.3     102       62 16.5
## Georgia          17.4     211       60 25.8
## Wyoming           6.8     161       60 15.6
## Tennessee        13.2     188       59 26.9
## Alabama          13.2     236       58 21.2
## Iowa              2.2      56       57 11.3
## New Hampshire     2.1      57       56  9.5
## Idaho             2.6     120       54 14.2
## Montana           6.0     109       53 16.4
## Kentucky          9.7     109       52 16.3
## Maine             2.1      83       51  7.8
## Arkansas          8.8     190       50 19.5
## Alaska           10.0     263       48 44.5
## South Carolina   14.4     279       48 22.5
## North Carolina   13.0     337       45 16.1
## South Dakota      3.8      86       45 12.8
## Mississippi      16.1     259       44 17.1
## North Dakota      0.8      45       44  7.3
## West Virginia     5.7      81       39  9.3
## Vermont           2.2      48       32 11.2
USArrests[order(c(USArrests$UrbanPop,USArrests$Assault), decreasing = TRUE),]  
##                Murder Assault UrbanPop Rape
## NA                 NA      NA       NA   NA
## NA.1               NA      NA       NA   NA
## NA.2               NA      NA       NA   NA
## NA.3               NA      NA       NA   NA
## NA.4               NA      NA       NA   NA
## NA.5               NA      NA       NA   NA
## NA.6               NA      NA       NA   NA
## NA.7               NA      NA       NA   NA
## NA.8               NA      NA       NA   NA
## NA.9               NA      NA       NA   NA
## NA.10              NA      NA       NA   NA
## NA.11              NA      NA       NA   NA
## NA.12              NA      NA       NA   NA
## NA.13              NA      NA       NA   NA
## NA.14              NA      NA       NA   NA
## NA.15              NA      NA       NA   NA
## NA.16              NA      NA       NA   NA
## NA.17              NA      NA       NA   NA
## NA.18              NA      NA       NA   NA
## NA.19              NA      NA       NA   NA
## NA.20              NA      NA       NA   NA
## NA.21              NA      NA       NA   NA
## NA.22              NA      NA       NA   NA
## NA.23              NA      NA       NA   NA
## NA.24              NA      NA       NA   NA
## NA.25              NA      NA       NA   NA
## NA.26              NA      NA       NA   NA
## NA.27              NA      NA       NA   NA
## NA.28              NA      NA       NA   NA
## NA.29              NA      NA       NA   NA
## NA.30              NA      NA       NA   NA
## NA.31              NA      NA       NA   NA
## NA.32              NA      NA       NA   NA
## NA.33              NA      NA       NA   NA
## NA.34              NA      NA       NA   NA
## NA.35              NA      NA       NA   NA
## NA.36              NA      NA       NA   NA
## NA.37              NA      NA       NA   NA
## NA.38              NA      NA       NA   NA
## NA.39              NA      NA       NA   NA
## California        9.0     276       91 40.6
## New Jersey        7.4     159       89 18.8
## Rhode Island      3.4     174       87  8.3
## New York         11.1     254       86 26.1
## NA.40              NA      NA       NA   NA
## Massachusetts     4.4     149       85 16.3
## Hawaii            5.3      46       83 20.2
## Illinois         10.4     249       83 24.0
## NA.41              NA      NA       NA   NA
## Nevada           12.2     252       81 46.0
## NA.42              NA      NA       NA   NA
## Arizona           8.1     294       80 31.0
## Florida          15.4     335       80 31.9
## Texas            12.7     201       80 25.5
## Utah              3.2     120       80 22.9
## Colorado          7.9     204       78 38.7
## Connecticut       3.3     110       77 11.1
## Ohio              7.3     120       75 21.4
## Michigan         12.1     255       74 35.1
## Washington        4.0     145       73 26.2
## Delaware          5.9     238       72 15.8
## Pennsylvania      6.3     106       72 14.9
## NA.43              NA      NA       NA   NA
## Missouri          9.0     178       70 28.2
## New Mexico       11.4     285       70 32.1
## Oklahoma          6.6     151       68 20.0
## Maryland         11.3     300       67 27.8
## Oregon            4.9     159       67 29.3
## Kansas            6.0     115       66 18.0
## Louisiana        15.4     249       66 22.2
## Minnesota         2.7      72       66 14.9
## Wisconsin         2.6      53       66 10.8
## Indiana           7.2     113       65 21.0
## Virginia          8.5     156       63 20.7
## Nebraska          4.3     102       62 16.5
## Georgia          17.4     211       60 25.8
## Wyoming           6.8     161       60 15.6
## Tennessee        13.2     188       59 26.9
## Alabama          13.2     236       58 21.2
## Iowa              2.2      56       57 11.3
## NA.44              NA      NA       NA   NA
## New Hampshire     2.1      57       56  9.5
## NA.45              NA      NA       NA   NA
## Idaho             2.6     120       54 14.2
## Montana           6.0     109       53 16.4
## NA.46              NA      NA       NA   NA
## Kentucky          9.7     109       52 16.3
## Maine             2.1      83       51  7.8
## Arkansas          8.8     190       50 19.5
## Alaska           10.0     263       48 44.5
## South Carolina   14.4     279       48 22.5
## NA.47              NA      NA       NA   NA
## NA.48              NA      NA       NA   NA
## North Carolina   13.0     337       45 16.1
## South Dakota      3.8      86       45 12.8
## NA.49              NA      NA       NA   NA
## Mississippi      16.1     259       44 17.1
## North Dakota      0.8      45       44  7.3
## West Virginia     5.7      81       39  9.3
## Vermont           2.2      48       32 11.2
USArrests$Murder
##  [1] 13.2 10.0  8.1  8.8  9.0  7.9  3.3  5.9 15.4 17.4  5.3  2.6 10.4  7.2
## [15]  2.2  6.0  9.7 15.4  2.1 11.3  4.4 12.1  2.7 16.1  9.0  6.0  4.3 12.2
## [29]  2.1  7.4 11.4 11.1 13.0  0.8  7.3  6.6  4.9  6.3  3.4 14.4  3.8 13.2
## [43] 12.7  3.2  2.2  8.5  4.0  5.7  2.6  6.8
USArrests$Murder[2:4]
## [1] 10.0  8.1  8.8
USArrests[1:5,]
##            Murder Assault UrbanPop Rape
## Alabama      13.2     236       58 21.2
## Alaska       10.0     263       48 44.5
## Arizona       8.1     294       80 31.0
## Arkansas      8.8     190       50 19.5
## California    9.0     276       91 40.6
USArrests[,1:2]
##                Murder Assault
## Alabama          13.2     236
## Alaska           10.0     263
## Arizona           8.1     294
## Arkansas          8.8     190
## California        9.0     276
## Colorado          7.9     204
## Connecticut       3.3     110
## Delaware          5.9     238
## Florida          15.4     335
## Georgia          17.4     211
## Hawaii            5.3      46
## Idaho             2.6     120
## Illinois         10.4     249
## Indiana           7.2     113
## Iowa              2.2      56
## Kansas            6.0     115
## Kentucky          9.7     109
## Louisiana        15.4     249
## Maine             2.1      83
## Maryland         11.3     300
## Massachusetts     4.4     149
## Michigan         12.1     255
## Minnesota         2.7      72
## Mississippi      16.1     259
## Missouri          9.0     178
## Montana           6.0     109
## Nebraska          4.3     102
## Nevada           12.2     252
## New Hampshire     2.1      57
## New Jersey        7.4     159
## New Mexico       11.4     285
## New York         11.1     254
## North Carolina   13.0     337
## North Dakota      0.8      45
## Ohio              7.3     120
## Oklahoma          6.6     151
## Oregon            4.9     159
## Pennsylvania      6.3     106
## Rhode Island      3.4     174
## South Carolina   14.4     279
## South Dakota      3.8      86
## Tennessee        13.2     188
## Texas            12.7     201
## Utah              3.2     120
## Vermont           2.2      48
## Virginia          8.5     156
## Washington        4.0     145
## West Virginia     5.7      81
## Wisconsin         2.6      53
## Wyoming           6.8     161
USArrests[,c(1,3)]
##                Murder UrbanPop
## Alabama          13.2       58
## Alaska           10.0       48
## Arizona           8.1       80
## Arkansas          8.8       50
## California        9.0       91
## Colorado          7.9       78
## Connecticut       3.3       77
## Delaware          5.9       72
## Florida          15.4       80
## Georgia          17.4       60
## Hawaii            5.3       83
## Idaho             2.6       54
## Illinois         10.4       83
## Indiana           7.2       65
## Iowa              2.2       57
## Kansas            6.0       66
## Kentucky          9.7       52
## Louisiana        15.4       66
## Maine             2.1       51
## Maryland         11.3       67
## Massachusetts     4.4       85
## Michigan         12.1       74
## Minnesota         2.7       66
## Mississippi      16.1       44
## Missouri          9.0       70
## Montana           6.0       53
## Nebraska          4.3       62
## Nevada           12.2       81
## New Hampshire     2.1       56
## New Jersey        7.4       89
## New Mexico       11.4       70
## New York         11.1       86
## North Carolina   13.0       45
## North Dakota      0.8       44
## Ohio              7.3       75
## Oklahoma          6.6       68
## Oregon            4.9       67
## Pennsylvania      6.3       72
## Rhode Island      3.4       87
## South Carolina   14.4       48
## South Dakota      3.8       45
## Tennessee        13.2       59
## Texas            12.7       80
## Utah              3.2       80
## Vermont           2.2       32
## Virginia          8.5       63
## Washington        4.0       73
## West Virginia     5.7       39
## Wisconsin         2.6       66
## Wyoming           6.8       60
USArrests[1:5,1:2]
##            Murder Assault
## Alabama      13.2     236
## Alaska       10.0     263
## Arizona       8.1     294
## Arkansas      8.8     190
## California    9.0     276
USArrests$Murder
##  [1] 13.2 10.0  8.1  8.8  9.0  7.9  3.3  5.9 15.4 17.4  5.3  2.6 10.4  7.2
## [15]  2.2  6.0  9.7 15.4  2.1 11.3  4.4 12.1  2.7 16.1  9.0  6.0  4.3 12.2
## [29]  2.1  7.4 11.4 11.1 13.0  0.8  7.3  6.6  4.9  6.3  3.4 14.4  3.8 13.2
## [43] 12.7  3.2  2.2  8.5  4.0  5.7  2.6  6.8
minorMurder = USArrests[order(USArrests$Murder, decreasing = TRUE),]
minorMurder[nrow(minorMurder),]
##              Murder Assault UrbanPop Rape
## North Dakota    0.8      45       44  7.3

*¿Que estados tienen una tasa inferior al 4%?, obtén esa información

minorMurder[minorMurder$Murder<4,]
##               Murder Assault UrbanPop Rape
## South Dakota     3.8      86       45 12.8
## Rhode Island     3.4     174       87  8.3
## Connecticut      3.3     110       77 11.1
## Utah             3.2     120       80 22.9
## Minnesota        2.7      72       66 14.9
## Idaho            2.6     120       54 14.2
## Wisconsin        2.6      53       66 10.8
## Iowa             2.2      56       57 11.3
## Vermont          2.2      48       32 11.2
## Maine            2.1      83       51  7.8
## New Hampshire    2.1      57       56  9.5
## North Dakota     0.8      45       44  7.3
rownames(USArrests[USArrests[,"UrbanPop"]>quantile(USArrests[,"UrbanPop"],.75),])
##  [1] "Arizona"       "California"    "Colorado"      "Florida"      
##  [5] "Hawaii"        "Illinois"      "Massachusetts" "Nevada"       
##  [9] "New Jersey"    "New York"      "Rhode Island"  "Texas"        
## [13] "Utah"
students<-read.table("./datasets/student.txt", header = TRUE)
students
##    height shoesize gender population
## 1     181       44   male     kuopio
## 2     160       38 female     kuopio
## 3     174       42 female     kuopio
## 4     170       43   male     kuopio
## 5     172       43   male     kuopio
## 6     165       39 female     kuopio
## 7     161       38 female     kuopio
## 8     167       38 female    tampere
## 9     164       39 female    tampere
## 10    166       38 female    tampere
## 11    162       37 female    tampere
## 12    158       36 female    tampere
## 13    175       42   male    tampere
## 14    181       44   male    tampere
## 15    180       43   male    tampere
## 16    177       43   male    tampere
## 17    173       41   male    tampere
colnames(students)
## [1] "height"     "shoesize"   "gender"     "population"
students$height
##  [1] 181 160 174 170 172 165 161 167 164 166 162 158 175 181 180 177 173
table(students)
## , , gender = female, population = kuopio
## 
##       shoesize
## height 36 37 38 39 41 42 43 44
##    158  0  0  0  0  0  0  0  0
##    160  0  0  1  0  0  0  0  0
##    161  0  0  1  0  0  0  0  0
##    162  0  0  0  0  0  0  0  0
##    164  0  0  0  0  0  0  0  0
##    165  0  0  0  1  0  0  0  0
##    166  0  0  0  0  0  0  0  0
##    167  0  0  0  0  0  0  0  0
##    170  0  0  0  0  0  0  0  0
##    172  0  0  0  0  0  0  0  0
##    173  0  0  0  0  0  0  0  0
##    174  0  0  0  0  0  1  0  0
##    175  0  0  0  0  0  0  0  0
##    177  0  0  0  0  0  0  0  0
##    180  0  0  0  0  0  0  0  0
##    181  0  0  0  0  0  0  0  0
## 
## , , gender = male, population = kuopio
## 
##       shoesize
## height 36 37 38 39 41 42 43 44
##    158  0  0  0  0  0  0  0  0
##    160  0  0  0  0  0  0  0  0
##    161  0  0  0  0  0  0  0  0
##    162  0  0  0  0  0  0  0  0
##    164  0  0  0  0  0  0  0  0
##    165  0  0  0  0  0  0  0  0
##    166  0  0  0  0  0  0  0  0
##    167  0  0  0  0  0  0  0  0
##    170  0  0  0  0  0  0  1  0
##    172  0  0  0  0  0  0  1  0
##    173  0  0  0  0  0  0  0  0
##    174  0  0  0  0  0  0  0  0
##    175  0  0  0  0  0  0  0  0
##    177  0  0  0  0  0  0  0  0
##    180  0  0  0  0  0  0  0  0
##    181  0  0  0  0  0  0  0  1
## 
## , , gender = female, population = tampere
## 
##       shoesize
## height 36 37 38 39 41 42 43 44
##    158  1  0  0  0  0  0  0  0
##    160  0  0  0  0  0  0  0  0
##    161  0  0  0  0  0  0  0  0
##    162  0  1  0  0  0  0  0  0
##    164  0  0  0  1  0  0  0  0
##    165  0  0  0  0  0  0  0  0
##    166  0  0  1  0  0  0  0  0
##    167  0  0  1  0  0  0  0  0
##    170  0  0  0  0  0  0  0  0
##    172  0  0  0  0  0  0  0  0
##    173  0  0  0  0  0  0  0  0
##    174  0  0  0  0  0  0  0  0
##    175  0  0  0  0  0  0  0  0
##    177  0  0  0  0  0  0  0  0
##    180  0  0  0  0  0  0  0  0
##    181  0  0  0  0  0  0  0  0
## 
## , , gender = male, population = tampere
## 
##       shoesize
## height 36 37 38 39 41 42 43 44
##    158  0  0  0  0  0  0  0  0
##    160  0  0  0  0  0  0  0  0
##    161  0  0  0  0  0  0  0  0
##    162  0  0  0  0  0  0  0  0
##    164  0  0  0  0  0  0  0  0
##    165  0  0  0  0  0  0  0  0
##    166  0  0  0  0  0  0  0  0
##    167  0  0  0  0  0  0  0  0
##    170  0  0  0  0  0  0  0  0
##    172  0  0  0  0  0  0  0  0
##    173  0  0  0  0  1  0  0  0
##    174  0  0  0  0  0  0  0  0
##    175  0  0  0  0  0  1  0  0
##    177  0  0  0  0  0  0  1  0
##    180  0  0  0  0  0  0  1  0
##    181  0  0  0  0  0  0  0  1
sym<-ifelse(students$gender=="male", "M", "F")
colours<-ifelse(students$population=="kuopio", "Blue", "Red")
students.new<-cbind(students,sym, colours)
str(students.new)
## 'data.frame':    17 obs. of  6 variables:
##  $ height    : int  181 160 174 170 172 165 161 167 164 166 ...
##  $ shoesize  : int  44 38 42 43 43 39 38 38 39 38 ...
##  $ gender    : Factor w/ 2 levels "female","male": 2 1 1 2 2 1 1 1 1 1 ...
##  $ population: Factor w/ 2 levels "kuopio","tampere": 1 1 1 1 1 1 1 2 2 2 ...
##  $ sym       : Factor w/ 2 levels "F","M": 2 1 1 2 2 1 1 1 1 1 ...
##  $ colours   : Factor w/ 2 levels "Blue","Red": 1 1 1 1 1 1 1 2 2 2 ...
students.male = students.new[which(students.new$sym=="M"),]
students.female = students.new[which(students.new$sym=="F"),]
write.table(students.new, "./datasets/studentsnew.txt", col.names = TRUE)