Before emailing for help, try the following:
If you have done your best to solve the problem and still need help, please email us. A good email will include:
Please do all you can to make it as easy as possible for us to help you. It takes a long time, and there are only two of us.
passwords.sav
into R as a data.frame
. Save the data.frame
as an object named pw
.View()
, head()
, and str()
to inspect pw
.## rank password category value time_unit offline_crack_sec rank_alt
## 1 1 password password-related 6.91 years 2.17e+00 1
## 2 2 123456 simple-alphanumeric 18.52 minutes 1.11e-05 2
## 3 3 12345678 simple-alphanumeric 1.29 days 1.11e-03 3
## 4 4 1234 simple-alphanumeric 11.11 seconds 1.11e-07 4
## 5 5 qwerty simple-alphanumeric 3.72 days 3.21e-03 5
## 6 6 12345 simple-alphanumeric 1.85 minutes 1.11e-06 6
## strength font_size
## 1 8 11
## 2 4 8
## 3 4 8
## 4 4 8
## 5 8 11
## 6 4 8
View()
, head()
, and str()
to inspect pw
.## 'data.frame': 507 obs. of 9 variables:
## $ rank : num 1 2 3 4 5 6 7 8 9 10 ...
## $ password : chr "password " "123456 " "12345678 " "1234 " ...
## $ category : chr "password-related " "simple-alphanumeric" "simple-alphanumeric" "simple-alphanumeric" ...
## $ value : num 6.91 18.52 1.29 11.11 3.72 ...
## $ time_unit : chr "years " "minutes" "days " "seconds" ...
## $ offline_crack_sec: num 2.17 1.11e-05 1.11e-03 1.11e-07 3.21e-03 1.11e-06 3.21e-03 2.17 2.17 8.35e-02 ...
## $ rank_alt : num 1 2 3 4 5 6 7 8 9 10 ...
## $ strength : num 8 4 4 4 8 4 8 4 7 8 ...
## $ font_size : num 11 8 8 8 11 8 11 8 11 11 ...
## - attr(*, "codepage")= int 65001
password
is a character vector. Use the function nchar()
to count the length of each password.## [1] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [38] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [75] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [112] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [149] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [186] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [223] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [260] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [297] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [334] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [371] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [408] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [445] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
## [482] 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9 9
pw$password
. Why is the result of nchar()
the same for each password? This is explained in the “Details” section of the documentation for foreign::read.spss()
.“Fixed length strings (including value labels) are padded on the right with spaces by SPSS, and so are read that way by R.”
trimws()
to remove the leading and trailing whitespace from each character string of pw$password
.The objective of this section of the assignment is to determine the average strength of the different categories of passwords.
unique(pw$category)
to print the unique values of category to the console.## [1] "password-related " "simple-alphanumeric" "animal "
## [4] "sport " "cool-macho " "name "
## [7] "fluffy " "food " "nerdy-pop "
## [10] "rebellious-rude " " "
trimws()
to remove leading and trailing whitespace from pw$category
.pw$category
to a factor with the following levels and labels:Level Number | Level | Label |
---|---|---|
1 | name | Name |
2 | cool-macho | Cool/macho |
3 | simple-alphanumeric | Simple alphanumeric |
4 | fluffy | Fluffy |
5 | sport | Sport |
6 | nerdy-pop | Nerdy pop |
7 | animal | Animal |
8 | password-related | Password-related |
9 | food | Food |
10 | rebellious-rude | Rebellious/rude |
pw$category <- factor(pw$category,
levels = c(
"name",
"cool-macho",
"simple-alphanumeric",
"fluffy",
"sport",
"nerdy-pop",
"animal",
"password-related",
"food",
"rebellious-rude"
),
labels = c(
"Name",
"Cool/macho",
"Simple alphanumeric",
"Fluffy",
"Sport",
"Nerdy pop",
"Animal",
"Password-related",
"Food",
"Rebellious/rude"
)
)
by()
, aggregate()
, or tapply()
to do this with just one line of code.mean(pw$offline_crack_sec[pw$category == "Name"], na.rm = TRUE)
mean(pw$offline_crack_sec[pw$category == "Cool/macho"], na.rm = TRUE)
mean(pw$offline_crack_sec[pw$category == "Simple alphanumeric"], na.rm = TRUE)
mean(pw$offline_crack_sec[pw$category == "Fluffy"], na.rm = TRUE)
mean(pw$offline_crack_sec[pw$category == "Sport"], na.rm = TRUE)
mean(pw$offline_crack_sec[pw$category == "Nerdy pop"], na.rm = TRUE)
mean(pw$offline_crack_sec[pw$category == "Animal"], na.rm = TRUE)
mean(pw$offline_crack_sec[pw$category == "Password-related"], na.rm = TRUE)
mean(pw$offline_crack_sec[pw$category == "Food"], na.rm = TRUE)
mean(pw$offline_crack_sec[pw$category == "Rebellious/rude"], na.rm = TRUE)
by()
## pw$category: Name
## [1] 0.2794351
## ------------------------------------------------------------
## pw$category: Cool/macho
## [1] 0.3471083
## ------------------------------------------------------------
## pw$category: Simple alphanumeric
## [1] 0.6123755
## ------------------------------------------------------------
## pw$category: Fluffy
## [1] 0.1607586
## ------------------------------------------------------------
## pw$category: Sport
## [1] 1.106457
## ------------------------------------------------------------
## pw$category: Nerdy pop
## [1] 1.288615
## ------------------------------------------------------------
## pw$category: Animal
## [1] 0.2435385
## ------------------------------------------------------------
## pw$category: Password-related
## [1] 2.247346
## ------------------------------------------------------------
## pw$category: Food
## [1] 0.1999104
## ------------------------------------------------------------
## pw$category: Rebellious/rude
## [1] 0.4044709
aggregate()
## category offline_crack_sec
## 1 Name 0.2794351
## 2 Cool/macho 0.3471083
## 3 Simple alphanumeric 0.6123755
## 4 Fluffy 0.1607586
## 5 Sport 1.1064572
## 6 Nerdy pop 1.2886154
## 7 Animal 0.2435385
## 8 Password-related 2.2473456
## 9 Food 0.1999104
## 10 Rebellious/rude 0.4044709
tapply()
## Name Cool/macho Simple alphanumeric Fluffy
## 0.2794351 0.3471083 0.6123755 0.1607586
## Sport Nerdy pop Animal Password-related
## 1.1064572 1.2886154 0.2435385 2.2473456
## Food Rebellious/rude
## 0.1999104 0.4044709
strength
for password strengths ≤ 10.Don’t include these:
install.packages()
.
help()
or ?
.
ggplot2
ggplot2
Install ggplot2
.
Load ggplot2
.
Download “anscombe_long.csv” from Canvas and import it to R. Assign the resulting data.frame
object the name anscombe_long
.
Convert to the column dataset
to a factor with levels 1 = “I”, 2 = “II”, 3 = “III”, and 4 = “IV”.
“Four x-y datasets which have the same traditional statistical properties (mean, variance, correlation, regression line, etc.), yet are quite different.”
id | dataset | x | y |
---|---|---|---|
1 | I | 10 | 8.04 |
2 | I | 8 | 6.95 |
3 | I | 13 | 7.58 |
4 | I | 9 | 8.81 |
5 | I | 11 | 8.33 |
6 | I | 14 | 9.96 |
7 | I | 6 | 7.24 |
8 | I | 4 | 4.26 |
9 | I | 12 | 10.84 |
10 | I | 7 | 4.82 |
11 | I | 5 | 5.68 |
1 | II | 10 | 9.14 |
2 | II | 8 | 8.14 |
3 | II | 13 | 8.74 |
4 | II | 9 | 8.77 |
5 | II | 11 | 9.26 |
6 | II | 14 | 8.10 |
7 | II | 6 | 6.13 |
8 | II | 4 | 3.10 |
9 | II | 12 | 9.13 |
10 | II | 7 | 7.26 |
11 | II | 5 | 4.74 |
1 | III | 10 | 7.46 |
2 | III | 8 | 6.77 |
3 | III | 13 | 12.74 |
4 | III | 9 | 7.11 |
5 | III | 11 | 7.81 |
6 | III | 14 | 8.84 |
7 | III | 6 | 6.08 |
8 | III | 4 | 5.39 |
9 | III | 12 | 8.15 |
10 | III | 7 | 6.42 |
11 | III | 5 | 5.73 |
1 | IV | 8 | 6.58 |
2 | IV | 8 | 5.76 |
3 | IV | 8 | 7.71 |
4 | IV | 8 | 8.84 |
5 | IV | 8 | 8.47 |
6 | IV | 8 | 7.04 |
7 | IV | 8 | 5.25 |
8 | IV | 19 | 12.50 |
9 | IV | 8 | 5.56 |
10 | IV | 8 | 7.91 |
11 | IV | 8 | 6.89 |
## I II III IV
## 9 9 9 9
Table 1
Statistical Properties of x
and y
in Four Datasets
Mx | SDx | My | SDy | cor(x, y) | |
---|---|---|---|---|---|
I | 9 | 3.32 | 7.5 | 2.03 | 0.82 |
II | 9 | 3.32 | 7.5 | 2.03 | 0.82 |
III | 9 | 3.32 | 7.5 | 2.03 | 0.82 |
IV | 9 | 3.32 | 7.5 | 2.03 | 0.82 |
So… how are they different?
From the documentation for ggplot2::aes()
:
“Aesthetic mappings describe how variables in the data are mapped to visual properties (aesthetics) of geoms.”
Argument | The Value is Mapped to: |
---|---|
x | Where the geom is placed along the x-axis. |
y | Where the geom is placed along the y-axis. |
colour | The line colour of the geom. |
fill | The fill of the geom. |
linetype | Different linetypes (e.g., solid, dashed, or dotted). |
shape | Different shapes (e.g., square, circle, diamond). |
size | The size of the geom. |
alpha | The transparency of the geom. |
There are many, and we’ll learn more over the course of the lab. Today, we’ll learn about these:
Learn more about a geom by visiting the documentation for that geom. Let’s start with geom_histogram()
The documentation will tell you required and optional aesthetics for a geom.
Scour the documentation for geom_histogram()
. Which aesthetics can be mapped to geom_histogram()
?
From a conference talk given by William Chase.
“A tool that enables us to concisely describe the components of a graphic.”
“A tool that enables us to concisely describe the design of a graphic.”
Use ggplot
themes to change design elements of your plot. There are many and they all start with theme_
. For example, our plot uses theme_minimal()
, which is a good theme because it removes a lot of extraneous elements.
Plot 3 uses two new geoms: geom_smooth()
and geom_point()
.
ggplot2
and layered grammar of graphics.
geom_histogram()
geom_boxplot()
geom_point()
geom_smooth()
ggplot2
.
Timed quiz. You can access it in Canvas. Open book, open notes, open internet. Not open friends.