Case study: Exploring Occupations Using The International Social Survey Programme (ISSP)
case-study-issp.Rmd
library(DIGCLASS)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(haven)
The International Social Survey Programme (ISSP) is a cross-national collaboration programme conducting annual surveys on diverse topics relevant to social sciences. Since its foundation, over one million respondents have participated in the surveys of the ISSP. All collected data and documentation is available free of charge.
The ISSP contains data for the ISCO08 class schema as well as
additional information on the working conditions of respondents. The
DIGCLASS
package contains a copy of the 2020 edition of the
survey for these variables. I’ve deliberately left this data as it comes
from the ISSP, meaning that we need to do a real-world cleanup of the
data as well as cleaning up the ISCO08 variable. Let’s look at how this
data looks like:
issp
#> # A tibble: 21,718 × 4
#> isco08 emprel nsup wrksup
#> <dbl+lbl> <dbl+lb> <dbl+lb> <dbl+lb>
#> 1 2430 [2430. Sales, marketing and public relations… 1 [1. … -4 [-4.… 2 [2. …
#> 2 3115 [3115. Mechanical engineering technicians] 1 [1. … -4 [-4.… 2 [2. …
#> 3 6130 [6130. Mixed crop and animal producers] 2 [2. … -4 [-4.… 2 [2. …
#> 4 4120 [4120. Secretaries (general)] 1 [1. … -4 [-4.… 2 [2. …
#> 5 7531 [7531. Tailors, dressmakers, furriers and ha… 1 [1. … -4 [-4.… 2 [2. …
#> 6 4211 [4211. Bank tellers and related clerks] 1 [1. … -4 [-4.… 2 [2. …
#> 7 4211 [4211. Bank tellers and related clerks] 1 [1. … 8 1 [1. …
#> 8 -4 [-4. NAP (Code 3 in WORK)] -4 [-4.… -4 [-4.… -4 [-4.…
#> 9 9510 [9510. Street and related service workers] 1 [1. … -4 [-4.… 2 [2. …
#> 10 7126 [7126. Plumbers and pipe fitters] 3 [3. … 4 1 [1. …
#> # ℹ 21,708 more rows
It has around 22K rows and 4 columns., Here’s what each column means:
- isco08: The ISCO08 class schema. Currently in 4-digits.
- emprel: The employee relationship of the respondent. Could be self-employed, employee or in a family business.
- nsup: Number of subordinates, if there are any.
- wrksup: Whether the respondent has subordinates, or in other words, whether the respondent is a supervisor.
Each of these columns has some values that are not valid, like “Did not respond” and so on. Let’s look at each of these columns to understand which values need to be excluded:
issp %>%
count(emprel)
#> # A tibble: 7 × 2
#> emprel n
#> <dbl+lbl> <int>
#> 1 -9 [-9. No answer] 904
#> 2 -4 [-4. NAP (Code 3 in WORK)] 1190
#> 3 1 [1. Employee] 15995
#> 4 2 [2. Self-employed without employees] 1969
#> 5 3 [3. Self-employed with 1 to 9 employees] 654
#> 6 4 [4. Self-employed with 10 employees or more] 190
#> 7 5 [5. Working for own family's business] 816
For the employee relationship, we have values -9 and -4 which should
not be there. They measure whether the question doesn’t apply and
whether there was no answer. Let’s look at nsup
and
wrksup
:
issp %>%
count(nsup)
#> # A tibble: 134 × 2
#> nsup n
#> <dbl+lbl> <int>
#> 1 -9 [-9. No answer] 1196
#> 2 -4 [-4. NAP (code 2, -4 in WRKSUP)] 15678
#> 3 1 [1. 1 employee] 405
#> 4 2 501
#> 5 3 475
#> 6 4 397
#> 7 5 466
#> 8 6 286
#> 9 7 145
#> 10 8 203
#> # ℹ 124 more rows
issp %>%
count(wrksup)
#> # A tibble: 4 × 2
#> wrksup n
#> <dbl+lbl> <int>
#> 1 -9 [-9. No answer] 922
#> 2 -4 [-4. NAP (Code 3 in WORK)] 1190
#> 3 1 [1. Yes] 5118
#> 4 2 [2. No] 14488
It seems that all columns (including ISCO08) have values -4 and -9 to
reflect respondents which didn’t answer and whether the question doesn’t
apply. Since we don’t need these for our case study, let’s convert them
to NA
’s. Remember that these can be useful for other
analysis so don’t just remove them automatically.
issp <- issp %>% mutate_all(~ if_else(.x < 0, NA, .x))
With the previous code, we loop over all columns and apply the
function if_else
to make sure we remove all values that are
below 0. Alright, let’s now focus on the isco08
column.
Let’s say we want to convert isco08
to isco88
.
Let’s use the function isco08_to_isco88
:
issp <- issp %>% mutate(isco88 = isco08_to_isco88(isco08))
#> ! ISCO variable is not a character. Beware that numeric ISCO variables possibly contain lost data. See https://cimentadaj.github.io/DIGCLASS/articles/repairing_isco_input.html for more details. Converting to a character vector.
#> ℹ ISCO variable has occupations with digits less than 4. Converting to 4 digits.
#> • Converted `110` to `0110`
#> • Converted `310` to `0310`
#> • Converted `210` to `0210`
Notice that there were several messages. First, it tells us that the
ISCO variable was not a character vector. It warns this because numeric
columns convert ISCO codes such as 0110
to 110
and we loose information. This makes it impossible to separate the
3-digit ISCO code 310
from 0310
converted to
310
. For that reason, all functions in
DIGCLASS
will raise a warning like this one if it finds
that ISCO comes as a character vector.
After this warning, it also mentions that certain occupations have
less than 4 digits. It assumes these are always ISCO
occupations that need a preceding 0
in front. It takes the
liberty of converting these automatically by appending the 0’s in front.
For more details on how this works and how to fix these values yourself,
the message points the user to the link https://cimentadaj.github.io/DIGCLASS/articles/repairing_isco_input.html.
As you can see yourself, despite these messages, these were all warnings and information messages. It translated everything correctly to ISCO88:
issp %>% select(isco88)
#> # A tibble: 21,718 × 1
#> isco88
#> <chr>
#> 1 2410
#> 2 3115
#> 3 6130
#> 4 4115
#> 5 7434
#> 6 4211
#> 7 4211
#> 8 NA
#> 9 9120
#> 10 7136
#> # ℹ 21,708 more rows
The explanation above works for most of the translations implemented
in DIGCLASS
. This means that most translations in
DIGCLASS
do not need previous transformations or additional
variables to make the translation. This applies to the vast majority of
the translations in DIGCLASS
.
However, there are cases where we have to translate these 4-digit
schemas into broader groups. For example to translate using
isco08_to_msec
, isco08
needs to be translated
to the 3-digit equivalent. Moreover, to translate to msec
we need other variables like the number of subordinates that the
respondent has, as well as whether the respondent is self-employed or an
employee. Let’s recode these into the needed values. The columns need to
be recoded like this:
is_supervisor
: A numeric vector indicating whether each individual is a supervisor (1, e.g. responsible for other employees) or not (0).self_employed
: A numeric vector indicating whether each individual is self-employed (1) or not (0).n_employees
: A numeric vector indicating the number of employees under each respondent. If the respondent has 0 employees, it should say0
and notNA
.
Let’s recode each one:
issp <-
issp %>%
mutate(
is_supervisor = ifelse(wrksup == 2, 0, wrksup),
self_employed = case_when(
emprel %in% c(1, 5) ~ 0,
emprel %in% 2:4 ~ 1,
TRUE ~ NA
),
n_employees = ifelse(is_supervisor == 0, 0, nsup)
) %>%
select(isco08, is_supervisor, self_employed, n_employees)
issp
#> # A tibble: 21,718 × 4
#> isco08 is_supervisor self_employed n_employees
#> <dbl+lbl> <dbl> <dbl> <dbl>
#> 1 2430 [2430. Sales, marketing and pub… 0 0 0
#> 2 3115 [3115. Mechanical engineering t… 0 0 0
#> 3 6130 [6130. Mixed crop and animal pr… 0 1 0
#> 4 4120 [4120. Secretaries (general)] 0 0 0
#> 5 7531 [7531. Tailors, dressmakers, fu… 0 0 0
#> 6 4211 [4211. Bank tellers and related… 0 0 0
#> 7 4211 [4211. Bank tellers and related… 1 0 8
#> 8 NA NA NA NA
#> 9 9510 [9510. Street and related servi… 0 0 0
#> 10 7126 [7126. Plumbers and pipe fitter… 1 1 4
#> # ℹ 21,708 more rows
Let’s explain what’s happening here. I recoded that if
wrksup
is 2, it means the person does is not supervisor (we
want supervisors to have a 1
and non-supervisors to have a
0
), otherwise continue with the 1
that is set
to for supervisors.
For self_employed
I recoded that if the respondent is an
employee or works at a family business, the respondent is an employee
(0
). If it is self-employed of any kind (with or without
employees), it is self-employed (1
). Otherwise, all values
should be missing.
Finally, for the number of employees we need to explicitly say if the
respondent has 0 employees. It’s not enough to leave it as
NA
. We recode it such that if the user is
not a supervisor (equal to 0
in
is_supervisor
), then it should have 0
employees in n_employees
.
With these columns recoded, let’s try to convert ISCO08 to MSEC using these columns:
issp <-
issp %>%
mutate(
msec = isco08_to_msec(isco08, is_supervisor, self_employed, n_employees)
)
#> ! ISCO variable is not a character. Beware that numeric ISCO variables possibly contain lost data. See https://cimentadaj.github.io/DIGCLASS/articles/repairing_isco_input.html for more details. Converting to a character vector.
#> ℹ ISCO variable has occupations with digits less than 4. Converting to 4 digits.
#> • Converted `110` to `0110`
#> • Converted `310` to `0310`
#> • Converted `210` to `0210`
The usual warnings appear saying that some values are recoded to have
4 digits. Some times we want to translate digits other than 4, for
example. For translating ISCO between different digits, we use the
functions isco*_swap
where *
represents the
given year of interest. For our case, we’re looking for
isco08_swap
. At the same time, we can remove all those
warnings of transforming values like 110
to
0110
by running isco08
through
repair_isco
once and saving it.
Let’s do both:
issp <-
issp %>%
mutate(
isco08 = repair_isco(isco08),
isco08_three = isco08_swap(isco08, from = 4, to = 3)
)
#> ! ISCO variable is not a character. Beware that numeric ISCO variables possibly contain lost data. See https://cimentadaj.github.io/DIGCLASS/articles/repairing_isco_input.html for more details. Converting to a character vector.
#> ℹ ISCO variable has occupations with digits less than 4. Converting to 4 digits.
#> • Converted `110` to `0110`
#> • Converted `310` to `0310`
#> • Converted `210` to `0210`
issp
#> # A tibble: 21,718 × 6
#> isco08 is_supervisor self_employed n_employees msec isco08_three
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 2430 0 0 0 23 2430
#> 2 3115 0 0 0 NA 3110
#> 3 6130 0 1 0 41 6130
#> 4 4120 0 0 0 51 4120
#> 5 7531 0 0 0 NA 7530
#> 6 4211 0 0 0 NA 4210
#> 7 4211 1 0 8 NA 4210
#> 8 NA NA NA NA NA NA
#> 9 9510 0 0 0 73 9510
#> 10 7126 1 1 4 NA 7120
#> # ℹ 21,708 more rows
We shouldn’t see any more warning messages. Now that we translate
isco08
to isco08_three
(you can see that
isco08_three
always ends with a 0
, because
we’ve translated it to 3-digit codes). Let’s use
isco08_to_msec
to translate it:
issp <-
issp %>%
mutate(
msec = isco08_to_msec(isco08_three, is_supervisor, self_employed, n_employees)
)
issp
#> # A tibble: 21,718 × 6
#> isco08 is_supervisor self_employed n_employees msec isco08_three
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 2430 0 0 0 23 2430
#> 2 3115 0 0 0 31 3110
#> 3 6130 0 1 0 41 6130
#> 4 4120 0 0 0 51 4120
#> 5 7531 0 0 0 62 7530
#> 6 4211 0 0 0 52 4210
#> 7 4211 1 0 8 59 4210
#> 8 NA NA NA NA NA NA
#> 9 9510 0 0 0 73 9510
#> 10 7126 1 1 4 43 7120
#> # ℹ 21,708 more rows
There we go. We now see an msec
column that contains the
translation. All translation functions in DIGCLASS
that
have labels contain an argument called label
that if set to
TRUE
will return the labels instead of the numbers. Here’s
an example with msec
:
issp %>%
mutate(
msec = isco08_to_msec(isco08_three, is_supervisor, self_employed, n_employees, label = TRUE)
) %>%
count(msec)
#> # A tibble: 39 × 2
#> msec n
#> <chr> <int>
#> 1 Agricultural Employees 325
#> 2 Agricultural Self-Employed 406
#> 3 Associate Professionals 254
#> 4 Blue-Collar Employees 1159
#> 5 Building and Related Trades Employees 365
#> 6 Business Associate Professionals 314
#> 7 Business Professionals 758
#> 8 Cleaners and Helpers 325
#> 9 Craft and Related Trades Self-Employed 547
#> 10 Customer Service Clerks 414
#> # ℹ 29 more rows
One final example comes from we call “chained” translations. For some
examples you’ll need to translate to a new class schema, that then needs
to be used as the input to another class schema. For example, to
translate from isco88
to msec
, you’ll need to
translate to isco88com
because that’s the only available
translation for isco88
. Here’s how we would do it:
issp %>%
mutate(
isco88 = isco08_to_isco88(isco08),
isco88com = isco88_to_isco88com(isco88),
isco88com_three = isco88_swap(isco88com, from = 4, to = 3),
msec = isco88com_to_msec(isco88com_three, is_supervisor, self_employed, n_employees, label = TRUE)
) %>%
count(msec)
#> # A tibble: 38 × 2
#> msec n
#> <chr> <int>
#> 1 Agricultural Employees 381
#> 2 Agricultural Self-Employed 345
#> 3 Associate Professionals 33
#> 4 Blue-Collar Employees 1214
#> 5 Building and Related Trades Employees 664
#> 6 Business Associate Professionals 157
#> 7 Business Professionals 397
#> 8 Cleaners and Helpers 947
#> 9 Craft and Related Trades Self-Employed 592
#> 10 Customer Service Clerks 464
#> # ℹ 28 more rows
As you can see, the DIGCLASS
offers all the pieces to
make very complicated translations rather easily. Since ISSP only
contains isco08
, we did the following translations:
- Convert ISCO08 to ISCO88
- Convert ISCO88 to ISCO88COM
- Convert ISCO88COM from 4-digits to 3-digits
- Convert ISCO88COM 3-digits into MSEC
This flexibility allows the user to do any sort of arbitrary transformation if needed as well as build more complicated pipelines that transform certain class schemas automatically to other expressions, such as bigger digit groups and subsequent transformations.
These examples paint very clear picture of all the different types of
transformations that are possible in this package. If you’ve worked out
how to run all of these examples, then you’re ready to navigate
DIGCLASS
.