Skip to contents
library(DIGCLASS)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(haven)

The International Social Survey Programme (ISSP) is a cross-national collaboration programme conducting annual surveys on diverse topics relevant to social sciences. Since its foundation, over one million respondents have participated in the surveys of the ISSP. All collected data and documentation is available free of charge.

The ISSP contains data for the ISCO08 class schema as well as additional information on the working conditions of respondents. The DIGCLASS package contains a copy of the 2020 edition of the survey for these variables. I’ve deliberately left this data as it comes from the ISSP, meaning that we need to do a real-world cleanup of the data as well as cleaning up the ISCO08 variable. Let’s look at how this data looks like:

issp
#> # A tibble: 21,718 × 4
#>    isco08                                             emprel   nsup     wrksup  
#>    <dbl+lbl>                                          <dbl+lb> <dbl+lb> <dbl+lb>
#>  1 2430 [2430. Sales, marketing and public relations…  1 [1. … -4 [-4.…  2 [2. …
#>  2 3115 [3115. Mechanical engineering technicians]     1 [1. … -4 [-4.…  2 [2. …
#>  3 6130 [6130. Mixed crop and animal producers]        2 [2. … -4 [-4.…  2 [2. …
#>  4 4120 [4120. Secretaries (general)]                  1 [1. … -4 [-4.…  2 [2. …
#>  5 7531 [7531. Tailors, dressmakers, furriers and ha…  1 [1. … -4 [-4.…  2 [2. …
#>  6 4211 [4211. Bank tellers and related clerks]        1 [1. … -4 [-4.…  2 [2. …
#>  7 4211 [4211. Bank tellers and related clerks]        1 [1. …  8        1 [1. …
#>  8   -4 [-4. NAP (Code 3 in WORK)]                    -4 [-4.… -4 [-4.… -4 [-4.…
#>  9 9510 [9510. Street and related service workers]     1 [1. … -4 [-4.…  2 [2. …
#> 10 7126 [7126. Plumbers and pipe fitters]              3 [3. …  4        1 [1. …
#> # ℹ 21,708 more rows

It has around 22K rows and 4 columns., Here’s what each column means:

  • isco08: The ISCO08 class schema. Currently in 4-digits.
  • emprel: The employee relationship of the respondent. Could be self-employed, employee or in a family business.
  • nsup: Number of subordinates, if there are any.
  • wrksup: Whether the respondent has subordinates, or in other words, whether the respondent is a supervisor.

Each of these columns has some values that are not valid, like “Did not respond” and so on. Let’s look at each of these columns to understand which values need to be excluded:

issp %>%
  count(emprel)
#> # A tibble: 7 × 2
#>   emprel                                              n
#>   <dbl+lbl>                                       <int>
#> 1 -9 [-9. No answer]                                904
#> 2 -4 [-4. NAP (Code 3 in WORK)]                    1190
#> 3  1 [1. Employee]                                15995
#> 4  2 [2. Self-employed without employees]          1969
#> 5  3 [3. Self-employed with 1 to 9 employees]       654
#> 6  4 [4. Self-employed with 10 employees or more]   190
#> 7  5 [5. Working for own family's business]         816

For the employee relationship, we have values -9 and -4 which should not be there. They measure whether the question doesn’t apply and whether there was no answer. Let’s look at nsup and wrksup:

issp %>%
  count(nsup)
#> # A tibble: 134 × 2
#>    nsup                                    n
#>    <dbl+lbl>                           <int>
#>  1 -9 [-9. No answer]                   1196
#>  2 -4 [-4. NAP (code 2, -4 in WRKSUP)] 15678
#>  3  1 [1. 1 employee]                    405
#>  4  2                                    501
#>  5  3                                    475
#>  6  4                                    397
#>  7  5                                    466
#>  8  6                                    286
#>  9  7                                    145
#> 10  8                                    203
#> # ℹ 124 more rows
issp %>%
  count(wrksup)
#> # A tibble: 4 × 2
#>   wrksup                            n
#>   <dbl+lbl>                     <int>
#> 1 -9 [-9. No answer]              922
#> 2 -4 [-4. NAP (Code 3 in WORK)]  1190
#> 3  1 [1. Yes]                    5118
#> 4  2 [2. No]                    14488

It seems that all columns (including ISCO08) have values -4 and -9 to reflect respondents which didn’t answer and whether the question doesn’t apply. Since we don’t need these for our case study, let’s convert them to NA’s. Remember that these can be useful for other analysis so don’t just remove them automatically.

issp <- issp %>% mutate_all(~ if_else(.x < 0, NA, .x))

With the previous code, we loop over all columns and apply the function if_else to make sure we remove all values that are below 0. Alright, let’s now focus on the isco08 column. Let’s say we want to convert isco08 to isco88. Let’s use the function isco08_to_isco88:

issp <- issp %>% mutate(isco88 = isco08_to_isco88(isco08))
#> ! ISCO variable is not a character. Beware that numeric ISCO variables possibly contain lost data. See https://cimentadaj.github.io/DIGCLASS/articles/repairing_isco_input.html for more details. Converting to a character vector.
#>  ISCO variable has occupations with digits less than 4. Converting to 4 digits.
#> • Converted `110` to `0110`
#> • Converted `310` to `0310`
#> • Converted `210` to `0210`

Notice that there were several messages. First, it tells us that the ISCO variable was not a character vector. It warns this because numeric columns convert ISCO codes such as 0110 to 110 and we loose information. This makes it impossible to separate the 3-digit ISCO code 310 from 0310 converted to 310. For that reason, all functions in DIGCLASS will raise a warning like this one if it finds that ISCO comes as a character vector.

After this warning, it also mentions that certain occupations have less than 4 digits. It assumes these are always ISCO occupations that need a preceding 0 in front. It takes the liberty of converting these automatically by appending the 0’s in front. For more details on how this works and how to fix these values yourself, the message points the user to the link https://cimentadaj.github.io/DIGCLASS/articles/repairing_isco_input.html.

As you can see yourself, despite these messages, these were all warnings and information messages. It translated everything correctly to ISCO88:

issp %>% select(isco88)
#> # A tibble: 21,718 × 1
#>    isco88
#>    <chr> 
#>  1 2410  
#>  2 3115  
#>  3 6130  
#>  4 4115  
#>  5 7434  
#>  6 4211  
#>  7 4211  
#>  8 NA    
#>  9 9120  
#> 10 7136  
#> # ℹ 21,708 more rows

The explanation above works for most of the translations implemented in DIGCLASS. This means that most translations in DIGCLASS do not need previous transformations or additional variables to make the translation. This applies to the vast majority of the translations in DIGCLASS.

However, there are cases where we have to translate these 4-digit schemas into broader groups. For example to translate using isco08_to_msec, isco08 needs to be translated to the 3-digit equivalent. Moreover, to translate to msec we need other variables like the number of subordinates that the respondent has, as well as whether the respondent is self-employed or an employee. Let’s recode these into the needed values. The columns need to be recoded like this:

  • is_supervisor: A numeric vector indicating whether each individual is a supervisor (1, e.g. responsible for other employees) or not (0).

  • self_employed: A numeric vector indicating whether each individual is self-employed (1) or not (0).

  • n_employees: A numeric vector indicating the number of employees under each respondent. If the respondent has 0 employees, it should say 0 and not NA.

Let’s recode each one:

issp <-
  issp %>%
  mutate(
    is_supervisor = ifelse(wrksup == 2, 0, wrksup),
    self_employed = case_when(
      emprel %in% c(1, 5) ~ 0,
      emprel %in% 2:4 ~ 1,
      TRUE ~ NA
    ),
    n_employees = ifelse(is_supervisor == 0, 0, nsup)
  ) %>%
  select(isco08, is_supervisor, self_employed, n_employees)

issp
#> # A tibble: 21,718 × 4
#>    isco08                                is_supervisor self_employed n_employees
#>    <dbl+lbl>                                     <dbl>         <dbl>       <dbl>
#>  1 2430 [2430. Sales, marketing and pub…             0             0           0
#>  2 3115 [3115. Mechanical engineering t…             0             0           0
#>  3 6130 [6130. Mixed crop and animal pr…             0             1           0
#>  4 4120 [4120. Secretaries (general)]                0             0           0
#>  5 7531 [7531. Tailors, dressmakers, fu…             0             0           0
#>  6 4211 [4211. Bank tellers and related…             0             0           0
#>  7 4211 [4211. Bank tellers and related…             1             0           8
#>  8   NA                                             NA            NA          NA
#>  9 9510 [9510. Street and related servi…             0             0           0
#> 10 7126 [7126. Plumbers and pipe fitter…             1             1           4
#> # ℹ 21,708 more rows

Let’s explain what’s happening here. I recoded that if wrksup is 2, it means the person does is not supervisor (we want supervisors to have a 1 and non-supervisors to have a 0), otherwise continue with the 1 that is set to for supervisors.

For self_employed I recoded that if the respondent is an employee or works at a family business, the respondent is an employee (0). If it is self-employed of any kind (with or without employees), it is self-employed (1). Otherwise, all values should be missing.

Finally, for the number of employees we need to explicitly say if the respondent has 0 employees. It’s not enough to leave it as NA. We recode it such that if the user is not a supervisor (equal to 0 in is_supervisor), then it should have 0 employees in n_employees.

With these columns recoded, let’s try to convert ISCO08 to MSEC using these columns:

issp <-
  issp %>%
  mutate(
    msec = isco08_to_msec(isco08, is_supervisor, self_employed, n_employees)
  )
#> ! ISCO variable is not a character. Beware that numeric ISCO variables possibly contain lost data. See https://cimentadaj.github.io/DIGCLASS/articles/repairing_isco_input.html for more details. Converting to a character vector.
#>  ISCO variable has occupations with digits less than 4. Converting to 4 digits.
#> • Converted `110` to `0110`
#> • Converted `310` to `0310`
#> • Converted `210` to `0210`

The usual warnings appear saying that some values are recoded to have 4 digits. Some times we want to translate digits other than 4, for example. For translating ISCO between different digits, we use the functions isco*_swap where * represents the given year of interest. For our case, we’re looking for isco08_swap. At the same time, we can remove all those warnings of transforming values like 110 to 0110 by running isco08 through repair_isco once and saving it.

Let’s do both:

issp <-
  issp %>%
  mutate(
    isco08 = repair_isco(isco08),
    isco08_three = isco08_swap(isco08, from = 4, to = 3)
  )
#> ! ISCO variable is not a character. Beware that numeric ISCO variables possibly contain lost data. See https://cimentadaj.github.io/DIGCLASS/articles/repairing_isco_input.html for more details. Converting to a character vector.
#>  ISCO variable has occupations with digits less than 4. Converting to 4 digits.
#> • Converted `110` to `0110`
#> • Converted `310` to `0310`
#> • Converted `210` to `0210`

issp
#> # A tibble: 21,718 × 6
#>    isco08 is_supervisor self_employed n_employees msec  isco08_three
#>    <chr>          <dbl>         <dbl>       <dbl> <chr> <chr>       
#>  1 2430               0             0           0 23    2430        
#>  2 3115               0             0           0 NA    3110        
#>  3 6130               0             1           0 41    6130        
#>  4 4120               0             0           0 51    4120        
#>  5 7531               0             0           0 NA    7530        
#>  6 4211               0             0           0 NA    4210        
#>  7 4211               1             0           8 NA    4210        
#>  8 NA                NA            NA          NA NA    NA          
#>  9 9510               0             0           0 73    9510        
#> 10 7126               1             1           4 NA    7120        
#> # ℹ 21,708 more rows

We shouldn’t see any more warning messages. Now that we translate isco08 to isco08_three (you can see that isco08_three always ends with a 0, because we’ve translated it to 3-digit codes). Let’s use isco08_to_msec to translate it:

issp <-
  issp %>%
  mutate(
    msec = isco08_to_msec(isco08_three, is_supervisor, self_employed, n_employees)
  )

issp
#> # A tibble: 21,718 × 6
#>    isco08 is_supervisor self_employed n_employees msec  isco08_three
#>    <chr>          <dbl>         <dbl>       <dbl> <chr> <chr>       
#>  1 2430               0             0           0 23    2430        
#>  2 3115               0             0           0 31    3110        
#>  3 6130               0             1           0 41    6130        
#>  4 4120               0             0           0 51    4120        
#>  5 7531               0             0           0 62    7530        
#>  6 4211               0             0           0 52    4210        
#>  7 4211               1             0           8 59    4210        
#>  8 NA                NA            NA          NA NA    NA          
#>  9 9510               0             0           0 73    9510        
#> 10 7126               1             1           4 43    7120        
#> # ℹ 21,708 more rows

There we go. We now see an msec column that contains the translation. All translation functions in DIGCLASS that have labels contain an argument called label that if set to TRUE will return the labels instead of the numbers. Here’s an example with msec:

issp %>%
  mutate(
    msec = isco08_to_msec(isco08_three, is_supervisor, self_employed, n_employees, label = TRUE)
  ) %>%
  count(msec)
#> # A tibble: 39 × 2
#>    msec                                       n
#>    <chr>                                  <int>
#>  1 Agricultural Employees                   325
#>  2 Agricultural Self-Employed               406
#>  3 Associate Professionals                  254
#>  4 Blue-Collar Employees                   1159
#>  5 Building and Related Trades Employees    365
#>  6 Business Associate Professionals         314
#>  7 Business Professionals                   758
#>  8 Cleaners and Helpers                     325
#>  9 Craft and Related Trades Self-Employed   547
#> 10 Customer Service Clerks                  414
#> # ℹ 29 more rows

One final example comes from we call “chained” translations. For some examples you’ll need to translate to a new class schema, that then needs to be used as the input to another class schema. For example, to translate from isco88 to msec, you’ll need to translate to isco88com because that’s the only available translation for isco88. Here’s how we would do it:

issp %>%
  mutate(
    isco88 = isco08_to_isco88(isco08),
    isco88com = isco88_to_isco88com(isco88),
    isco88com_three = isco88_swap(isco88com, from = 4, to = 3),
    msec = isco88com_to_msec(isco88com_three, is_supervisor, self_employed, n_employees, label = TRUE)
  ) %>%
  count(msec)
#> # A tibble: 38 × 2
#>    msec                                       n
#>    <chr>                                  <int>
#>  1 Agricultural Employees                   381
#>  2 Agricultural Self-Employed               345
#>  3 Associate Professionals                   33
#>  4 Blue-Collar Employees                   1214
#>  5 Building and Related Trades Employees    664
#>  6 Business Associate Professionals         157
#>  7 Business Professionals                   397
#>  8 Cleaners and Helpers                     947
#>  9 Craft and Related Trades Self-Employed   592
#> 10 Customer Service Clerks                  464
#> # ℹ 28 more rows

As you can see, the DIGCLASS offers all the pieces to make very complicated translations rather easily. Since ISSP only contains isco08, we did the following translations:

  • Convert ISCO08 to ISCO88
  • Convert ISCO88 to ISCO88COM
  • Convert ISCO88COM from 4-digits to 3-digits
  • Convert ISCO88COM 3-digits into MSEC

This flexibility allows the user to do any sort of arbitrary transformation if needed as well as build more complicated pipelines that transform certain class schemas automatically to other expressions, such as bigger digit groups and subsequent transformations.

These examples paint very clear picture of all the different types of transformations that are possible in this package. If you’ve worked out how to run all of these examples, then you’re ready to navigate DIGCLASS.