Skip to contents

Translating between hierarchies

Most surveys that contain occupation related variables have 4 digit ISCO occupations. What does that mean? That you’re working with the most fine-grained definition of an occupation. In some cases, you want to work with aggregated groups. Instead of knowing something about a mathematician, you’d rather group all math related occupation into a “Scientist” category. DIGCLASS has this implemented following the rules of each ISCO schema. Let’s load DIGCLASS:

library(DIGCLASS)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

In ISCO parlance, the most granular occupation have what it’s called 4 digits. This means that the occupation 4 non-zero digits. Occupation 2111 is a 4 digit occupation because it does not contain any zeroes. In contrast, 2110 is the “parent” category of 2111. To make it even more simple, think that 2111 is the occupation “Physicists and astronomers” while 2110 is “Physicists, chemists and related professionals”. You can intuitively group physicists in that broader category. Similarly, the occupation 2110 or “Physicists, chemists and related professionals” is nested within the more broader group 2100 or “Physical, mathematical and engineering science professionals”. Finally, the broadest group is 2000, for which the general group definition is “Professionals.

This was just an intuitive explanation of how ISCO codes works. You don’t have to remember what each category is. You can always look up these values yourself for better understanding but DIGCLASS will do the work of translating everything for you. The important thing to remember is that something you’ll want to group fine grained occupations into more broader occupation categories. An example would be that all categories that are within “Physicists, chemists and related professionals” are grouped together. This means that we “convert” the column from 4 digits into 2 digits, for example. In DIGCLASS you can do that with the function isco*_swap where * is the ISCO of preference. Let’s look at the ISCO variables we have in the ESS data in DIGCLASS:

ess
#> # A tibble: 48,285 × 12
#>    isco68 isco88 isco88com isco08 emplno self_employed is_supervisor
#>    <chr>  <chr>  <chr>     <chr>   <dbl>         <dbl>         <dbl>
#>  1 5890   5169   5169      5414        0             1             0
#>  2 2120   1222   1222      1321        0             0             1
#>  3 7200   8120   8120      3135        0             0             0
#>  4 9310   7141   7141      7131        0             0             1
#>  5 6220   6111   6111      6111        0             0             0
#>  6 6220   6111   6111      6111        0             0             1
#>  7 9595   9313   9313      9313        0             0             1
#>  8 6000   1221   1221      1311        0             0             1
#>  9 6000   1221   1221      1311        2             1             1
#> 10 6220   6111   6111      6111        0             0             1
#> # ℹ 48,275 more rows
#> # ℹ 5 more variables: control_work <dbl>, control_daily <dbl>,
#> #   work_status <dbl>, main_activity <dbl>, agea <dbl>

All three ISCO variables are in four digits but we can convert them to three digits:

ess %>%
  transmute(
    isco88,
    isco88_three = isco88_swap(isco88, from = 4, to = 3)
  )
#> # A tibble: 48,285 × 2
#>    isco88 isco88_three
#>    <chr>  <chr>       
#>  1 5169   5160        
#>  2 1222   1220        
#>  3 8120   8120        
#>  4 7141   7140        
#>  5 6111   6110        
#>  6 6111   6110        
#>  7 9313   9310        
#>  8 1221   1220        
#>  9 1221   1220        
#> 10 6111   6110        
#> # ℹ 48,275 more rows

As you can see, the three digit translation always has a zero, meaning that it was translated into a broder group. We can do the same for an even broader group, translating from 4 to 2 digits:

ess %>%
  transmute(
    isco08,
    isco08_two = isco08_swap(isco08, from = 4, to = 2)
  )
#> # A tibble: 48,285 × 2
#>    isco08 isco08_two
#>    <chr>  <chr>     
#>  1 5414   5400      
#>  2 1321   1300      
#>  3 3135   3100      
#>  4 7131   7100      
#>  5 6111   6100      
#>  6 6111   6100      
#>  7 9313   9300      
#>  8 1311   1300      
#>  9 1311   1300      
#> 10 6111   6100      
#> # ℹ 48,275 more rows

We can see that the two digit translation is a broader category than the original four digit occupation. Note that we can translate everything from 4 to 1 but not the other way around:

ess %>%
  transmute(
    isco08,
    isco08_two = isco08_swap(isco08, from = 2, to = 4)
  )
#> Error in `transmute()`:
#>  In argument: `isco08_two = isco08_swap(isco08, from = 2, to = 4)`.
#> Caused by error in `isco08_swap()`:
#> ! `from` should always be a bigger digit group than `to`.

That’s because we can’t translate a more broader group into a finer occupation because it could be many specific occupation within a broder group. Finally, do note that for ISCO68, there are some 1 digit groups missing (0000 and 1000 don’t have a broader category), so when you translate from any digit to the 1 digit in ISCO68 you might some missing values for occupation within the major group 0000 and 1000:

ess %>%
  transmute(
    isco68,
    isco68_one = isco68_swap(isco68, from = 4, to = 1)
  )
#> # A tibble: 48,285 × 2
#>    isco68 isco68_one
#>    <chr>  <chr>     
#>  1 5890   5000      
#>  2 2120   2000      
#>  3 7200   7000      
#>  4 9310   9000      
#>  5 6220   6000      
#>  6 6220   6000      
#>  7 9595   9000      
#>  8 6000   6000      
#>  9 6000   6000      
#> 10 6220   6000      
#> # ℹ 48,275 more rows

Note that the 1 digit groups 2000, 3000, 5000 and 8000 are translated correctly. Yet the 1 digit group 1000 or 0000 are never translated because they don’t exist in ISCO68. DIGCLASS makes the translation either way but note that you’ll lose that information when you translate it to other schemas because it’s an NA.

Using translated hierarchies for translation between schemas

isco*_swap are important functions because some translations require ISCO variables to be in different digits. For example, to translate ISCO08 to the ESEC class schema, ISCO08 needs to be in 3-digits. How would that translation look like? Here’s an example:

library(dplyr)

# convert isco08 to three digits
ess$isco08_three <- isco08_swap(ess$isco08, from = 4, to = 3)

ess %>%
  transmute(
    isco08_three,
    esec = isco08_to_esec(
      isco08_three,
      is_supervisor,
      self_employed,
      emplno,
      label = FALSE
    )
  )
#> # A tibble: 48,285 × 2
#>    isco08_three esec 
#>    <chr>        <chr>
#>  1 5410         3    
#>  2 1320         2    
#>  3 3130         6    
#>  4 7130         6    
#>  5 6110         8    
#>  6 6110         6    
#>  7 9310         6    
#>  8 1310         2    
#>  9 1310         5    
#> 10 6110         6    
#> # ℹ 48,275 more rows

Similarly, ESEC has another translation but based on ISCO08 being 2-digits. Here’s an example:

# convert to two digits
ess$isco08_two <- isco08_swap(ess$isco08, from = 4, to = 2)

ess %>%
  transmute(
    isco08_two,
    esec = isco08_two_to_esec(
      isco08_two,
      is_supervisor,
      self_employed,
      emplno,
      label = FALSE
    )
  )
#> # A tibble: 48,285 × 2
#>    isco08_two esec 
#>    <chr>      <chr>
#>  1 5400       3    
#>  2 1300       4    
#>  3 3100       2    
#>  4 7100       4    
#>  5 6100       6    
#>  6 6100       5    
#>  7 9300       5    
#>  8 1300       4    
#>  9 1300       1    
#> 10 6100       5    
#> # ℹ 48,275 more rows

As you can see, isco*_swap are functions that serve and facilitate a common task in ISCO translations.