Translating between hierarchies for ISCO68, ISCO88 and ISCO08
translating_between_hierarchies.Rmd
Translating between hierarchies
Most surveys that contain occupation related variables have 4 digit
ISCO occupations. What does that mean? That you’re working with the most
fine-grained definition of an occupation. In some cases, you want to
work with aggregated groups. Instead of knowing something about a
mathematician, you’d rather group all math related occupation into a
“Scientist” category. DIGCLASS
has this implemented
following the rules of each ISCO schema. Let’s load
DIGCLASS
:
library(DIGCLASS)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
In ISCO parlance, the most granular occupation have what it’s called 4 digits. This means that the occupation 4 non-zero digits. Occupation 2111 is a 4 digit occupation because it does not contain any zeroes. In contrast, 2110 is the “parent” category of 2111. To make it even more simple, think that 2111 is the occupation “Physicists and astronomers” while 2110 is “Physicists, chemists and related professionals”. You can intuitively group physicists in that broader category. Similarly, the occupation 2110 or “Physicists, chemists and related professionals” is nested within the more broader group 2100 or “Physical, mathematical and engineering science professionals”. Finally, the broadest group is 2000, for which the general group definition is “Professionals.
This was just an intuitive explanation of how ISCO codes works. You
don’t have to remember what each category is. You can always look up
these values yourself for better understanding but DIGCLASS
will do the work of translating everything for you. The important thing
to remember is that something you’ll want to group fine grained
occupations into more broader occupation categories. An example would be
that all categories that are within “Physicists, chemists and related
professionals” are grouped together. This means that we “convert” the
column from 4 digits into 2 digits, for example. In
DIGCLASS
you can do that with the function
isco*_swap
where *
is the ISCO of preference.
Let’s look at the ISCO variables we have in the ESS data in
DIGCLASS
:
ess
#> # A tibble: 48,285 × 12
#> isco68 isco88 isco88com isco08 emplno self_employed is_supervisor
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 5890 5169 5169 5414 0 1 0
#> 2 2120 1222 1222 1321 0 0 1
#> 3 7200 8120 8120 3135 0 0 0
#> 4 9310 7141 7141 7131 0 0 1
#> 5 6220 6111 6111 6111 0 0 0
#> 6 6220 6111 6111 6111 0 0 1
#> 7 9595 9313 9313 9313 0 0 1
#> 8 6000 1221 1221 1311 0 0 1
#> 9 6000 1221 1221 1311 2 1 1
#> 10 6220 6111 6111 6111 0 0 1
#> # ℹ 48,275 more rows
#> # ℹ 5 more variables: control_work <dbl>, control_daily <dbl>,
#> # work_status <dbl>, main_activity <dbl>, agea <dbl>
All three ISCO variables are in four digits but we can convert them to three digits:
ess %>%
transmute(
isco88,
isco88_three = isco88_swap(isco88, from = 4, to = 3)
)
#> # A tibble: 48,285 × 2
#> isco88 isco88_three
#> <chr> <chr>
#> 1 5169 5160
#> 2 1222 1220
#> 3 8120 8120
#> 4 7141 7140
#> 5 6111 6110
#> 6 6111 6110
#> 7 9313 9310
#> 8 1221 1220
#> 9 1221 1220
#> 10 6111 6110
#> # ℹ 48,275 more rows
As you can see, the three digit translation always has a zero, meaning that it was translated into a broder group. We can do the same for an even broader group, translating from 4 to 2 digits:
ess %>%
transmute(
isco08,
isco08_two = isco08_swap(isco08, from = 4, to = 2)
)
#> # A tibble: 48,285 × 2
#> isco08 isco08_two
#> <chr> <chr>
#> 1 5414 5400
#> 2 1321 1300
#> 3 3135 3100
#> 4 7131 7100
#> 5 6111 6100
#> 6 6111 6100
#> 7 9313 9300
#> 8 1311 1300
#> 9 1311 1300
#> 10 6111 6100
#> # ℹ 48,275 more rows
We can see that the two digit translation is a broader category than the original four digit occupation. Note that we can translate everything from 4 to 1 but not the other way around:
ess %>%
transmute(
isco08,
isco08_two = isco08_swap(isco08, from = 2, to = 4)
)
#> Error in `transmute()`:
#> ℹ In argument: `isco08_two = isco08_swap(isco08, from = 2, to = 4)`.
#> Caused by error in `isco08_swap()`:
#> ! `from` should always be a bigger digit group than `to`.
That’s because we can’t translate a more broader group into a finer occupation because it could be many specific occupation within a broder group. Finally, do note that for ISCO68, there are some 1 digit groups missing (0000 and 1000 don’t have a broader category), so when you translate from any digit to the 1 digit in ISCO68 you might some missing values for occupation within the major group 0000 and 1000:
ess %>%
transmute(
isco68,
isco68_one = isco68_swap(isco68, from = 4, to = 1)
)
#> # A tibble: 48,285 × 2
#> isco68 isco68_one
#> <chr> <chr>
#> 1 5890 5000
#> 2 2120 2000
#> 3 7200 7000
#> 4 9310 9000
#> 5 6220 6000
#> 6 6220 6000
#> 7 9595 9000
#> 8 6000 6000
#> 9 6000 6000
#> 10 6220 6000
#> # ℹ 48,275 more rows
Note that the 1 digit groups 2000
, 3000
,
5000
and 8000
are translated correctly. Yet
the 1 digit group 1000
or 0000
are never
translated because they don’t exist in ISCO68. DIGCLASS
makes the translation either way but note that you’ll lose that
information when you translate it to other schemas because it’s an
NA
.
Using translated hierarchies for translation between schemas
isco*_swap
are important functions because some
translations require ISCO variables to be in different digits. For
example, to translate ISCO08 to the ESEC class schema, ISCO08 needs to
be in 3-digits. How would that translation look like? Here’s an
example:
library(dplyr)
# convert isco08 to three digits
ess$isco08_three <- isco08_swap(ess$isco08, from = 4, to = 3)
ess %>%
transmute(
isco08_three,
esec = isco08_to_esec(
isco08_three,
is_supervisor,
self_employed,
emplno,
label = FALSE
)
)
#> # A tibble: 48,285 × 2
#> isco08_three esec
#> <chr> <chr>
#> 1 5410 3
#> 2 1320 2
#> 3 3130 6
#> 4 7130 6
#> 5 6110 8
#> 6 6110 6
#> 7 9310 6
#> 8 1310 2
#> 9 1310 5
#> 10 6110 6
#> # ℹ 48,275 more rows
Similarly, ESEC has another translation but based on ISCO08 being 2-digits. Here’s an example:
# convert to two digits
ess$isco08_two <- isco08_swap(ess$isco08, from = 4, to = 2)
ess %>%
transmute(
isco08_two,
esec = isco08_two_to_esec(
isco08_two,
is_supervisor,
self_employed,
emplno,
label = FALSE
)
)
#> # A tibble: 48,285 × 2
#> isco08_two esec
#> <chr> <chr>
#> 1 5400 3
#> 2 1300 4
#> 3 3100 2
#> 4 7100 4
#> 5 6100 6
#> 6 6100 5
#> 7 9300 5
#> 8 1300 4
#> 9 1300 1
#> 10 6100 5
#> # ℹ 48,275 more rows
As you can see, isco*_swap
are functions that serve and
facilitate a common task in ISCO translations.