SYLLS (SYnthetic data estimation for the UK LongitudinaL Studies)
Dr Adam Dennett, Dr Belinda Wu, Dr Nicola Shelton and Dr Ian Shuttleworth
University College of London
The England and Wales Longitudinal Study (ONS LS), Scottish Longitudinal Study (SLS) and Northern Ireland Longitudinal Study (NILS) are incredibly rich micro-datasets linking census and other health and administrative data (births, deaths, marriages, cancer registrations) for individuals and their immediate families across several decades. Whilst unique and valuable resources, the sensitive nature of the information they contain means that access to the microdata is restricted to approved researchers and LS support staff, who can only view and work with the data in safe settings controlled by the national statistical agencies. Consequently, compared to other census data products, the three longitudinal studies are used by a small number of researchers – a situation which limits their potential impact.
With other census data products such as the aggregate statistics or interaction data, potential users are able to download the data onto their own computers; explore it, test their ideas, and experiment with analyses. As a result, these resources are very widely used in social science research and teaching – a situation which unfortunately contrasts with that of the national longitudinal studies, whose user base is comparatively small. Given that confidentiality constraints mean that open access is not possible with the real microdata, alternative options are needed to allow academics and other users to carry out their research more freely. To address this the SYLLS project (Synthetic Data Estimation for UK Longitudinal Studies) has been set up. SYLLS is developing techniques to produce synthetic data which mimics the real data and preserves the relationships between variables and transitions of individuals over time, but is freely accessible without restriction.
This project, a collaboration between the three UK Longitudinal Study Research Support Units – CeLSIUS, LSCS and the NILS-RSU – will make use of two complementary methods for generating synthetic data products:
Statistical modelling, similar to techniques used for multiple imputation, will be used to generate bespoke synthetic datasets for individual research projects. After developing their methods on the synthetic data the users will have the option of having their analysis repeated and, we hope, confirmed on the actual LS data sets. These data will then be catalogued and made available more widely to other potentially interested users.
Microsimulation will be used to generate synthetic longitudinal data ‘spines’ for each of the national longitudinal studies. These ‘spines’ will synthesise the full sample but will only include the most frequently used variables and longitudinal transitions.
We are very pleased to announce the new version of our data dictionary interface featuring: full variable information, advanced search options and the ability to save your own variable lists between sessions. We would greatly appreciate your thoughts on the new system so please find out more...
- RSU Closure Dates We would like to remind all users that the RSU Secure room will be closed on Monday 6th May for the ... [more]
- The First UK Census Longitudinal Studies Conference The first UK Census Longitudinal Studies conference will be held at Queen's University Belfast on 8t... [more]