Dataset Details
Synthetic dataset - Hospitalised patients with Thromboembolic diagnosis
Published by PIONEER: HDR UK Health Data Hub in Acute Care
Description
Background Annually in the UK, around 60,000 people develop a pulmonary embolism (PE) and 200,000 a deep vein thrombosis (DVT) and the number of emergency admissions for suspected PE and DVT is increasing. Diagnosing PE and DVT remains a challenge due to the non-specific nature of presenting symptoms. Further tests are often required and each year the number of CTPAs and USS performed for suspected VTE increases. There is great interest in finding better tools to identify those with the highest likelihood of a DVT and PE, so that precious screening services can be focused where needed most. A number of tools have been suggested but few have been adopted in clinical practice. Methods such as age-adjusted D-dimer tests and 4PEPs and 4D scores aim to predict PE and DVT more accurately. Implementing a more precise system could revolutionise how we diagnose and treat these dangerous conditions. This dataset enables an exploration of VTE to better understand disease, identify patients at most risk of the poorest outcomes and to improve health services through the development of new prognostic tools. PIONEER geography: The West Midlands (WM) has a population of 5.9 million & includes a diverse ethnic & socio-economic mix. UHB is one of the largest NHS Trusts in England, providing direct acute services & specialist care across four hospital sites, with 2.2 million patient episodes per year, and 2,750 beds. UHB runs a fully electronic healthcare record (EHR) (PICS; Birmingham Systems), a shared primary & secondary care record (Your Care Connected) & a patient portal “My Health.” Methodology: A specific pipeline was designed for the generation of the synthetic version of thromboembolic events dataset including data pre-processing, synthetising, and post-process steps. In brief, a generative adversarial network model (CTGAN) in the SDV package (N. Patki, 2016) was employed to generate synthetic dataset which is statistically equivalent to a real dataset. Pre-process and post-process steps were customised to improve the realisticity of the synthetic data. Scope: Enabling data-driven research and machine learning models towards improving the diagnosis of Thromboembolic events (PE/DVT). Real-world dataset linked. The dataset includes large patient demographics, clinical scores, and medical conditions for PE/DVT patients, alongside outcomes taken from ICD-10 & SNOMED-CT codes. Available supplementary data: real-world PE/DVT cohort. Available supplementary support: Analytics, model build, validation & refinement; A.I.; Data partner support for ETL (extract, transform & load) process, Clinical expertise, Patient & end-user access, Purchaser access, Regulatory requirements, Data-driven trials, “fast screen” services.
Topics
Dataset Information
Resource Type:
dataset
Geographic Coverage:
GB
Temporal Coverage:
2016-12-26T00:00:00.000Z/2021-12-28T00:00:00.000Z
Publisher
PIONEER: HDR UK Health Data Hub in Acute Care
Data Catalogs
Health Data Research Innovation Gateway