Skip to content

CREATE TRE Architecture Design

TRE key components:

  • Users log in to the e-Research portal using Multi Factor Authentication (MFA)
  • Users need to be a member of one or more security groups to access the TRE, the 4 security groups are:
    • Standard user:
      • Access to the TRE project Virtual Desktop machines (VMs)
      • Read-only access to the project dataset directories
    • Data manager user (in addition to above):
      • Have full access to dataset directories
    • Data ingress user:
      • Allowed to import data to the TRE project
    • Data egress user:
      • Allowed to export data from the TRE project
      • Data egress details are logged for audit purposes and users with egress access must be trained to appropriately classify and approve data for egress
  • Connection to the TRE web portal is encrypted (HTTPS)
  • The portal web server then makes an encrypted (RDP) connection to the TRE project VMs (Virtual Machine)
  • A private OpenStack cloud contains the TRE project elements:
    • Researchers VMs (contain pre-installed software and access to the project dataset)
    • Data mover VMs (dedicated to input and output TRE data by project’s nominated data ingress/egress users)
    • SMB server (stores encrypted TRE project dataset and user/project data, backed up outside the environment)
    • Security Groups control egress and ingress traffic to the TRE environment
    • No Internet access from the project VMs
  • Proxy server (to control ingress/egress from TRE projects)

TRE Environment Diagram

graph TB

A(TRE Researcher)
A1(TRE Data Ingress User)
A2(TRE Data Egress User)
A3(Web browser)
B(KCL Firewall)
E(Researcher VM)
E1(Researcher VM)
F(Researcher VM)
F1(Researtech VM)
G(Data movers VM)
G1(Data movers VM)
H(Guacamole Server)
H1(Guacamole Server)
I(Backup)
I1(Backup)
M(NGINX)
O(SMB Server)
O1(SMB Server)

subgraph WAN
A-->A3
A1-->A3
A2-->A3
end
subgraph KCL_premier_managed_by_KCL_IT
O-->I
O1-->I1
I1
A3-->|HTTPS|B-->|HTTPS|M
    subgraph PVE_e_Research
     M
    end
        subgraph OPENSTACK
    M-->|HTTPS|H-->|RDP|E-->O
    H-->|RDP|E1-->O
    H-->|RDP|G-->O
    M-->|HTTPS|H1-->|RDP|F-->O1
    H1-->|RDP|F1-->O1
    H1-->|RDP|G1-->O1
          subgraph "FIREWALL"
                subgraph TRE_Project_1
                E
                E1
                G
                O
                H
                end
                subgraph TRE_Project_2
                F
                F1
                G1
                O1
                H1
                end
            end
        end
end

style FIREWALL fill:#ff0000
style TRE_Project_1 fill:#ffffff
style TRE_Project_2 fill:#ffffff
style OPENSTACK fill:#f5f5f5
style KCL_premier_managed_by_KCL_IT fill:#f8f8ff
style PVE_e_Research fill:#eeffee
style WAN fill:#eafffe
style B fill:#ff0000
linkStyle 0,1,2,5,6,7,8,10,12,14,15,17,19 stroke-width:2px,stroke:green

Data Egress Application Process

Steps

  1. Researcher Phase:
  2. Researcher selects files for egress.
  3. Provides a description of the files.
  4. Confirms training completion.
  5. Submits the egress request.

  6. Egress Authority Decision:

  7. The Data Egress Authority reviews the submitted request.
  8. If accepted, it checks whether the Multi-Factor Authentication (MFA) session is valid.

  9. MFA Authentication:

  10. If MFA is not valid, authenticate through MFA.
  11. An OTP (One-Time Password) is sent to the Data Owner's email.

  12. API Processing:

  13. The request to copy files is sent to the API.
  14. The API uses rsync to copy the files.
  15. The Egress Requestor is notified by email once the process is complete.

  16. Email Notification:

  17. Transfer emails to the Mail Transfer Agent (MTA).
  18. Send email to the destination.

  19. Database Logging:

  20. Data related to the request is saved to a MySQL Database.
graph TB

A(Researcher)
B1(Select Files)
B2(Provide Description)
B3(Training Completed?)
B4(Submit Request)
D(Authenticate through MFA)
D1(Request to Copy Files Sent to API)
D2(API Copies Files Using rsync)
D3(Egress Requestor Notified by Email)
E(Egress Requestor Notified by Email)
F(Transfer Emails to MTA)
G(Send Email to Destination)
H(MySQL Database)

A --> B1
B4 --> C{Decision by Data Egress Authority}
C -- Accept --> C1{Is MFA Session Valid?}
C1 -- Yes --> D1
C1 -- No --> D
D -->|Send OTP to Data Egress Authority Email| F
C -- Reject --> E
E --> F
B4 -->|Data Saved to DB| H
D --> D1

subgraph Egress Portal
    B1 --> B2
    B2 --> B3
    B3-->B4
end

subgraph API
    D1 --> D2
    D2 --> D3
end

subgraph Postfix Mail Relay
    B4 -->|Data Mover Notified by Email| F
end

subgraph Mail Transfer Agent
    F --> G
end

subgraph MySQL
    H
end

linkStyle 1,3,4,5,7,8,9,13,14,15,16 stroke-width:2px,stroke:red
linkStyle 0,2,6,10,11,12 stroke-width:2px,stroke:green

Data Ingress Process

Data Ingress Steps

  1. Access via Public Internet or Trusted Network:
  2. TRE Data Ingress Users access the system either directly from the Public Internet or a Trusted Network using SFTP clients.

  3. Firewall and Server Interaction:

  4. Connections are made through the KCL Firewall to the SFTP and OpenVPN servers.
  5. Data is transferred to the TRE Directory and CephFS.

  6. VM and Backup Interaction:

  7. Researcher VMs interact with the SMB Server.
  8. Data is backed up accordingly.

  9. Virus Scan and File Check:

  10. A Cron Job checks for new files in CephFS and runs a ClamAV virus scan.
graph LR

A(TRE Researcher)
A1(TRE Data Ingress User)
A2(TRE Data Egress User)
A3(SFTP client)
A4(TRE Data Ingress User)
A5(External agency)
A6(SFTP client)
B(KCL Firewall)
B1(KCL Firewall)
E(Researcher VM)
E1(Researcher VM)
F(TRE Directory,CephFS)
G(Data movers VM)
K(SFTP Server)
H(Guacamole Server)
I(Backup)
O(SMB Server)
Q(Cron Job, check for new files in the CephFS, runs a ClamAV virus scan)
P(TRE Ingress User1 Directory)
P1(TRE Ingress User2 Directory)
P2(TRE Ingress User3 Directory)

subgraph Public_Internet
A-->A3
A1-->A3
A2-->A3
end

subgraph Trusted_Network
A4-->A6
A5-->A6
end

subgraph KCL_perimeter_managed_by_KCL_IT
K-->F<--->Q-->O
O-->I
F-->P
F-->P1
F-->P2
A6--->|SSH|B1-->|SSH|K
A3--->|SSH|B-->|SSH|K

    subgraph PVE_e_Research
    K
    end

    subgraph OPENSTACK
    H-->|RDP|E-->O
    H-->|RDP|E1-->O
    H-->|RDP|G-->O

        subgraph "FIREWALL"
            subgraph TRE_Project_1
            E
            E1
            G
            O
            H
            Q
            end
        end
    end

end

style FIREWALL fill:#ff0000
style TRE_Project_1 fill:#ffffff
style OPENSTACK fill:#f5f5f5
style KCL_perimeter_managed_by_KCL_IT fill:#f8f8ff
style PVE_e_Research fill:#eeffee
style Public_Internet fill:#eafffe
style Trusted_Network fill:#eafffe
style B fill:#ff0000
style B1 fill:#ff0000
linkStyle 1,3,4,5,6,7,8,9,10,12,13,14,15,16,17 stroke-width:2px,stroke:red

Access HTTPS Process

Access HTTPS Steps

  1. Access:
  2. TRE Data Ingress, and Data Egress Users access the system via a web browser.

  3. Firewall Interaction:

  4. HTTPS requests are sent through the KCL Firewall/MFA to the NGINX server.

  5. Proxy and VM Access:

  6. TRE_Squid_Proxy interacts with the KCL Firewall and NGINX.
  7. Researchers access their VMs through the Guacamole Server via HTTPS and RDP.

  8. Data Movement:

  9. Data movers VM interacts with the SMB Server for data transfer.
  10. HTTPS access requests are processed between the Data movers VM and TRE_Squid_Proxy.
graph LR

A(TRE Researcher)
A1(TRE Data Ingress User)
A2(TRE Data Egress User)
A3(Web browser)
A4(https://)
B(KCL Firewall)
B1(KCL Firewall/MFA)
C(TRE_Squid_Proxy)
E(Researcher VM)
E1(Researcher VM)
G(Data movers VM)
H(Guacamole Server)
I(Backup)
M(NGINX)
O(SMB Server)

subgraph WAN
A-->A3
A1-->A3
A2-->A3
end

subgraph Internet
A4
end

subgraph KCL_perimeter_managed_by_KCL_IT
O---->I
A3-->|HTTPS|B1-->|HTTPS|M
C<--->B<--->A4

    subgraph PVE_e_Research
     M
     C
    end

    subgraph OPENSTACK
    M-->|HTTPS|H-->|RDP|E-->O
    H-->|RDP|E1-->O
    H-->|RDP|G-->O
    G<-->|HTTPS access request|C

    subgraph "FIREWALL"
        subgraph TRE_Project_1
        E
        E1
        G
        O
        H
        end
    end
end
end

style FIREWALL fill:#ff0000
style TRE_Project_1 fill:#ffffff
style OPENSTACK fill:#f5f5f5
style KCL_perimeter_managed_by_KCL_IT fill:#f8f8ff
style PVE_e_Research fill:#eeffee
style WAN fill:#eafffe
style Internet fill:#eafffe
style B fill:#ff0000
style B1 fill:#ff0000
linkStyle 6,7,15 stroke-width:2px,stroke:red
linkStyle 1,2,4,5,8,13 stroke-width:2px,stroke:green

External Software Repository Access

External Repository Access Steps

  1. Accessing GitHub:
  2. Researchers have read access to the GitHub.com.

  3. NGINX and Proxy Interaction:

  4. HTTPS requests are filtered and processed through NGINX and TRE_Squid_Proxy.
  5. There's an /etc/hosts entry for GitHub.com to facilitate access.

  6. Firewall and VM Interaction:

  7. The system is protected by a firewall, and VMs are part of the TRE Project.
graph LR

A4(GitHub.com)
C(TRE_Squid_Proxy)
E(Researcher VM)
E1(Researcher VM)
G(Researcher VM)
H(TRE Researcher)
M(NGINX)
O(Web browser)

subgraph Internet
A4
end

subgraph KCL_perimeter_managed_by_KCL_IT
M<--->| HTTPS/ Filtering of HTTP verbs and URL paths|A4
    subgraph PVE_e_Research
     M
     C
    end
        subgraph OPENSTACK
    H-->|RDP|E-->O
    H-->|RDP|E1-->O
    H-->|RDP|G-->O
    O<-->|HTTPS|C-->|HTTPS /etc/hosts entry for Github.com|M

    subgraph "FIREWALL"
        subgraph TRE_Project_1
        E
        E1
        G
        O
        H
        end
    end
end
end

style FIREWALL fill:#ff0000
style TRE_Project_1 fill:#ffffff
style OPENSTACK fill:#f5f5f5
style KCL_perimeter_managed_by_KCL_IT fill:#f8f8ff
style PVE_e_Research fill:#eeffee
style Internet fill:#eafffe
linkStyle 0,7,8 stroke-width:2px,stroke:red
linkStyle 1,2,3,4,5,6 stroke-width:2px,stroke:green