Globus

Introduction

Globus is an infrastructure designed for managing data, potentially among multiple institutions. At Virginia Tech, ARC maintains an institutional Globus Subscription and hosts a Globus Connect Server (GCS) which provides a Globus endpoint for ARC’s /projects directories. Among other features, Globus provides fault tolerance for (large) data transfers.

VT’s Subscription Level

VT has subscribed to the High Assurance tier to help enable the transfer of Protected Health Information (PHI), Personally Identifiable Information (PII), and Controlled Unclassified Information (CUI) data. This allows VT entities to host Globus Connect Servers and designate collections as on enabled servers as “HA”.

In addition, individuals may create personal Globus accounts and use Globus Connect Personal (GCP) on ARC systems or other platforms if they have their own Globus license.

Sharing Data Via Globus

Internal Visibility

ARC /projects directories will be “Shared via Globus” when the directory owner (the PI) enables sharing for the directory using the check box in the ColdFront interface. This causes the folder to appear in the Globus infrastructure as part of the “Virginia Tech ARC Globus Projects Directories” collections, but does not make your data visible to anyone who did not already have access to it.

Common uses of this level of sharing include:

  • moving data in and out of endpoints you control where one endpoint is an ARC /projects/ directory

  • downloading datasets from other institutions shared data collections

Guest Collections for Broader Sharing

Globus Guest Collections unlock the even more potential for hosting and sharing data including with other institutions and people outside of VT. When you create a Guest Collection, you will define who can access it and what permissions they will have on the data within it.

There is a manual configuration that ARC will need to make within your /projects directory to enable Guest Collections within it. This extra step is required because of the additional exposure and risk of data loss that comes along with Guest Collections. Thus, you cannot set up a guest collection (GC) on your own and you should contact ARC supprt to have one set up. Please enter the ticket here: ARC Support. On the page, click “Request this service” near the upper right. Then ARC will have a consultation with you to understand your needs.

Getting Started with Globus

  1. Log in to Globus: Confirm that you can log in to the Globus Web Site, https://globus.org. If you do not already have a Globus account, you will need to create one and associate your VT credentials with the Globus account.

  2. Enable Globus for your ARC project: If you plan to use Virginia Tech ARC’s Globus license, you must enable Globus sharing for your /projects directory. The owner (usually the PI) of the /projects allocation can do this through ARC’s ColdFront allocation management system. Use the steps and screenshot below as guidance:

    • Open your project storage allocation in ColdFront.

    • Check the box for “Share via Globus” and click Update.

    • The change takes effect immediately.

  3. File and directory permissions for your ARC project: Even if you enable your /projects directory to share files using Globus via ColdFront, you still control permissions on your files and directories. That is, you still have control over the directories and files within your /projects directories that are accessible by Globus. So, for example, you can make invisible files that you do not want to be copied from your area using Globus by unsetting the read bit; and you can ensure that a file does not get overwritten using Globus (when copying a file into your /projects area) by unsetting the write bit on that file. (These file permissions bits are set and unset using the Unix chmod command.)

When transferring data with Globus, at least one endpoint must be covered by an active Globus subscription (institutional or personal license).  
- If your institution already has a license, you can transfer directly to `/home` or `/projects` using Globus Connect Personal (GCP).  
- If you intend to use Virginia Tech ARC’s Globus license, the ARC endpoint must be `/projects`, and it must be enabled in ColdFront as described above.

Transferring to/from ARC using VT ARC’s Globus license (/projects)

If you plan to use VT ARC’s Globus license, transfers are only supported to the /projects directory with “Share via Globus” enabled (as described in the Prerequisites section). Once enabled, any member of the associated project group can:

  1. Log in to https://globus.org

  2. In the File Manager, search for “Virginia Tech” to locate the “Virginia Tech ARC Globus Projects Directories” (GCS) collection. Select this collection, and the shared directory will be visible.

    • All /projects directories with “Share via Globus” enabled will appear.

    • Access permissions remain restricted to project group members, the same as on ARC clusters.

  3. Transfer files between your /projects directory endpoint and another endpoint. The other endpoint can be any GCS or GCP endpoint and does not require a license, since you are using VT ARC’s Globus license. (If you are not seeing two endpoints as in the graphic immediately below, go to the “Panels” area at the upper right and choose the middle of the three icons.)

Transferring to/from ARC using own Globus license

If you plan to use your own Globus license, you can transfer data to both /home and /projects directories using Globus Connect Personal (GCP).

Globus Connect Personal (GCP)

GCP can be used to connect a device or storage location you own to the Globus network. For example, you can make your /home/<username> or /projects/<groupname> group-shared directory accessible to you when you log into the https://globus.org web application. When you do this, it shows up in your “Collections.” You can then browse, upload, download, and coordinate transfers among other collections in the Globus web application. Detailed information on using GCP is available on Globus’ website.

Optimizing file transfer performance

Filesets with “Lots of Small Files” (LoSF) are the worst-case scenario for most file systems and transfer tools. For stability and performance, it is best to package such LoSF filesets into archives using tools such as tar. Attempting transfers of LoSF filesets via Globus can cause very poor performance and faults such as ENDPOINT_TOO_BUSY.