This document outlines the process of integrating a custom data source in Glean. The process includes defining the source and permissions, syncing data, and validating results. Key prerequisites include source API access, understanding of objects to be indexed, and the permissions model. The process culminates with testing and monitoring to ensure seamless integration.
Begin by identifying the data source, which will be Gainsight in this scenario. Determine the list of objects and fields that need to be synchronized with Glean.

While identifying the objects and fields, ensure these are not already present in Glean groups or other data sources. Next, identify the list of users who should have access to this data. In Gainsight, two profiles have been identified: Full and Viewer Analytics.

Assign access permissions to individuals with these two profiles, ensuring they have the requisite access to Glean's Gainsight data. After completing these identifications, proceed to create the custom data source in Glean by navigating to the Admin Console and selecting Data Sources.

Add a new data source by choosing Custom Data Source and provide it with a unique name.

Use the same name for the display name and select the appropriate category. Provide the instance URL of the data source.

Ensure that you enable the option to use email addresses for document access control if required.

Enable the necessary option for email address usage.

Proceed by not creating object definitions in the UI, as this can be complex.

Click on Publish to finalize the setup.

Upon publishing, the data source will appear under the All Data Sources tab. Next, create an API indexing token for use with Glean APIs to retrieve data from Gainsight and push it to Glean. Click on Add Token and provide a meaningful description associated with the data source.

Enable the option to create an indexing API token for use across all data sources.

Specify that this API token is to be used exclusively with the Gainsight data source. Provide the required data source details, choose an expiry date, and click Save.

Once saved, the API token will be generated. Ensure its secure storage and do not share it with others.

After obtaining the API token, proceed to create object and property definitions for the data source based on the identified objects and fields from Gainsight.

Use the Add Data Source Endpoint to see the payload with object and property definitions. Property definitions include a name, display name, and type of field, along with an option to hide the UI facet.

Enable the field to be displayed as a filter in search results. Once enabled, it will appear as a filter option.

Having set up the data source, verify that all properties are updated in the Glean data source.

Verify in Glean that all object definitions and properties have synced.

Verify in Gainsight that all pushed properties and objects have synced here as well.

At this stage, everything should be configured in Glean.

With the API indexing token and defined data source objects and properties, proceed to create middleware to sync data between Gainsight and Glean. Utilize Trailhead AI for automation.

Define four key workflows: two for full loads and two for delta loads.

Start with a full load workflow to index users into Glean by fetching user records from Gainsight matching the Full and Viewer Analytics profiles.

Create a payload, establish a production group in Glean, and index users from Gainsight into Glean using the bulk index users endpoint. Add all indexed users to the group.

After completing the full load, utilize a delta load workflow that identifies user changes every hour and updates the group accordingly.

Once users and groups are set up, perform a full load with all identified records.

Create a full load workflow to retrieve all object records and create a payload.

Assign permissions to records based on the established group and index the documents using bulk index documents.

Sync all records from Gainsight to Glean.

Enable a delta load workflow to identify changes every hour and index them accordingly.

Understand the difference between bulk index documents, which replaces previous loads, and index documents, used for incremental updates. Implement batching for large payloads and adhere to rate limits.

Limit document indexing to 500 per batch with a 30-second delay to avoid rate limits.

Test the results by querying the data source. For example, inquire about the last engagement date for 66degrees, and the system should retrieve the corresponding data.

Perform a broader query, asking for specific customer details, and ensure the system provides accurate information from the Gainsight documents.

The system should provide exact information and identify the documents where the data was fetched, completing the integration process. Thank you.
