Cosmos Graph DB Automation with Gremlin

We’re getting started on a new project using a graph DB on Microsoft’s Cosmos DB, so we want to set up a process for Cosmos DB deployment automation. We use Azure Resource Manager Templates for automating our Azure resource deployment.

This post assumes you have previous experience with ARM templates, Gremlin, and Cosmos DB in general, so we’ll jump straight into some of the quirks that popped up in our setup. If you want some more basic info on Gremlin, check out our previous blog post or head over to the Gremlin documentation.

Graph DBs in Cosmos consist of 3 main components: a Cosmos Database account, one or more databases, and one or more graphs (collections) in each database. Database accounts can be provisioned using ARM templates, but you’ll have to find another way to create databases or graphs automatically. We do this with some .NET code that runs on startup of our application, which we’ll show in a bit.

Provisioning the Cosmos Graph DB Account

You can find the ARM template format for Cosmos DB accounts here.

Most of the settings are straightforward, and you can view the property descriptions for information on some of the various settings for Cosmos DB accounts.

However, you need to use some specific settings for graph databases. When we first created our DB account, we got a lot of strange, unspecific errors when trying to load a graph in the Azure portal because we were missing a few graph DB-specific settings. I was able to find those settings in Github’s Azure quickstart templates. The settings we’re concerned with are:

 	“kind”: “GlobalDocumentDb”
 	"tags": { "defaultExperience":  "Graph" }
 	"properties: {  "capabilities": [ {"name": "EnableGremlin"}  } ]

Creating the Database and Graph

Once we set up our database account, we’re ready to create our database and graph. You can do this manually, but we opt to do it automatically as part of our application startup using the DocumentClient class in the Microsoft.Azure.DocumentDB.Core NuGet package.

We need a couple of parameters to create our DocumentClient:  the .NET SDK URI and the primary auth key for the database account. You can find the URI in the “Overview” or “Keys” tab for your DB account (be sure to use the .NET SDK URI and not the Gremlin Endpoint) and the primary key in the “Keys” tab.

Cosmos db settings

We call the following method in our application’s startup:

public async Task SetupGremlinDb()
    DocumentClient docDbClient = 
            new DocumentClient(new Uri(_gremlinDbOptions.Uri), _gremlinDbOptions.AuthKey);
    await CreateDatabaseIfNotExists(docDbClient);
    await CreateDocumentCollectionIfNotExists(docDbClient);

This function creates a DocumentClient, which it uses to create our database and our graph, if necessary. In our first method, we check to see if the database already exists, and if it doesn’t, we create it.

public async Task CreateDatabaseIfNotExists(DocumentClient client)
    // Check to verify a database does not exist
    var result = client.CreateDatabaseQuery()
            .Where(d => d.Id == _gremlinDbOptions.Database).AsEnumerable().FirstOrDefault();
    if (result == null)
    // If the database does not exist, create a new database
    await client.CreateDatabaseAsync(new Database { Id = _gremlinDbOptions.Database });

We also need the database ID – whatever we want to call the database. Next, we check to see if our graph already exists, and if it doesn’t, we create it.

public async Task CreateDocumentCollectionIfNotExists(DocumentClient client)

    var result = client.CreateDocumentCollectionQuery(
            .Where(c => c.Id == _gremlinDbOptions.Collection).AsEnumerable().FirstOrDefault();
    if (result == null)
        // If the document collection does not exist, create a new collection
        var collectionInfo = new DocumentCollection
            Id = _gremlinDbOptions.Collection

        // Here we create a collection with 400 RU/s.
        await client.CreateDocumentCollectionAsync(
                new RequestOptions { OfferThroughput = 400 });

This code could also be reused for setting up multiple databases and graphs. Once that’s all set up, running this code should create our database and graph, and it’s ready to use. Depending on what sort of tooling / framework you use for your CI/CD process, you could likely further streamline this process.

Interested in learning more about our work? Check out some of our previous projects.