May 1, 2023 ā€¢ 13 min read

How we built logging as a service with ClickHouse

author picture
Eric Thomas
Software Engineer

At the end of last year, Highlight released aĀ revamped version of our error monitoring product whereby you could go from stacktrace to session replay to see exactly what the user did to trigger an error. Getting this insight gives developers the cheat code to figuring out how a bug occurred in the first place.

The next natural extension to solving errors was providing all the logs emitted from clicking a button cycling down and up the stack and today we are delighted that logging is available for all Highlight users šŸ„³

Like most logging applications, we give you the ability to find logs given whatever custom attributes you attach to a log along with auto-injecting valuable debugging information via our SDKs.

Unlike monitoring tools that do one thing or are so disjointed that nothing works together, we go one step further and autolink everything in our product with zero config.

What follows is our journey to building a logging product with Clickhouse and all the mistakes we made along the way.

Figuring out our schema

At Highlight, weā€™ve gone all in onĀ OpenTelemetry. A nice artifact of that is that it helps accelerate the design phase since we can lean heavily on what theyā€™ve built and adapt it to our needs. We also knew that we wanted to leverage Clickhouse givenĀ thoseĀ whoĀ have paved the way before us. Reaching into the internals of OpenTelemetry code, we can see they have aĀ design if youā€™re running your ownĀ collector.

CREATE TABLE IF NOT EXISTS logs ( Timestamp DateTime64(9), TraceId String, SpanId String, TraceFlags UInt32, SeverityText LowCardinality(String), SeverityNumber Int32, ServiceName LowCardinality(String), Body String, ResourceAttributes Map(LowCardinality(String), String), LogAttributes Map(LowCardinality(String), String), INDEX idx_trace_id TraceId TYPE bloom_filter GRANULARITY 1, INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter GRANULARITY 1, INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter GRANULARITY 1, INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter GRANULARITY 1, INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter GRANULARITY 1, INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1 ) ENGINE MergeTree() PARTITION BY toDate(Timestamp) ORDER BY (ServiceName, SeverityText, toUnixTimestamp(Timestamp), TraceId) TTL Timestamp + toIntervalDay(30) SETTINGS ttl_only_drop_parts = 1;
Copy

This gave us enough of a starting point on design and we modified the schema along the way as we understood our needs more.

Adding multi-tenancy

The above design works if youā€™re housing your own logs, but weā€™re building logging as a service. At Highlight, everything is owned by a project. We considered separate tables per project (even separate databases) so one noisy project canā€™t degrade the experience for everyone else but thisĀ video convinced us we just needed to add a single column (ProjectId).

In Clickhouse, unless explicitly stated, the primary key is derived from theĀ ORDER BY clause. Since each of our queries would include this column, we simplified ourĀ ORDER BY to:

ORDER BY (ProjectId, toUnixTimestamp(Timestamp)
Copy

As long as every query includes aĀ ProjectId andĀ Timestamp in aĀ WHERE clause, Clickhouse can efficiently query our data through theĀ primary key.

Handling time

Of all the things to get right with logs, time is the most important.

OpenTelemetryā€™s schema describes theĀ Timestamp as type,Ā DateTime64(9), which implies every time will have nanosecond precision (e.g.Ā 2023-04-12 10:20:30.999999999 ). The decision to use this much precision has some precedence of ensuring logs that come at the exact same second can be ordered properly.

However, once we populated our table with hundreds of millions of logs, we observed that worse case scenario queries are the ones that search over a large time frame even if the result size is small. A simple query to get the most recent 50 logs over the last 30 days:

SELECT * FROM logs Where ProjectId = 1 AND Timestamp <= now() AND Timestamp >= (now() - toIntervalDay(30)) ORDER BY Timestamp DESC LIMIT 50
Copy

would throw memory limit errors:

"Code: 241. DB::Exception: Memory limit (for user) exceeded: would use 8.00 GiB (attempt to allocate chunk of 4434098 bytes), maximum: 8.00 GiB. OvercommitTracker decision: Query was selected to stop by OvercommitTracker.: Cache info: Buffer path: ch-s3-faf28823-b036-4ef3-b703-750dad5ee0b9/mergetree/yga/qhwalotmlwzbleeyyrlienvykaxpy, hash key: a4c87d31cb1ecdb47bc88ff5cd84fd12, file_offset_of_buffer_end: 430543, internal buffer remaining read range: [430587:430586], read_type: REMOTE_FS_READ_AND_PUT_IN_CACHE, last caller: ef1cdf5e-fba0-470c-a27b-e2b3fa435d20:471, file segment info: None: (while reading column TraceFlags): (while reading from part /var/lib/clickhouse/disks/s3disk/store/48e/48e1599a-61fc-4783-91d0-cea46ab2400f/20230306_94087_95989_1467/ from mark 0 with max_rows_to_read = 3512): While executing MergeTreeReverse. (MEMORY_LIMIT_EXCEEDED) (version 22.13.1.24479 (official build))\\n"
Copy

That query would use 8 GB of memory to execute šŸ¤Æ

After some searching onĀ GitHub issues, asking theĀ Clickhouse Slack community, and becoming adept atĀ EXPLAIN, we discovered that a lot of performance issues can stem from using too much precision on a timestamp. We reasoned as well that using aĀ DateTime (second precision) is just fine from a product perspective.

We created a new table, (aptly namedĀ logs_new ) with the columnĀ Timetamp DateTime and backfilled the data from the old table,Ā logs:

INSERT INTO logs_new SELECT * from logs;
Copy

Running the same query takes uses onlyĀ 520.34 MB of memory. As previously stated, searching a large time frame is the worse case scenario ā€” performance improves greatly when the time frame is smaller or when additional filters are added.

Pagination and permalinking

Thereā€™s three hard problems in computer science: naming things, cache invalidation, and stable, cursor pagination.

When a user loads the logs page, we return back the most recent 50 logs. As they scroll to the end of the window, we load the next 50 logsĀ after the last log in the first request.

This isnā€™t this authorā€™s first rodeo with respect to pagination. Inefficient solutions like usingĀ OFFSET pagination that say ā€œgive me the next pageā€ of logs are easy to set up but far less scalable than solutions like cursor pagination that say ā€œgive me the next set of logsĀ after this cursor*ā€.* For most databases, the cursor is an auto-incrementing ID (presumably this is in sync with a recordā€™s creation time).

But Clickhouse has no concept of an auto-incrementing IDs and when a record is created it is not a true definition of time in a logging application. A logā€™s timestamp should be when it was emitted from a customerā€™s code. Because we use batching techniques under the hood to flush logs periodically, there will be some delay before that log is finally written to our Clickhouse database. To make the problem harder, itā€™s entirely likely that two logs could share the same timestamp if they are emitted at nearly the same time.

Relay to the rescue

A good forcing function for us was figuring out first what the API would look like. We use GraphQL at Highlight because we love taking advantage of its type safety.Ā Relay is an opinionated way of designing your GraphQL API. Weā€™ve never tapped into it prior but they do provide aĀ spec on structuring pagination responses.

{ logs(after: "opaqueCursor") { edges { cursor node { timestamp message ... // other log data } } pageInfo { hasNextPage } } }
Copy

The naming can feel overly verbose for the uninitiated but the takeaway is that if you can achieve the spec, then youā€™ve solved the problem. Relay uses the intentional naming ofĀ cursor which is typeĀ String and not aĀ GraphQLĀ ID. Itā€™s up to the implementer to decide on how this is defined.

The above code example solves paging forward and itā€™s a great starting point before trying to complete the full spec of also paging backwards.

UUID to the rescue

Now that we had our spec in place, we needed to fill in the details. For ourĀ cursor, we wanted something to be unique. Since a logā€™s timestamp isnā€™t guaranteed to be unique, we opted to join it with a new column with a generated UUID to handle conflicts when two logs share the same timestamp.

As mentioned before, aĀ cursor is aĀ String so we used this function to generate one when we are passing aĀ cursor to the frontend.

func encodeCursor(t time.Time, uuid string) string { key := fmt.Sprintf("%s,%s", t.Format(time.RFC3339), uuid) return base64.StdEncoding.EncodeToString([]byte(key)) }
Copy

and this one to decode theĀ cursor passed from the frontend to the backend:

func decodeCursor(encodedCursor string) (timestamp time.Time, uuid string, err error) { byt, err := base64.StdEncoding.DecodeString(encodedCursor) if err != nil { return } arrStr := strings.Split(string(byt), ",") if len(arrStr) != 2 { err = e.New("cursor is invalid") return } timestamp, err = time.Parse(time.RFC3339, arrStr[0]) if err != nil { return } uuid = arrStr[1] return timestamp, uuid, err }
Copy

So when the frontend says ā€œgive me the next of logsĀ after this cursorā€, we can decode the timestamp and UUID and pass it into thisĀ WHERE clause:

WHERE Timestamp <= time AND (Timestamp < time OR UUID < uuid) ORDER BY Timestamp DESC, UUID DESC
Copy

TheĀ AND handles when two logs share the same timestamp meaning that making the same request multiple times will result in the same order of results.

Further reading

Thereā€™s a lot more to pagination and we encourage readers to see theĀ full PR that goes into more detail if you are doing something similar with Clickhouse and weĀ followed up with handling backwards pagination and permalinking (fun stuff with subselects and unions).

Filtering

MostĀ loggingĀ apps have complex languages where 99% of the functionality is unused.Ā Ours is intentionally simple. Given a log such as:

logger.info('Hello world', { table: 'users', }),
Copy

Hello world is the log message,Ā info is the log level, andĀ table:custom is an arbitrary key/value pair.

Searching common fields like the logā€™s level (e.g.Ā level:info ) maps to a top level column in our Clickhouse table:

SELECT * FROM logs WHERE ProjectId = 1 AND Timestamp <= now() AND Timestamp >= (now() - toIntervalDay(1)) WHERE SeverityText = 'info'
Copy

Searching custom attributes (e.g.Ā table:custom) queries against a ClickhouseĀ map column which allows for a flexible schema-less storage.

SELECT * FROM logs WHERE ProjectId = 1 AND Timestamp <= now() AND Timestamp >= (now() - toIntervalDay(1)) WHERE LogAttributes['table'] = 'custom'
Copy

Log message filtering

Ideally logs are filtered by some type of key/value pair but most people are going to just search for a logā€™s message (particularly those who have small apps).

We store log messages in a column calledĀ Body and it has aĀ tokenbf_v1Ā skip index defined as:

INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1
Copy

Which means that it stores a string such as ā€œHello worldā€ as tokensĀ ["Hello", "world"]. Hence, to effectively use that index, searching against this requires us to split up the string passed in:

SELECT * FROM logs WHERE ProjectId = 1 AND Timestamp <= now() AND Timestamp >= (now() - toIntervalDay(1)) WHERE hasToken(Body, 'Hello') AND hasToken(Body, 'world')
Copy

Weā€™ve also noticed performance gains by manually overridingĀ PREWHERE for these types of searches. Check out thisĀ pull request if you want a deeper dive withĀ EXPLAIN .

Wiring this up in Go

Much of this blog post has focused on Clickhouse, but we also wanted to give some attention to how we built this with Go, our language of choice at Highlight. And while we love Go, we do know it can be difficult to sift through the many libraries out there. With that being said, hereā€™s a speed run of how we use Go with Clickhouse along with pull request examples.

Connecting to Clickhouse

Clickhouse provides anĀ official client for connecting to a Clickhouse database. We recommend this middle ground over low level tooling likeĀ ch-go and high level ORM tooling likeĀ go-clickhouse thatā€™ll just get in your way.

https://github.com/highlight/highlight/pull/3606/

Migrations

TheĀ migrate tool has multiple database support (Clickhouse being one of them). Migrations are automatically run when our backend spins up.

https://github.com/highlight/highlight/pull/4109/

Testing

We have a dedicated test database for Clickhouse that is spun up in each of our tests

func TestMain(m *testing.M) { _, err := setupClickhouseTestDB() if err != nil { panic("Failed to setup clickhouse test database") } code := m.Run() os.Exit(code) }
Copy

and each test handles setting up a test client along with cleaning up the data:

func setupTest(tb testing.TB) (*Client, func(tb testing.TB)) { client, _ := NewClient(TestDatabase) return client, func(tb testing.TB) { err := client.conn.Exec(context.Background(), "TRUNCATE TABLE logs") assert.NoError(tb, err) } } func TestBatchWriteLogRows(t *testing.T) { ctx := context.Background() client, teardown := setupTest(t) defer teardown(t) // some code that uses `ctx` and `client` }
Copy

SeeĀ logs_test.go for all things we found useful to test.

Query building

Much of what weā€™ve talked about thatā€™s Clickhouse related is just an SQL snippet. In a real application, this is only half the battle. TheĀ WHERE part of our SQL is going to constantly change given how a user wants to query their logs.

Fortunately, because Clickhouse uses familiar SQL syntax, there are plenty of SQL builders in Go. Initially we went withĀ squirrel but ended up pivoting to the aptly namedĀ go-sqlbuilder because unlike most SQL builders, it has support for when querying gets more complex (like subqueries) while still ensuring code is safe from SQL injections.

All of our logic for querying (and writing) to Clickhouse can be found inĀ logs.go.

Summary

In closing, Clickhouse has been a pleasure to work with. Despite no one having any prior knowledge, we were able to pick it up quickly. This is in large part to the great documentation and the Clickhouse community (support team especially). We are already considering using Clickhouse for new feature development and replacing other parts of our stack (namely OpenSearch and InfluxDB).

Thereā€™s still a lot moreĀ tuning weā€™d like to do with Clickhouse as we scale out more. If youā€™d like to learn more about how we use Clickhouse at Highlight, feel free to ask us anything in ourĀ discord.

Other articles you may like

Day 5: Our Partners & Supporters
How to Set Up Your Production AWS MSK Kafka Cluster
Day 3: Open Sourcing our Company
Try Highlight Today

Get the visibility you need