At the end of last year, Highlight released aĀ revamped version of our error monitoring product whereby you could go from stacktrace to session replay to see exactly what the user did to trigger an error. Getting this insight gives developers the cheat code to figuring out how a bug occurred in the first place.
The next natural extension to solving errors was providing all the logs emitted from clicking a button cycling down and up the stack and today we are delighted that logging is available for all Highlight users š„³
Like most logging applications, we give you the ability to find logs given whatever custom attributes you attach to a log along with auto-injecting valuable debugging information via our SDKs.
Unlike monitoring tools that do one thing or are so disjointed that nothing works together, we go one step further and autolink everything in our product with zero config.
What follows is our journey to building a logging product with Clickhouse and all the mistakes we made along the way.
Figuring out our schema
At Highlight, weāve gone all in onĀ OpenTelemetry. A nice artifact of that is that it helps accelerate the design phase since we can lean heavily on what theyāve built and adapt it to our needs. We also knew that we wanted to leverage Clickhouse givenĀ thoseĀ whoĀ have paved the way before us. Reaching into the internals of OpenTelemetry code, we can see they have aĀ design if youāre running your ownĀ collector.
CREATE TABLE IF NOT EXISTS logs (
Timestamp DateTime64(9),
TraceId String,
SpanId String,
TraceFlags UInt32,
SeverityText LowCardinality(String),
SeverityNumber Int32,
ServiceName LowCardinality(String),
Body String,
ResourceAttributes Map(LowCardinality(String), String),
LogAttributes Map(LowCardinality(String), String),
INDEX idx_trace_id TraceId TYPE bloom_filter GRANULARITY 1,
INDEX idx_res_attr_key mapKeys(ResourceAttributes) TYPE bloom_filter GRANULARITY 1,
INDEX idx_res_attr_value mapValues(ResourceAttributes) TYPE bloom_filter GRANULARITY 1,
INDEX idx_log_attr_key mapKeys(LogAttributes) TYPE bloom_filter GRANULARITY 1,
INDEX idx_log_attr_value mapValues(LogAttributes) TYPE bloom_filter GRANULARITY 1,
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1
)
ENGINE MergeTree()
PARTITION BY toDate(Timestamp)
ORDER BY (ServiceName, SeverityText, toUnixTimestamp(Timestamp), TraceId)
TTL Timestamp + toIntervalDay(30)
SETTINGS ttl_only_drop_parts = 1;
This gave us enough of a starting point on design and we modified the schema along the way as we understood our needs more.
Adding multi-tenancy
The above design works if youāre housing your own logs, but weāre building logging as a service. At Highlight, everything is owned by a project. We considered separate tables per project (even separate databases) so one noisy project canāt degrade the experience for everyone else but thisĀ video convinced us we just needed to add a single column (ProjectId).
In Clickhouse, unless explicitly stated, the primary key is derived from theĀ ORDER BY clause. Since each of our queries would include this column, we simplified ourĀ ORDER BY to:
ORDER BY (ProjectId, toUnixTimestamp(Timestamp)
As long as every query includes aĀ ProjectId andĀ Timestamp in aĀ WHERE clause, Clickhouse can efficiently query our data through theĀ primary key.
Handling time
Of all the things to get right with logs, time is the most important.
OpenTelemetryās schema describes theĀ Timestamp as type,Ā DateTime64(9), which implies every time will have nanosecond precision (e.g.Ā 2023-04-12 10:20:30.999999999 ). The decision to use this much precision has some precedence of ensuring logs that come at the exact same second can be ordered properly.
However, once we populated our table with hundreds of millions of logs, we observed that worse case scenario queries are the ones that search over a large time frame even if the result size is small. A simple query to get the most recent 50 logs over the last 30 days:
SELECT *
FROM logs
Where ProjectId = 1
AND Timestamp <= now() AND Timestamp >= (now() - toIntervalDay(30))
ORDER BY
Timestamp DESC
LIMIT 50
would throw memory limit errors:
"Code: 241. DB::Exception: Memory limit (for user) exceeded: would use 8.00 GiB (attempt to allocate chunk of 4434098 bytes), maximum: 8.00 GiB. OvercommitTracker decision: Query was selected to stop by OvercommitTracker.: Cache info: Buffer path: ch-s3-faf28823-b036-4ef3-b703-750dad5ee0b9/mergetree/yga/qhwalotmlwzbleeyyrlienvykaxpy, hash key: a4c87d31cb1ecdb47bc88ff5cd84fd12, file_offset_of_buffer_end: 430543, internal buffer remaining read range: [430587:430586], read_type: REMOTE_FS_READ_AND_PUT_IN_CACHE, last caller: ef1cdf5e-fba0-470c-a27b-e2b3fa435d20:471, file segment info: None: (while reading column TraceFlags): (while reading from part /var/lib/clickhouse/disks/s3disk/store/48e/48e1599a-61fc-4783-91d0-cea46ab2400f/20230306_94087_95989_1467/ from mark 0 with max_rows_to_read = 3512): While executing MergeTreeReverse. (MEMORY_LIMIT_EXCEEDED) (version 22.13.1.24479 (official build))\\n"
That query would use 8 GB of memory to execute š¤Æ
After some searching onĀ GitHub issues, asking theĀ Clickhouse Slack community, and becoming adept atĀ EXPLAIN, we discovered that a lot of performance issues can stem from using too much precision on a timestamp. We reasoned as well that using aĀ DateTime (second precision) is just fine from a product perspective.
We created a new table, (aptly namedĀ logs_new ) with the columnĀ Timetamp DateTime and backfilled the data from the old table,Ā logs:
INSERT INTO logs_new
SELECT * from logs;
Running the same query takes uses onlyĀ 520.34 MB of memory. As previously stated, searching a large time frame is the worse case scenario ā performance improves greatly when the time frame is smaller or when additional filters are added.
Pagination and permalinking
Thereās three hard problems in computer science: naming things, cache invalidation, and stable, cursor pagination.
When a user loads the logs page, we return back the most recent 50 logs. As they scroll to the end of the window, we load the next 50 logsĀ after the last log in the first request.
This isnāt this authorās first rodeo with respect to pagination. Inefficient solutions like usingĀ OFFSET pagination that say āgive me the next pageā of logs are easy to set up but far less scalable than solutions like cursor pagination that say āgive me the next set of logsĀ after this cursor*ā.* For most databases, the cursor is an auto-incrementing ID (presumably this is in sync with a recordās creation time).
But Clickhouse has no concept of an auto-incrementing IDs and when a record is created it is not a true definition of time in a logging application. A logās timestamp should be when it was emitted from a customerās code. Because we use batching techniques under the hood to flush logs periodically, there will be some delay before that log is finally written to our Clickhouse database. To make the problem harder, itās entirely likely that two logs could share the same timestamp if they are emitted at nearly the same time.
Relay to the rescue
A good forcing function for us was figuring out first what the API would look like. We use GraphQL at Highlight because we love taking advantage of its type safety.Ā Relay is an opinionated way of designing your GraphQL API. Weāve never tapped into it prior but they do provide aĀ spec on structuring pagination responses.
{
logs(after: "opaqueCursor") {
edges {
cursor
node {
timestamp
message
... // other log data
}
}
pageInfo {
hasNextPage
}
}
}
The naming can feel overly verbose for the uninitiated but the takeaway is that if you can achieve the spec, then youāve solved the problem. Relay uses the intentional naming ofĀ cursor which is typeĀ String and not aĀ GraphQLĀ ID. Itās up to the implementer to decide on how this is defined.
The above code example solves paging forward and itās a great starting point before trying to complete the full spec of also paging backwards.
UUID to the rescue
Now that we had our spec in place, we needed to fill in the details. For ourĀ cursor, we wanted something to be unique. Since a logās timestamp isnāt guaranteed to be unique, we opted to join it with a new column with a generated UUID to handle conflicts when two logs share the same timestamp.
As mentioned before, aĀ cursor is aĀ String so we used this function to generate one when we are passing aĀ cursor to the frontend.
func encodeCursor(t time.Time, uuid string) string {
key := fmt.Sprintf("%s,%s", t.Format(time.RFC3339), uuid)
return base64.StdEncoding.EncodeToString([]byte(key))
}
and this one to decode theĀ cursor passed from the frontend to the backend:
func decodeCursor(encodedCursor string) (timestamp time.Time, uuid string, err error) {
byt, err := base64.StdEncoding.DecodeString(encodedCursor)
if err != nil {
return
}
arrStr := strings.Split(string(byt), ",")
if len(arrStr) != 2 {
err = e.New("cursor is invalid")
return
}
timestamp, err = time.Parse(time.RFC3339, arrStr[0])
if err != nil {
return
}
uuid = arrStr[1]
return timestamp, uuid, err
}
So when the frontend says āgive me the next of logsĀ after this cursorā, we can decode the timestamp and UUID and pass it into thisĀ WHERE clause:
WHERE Timestamp <= time
AND (Timestamp < time OR UUID < uuid)
ORDER BY Timestamp DESC, UUID DESC
TheĀ AND handles when two logs share the same timestamp meaning that making the same request multiple times will result in the same order of results.
Further reading
Thereās a lot more to pagination and we encourage readers to see theĀ full PR that goes into more detail if you are doing something similar with Clickhouse and weĀ followed up with handling backwards pagination and permalinking (fun stuff with subselects and unions).
Filtering
MostĀ loggingĀ apps have complex languages where 99% of the functionality is unused.Ā Ours is intentionally simple. Given a log such as:
logger.info('Hello world', {
table: 'users',
}),
Hello world is the log message,Ā info is the log level, andĀ table:custom is an arbitrary key/value pair.
Searching common fields like the logās level (e.g.Ā level:info ) maps to a top level column in our Clickhouse table:
SELECT * FROM logs
WHERE ProjectId = 1
AND Timestamp <= now()
AND Timestamp >= (now() - toIntervalDay(1))
WHERE SeverityText = 'info'
Searching custom attributes (e.g.Ā table:custom) queries against a ClickhouseĀ map column which allows for a flexible schema-less storage.
SELECT * FROM logs
WHERE ProjectId = 1
AND Timestamp <= now()
AND Timestamp >= (now() - toIntervalDay(1))
WHERE LogAttributes['table'] = 'custom'
Log message filtering
Ideally logs are filtered by some type of key/value pair but most people are going to just search for a logās message (particularly those who have small apps).
We store log messages in a column calledĀ Body and it has aĀ tokenbf_v1Ā skip index defined as:
INDEX idx_body Body TYPE tokenbf_v1(32768, 3, 0) GRANULARITY 1
Which means that it stores a string such as āHello worldā as tokensĀ ["Hello", "world"]. Hence, to effectively use that index, searching against this requires us to split up the string passed in:
SELECT * FROM logs
WHERE ProjectId = 1
AND Timestamp <= now()
AND Timestamp >= (now() - toIntervalDay(1))
WHERE hasToken(Body, 'Hello') AND hasToken(Body, 'world')
Weāve also noticed performance gains by manually overridingĀ PREWHERE for these types of searches. Check out thisĀ pull request if you want a deeper dive withĀ EXPLAIN .
Wiring this up in Go
Much of this blog post has focused on Clickhouse, but we also wanted to give some attention to how we built this with Go, our language of choice at Highlight. And while we love Go, we do know it can be difficult to sift through the many libraries out there. With that being said, hereās a speed run of how we use Go with Clickhouse along with pull request examples.
Connecting to Clickhouse
Clickhouse provides anĀ official client for connecting to a Clickhouse database. We recommend this middle ground over low level tooling likeĀ ch-go and high level ORM tooling likeĀ go-clickhouse thatāll just get in your way.
https:/
Migrations
TheĀ migrate tool has multiple database support (Clickhouse being one of them). Migrations are automatically run when our backend spins up.
https:/
Testing
We have a dedicated test database for Clickhouse that is spun up in each of our tests
func TestMain(m *testing.M) {
_, err := setupClickhouseTestDB()
if err != nil {
panic("Failed to setup clickhouse test database")
}
code := m.Run()
os.Exit(code)
}
and each test handles setting up a test client along with cleaning up the data:
func setupTest(tb testing.TB) (*Client, func(tb testing.TB)) {
client, _ := NewClient(TestDatabase)
return client, func(tb testing.TB) {
err := client.conn.Exec(context.Background(), "TRUNCATE TABLE logs")
assert.NoError(tb, err)
}
}
func TestBatchWriteLogRows(t *testing.T) {
ctx := context.Background()
client, teardown := setupTest(t)
defer teardown(t)
// some code that uses `ctx` and `client`
}
SeeĀ logs_test.go for all things we found useful to test.
Query building
Much of what weāve talked about thatās Clickhouse related is just an SQL snippet. In a real application, this is only half the battle. TheĀ WHERE part of our SQL is going to constantly change given how a user wants to query their logs.
Fortunately, because Clickhouse uses familiar SQL syntax, there are plenty of SQL builders in Go. Initially we went withĀ squirrel but ended up pivoting to the aptly namedĀ go-sqlbuilder because unlike most SQL builders, it has support for when querying gets more complex (like subqueries) while still ensuring code is safe from SQL injections.
All of our logic for querying (and writing) to Clickhouse can be found inĀ logs.go.
Summary
In closing, Clickhouse has been a pleasure to work with. Despite no one having any prior knowledge, we were able to pick it up quickly. This is in large part to the great documentation and the Clickhouse community (support team especially). We are already considering using Clickhouse for new feature development and replacing other parts of our stack (namely OpenSearch and InfluxDB).
Thereās still a lot moreĀ tuning weād like to do with Clickhouse as we scale out more. If youād like to learn more about how we use Clickhouse at Highlight, feel free to ask us anything in ourĀ discord.
Other articles you may like
Feb 18, 2023 ā¢ 17 min. read