Woopra's Profile ID System

Woopra has an advanced ID mapping system that allows it to use many different fields to uniquely refer to an individual profile. The Basic Concepts section will introduce some conventional database concepts and Woopra's challenges. Then we will dive into how Woopra's ID system works and how to make the best use of it.

Basic Concepts and Problems

Unique Identifiers

In the realm of database systems, a unique identifier is a field on database entities, the particular value of which is guaranteed to uniquely refer to a single entity in that database. If you think of a database as a table of rows and columns, each column is a field, and each row is a particular entity in the table. The unique ID is a column that holds a value that is different for each row and can be used to unequivocally refer to that single row. In Woopra's profile database system, you can think of a row as a single profile.

A unique identifier is special in that only one database entity may have a given id value, so it can be used to find one specific entity.

Primary Unique Identifiers

In SQL databases, the concept of a Primary Key refers to the idea of a unique identifier. Primary keys refer to a single SQL row and are guaranteed to be unique for each row, and every row is guaranteed to have one.

The ID Field, identifiers, and cv_id: a quick glossarial note

There is some necessary but unfortunate equivocation in the terminology we use here: ID. There is the concept of an identifier, which could be a database ID field or could be some other identifying field like an email address or even a cookie. Then there is the value of one of these identifiers, which might be referred to as a person's or profile's ID, email ID, device ID (i.e. cookie), etc.

Then there is the actual ID field per se, which is the highest-order identifier in the Woopra ID hierarchy. Due to how this ID field is sent in the tracking SDK code, we sometimes refer to this highest order Woopra identifier as cv_id.

If you find any places in the text where the context cannot help you overcome the equivocation, please let us know by clicking "Suggest an Edit" above and letting us know what needs clarification.

Woopra's Challenges

The Woopra system faces a number of challenges that traditional SQL database concepts do not help solve and often even actively stand in the way of resolving. We will limit this discussion to the problem of uniquely identifying tracked person profiles in Woopra.

The Woopra system needs to be able to tell which person profile performed an incoming tracked event (or property update.) The problem is that people in Woopra can exist at a number of different levels of being identified. They could be a first-time anonymous user to a website or a long-time paying customer.

Sometimes a person will make a few visits to your site anonymously over a year leading up to the time they decide to sign up for your newsletter, giving you their email. Sometimes this can even mean that what was previously considered to be two different people in Woopra is now known to be a single person--perhaps originally from two devices--requiring a merge of the two profiles.

In the traditional database world, merging two rows with different identifiers is a messy business (Which unique primary identifier is kept? What if a database user asks for the row with an ID that was removed?) Additionally, because every row must have an ID, you cannot wait until you know that all of a single person's events and traits are in the one database row that represents them.

Another issue is that if you want to track anonymous behavior and even attribute it to known people in the future as they identify themselves to you--a key value proposition in the Woopra system--then using a single id value per person becomes more complex.

Similarly, if you want to track behavior across channels--another key value proposition of Woopra--then it is basically impossible to maintain the database ID for the profile between your website and, say, your email marketing automation service.

These and other more nuanced issues make this problem of identifiers significant in the Woopra System. Woopra solves this problem by dedicating an entire sub-system to managing identifiers.

The Goal

Woopra needs to be able to take whatever information is available about a person performing the event in an incoming track request and use it to determine, with the highest accuracy possible, which other events this person has performed and, thus, to which profile the incoming events belong.

If a user is coming anonymously to your website, all you have is a cookie, which is, conceptually, a device ID pointing to that browser on that machine. It will be the same next time the person visits your site from that machine and that browser, but if they visit from a different browser or on their phone, you will have a new cookie. So, Woopra needs to be able to use multiple cookies to eventually refer to one person, assuming that one day you find out who that person is and can associate all their devices with them.

Similarly, you may have an incoming "Email Sent" event from your email marketing tool that is not from a browser and has no cookie. This event has an email address--another major identifier. Woopra needs to eventually (when the person signs in with that email address on their browser with that cookie they had in the past) be able to consider the events performed by cookie 1, cookie 2, and email 1 all to belong to the same profile.

Woopra's Multi-ID System

Woopra's profile ID system adds a dimension to traditional database identifiers by allowing multiple different identifier fields. The ID fields exist in a hierarchy and are stored in their own database that associates or "maps" person profiles to the values of various identifiers in the ID hierarchy that have been given to a person.

(Aside) Mapping

In Computer Science, a Map or Mapping (or hash map) is a data structure that associates values. Again you can think of a map as a two-column table, with each row containing values considered to be related. Similarly to how a geographical map can help you get from point A to point B, Woopra's data structure map of identifiers can help you get from one identifier, say a person's email to another, like a browser cookie.

Identifiers in Woopra

When it comes down to it, an identifier is a user property with some special behavior. There are a number of different identifiers that make sense to use in the context of customer data. Woopra has a few of these pre-defined in all Woopra instances. Here they are in increasing position in the ID hierarchy:

  • Browser cookie --The Lowest order id.
  • Any Custom Identifiers ...
  • Email Address
  • ID (cv_id, external database id)

While the above values are built-in to the Woopra system, Woopra is not limited to a predetermined set of Identifiers. In fact, enterprise users can define custom unique identifiers as well. For instance, this allows you to use identifiers from external contact management systems.

Also, some integrations, for example, those that send phone events to your Woopra (Sonar, Bellgram, Ringostat, Routee) will create custom identifiers for you, like a phone number. This way, when a track request comes in saying that a person with phone number 239-4567 sent a text to your support team, the Woopra system can determine to which profile this event belongs.

The Hierarchy

As mentioned above, Woopra maintains a hierarchy of multiple identifiers and has an entire sub-system dedicated to maintaining the mappings between them. The hierarchy itself determines which identifier takes precedence in determining which profile is referenced.

Assigning Events

The first and most obvious thing that the ID system needs to do is to determine the profile on which a newly tracked event should go. A tracking event must come in with at least one user identifier, or else it will be dropped on the grounds that Woopra cannot know who did the event. This identifier is submitted to the ID system and the ID system returns the profile to which that event logically belongs.

If there are multiple identifiers on a track request, two things happen. First, Woopra selects the highest-order identifier from the ID hierarchy and submits it to the ID system to find the appropriate profile as normal. Second, Woopra tells the ID system that these two identifiers should point to the same profile, and a cascade of updates in the system are possible.

First, the two IDs are mapped to the ID system. So, for instance, the system notes that cookie abcd123 and email [email protected] belong to the same user and should point to the same profile. From now on, a tracked event with either one of these IDs--whether or not the other ID is present--will go to this profile.

Then, if this mapping is new, the system looks to see if each ID previously pointed to its own separate profile. If so, a profile merge occurs. This can be a very complex process and it is not really reversible.