Export raw daily visit data

The Export API exports a raw list of visits as line-separated JSON visit objects. You request the files by the log day in the America/Pacific time zone.

Line Types

Each line in the response is a single contained JSON object. You'll find a few different types of objects available for export. The type of object is available on the "type" key in the JSON object on each line.

Visit Lines

Visit lines represent a single session/visit for a single profile. They contain visit-level metadata, a visitor object with any updated properties for that visitor during this session, as well as an array of actions that occurred in this visit, each has its own event name, timestamp, properties, etc.

The visit also has a profile id ("pid") on which to include this visit and all of its actions. The profile id is an ephemeral randomly-generated UUID for a profile representing one person--do not rely on it to stay the same for the same person through time (see deletes below).

To process a visit line, find or create a profile in your own database with the given pid, set the keys and values from the visitor object on to this profile, then add all of the actions in the actions array to this profile.

Delete Lines

Probably the most important thing to note about this API is that, because it closely mirrors our own home-rolled, transactional database, it will include DELETE lines that are intended to nullify all previous visits for a given profile.

After a DELETE line, these visits, their actions, and all visitor data are re-logged (with the original timestamps) under a new profile id.

Delete lines are marked by type: "delete"

The most common reason for a delete is that two profiles have been identified as the same individual, and are now going to be deleted and re-logged under the same profile.

To process a delete line, find the profile in your database that you created for that pid, and delete it. It's scary, we know, but the coming lines will contain all the data you just deleted, and likely more. The delete line will also contain a new_id with the new pid that will be used to re-log these actions and any further ones for this profile, along with any actions from other merged profiles, as well as the new action that caused the merge.

Deletes? (yes, deletes).

Deleting in a transactional database is usually done by writing a delete directive. In practice, this should never result in data being erased. Rather, the data processing system reads the delete line as a directive to ignore a particular piece of data that was logged in the past. whenever reading relevant data after the delete. There are many reasons for this system in a transactional database, and Wikipedia or Stack Overflow will surely help figure them out.

So why are there so many delete actions in our particular case? Well, there are often anonymous users who come back multiple times to interact with your site, product, or in other ways but are never identified. This would result in each of these sessions being attributed to its own pid.

Consider this example:

DELETES EXAMPLE:

Let's say a user has three anonymous profiles--one from her desktop two weeks ago when she first discovered your website, one from her mobile phone where she clicked an advertisement a few days ago, and one on her work laptop a few minutes ago. She decides to create an account and identify herself today.

On her work laptop, she submits her information on your CTA form and gives you her email address. You, of course, forward that email address to Woopra to put on her profile. The export logs would now see a DELETE for the old anonymous profile id and a new one would be created that is mapped to this email address AND to the cookie on her work laptop.

So now let's say this evening, the user in question is on the train home and decides to sign in with her new email/account from her phone. When she does this, the profile that had pointed to the phone's cookie will get a DELETE, and all of those actions will be re-logged onto the new pid that was generated when she gave us her email earlier today.

This is called a profile merge, and you can learn more about these concepts as they pertain to visitor identifiers in the Woopra's Profile ID System document. The same thing will happen again when she logs in from her desktop.

The database sometimes seems to have a mind of its own when it comes to deleting. For instance, the first identify in the previous example may not have a delete after it, but just a new visitor property added to the visitor object in a "type: visit" line. Similarly, often a merge event will lead to both profiles being deleted and re-logged under a third profile.

There is no simple way to predict exact behavior, but your data is guaranteed to be consistent if you follow the rules of deleting when the system tells you to and trust that you will get the data re-logged.