Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Type

Examples

Notes

time

{"time_column_sec" : 1448933490}

{"time_column_milli" : 1448933490000}

{"time_column_micro" : 1448933490000000}

{"time_column_with_format" : "2015-11-30 08:09:12"}

Dates can either be epoch time (seconds, milliseconds, or microseconds) or strptime formatted dates (with custom format).

For a given column, the date format must be consistent in every event.

int

{"int_column" : 12345}

You can filter an int column for exact match or use the standard compare operators.

You can aggregate int columns (min, max, avg, median, 50%, 75%, 90%, 95%, 99%).

You can group by int columns. Configure this in the Settings tab.

We support ints up to 53 bits.

Make sure that integers are unquoted (12345) and strings are quoted ("12345" or hex values like "ABC42"). Otherwise, this can cause Scuba to incorrectly treat a column as an integer when it would have been more appropriate as a string. Conversely, sometimes an unquoted integer value in JSON should be treated as a string (like a ZIP code) because it is groupable data, rather than a number on which to perform mathematical operations.

decimal

{"decimal_column" : 12345.98}

Decimal values have the same properties as int values.

We support decimals up to 47 bits.

dollars

{"dollar_column" : "$12,345.98"}

Scuba auto-detects numeric strings with dollar signs ($) as decimal columns. 

For example, {"purchase_dollars": "$13,424.35"} would generate an int column that you can use just like any other decimal (with Sum, Avg, etc.).

string

{"string_column" : "hello"}

You can filter a string column for exact match, starts with, ends with, or contains.

You can group by string columns.

We autocomplete most string columns when you type them into the UI (in a filter, for example).

int_set

{"int_set_column" : [12345, 245, 99834]}

Int set columns are loaded from JSON arrays, but they have unordered set semantics (contains, does not contain).

string_set

{"string_set_column" : ["hello", "goodbye", "nice", "to", "see", "you"]}

String set columns are loaded from JSON arrays, but they have unordered set semantics (contains, does not contain).

Scuba will import a list of strings into a column of type "set." You can ask two questions of this column at query time: does the set contain a particular string, or does it not contain a particular string. Creating this type of column is a best practice when you want to do A/B testing (tagging each row with which test groups it's in). 

This is imported as a set, which means that it’s not ordered (versus a list, which would be ordered).

For example:

"tags":["fun","sports","beach"]

JSON object

{"column": {"a": 1, "b": "xxx"}}

Scuba will "flatten" a nested object using a dot-separated notation. The output would be:

{"column.a": 1, "column.b": "xxx"}

Array of JSON objects

{"column": [{"a": 1,"b": "zzz"}, {"a": 2,"b": "yyy"}]}

The data import process will "shred" a nested array to produce multiple arrays of simple strings (as opposed to a single array of objects). The intermediate output would be:

{"column": {"a": [1,2], "b": ["zzz","yyy"]}}

Based on the "flatten" logic above, the final output would be:

{"column.a": [1,2], "column.b": ["zzz","yyy"]}

Identifiers

{"hex32_column" : "A2439GF88EA12315A2487GF88EA12312"}

{"hex32_column" : "e41249ed-2398-4c29-a6fa-ee81116dd302"}

While Scuba does support identifiers, we typically apply special handling to them before importing into the system. 

Contact your Scuba representative before sending new identifiers into the system.

You can filter or group by hex32 columns.

URL

{"url_column" : "https://www.mywebsite.com/landing/blue/"}

URL columns are split into multiple columns (domain, path, filename, etc.), separated at each "/" character. These columns are easier to manipulate separately.

IP address

{"ip_column" : "127.0.0.1"}

IP address columns are parsed via a geoIP lookup to generate additional geographic information columns such as country and city.

User agent

{"user_agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_0)"}

User agent columns are split into multiple columns.

For web browser user agent strings, Scuba will add user-friendly columns for browser, platform, etc.

Working with large string data

...