Functions

and

The and function evaluates its two parameters and returns true if both parameters return true.

Parameters:

Returns:

boolean

Signature:

and(boolean, boolean)

In context example:

<cube name="customer_hits" groupby="$cust_id" aggregate="sum(1)" where="and(notNull($cust_id), gt($cost, 0))" />

In the example above the and function is used in the where clause to filter records aggregated by a cube. It's first parameter, notNull($cust_id'), returns true if the cust_id column is not null. The second parameter, gt($cost, 0), returns true if the cost column is greater than 0.

See also:

or | not | gt | notNull

contains

The contains function returns true if its first parameter contains the bytes in it's second parameter.

Parameters:

Returns:

boolean

Signature:

contains(bytes, bytes)

In context example:

<cube name="hits" groupby="$cust_id" aggregate="sum(1)" where="contains($search_terms, 'speed reading')" />

In the example above the contains function is used in the where clause to filter results aggregated by a cube. The contains function above returns true if the bytes in the search_terms column contains the string 'speed reading'.

See also:

streq

counter

The counter function maintains the state of a numeric counter in a transaction property. It is one of the four transaction property tracking functions: val, list, set and counter.

Parameters:

Returns:

a transaction property.

Signature:

counter(long, boolean)

In context example:

<aggregators>

      <aggregator grain="YYYYMMDD()">

         <transaction  key="$session_id"  time="$rec_time" timeout="1200">

             <properties>
                <property name="cust_id" value="val($cust_id, false)"/>
                <property name="cost"    value="counter($cost, true)"/>
             </properties>

             <cube name="customer_session" groupby="@cust_id" aggregate="sum(1)" where="gt(@cost, 1000)"/>

         </transaction>

      </aggregator>

</aggregators>

In the example above, the counter function is maintaining a counter for the cost transaction property. The cube is using the cost transaction property as part of the where clause.

This example cube is counting the number of transactions for each customer where the cost of the transaction is greater than 1000.

See section 4.2.6 Sessions & Transactions of the User Guide for full details on how the counter function is used during transaction tracking.

See also:

val | list | set

data

The data function returns the data from an indexed column of a load unit. This function is used to provide access to data in the load units during the load.

Parameter:

Returns:

int, long, bytes depending on the context of the call.

Signature:

data(0)

In context example:

<load name="cust_id" value="data(1)" />

In the example above, the data function is being used in a load directive. In this example the data function is returning the data from column 2 of a load unit. It will automatically return the data as an int, long or bytes depending how it is requested.

dictionary

The dictionary function returns a Dynamic Dictionary used for on-the-fly surrogate key replacement during the load. Dynamic Dictionaries are built automatically as data is loaded into a column. When new text values are added to the dictionary, a key is automatically created for the text by incrementing the dictionaries internal counter. Dynamic Dictionaries persist their text values and integer keys in on-disk indexes, which are stored within the same partition as the fact table they were assigned to.

Parameters:

Returns:

Dynamic Dictionary

Signature:

dictionary(string)

In context example:

<column name="url" type="int" size="2" dictionary="cache(dictionary('urls'), 30)"/>

In the example above, a dictionary named urls is assigned to the url fact table column. This dictionary will automatically replace the text data loaded into the url column with an integer key. The dictionary is wrapped in a cache to provide high performance access.

See section 2.4.1 Working With Dictionaries of the User Guide for more information on Dynamic Dictionaries.

See also:

cache | resource

duo

The duo function creates a two dimensional grouping key for the groupby attribute of a cube.

Parameters:

Returns:

key, wrapping its two parameter keys.

Signature:

duo(key, key)

In context example:

<cube name="url_bytes" groupby="duo($cust_id, $url)" aggregate="sum(1)"/>

In the example above, the duo function returns a two dimensional grouping key by wrapping the $cust_id and $url fact table columns.

sum | trio | quad

eq

The eq function returns true if its first parameter is numerically equal to it's second parameter.

Parameters:

Returns:

boolean

Signature:

eq(long, long)

In context example:

<cube name="cust" groupby="$cust_id" aggregate="sum(1)" where="eq($cost, 1000)" />

In the example above the eq function is used in a where clause to filter results that are aggregated by a cube. The eq function above returns true if the cost column equals to 1000.

See also:

gt | lt | | streq

gt

The gt function returns true if it's first parameter is greater then it's second parameter.

Parameters:

Returns:

boolean

Signature:

gt(long, long)

In context example:

<cube name="higher_cost_urls" groupby="$url" aggregate="sum(1)" where="gt($cost, 200)" />

In the example above the gt function is used in the where clause to filter results that are aggregated by a cube. The gt function above returns true if the cost column is greater the 200.

See also:

lt | eq

hourly

The hourly function is used to for creating hourly aggregator time grains. .

Parameters:

Returns:

The hourly long key.

Signature:

hourly(int, int)

In context example:

<aggregator grain="hourly(YYYYMMDD(), $hour)">>

In the example above the hourly function fuses the result of YYYYMMDD() and $hour to create single long key. The fused key will create an hourly time grain.

ip

The ip function returns an Algorithmic Dictionary that converts an IP address to a 4 byte int and back again.

Parameters:None

Returns:

An ip dictionary

Signature:

ip()

In context example:

<column name="ip" type="int" size="3" dictionary="ip()"/>

In the example above, an ip dictionary is being assigned to the ip column in a fact table. This dictionary will automatically replace ip addresses with a 4 byte integer as data is loaded in this column.

See section 2.4.1 Working With Dictionaries of the User Guide for more information on the ip dictionary.

list

The list function mantains a list of values in a transaction property. It is one of the four transaction property tracking functions: val, list, set and counter.

Parameters:

Returns:

a transaction property.

Signature:

list(key, int, boolean, boolean)

In context example:

<aggregators>

      <aggregator grain="YYYYMMDD()">

         <transaction  key="$session_id"  time="$rec_time" timeout="1200">

             <properties>
                <property name="path"    value="list($url, 5, false, true)"/>
             </properties>

             <cube name="customer_session" groupby="@path" aggregate="sum(1)"/>

         </transaction>

      </aggregator>

</aggregators>

In the example able above, the list function is maintaining a list of the url's accessed in the path transaction property. The cube is using the path transaction property to group by. The group by key returned by the path property, contains the list of urls gathered in a transaction as a single key for grouping.

See section 4.2.6 Sessions & Transactions of the User Guide for full details on how the list function is used during transaction tracking.

See also:

val | counter | set

lt

The lt function returns true if its first parameter is less than its second parameter.

Parameters:

Returns:

boolean

Signature:

lt(long, long)

In context example:

<cube name="cust" groupby="$cust_id" aggregate="sum(1)" where="lt($cost, 1000)" />

In the example above the lt function is used in a where clause to filter results that are aggregated by a cube. In the example above the lt function returns true if the cost column is greater than 1000.

See also:

gt | eq

not

The not function wraps a function that returns a boolean and returns the opposite boolean value.

Parameters:

Returns:

boolean

Signature:

not(boolean)

In context example:

<cube name="hits" groupby="$search_terms" aggregate="sum(1)" where="not(contains($search_terms, 'test'))" />

In the example above, the not function is used in a where clause to filter records aggregated by a cube. The not function is negating the result of the contains function.

notNull

The notNull function returns true if its parameter is not null. It is used to check if a column of the data type bytes is null.

Parameters:

Returns:

true if it's parameter is not null. If it is null it returns false.

Signature:

notNull(key)

In context example:

<cube name="hits;" groupby="$cust_id" aggregate="sum(1)" where="notNull($search_term)" />

In the example above, the notNull function is used in the where clause of the cube to filter records. The notNull function above returns true if the search_term column is not null.

or

The or function returns true if either of its parameters returns true.

Parameters:

Returns:

boolean

Signature:

or(boolean, boolean)

In context example:

<cube name="hits;" groupby="$url" aggregate="sum(1)" where="or(notNull($cust_id), gt($cost, 0))" />

In the example above the or function is used in the where clause to filter records aggregated by a cube. In this example or will return true if the cust_id column is not null or the cost column is greater than 0.

See also:

and | not

quad

The quad function creates a 4 dimensional groupby key for a cube.

Parameters:

Returns:

key, wrapping it's four parameter keys.

Signature:

quad(key, key, key, key)

In context example:

<cube name="url_bytes" groupby="quad($cust_id, $url, $ip, $action)" aggregate="sum(), 1)"/>

In the example above the quad function returns a four dimensional groupby key by wrapping the cust_id, url, ip and action columns.

duo | trio

resource

The resource function returns a read only, pre-built Resource Dictionary used for on-the-fly surrogate key replacement during the load. Resource Dictionaries are built from key/value pair files located in $AMBERSTONE_HOME/resources. Resource Dictionaries are used for fact table columns where all the text values of a column are known in advance. As data is loaded into a fact table column, a Resource Dictionary will automatically replace text values with their corresponding integer key found in it's pre-built dictionary.

Parameters:

Returns:

resource dictionary

Signature:

resource(string)

In context example:

<column name="action" type="int" size="2" dictionary="cache(resource('actions'), 30)"/>

In the example above, a resource dictionary named actions is assigned to the action column in the fact table. This resource dictionary will automatically replace the text data loaded into the action column with an integer key. The resource dictionary is wrapped in a cache to provide high performance access.

See section 2.4.1 Working With Dictionaries of the User Guide for more information on Resource Dictionaries.

See also:

cache | dictionary

scale

The scale function is used in load directives to transform a floating point number to an int or long by moving the decimal a specified number of places to the right. Precision further to the right is dropped.

Parameters:

Returns:

Either an int or long depending on the context.

Signatures:

scale(float, int)
scale(float, int, boolean)

In context example:

<load name="cost" value="scale(data(2), 4)" />

In the example load directive above, the scale function is scaling the return value from the data(2) function by moving the decimal 4 places to the right. The value is being assigned to the cost column. If the cost column is an int, the scale function returns an int, if its a long, it will return a long.

See also:

data

select

The select function is used to select a set of columns to read or extract from a fact table.

Parameters:

Returns:

result set

Signature:

select(column...)

In context example:

read sample "select($cust_id, $url)" 19980605 19980607

In the example above, the select function returns a result set with the cust_id and url columns.

set

The set function maintains a set of values in a transaction property. It is similar to the list function except that it keeps a list of unique values. It is one of the four transaction property tracking functions: val, list, set and counter.

Parameters:

Returns:

a transaction property.

Signature:

list(key, int, boolean, boolean)

In context example:

<aggregators>

      <aggregator grain="YYYYMMDD()">

         <transaction  key="$session_id"  time="$rec_time" timeout="1200">

             <properties>
                <property name="path"    value="set($url, 5, false, true)"/>
             </properties>

             <cube name="customer_session" groupby="@path" aggregate="sum(1)"/>

         </transaction>

      </aggregator>

</aggregators>

In the example above, the set function is maintaining a set of the unique url's accessed in the path transaction property. The cube is using the path transaction property to group by. The @path variable refers to the path transaction property, which contains a unique set of urls in the transaction. The cube above is keeping a count of all the different paths in the transaction.

See section 4.2.6 Sessions & Transactions of the User Guide for full details on how the set function is used during transaction tracking.

See also:

val | counter | list

slice

The slice function creates a sub array of bytes from a byte array. The slice function returns the sub array as bytes, or a parsed int or long depending on the context that its called.

Parameters:

Returns:

bytes, int or long depending on the context. If it returns an int or long, the numbers are parsed from the ascii data in the slice.

Signature:

slice(bytes, int, int)

In context example:

<load name="hour" value="slice(data(3), 11, 2)" />

In the example above, the slice function creates a sub array from the bytes returned by the data(3) function call. The sub array starts at the offset index 11 and is 2 bytes long.

streq

The streq function performs a string comparison on its two parameters and returns true if they are equal.

Parameters:

Returns:

boolean

Signature:

string(bytes, bytes)

In context example:

<cube name="hits;" groupby="$cust_id" aggregate="sum(1)" where="streq($search_terms, 'speed reading')" />

In the example above, the streq function is used in a where clause to filter results aggregated by a cube. The streq function above returns true if the bytes in the search_terms column equal 'speed reading'.

See also:

contains | eq

sum

The sum function is an aggregate function used in the cube definition. It maintains up to three summations for a set of grouping keys.

Parameters:

Returns:

void

Signature:

sum(long, [long], [long])

In context example:

<cube name="hits;" groupby="$cust_id" aggregate="sum($cost)" />

In the example above, the sum function is summing the cost column of a fact table. This cube will maintain a separate sum for each of the unique values of the groupby column, cust_id.

trio

The trio function creates a three dimensional groupby key for a cube.

Parameters:

Returns:

key, wrapping its three parameter keys.

Signature:

trio(key, key, key)

In context example:

<cube name="url_bytes" groupby="trio($cust_id, $url, $ip))" aggregate="sum(1)"/>

In the example above, the trio function returns a three dimensional groupby key by wrapping cust_id, url and ip.

duo | quad

unixTime

The unixTime function parses a text timestamp and returns an int value representing Unix Time. It is used for filtering records based on time.

Parameters:

Returns:

int, representing Unix Time.

Signature:

unixTime(string)

In context example:

<cube name="time_filter" groupby="$cust_id" aggregate="sum(1)" where="eq(unixTime('06/12/2012:03:12:12'), $rec_time)"/>

In the example above, the unixTime function is used in the where clause to limit the aggregation to include only records where the rec_time column equals '06/12/2012:03:12:12'. The rec_time column must be a time data type for the comparison to work.

eq

val

The val function mantains the state of a single value in a transaction property. It is one of the four transaction property tracking functions: val, list, set and counter.

Parameters:

Returns:

a transaction property.

Signature:

val(key, boolean)

In context example:

<aggregators>

      <aggregator grain="YYYYMMDD()">

         <transaction  key="$session_id"  time="$rec_time" timeout="1200">

             <properties>
                <property name="cust_id" value="val($cust_id, false)"/>
             </properties>

             <cube name="customer_session" groupby="@cust_id" aggregate="sum(1)""/>

         </transaction>

      </aggregator>

</aggregators>

In the example above the val function is maintaining the value of the cust_id in the transaction property named cust_id. The cube is then grouping by the cust_id transaction property. This example is counting the number of transactions for each customer.

See section 4.2.6 Sessions & Transactions of the User Guide for full details on how the val function is used during transaction tracking.

See also:

counter | list | set

YYYYMMDD

The YYYYMMDD function returns a grouping key for an aggregator time grain. It returns a 4 byte int in the format YYYYMMDD, representing the year (4 digit), month (2 digit) and day_of_month (2 digit).

Parameters:None

Returns:

int in the format YYYYMMDD

Signature:

YYYYMMDD()

In context example:

<aggregator grain="YYYYMMDD()">

In the example above, the YYYYMMDD function returns the time grain grouping key.

See also:

YYYYWWD

YYYYWWD

The YYYYWWD function returns a grouping key for an aggregator time grain. It returns a 4 byte int in the format YYYYWWD, representing the year (4 digit), week (2 digit) and day_of_week (1 digit).

Parameters:None

Returns:

int in the format YYYYWWDD

Signature:

YYYYWWD()

In context example:

<aggregator grain="YYYYWWD()">

In the example above, the YYYYWWD function returns the time grain grouping key.

See also:

YYYYMMDD