Functions
- and
- contains
- counter
- data
- dictionary
- duo
- eq
- gt
- hourly
- ip
- list
- lt
- not
- notNull
- or
- quad
- resource
- scale
- select
- set
- slice
- streq
- sum
- trio
- unixTime
- val
- YYYYMMDD
- YYYYWWD
and
The and function evaluates its two parameters and returns true if both parameters return true. Parameters:Returns:boolean Signature: and(boolean, boolean)
- boolean: Parameter 1 must evaluate to a boolean.
- boolean: Parameter 2 must evaluate to a boolean.
In context example: <cube name="customer_hits" groupby="$cust_id" aggregate="sum(1)" where="and(notNull($cust_id), gt($cost, 0))" /> In the example above the and function is used in the where clause to filter records aggregated by a cube. It's first parameter, notNull($cust_id'), returns true if the cust_id column is not null. The second parameter, gt($cost, 0), returns true if the cost column is greater than 0. See also: or | not | gt | notNull
contains
The contains function returns true if its first parameter contains the bytes in it's second parameter. Parameters:Returns:boolean Signature: contains(bytes, bytes)
- bytes: Parameter 1 must evaluate to an array of bytes that represent a string. This is the string that is being searched.
- bytes: Parameter 2 must evaluate to an array of bytes that represent a string. This is the string that is being searched for.
In context example: <cube name="hits" groupby="$cust_id" aggregate="sum(1)" where="contains($search_terms, 'speed reading')" /> In the example above the contains function is used in the where clause to filter results aggregated by a cube. The contains function above returns true if the bytes in the search_terms column contains the string 'speed reading'. See also: streq
counter
The counter function maintains the state of a numeric counter in a transaction property. It is one of the four transaction property tracking functions: val, list, set and counter. Parameters:Returns: a transaction property. Signature: counter(long, boolean)
- long: Parameter 1 must evaluate to a long, which is to be added to the counter each time it is updated.
- boolean: Parameter 2 must evaluate to either true or false. If it evaluates to true the counter is updated.
In context example:<aggregators> <aggregator grain="YYYYMMDD()"> <transaction key="$session_id" time="$rec_time" timeout="1200"> <properties> <property name="cust_id" value="val($cust_id, false)"/> <property name="cost" value="counter($cost, true)"/> </properties> <cube name="customer_session" groupby="@cust_id" aggregate="sum(1)" where="gt(@cost, 1000)"/> </transaction> </aggregator> </aggregators>In the example above, the counter function is maintaining a counter for the cost transaction property. The cube is using the cost transaction property as part of the where clause.
This example cube is counting the number of transactions for each customer where the cost of the transaction is greater than 1000. See section 4.2.6 Sessions & Transactions of the User Guide for full details on how the counter function is used during transaction tracking. See also: val | list | set
data
The data function returns the data from an indexed column of a load unit. This function is used to provide access to data in the load units during the load. Parameter:Returns: int, long, bytes depending on the context of the call. Signature: data(0)
- int: A number literal that refers to the indexed column of a load unit. Columns are indexed starting at 0.
In context example: <load name="cust_id" value="data(1)" /> In the example above, the data function is being used in a load directive. In this example the data function is returning the data from column 2 of a load unit. It will automatically return the data as an int, long or bytes depending how it is requested.
dictionary
The dictionary function returns a Dynamic Dictionary used for on-the-fly surrogate key replacement during the load. Dynamic Dictionaries are built automatically as data is loaded into a column. When new text values are added to the dictionary, a key is automatically created for the text by incrementing the dictionaries internal counter. Dynamic Dictionaries persist their text values and integer keys in on-disk indexes, which are stored within the same partition as the fact table they were assigned to. Parameters:Returns:Dynamic Dictionary Signature: dictionary(string)
- string: A string literal representing the name of the dictionary.
In context example: <column name="url" type="int" size="2" dictionary="cache(dictionary('urls'), 30)"/> In the example above, a dictionary named urls is assigned to the url fact table column. This dictionary will automatically replace the text data loaded into the url column with an integer key. The dictionary is wrapped in a cache to provide high performance access. See section 2.4.1 Working With Dictionaries of the User Guide for more information on Dynamic Dictionaries. See also: cache | resource
duo
The duo function creates a two dimensional grouping key for the groupby attribute of a cube. Parameters:Returns: key, wrapping its two parameter keys. Signature: duo(key, key)
- key: Parameter 1 must evaluate to a grouping key.
- key: Parameter 2 must evaluate to a grouping key.
In context example: <cube name="url_bytes" groupby="duo($cust_id, $url)" aggregate="sum(1)"/> In the example above, the duo function returns a two dimensional grouping key by wrapping the $cust_id and $url fact table columns. sum | trio | quad
eq
The eq function returns true if its first parameter is numerically equal to it's second parameter. Parameters:Returns:boolean Signature: eq(long, long)
- long: Parameter 1 must evaluate to a long.
- long: Parameter 2 must evaluate to a long.
In context example: <cube name="cust" groupby="$cust_id" aggregate="sum(1)" where="eq($cost, 1000)" /> In the example above the eq function is used in a where clause to filter results that are aggregated by a cube. The eq function above returns true if the cost column equals to 1000. See also: gt | lt | | streq
gt
The gt function returns true if it's first parameter is greater then it's second parameter. Parameters:Returns:boolean Signature: gt(long, long)
- long: Parameter 1 must evaluate to a long.
- long: Parameter 2 must evaluate to a long.
In context example: <cube name="higher_cost_urls" groupby="$url" aggregate="sum(1)" where="gt($cost, 200)" /> In the example above the gt function is used in the where clause to filter results that are aggregated by a cube. The gt function above returns true if the cost column is greater the 200. See also: lt | eq
hourly
The hourly function is used to for creating hourly aggregator time grains. . Parameters:Returns: The hourly long key. Signature: hourly(int, int)
- int: Parameter 1 evaluates to an int. This is one of the daily time grain functions: YYYYMMDD() or YYYYWWD().
- int: Parameter 2 evaluates to an int. This is the column in the fact table where the hour is stored for each record.
In context example: <aggregator grain="hourly(YYYYMMDD(), $hour)">> In the example above the hourly function fuses the result of YYYYMMDD() and $hour to create single long key. The fused key will create an hourly time grain.
ip
The ip function returns an Algorithmic Dictionary that converts an IP address to a 4 byte int and back again. Parameters:None Returns:An ip dictionary Signature: ip()
In context example: <column name="ip" type="int" size="3" dictionary="ip()"/> In the example above, an ip dictionary is being assigned to the ip column in a fact table. This dictionary will automatically replace ip addresses with a 4 byte integer as data is loaded in this column. See section 2.4.1 Working With Dictionaries of the User Guide for more information on the ip dictionary.
list
The list function mantains a list of values in a transaction property. It is one of the four transaction property tracking functions: val, list, set and counter. Parameters:Returns: a transaction property. Signature: list(key, int, boolean, boolean)
- key: Parameter 1 evaluates to a key, which is added to the list each time the transaction is updated.
- int: Parameter 2 is an integer literal that caps the size of the list.
- boolean: Parameter 3 is a boolean literal which stipulates if the list is to be sorted. If this is false, the list is kept in insertion order. If it is true, the list is sorted based on the hash value of its elements. Sorting the list will keep a predictable order if this is needed for aggregation.
- boolean: Parameter 4 evaluates to either true or false. If it evaluates to true the key is added to the list.
In context example:<aggregators> <aggregator grain="YYYYMMDD()"> <transaction key="$session_id" time="$rec_time" timeout="1200"> <properties> <property name="path" value="list($url, 5, false, true)"/> </properties> <cube name="customer_session" groupby="@path" aggregate="sum(1)"/> </transaction> </aggregator> </aggregators>In the example able above, the list function is maintaining a list of the url's accessed in the path transaction property. The cube is using the path transaction property to group by. The group by key returned by the path property, contains the list of urls gathered in a transaction as a single key for grouping.
See section 4.2.6 Sessions & Transactions of the User Guide for full details on how the list function is used during transaction tracking. See also: val | counter | set
lt
The lt function returns true if its first parameter is less than its second parameter. Parameters:Returns:boolean Signature: lt(long, long)
- long: Parameter 1 must evaluate to a long.
- long: Parameter 2 must evaluate to a long.
In context example: <cube name="cust" groupby="$cust_id" aggregate="sum(1)" where="lt($cost, 1000)" /> In the example above the lt function is used in a where clause to filter results that are aggregated by a cube. In the example above the lt function returns true if the cost column is greater than 1000. See also: gt | eq
not
The not function wraps a function that returns a boolean and returns the opposite boolean value. Parameters:Returns:boolean Signature: not(boolean)
- boolean: Parameter 1 evaluates to a boolean.
In context example: <cube name="hits" groupby="$search_terms" aggregate="sum(1)" where="not(contains($search_terms, 'test'))" /> In the example above, the not function is used in a where clause to filter records aggregated by a cube. The not function is negating the result of the contains function.
notNull
The notNull function returns true if its parameter is not null. It is used to check if a column of the data type bytes is null. Parameters:Returns:true if it's parameter is not null. If it is null it returns false. Signature: notNull(key)
- key: Parameter 1 must evaluate to a key that is to be checked for null.
In context example: <cube name="hits;" groupby="$cust_id" aggregate="sum(1)" where="notNull($search_term)" /> In the example above, the notNull function is used in the where clause of the cube to filter records. The notNull function above returns true if the search_term column is not null.
or
The or function returns true if either of its parameters returns true. Parameters:Returns:boolean Signature: or(boolean, boolean)
- boolean: Parameter 1 must evaluate to a boolean.
- boolean: Parameter 2 must evaluate to a boolean.
In context example: <cube name="hits;" groupby="$url" aggregate="sum(1)" where="or(notNull($cust_id), gt($cost, 0))" /> In the example above the or function is used in the where clause to filter records aggregated by a cube. In this example or will return true if the cust_id column is not null or the cost column is greater than 0. See also: and | not
quad
The quad function creates a 4 dimensional groupby key for a cube. Parameters:Returns: key, wrapping it's four parameter keys. Signature: quad(key, key, key, key)
- key: Parameter 1 evaluates to a grouping key.
- key: Parameter 2 evaluates to a grouping key.
- key: Parameter 3 evaluates to a grouping key.
- key: Parameter 4 evaluates to a grouping key.
In context example: <cube name="url_bytes" groupby="quad($cust_id, $url, $ip, $action)" aggregate="sum(), 1)"/> In the example above the quad function returns a four dimensional groupby key by wrapping the cust_id, url, ip and action columns. duo | trio
resource
The resource function returns a read only, pre-built Resource Dictionary used for on-the-fly surrogate key replacement during the load. Resource Dictionaries are built from key/value pair files located in $AMBERSTONE_HOME/resources. Resource Dictionaries are used for fact table columns where all the text values of a column are known in advance. As data is loaded into a fact table column, a Resource Dictionary will automatically replace text values with their corresponding integer key found in it's pre-built dictionary. Parameters:Returns:resource dictionary Signature: resource(string)
- string: A string literal representing the name of the resource dictionary. The resource dictionary will be built from a text file with that name in the $AMBERSTONE_HOME/resources directory.
In context example: <column name="action" type="int" size="2" dictionary="cache(resource('actions'), 30)"/> In the example above, a resource dictionary named actions is assigned to the action column in the fact table. This resource dictionary will automatically replace the text data loaded into the action column with an integer key. The resource dictionary is wrapped in a cache to provide high performance access. See section 2.4.1 Working With Dictionaries of the User Guide for more information on Resource Dictionaries. See also: cache | dictionary
scale
The scale function is used in load directives to transform a floating point number to an int or long by moving the decimal a specified number of places to the right. Precision further to the right is dropped. Parameters:Returns:Either an int or long depending on the context. Signatures: scale(float, int)
- float: Parameter 1 must evaluate to a set of bytes that can be parsed into a float. This could be a string literal such as '100.34' or more likely a call to the data function, which will return the bytes from a column in the load unit.
- int: Parameter 2 is a number literal describing the number of places to move the decimal point right.
- boolean (optional): Parameter 3 is a boolean literal that stipulates whether to round the last digit before dropping precision. Default false.
scale(float, int, boolean) In context example: <load name="cost" value="scale(data(2), 4)" /> In the example load directive above, the scale function is scaling the return value from the data(2) function by moving the decimal 4 places to the right. The value is being assigned to the cost column. If the cost column is an int, the scale function returns an int, if its a long, it will return a long. See also: data
select
The select function is used to select a set of columns to read or extract from a fact table. Parameters:Returns:result set Signature: select(column...)
- column...: A list of columns to read or extract.
In context example: read sample "select($cust_id, $url)" 19980605 19980607 In the example above, the select function returns a result set with the cust_id and url columns.
set
The set function maintains a set of values in a transaction property. It is similar to the list function except that it keeps a list of unique values. It is one of the four transaction property tracking functions: val, list, set and counter. Parameters:Returns: a transaction property. Signature: list(key, int, boolean, boolean)
- key: Parameter 1 evaluates to key, which is added to the set each time the transaction is updated. Only items that are not already in the set are added.
- int: Parameter 2 is an integer literal that caps the size of the set.
- boolean: Parameter 3 is a boolean literal which stipulates if the set is to be sorted. If this is false, the list is kept in insertion order. If it is true, the set is sorted based on the hash value of it's elements. Sorting the set will keep a predictable order if this is needed for aggregation.
- boolean: Parameter 4 evaluates to either true or false. If it evaluates to true, the key is added to the set.
In context example:<aggregators> <aggregator grain="YYYYMMDD()"> <transaction key="$session_id" time="$rec_time" timeout="1200"> <properties> <property name="path" value="set($url, 5, false, true)"/> </properties> <cube name="customer_session" groupby="@path" aggregate="sum(1)"/> </transaction> </aggregator> </aggregators>In the example above, the set function is maintaining a set of the unique url's accessed in the path transaction property. The cube is using the path transaction property to group by. The @path variable refers to the path transaction property, which contains a unique set of urls in the transaction. The cube above is keeping a count of all the different paths in the transaction.
See section 4.2.6 Sessions & Transactions of the User Guide for full details on how the set function is used during transaction tracking. See also: val | counter | list
slice
The slice function creates a sub array of bytes from a byte array. The slice function returns the sub array as bytes, or a parsed int or long depending on the context that its called. Parameters:Returns:bytes, int or long depending on the context. If it returns an int or long, the numbers are parsed from the ascii data in the slice. Signature: slice(bytes, int, int)
- bytes: Parameter 1 evaluates to an array of bytes the sub-array will be taken from.
- int: Parameter 2 is an integer literal that specifies the offset of the sub array.
- int: Parameter 3 is an integer literal that specifies the number of bytes to include in the sub array.
In context example: <load name="hour" value="slice(data(3), 11, 2)" /> In the example above, the slice function creates a sub array from the bytes returned by the data(3) function call. The sub array starts at the offset index 11 and is 2 bytes long.
streq
The streq function performs a string comparison on its two parameters and returns true if they are equal. Parameters:Returns:boolean Signature: string(bytes, bytes)
- bytes: Parameter 1 evaluates to an array of bytes that represent a string.
- bytes: Parameter 2 evaluates to an array of bytes that represent a string.
In context example: <cube name="hits;" groupby="$cust_id" aggregate="sum(1)" where="streq($search_terms, 'speed reading')" /> In the example above, the streq function is used in a where clause to filter results aggregated by a cube. The streq function above returns true if the bytes in the search_terms column equal 'speed reading'. See also: contains | eq
sum
The sum function is an aggregate function used in the cube definition. It maintains up to three summations for a set of grouping keys. Parameters:Returns:void Signature: sum(long, [long], [long])
- long: Parameter 1 evaluates to a long value that is added to the first summation
- long (optional): Parameter 2 evaluates to a long value that is added to the second summation
- long (optional): Parameter 3 evaluates to a long value that is added to the third summation
In context example: <cube name="hits;" groupby="$cust_id" aggregate="sum($cost)" /> In the example above, the sum function is summing the cost column of a fact table. This cube will maintain a separate sum for each of the unique values of the groupby column, cust_id.
trio
The trio function creates a three dimensional groupby key for a cube. Parameters:Returns: key, wrapping its three parameter keys. Signature: trio(key, key, key)
- key: Parameter 1 evaluates to a grouping key.
- key: Parameter 2 evaluates to a grouping key.
- key: Parameter 2 evaluates to a grouping key.
In context example: <cube name="url_bytes" groupby="trio($cust_id, $url, $ip))" aggregate="sum(1)"/> In the example above, the trio function returns a three dimensional groupby key by wrapping cust_id, url and ip. duo | quad
unixTime
The unixTime function parses a text timestamp and returns an int value representing Unix Time. It is used for filtering records based on time. Parameters:Returns: int, representing Unix Time. Signature: unixTime(string)
- string: A string literal timestamp in the following format: MM/DD/YYYY:hh:mm:ss.
In context example: <cube name="time_filter" groupby="$cust_id" aggregate="sum(1)" where="eq(unixTime('06/12/2012:03:12:12'), $rec_time)"/> In the example above, the unixTime function is used in the where clause to limit the aggregation to include only records where the rec_time column equals '06/12/2012:03:12:12'. The rec_time column must be a time data type for the comparison to work. eq
val
The val function mantains the state of a single value in a transaction property. It is one of the four transaction property tracking functions: val, list, set and counter. Parameters:Returns: a transaction property. Signature: val(key, boolean)
- key: Parameter 1 evaluates to a key that the val function stores in the transaction property.
- boolean: Parameter 2 is a boolean literal that determines if the key is overwritten as records are added to the transaction. If the value is true, then the val function will overwrite it's stored key each time the transaction is updated. If it's false the first key that is stored will not be overwritten.
In context example:<aggregators> <aggregator grain="YYYYMMDD()"> <transaction key="$session_id" time="$rec_time" timeout="1200"> <properties> <property name="cust_id" value="val($cust_id, false)"/> </properties> <cube name="customer_session" groupby="@cust_id" aggregate="sum(1)""/> </transaction> </aggregator> </aggregators>In the example above the val function is maintaining the value of the cust_id in the transaction property named cust_id. The cube is then grouping by the cust_id transaction property. This example is counting the number of transactions for each customer.
See section 4.2.6 Sessions & Transactions of the User Guide for full details on how the val function is used during transaction tracking. See also: counter | list | set
YYYYMMDD
The YYYYMMDD function returns a grouping key for an aggregator time grain. It returns a 4 byte int in the format YYYYMMDD, representing the year (4 digit), month (2 digit) and day_of_month (2 digit). Parameters:None Returns:int in the format YYYYMMDD Signature: YYYYMMDD()
In context example: <aggregator grain="YYYYMMDD()"> In the example above, the YYYYMMDD function returns the time grain grouping key. See also: YYYYWWD
YYYYWWD
The YYYYWWD function returns a grouping key for an aggregator time grain. It returns a 4 byte int in the format YYYYWWD, representing the year (4 digit), week (2 digit) and day_of_week (1 digit). Parameters:None Returns:int in the format YYYYWWDD Signature: YYYYWWD()
In context example: <aggregator grain="YYYYWWD()"> In the example above, the YYYYWWD function returns the time grain grouping key. See also: YYYYMMDD