Using CluedIn Rules in Microsoft Fabric

youtu.be/ePNiXonel8k?si=DrbFQUAy1-j5jXmO

CluedIn provides a powerful GraphQL API for getting data and metadata and executing actions. In this article, you will learn how to read CluedIn rules' metadata in Microsoft Fabric and use it to work with data in OneLake.

To use CluedIn rule in Microsoft Fabric, create a notebook in Microsoft Fabric, install dependencies, and get a CluedIn access token:

!pip install cluedin
!pip install jqqb_evaluator

import cluedin
    
ctx = cluedin.Context.from_dict({
    "domain": "cluedin.demo",
    "org_name": "foobar",
    "user_email": "admin@cluedin.demo",
    "user_password": "yourStrong(!)Password"
})
ctx.get_token()

Get all the rules from CluedIn.

# get all data part rules
rules = cluedin.rules.get_rules(ctx)
# output rule names
list(map(lambda x: x['name'], rules['data']['management']['rules']['data']))

If you want to get a particular rule by ID, use the following method.

cluedin.rules.get_rule(ctx, rule_id)

From a rule's conditions, you can create an Evaluator that helps to evaluate if a given object matches with the rule's conditions. In the following example, we take all data part rules from CluedIn, create a list of evaluators, and then test if a test object matches to at least one evaluator in the list:

# get all data part rules ids 
rule_ids = list(map(lambda x: x['id'], cluedin.rules.get_rules(ctx)['data']['management']['rules']['data']))
# get full rule data
rules = list(map(lambda rule_id: cluedin.rules.get_rule(ctx, rule_id), rule_ids))
# get a list of evaluators for all rules
evaluators = list(map(lambda rule: cluedin.rules.Evaluator(rule['data']['management']['rule']['condition']), rules))

# test object
obj = {
    'employee.job': 'Akkounting'
}
 
# returns True if the object matches to at least one evaluator in the list
any(map(lambda evaluator: evaluator.object_matches_rules(obj), evaluators))

The evaluator's explain() method helps to understand the current evaluator's conditions. It outputs code in terms of pandas' DataFrame.query method:

# explain all evaluators
list(map(lambda evaluator: evaluator.explain(), evaluators))

Output:

[
    'df.query(\'`employee.job` == "Ackounting" | `employee.job` == "Acounting" | `employee.job` == "Akkounting" | `employee.job` == "aCoUnTiNg" | `employee.job` == "account ing" | `employee.job` == "accounting"\')',
    'df.query(\'`employee.job` == "Software Dev"\')',
    'df.query(\'`employee.job` == "Softwear Dev"\')'
]

With the help of a few methods, you can transform your data using CluedIn rules:

def set_value_action(obj, field, val):
    """
    Set Value action: takes an object (obj), and sets its property (field) to a value (val).
    """
    obj[field] = val
    return obj

def get_action(action_json):
    """
    Takes a Rule Action JSON, and returns a lambda
    """
    if action_json['type'] == 'CluedIn.Rules.Actions.SetValue, CluedIn.Rules':
        field = None
        val = None
        for prop in action_json['properties']:
            if prop['name'] == 'FieldName':
                field = prop['value']
            elif prop['name'] == 'Value':
                val = prop['value']
        return lambda obj: set_value_action(obj, field, val)

    print(f'Action "{action_json["type"]}" is not supported.')
    return lambda obj: obj

def get_actions_with_evaluators(rule):
    """
    For a given rule, returns an iterable of objects containing an action and a corresponding evaluator:
    {
      'action': lambda x: ...,
      'evaluator': ...

    }
    """
    for r in rule['data']['management']['rule']['rules']:
        for a in r['actions']:
            yield {
                'evaluator': cluedin.rules.Evaluator(rule['data']['management']['rule']['condition']),
                'action': get_action(a)
            }

# test action (not evaluator)
result = list(get_actions_with_evaluators(rules[0]))
result[0]['action']({ 'employee.job': 'CEO' })

def apply_actions(actions_with_evaluators, obj):
    """
    Given a list of actions with evaluators pairs and an object (obj),
    apply action to the object if it passes the corresponding evaluator.
    """
    for action_with_evaluator in actions_with_evaluators:
        if action_with_evaluator['evaluator'].object_matches_rules(obj):
            obj = action_with_evaluator['action'](obj)
    return obj

# test
actions_with_evaluators = [action_with_evaluator for rule in rules for action_with_evaluator in get_actions_with_evaluators(rule)]
apply_actions(actions_with_evaluators, { 'employee.job': 'Akkounting' })

Now, we can load CluedIn data in a DataFrame:

import pandas as pd

query = """
query searchEntities($cursor: PagingCursor, $query: String, $pageSize: Int) {
  search(
    query: $query
    sort: FIELDS
    cursor: $cursor
    pageSize: $pageSize
    sortFields: {field: "id", direction: ASCENDING}
  ) {
    cursor
    entries {
      id
      entityType
      properties
    }
  }
}
"""

df = pd.DataFrame(cluedin.gql.entries(ctx, query, { 'query': 'entityType:/Employee', 'pageSize': 10_000 }, flat=True))

df.head(20)

And apply Rule Actions to matched records:

# apply rule actions to a data frame
df.apply(lambda row: apply_actions(actions_with_evaluators, row), axis=1)

Or you can filter the DataFrame with the evaluators:

def evaluate(row):
    return any(map(lambda evaluator: evaluator.object_matches_rules(row), evaluators))

df_filtered = df[df.apply(evaluate, axis=1)]
display(df_filtered)