Roman Klimenko - Data og Solution Arkitekt, Lead Software Udvikler

youtu.be/ePNiXonel8k?si=DrbFQUAy1-j5jXmO

CluedIn tilbyder et kraftfuldt GraphQL API til at hente data og metadata samt udføre handlinger. I denne artikel lærer du, hvordan du kan læse CluedIn rules' metadata i Microsoft Fabric og bruge dem til at arbejde med data i OneLake.

For at bruge CluedIn rules i Microsoft Fabric, opret en notebook i Microsoft Fabric, installer afhængigheder og hent et CluedIn access token:

!pip install cluedin
!pip install jqqb_evaluator

import cluedin

ctx = cluedin.Context.from_dict({
    "domain": "cluedin.demo",
    "org_name": "foobar",
    "user_email": "admin@cluedin.demo",
    "user_password": "yourStrong(!)Password"
})
ctx.get_token()

Hent alle regler fra CluedIn.

# hent alle data part rules
rules = cluedin.rules.get_rules(ctx)
# vis regelnavne
list(map(lambda x: x['name'], rules['data']['management']['rules']['data']))

Hvis du vil hente en bestemt regel efter ID, brug følgende metode.

cluedin.rules.get_rule(ctx, rule_id)

Fra en regels betingelser kan du oprette en Evaluator, der hjælper med at evaluere, om et givet objekt matcher reglens betingelser. I det følgende eksempel tager vi alle data part rules fra CluedIn, opretter en liste af evaluators og tester derefter, om et testobjekt matcher mindst en evaluator i listen:

# hent alle data part rules ids
rule_ids = list(map(lambda x: x['id'], cluedin.rules.get_rules(ctx)['data']['management']['rules']['data']))
# hent fulde regeldata
rules = list(map(lambda rule_id: cluedin.rules.get_rule(ctx, rule_id), rule_ids))
# hent en liste af evaluators for alle regler
evaluators = list(map(lambda rule: cluedin.rules.Evaluator(rule['data']['management']['rule']['condition']), rules))

# testobjekt
obj = {
    'employee.job': 'Akkounting'
}

# returnerer True, hvis objektet matcher mindst en evaluator i listen
any(map(lambda evaluator: evaluator.object_matches_rules(obj), evaluators))

Evaluatorens explain()-metode hjælper med at forstå den aktuelle evaluators betingelser. Den outputter kode i form af pandas' DataFrame.query-metode:

# forklar alle evaluators
list(map(lambda evaluator: evaluator.explain(), evaluators))

Output:

[
    'df.query(\'`employee.job` == "Ackounting" | `employee.job` == "Acounting" | `employee.job` == "Akkounting" | `employee.job` == "aCoUnTiNg" | `employee.job` == "account ing" | `employee.job` == "accounting"\')',
    'df.query(\'`employee.job` == "Software Dev"\')',
    'df.query(\'`employee.job` == "Softwear Dev"\')'
]

Ved hjælp af et par metoder kan du transformere dine data med CluedIn rules:

def set_value_action(obj, field, val):
    """
    Set Value action: takes an object (obj), and sets its property (field) to a value (val).
    """
    obj[field] = val
    return obj

def get_action(action_json):
    """
    Takes a Rule Action JSON, and returns a lambda
    """
    if action_json['type'] == 'CluedIn.Rules.Actions.SetValue, CluedIn.Rules':
        field = None
        val = None
        for prop in action_json['properties']:
            if prop['name'] == 'FieldName':
                field = prop['value']
            elif prop['name'] == 'Value':
                val = prop['value']
        return lambda obj: set_value_action(obj, field, val)

    print(f'Action "{action_json["type"]}" is not supported.')
    return lambda obj: obj

def get_actions_with_evaluators(rule):
    """
    For a given rule, returns an iterable of objects containing an action and a corresponding evaluator:
    {
      'action': lambda x: ...,
      'evaluator': ...

    }
    """
    for r in rule['data']['management']['rule']['rules']:
        for a in r['actions']:
            yield {
                'evaluator': cluedin.rules.Evaluator(rule['data']['management']['rule']['condition']),
                'action': get_action(a)
            }

# test action (ikke evaluator)
result = list(get_actions_with_evaluators(rules[0]))
result[0]['action']({ 'employee.job': 'CEO' })

def apply_actions(actions_with_evaluators, obj):
    """
    Given a list of actions with evaluators pairs and an object (obj),
    apply action to the object if it passes the corresponding evaluator.
    """
    for action_with_evaluator in actions_with_evaluators:
        if action_with_evaluator['evaluator'].object_matches_rules(obj):
            obj = action_with_evaluator['action'](obj)
    return obj

# test
actions_with_evaluators = [action_with_evaluator for rule in rules for action_with_evaluator in get_actions_with_evaluators(rule)]
apply_actions(actions_with_evaluators, { 'employee.job': 'Akkounting' })

Nu kan vi indlæse CluedIn-data i en DataFrame:

import pandas as pd

query = """
query searchEntities($cursor: PagingCursor, $query: String, $pageSize: Int) {
  search(
    query: $query
    sort: FIELDS
    cursor: $cursor
    pageSize: $pageSize
    sortFields: {field: "id", direction: ASCENDING}
  ) {
    cursor
    entries {
      id
      entityType
      properties
    }
  }
}
"""

df = pd.DataFrame(cluedin.gql.entries(ctx, query, { 'query': 'entityType:/Employee', 'pageSize': 10_000 }, flat=True))

df.head(20)

Og anvende Rule Actions på matchende poster:

# anvend rule actions på en data frame
df.apply(lambda row: apply_actions(actions_with_evaluators, row), axis=1)

Eller du kan filtrere din DataFrame med evaluators:

def evaluate(row):
    return any(map(lambda evaluator: evaluator.object_matches_rules(row), evaluators))

df_filtered = df[df.apply(evaluate, axis=1)]
display(df_filtered)