Brug af CluedIn Rules i Microsoft Fabric
youtu.be/ePNiXonel8k?si=DrbFQUAy1-j5jXmO
CluedIn tilbyder et kraftfuldt GraphQL API til at hente data og metadata samt udføre handlinger. I denne artikel lærer du, hvordan du kan læse CluedIn rules' metadata i Microsoft Fabric og bruge dem til at arbejde med data i OneLake.
For at bruge CluedIn rules i Microsoft Fabric, opret en notebook i Microsoft Fabric, installer afhængigheder og hent et CluedIn access token:
!pip install cluedin
!pip install jqqb_evaluator
import cluedin
ctx = cluedin.Context.from_dict({
"domain": "cluedin.demo",
"org_name": "foobar",
"user_email": "admin@cluedin.demo",
"user_password": "yourStrong(!)Password"
})
ctx.get_token()
Hent alle regler fra CluedIn.
# hent alle data part rules
rules = cluedin.rules.get_rules(ctx)
# vis regelnavne
list(map(lambda x: x['name'], rules['data']['management']['rules']['data']))
Hvis du vil hente en bestemt regel efter ID, brug følgende metode.
cluedin.rules.get_rule(ctx, rule_id)
Fra en regels betingelser kan du oprette en Evaluator, der hjælper med at evaluere, om et givet objekt matcher reglens betingelser. I det følgende eksempel tager vi alle data part rules fra CluedIn, opretter en liste af evaluators og tester derefter, om et testobjekt matcher mindst en evaluator i listen:
# hent alle data part rules ids
rule_ids = list(map(lambda x: x['id'], cluedin.rules.get_rules(ctx)['data']['management']['rules']['data']))
# hent fulde regeldata
rules = list(map(lambda rule_id: cluedin.rules.get_rule(ctx, rule_id), rule_ids))
# hent en liste af evaluators for alle regler
evaluators = list(map(lambda rule: cluedin.rules.Evaluator(rule['data']['management']['rule']['condition']), rules))
# testobjekt
obj = {
'employee.job': 'Akkounting'
}
# returnerer True, hvis objektet matcher mindst en evaluator i listen
any(map(lambda evaluator: evaluator.object_matches_rules(obj), evaluators))
Evaluatorens explain()-metode hjælper med at forstå den aktuelle evaluators betingelser. Den outputter kode i form af pandas' DataFrame.query-metode:
# forklar alle evaluators
list(map(lambda evaluator: evaluator.explain(), evaluators))
Output:
[
'df.query(\'`employee.job` == "Ackounting" | `employee.job` == "Acounting" | `employee.job` == "Akkounting" | `employee.job` == "aCoUnTiNg" | `employee.job` == "account ing" | `employee.job` == "accounting"\')',
'df.query(\'`employee.job` == "Software Dev"\')',
'df.query(\'`employee.job` == "Softwear Dev"\')'
]
Ved hjælp af et par metoder kan du transformere dine data med CluedIn rules:
def set_value_action(obj, field, val):
"""
Set Value action: takes an object (obj), and sets its property (field) to a value (val).
"""
obj[field] = val
return obj
def get_action(action_json):
"""
Takes a Rule Action JSON, and returns a lambda
"""
if action_json['type'] == 'CluedIn.Rules.Actions.SetValue, CluedIn.Rules':
field = None
val = None
for prop in action_json['properties']:
if prop['name'] == 'FieldName':
field = prop['value']
elif prop['name'] == 'Value':
val = prop['value']
return lambda obj: set_value_action(obj, field, val)
print(f'Action "{action_json["type"]}" is not supported.')
return lambda obj: obj
def get_actions_with_evaluators(rule):
"""
For a given rule, returns an iterable of objects containing an action and a corresponding evaluator:
{
'action': lambda x: ...,
'evaluator': ...
}
"""
for r in rule['data']['management']['rule']['rules']:
for a in r['actions']:
yield {
'evaluator': cluedin.rules.Evaluator(rule['data']['management']['rule']['condition']),
'action': get_action(a)
}
# test action (ikke evaluator)
result = list(get_actions_with_evaluators(rules[0]))
result[0]['action']({ 'employee.job': 'CEO' })
def apply_actions(actions_with_evaluators, obj):
"""
Given a list of actions with evaluators pairs and an object (obj),
apply action to the object if it passes the corresponding evaluator.
"""
for action_with_evaluator in actions_with_evaluators:
if action_with_evaluator['evaluator'].object_matches_rules(obj):
obj = action_with_evaluator['action'](obj)
return obj
# test
actions_with_evaluators = [action_with_evaluator for rule in rules for action_with_evaluator in get_actions_with_evaluators(rule)]
apply_actions(actions_with_evaluators, { 'employee.job': 'Akkounting' })
Nu kan vi indlæse CluedIn-data i en DataFrame:
import pandas as pd
query = """
query searchEntities($cursor: PagingCursor, $query: String, $pageSize: Int) {
search(
query: $query
sort: FIELDS
cursor: $cursor
pageSize: $pageSize
sortFields: {field: "id", direction: ASCENDING}
) {
cursor
entries {
id
entityType
properties
}
}
}
"""
df = pd.DataFrame(cluedin.gql.entries(ctx, query, { 'query': 'entityType:/Employee', 'pageSize': 10_000 }, flat=True))
df.head(20)
Og anvende Rule Actions på matchende poster:
# anvend rule actions på en data frame
df.apply(lambda row: apply_actions(actions_with_evaluators, row), axis=1)
Eller du kan filtrere din DataFrame med evaluators:
def evaluate(row):
return any(map(lambda evaluator: evaluator.object_matches_rules(row), evaluators))
df_filtered = df[df.apply(evaluate, axis=1)]
display(df_filtered)