CluedIn Python SDK 3.0.0 - Roman Klimenko

CluedIn Python SDK 3.0.0 drops support for the deprecated Python versions; hence, the lowest supported Python version is 3.10.

However, the most interesting change is that support for CluedIn ingestion endpoints has been added. This simplifies the most common pattern for data ingestion in CluedIn and defaults to best practices. Check the code snippet: you can send an array (list), stream (generator), or a data frame—the SDK will split data in batches for you and return the response for each batch.

import cluedin

# TODO: define API_TOKEN and ENDPOINT_URL

ctx = cluedin.Context.from_jwt(API_TOKEN)

# 1. ingest a simple list
data = [
  { 'id': 1, 'name': 'foo' },
  { 'id': 2, 'name': 'bar' }
  # HINT: you can have millions of records here
]

for processed_batch in cluedin.ingestion.post(ctx, ENDPOINT_URL, data):
  	# HINT: cluedin.ingestion.post splits data into batches
    #	and posts a batch after a batch
    #   the next batch will not be posted if you break the loop
    # HINT: the processed_batch contains records posted in the batch
    #   and the ingestion endpoint's response, including the receiptId
	print(processed_batch['response'])
    
# 2. stream data with generators
def stream_data():
	for i in range(1_000_000):
    	yield { 'id': i, 'email': f'user{i}@cluedin.com' }

for processed_batch in cluedin.ingestion.post(ctx, ENDPOINT_URL, stream_data()):
	print(processed_batch['response'])
    
# 3. pandas
import pandas as pd

# TODO: define a DataFrame df

for processed_batch in cluedin.ingestion.post(ctx, ENDPOINT_URL, df.to_dict('records')):
	print(processed_batch['response'])

What's Changed

Full Changelog: https://github.com/romaklimenko/cluedin/compare/2.6.0...3.0.0