Microsoft Fabric – Ingest XML into Lakehouse

Data Fabric

Picture the scene. You’re potentially migrating to Microsoft Fabric and have a bunch of pipelines that require you to ingest XML from an API – your existing tools do this no sweat so for a technology that sells itself as an end-to-end analytical platform this should be well within its wheelhouse.

You’ve created the pipeline, setup the source, you come to setup the Lakehouse sink and, shock horror, XML is not a supported file format.

This is where it gets a bit fuzzy…….

The Azure DataFactory documentation at Copy and transform data in Microsoft Fabric Lakehouse – Azure Data Factory & Azure Synapse | Microsoft Learn, states that XML is not a supported output format.

However, switching to the Microsoft Fabric documentation using the documentation contents at https://learn.microsoft.com/en-us/fabric/data-factory/connector-lakehouse-copy-activity

currently specifies it is an option

Regardless of which documentation is correct, here is a workaround I found.

Workaround

The workaround is fairly straightforward. It uses a pipeline with a Web Activity to obtain the XML output, then passes this to a Notebook to put the file into storage.

Web Activity

Setting up a Web Activity is fairly straightforward, just make sure the output is what you’re expecting.

Notebook

Setup the notebook as follows

Add the target Lakehouse to the Notebook.

The first cell is where we setup two parameters to receive the FileName (assuming its going to change) and the XML content. These values are overridden when the pipeline is run. To setup a parameter cell, use the menu as follows

The second cell simply has one command

mssparkutils.fs.put(FILE_NAME, XML_CONTENT, True)

Setup a Notebook Activity in your pipeline and setup the Base Parameters as follows

In this example Ive setup the dynamic content as

@activity('Web1').output.Response

and

@concat('Files/OutputFromWebTask.xml')

Once the pipeline is run, a single file is output

Gotchas

A couple of gotchas I came across were

Parameter names in Notebooks and Notebook Activities are case sensitive.

Don’t forget the /Files folder prefix in the File_Name parameter.

Final Thoughts

Finding ways around problems are what Data Engineers excel at, although dealing with documentation in this state is simply confusing. I hope both issues get fixed very soon.

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to top