Seeking Best Practices for CSV to JSON Transformation in Fabric
Hi everyone! My name is Max, and I work as an analyst in a small company. And i'm thrilled to build Fabric solutions for my tasks instead of huge piles of different tools that are hard to log and monitor. I'm new to Microsoft Fabric, and as an exercise to learn it, I decided to migrate one of my existing projects to the Fabric architecture. I'm facing some challenges with the architecture. Here is a brief description of the old process: 1. The client sends a list of agencies in a .csv file. 2. Validation and mapping are done within a Logic App. 3. The agency table is transformed into a set of JSON objects (one object per row) using Logic App and Azure Functions. The resulting JSON object has a more complex structure than the original table, so it cannot be converted 1:1 from the table to JSON. 4. All JSONs are saved in Blob Storage (in the new architecture, I was thinking of switching to Lakehouse to better orchestrate and log all these JSON files). While the first two steps are straightforward to implement in Fabric (thanks to Will's videos), I'm struggling with the third step. I can't figure out what is the best way to separate csv rows into set of json files. I have tried PySpark, but either I didn't understand something, or it's not the best fit for this task - saving JSON files to Lake Storage is behaving oddly. Also, I couldn't find a suitable solution in DataFlow and Pipeline (calling Azure Function seems like a possible option, but I can't figure out the best way to organize it). SQL doesn't look helpful, but It doesn't accept FOR JSON for some reason. Or should I just use something outside Fabric for this specific operation? I would appreciate any ideas on how to make this transformation as efficient and effective as possible, thanks! 💚