John OCallaghan

Learn Microsoft Fabric

Activity

Mon

Wed

Fri

Sun

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

What is this?

Less

Memberships

Learn Microsoft Fabric

Public • 5.7k • Free

14 contributions to Learn Microsoft Fabric

John OCallaghan

Sep 12 in

Technical

Notebook authentication

In our data pipelines, we use pre-created gateway connections as our means of authenticating. This is true for our database, file, and in some cases API endpoints. I believe this is not possible in a Notebook. What is the recommended/secure method for authenticating against items as listed above within a Fabric Notebook? I guess you are all going to advise KeyVault? How-To Access Azure Key Vault Secrets from Fabric Notebook - Syntera

New comment Sep 12

John OCallaghan

Sep 5 in

Technical

HELP!!!!!

My production ETL stopped working last night. The Stored procedure transferring from the delta table in the lakeouse to the "silver" layer in the warehouse could not see the delta table. The upshot is I can (could!) visually see the tables in my SQL endpoint (via web UI or SSMS), but if I query them I am told the table is not there (which is correct). When I view the lakehouse via the Lakehouse view, it correctly only shows the only 2 tables that are there. As I am typing, the situation has changed (after over an hour of trying to understand the problem whilst it persisted). However, there is definitely something wrong with the lakehouse. I have just triggered the ETL again (but for 1 table I know will only generate a single row delta), and it is continuing to show the same behaviours. Is there something I need to do to "tidy up" the lakehouse. The ELT literally drops all previous deltas, then copies the fresh delta. Every 15 mins. has been running fine for weeks. Anything anybody can point me to?

New comment Sep 6

John OCallaghan

0 likes • Sep 5

Thanks Will Surly changing the lakehouse table I was writing my delta tables to, should have meant I was on a "clean page", and therefore the issue could not have been one of table maintenance? Additionally, the degradation seemed to happen "all of a sudden", as in the ETL was running fine until 5.30pm last night, each run (on a 15 min cycle) executed in its usual 8 minute window, then on the 5.45pm failed with the errors about not being able to find the table in the lakehouse (when I am referring to Delta, I am not meaning delta parquet, but the dataset that is the difference from the last load). This persisted for 3 more runs (the second step not finding the table created by the first step), and then worked again! Running in the usual 7-8 mins. It then failed once more (same issue), that successful again all the way until this morning where it failed again, and has continued to fail every time since.

John OCallaghan

1 like • Sep 6

Thanks Jerry. Yeah, I think I am going to have to investigate this. Notebooks are relatively new to me, so I will need to find a few examples that achieve the UPSERT pattern I have in my SP. For now I have overcome the problem (without understanding what is causing it); I have moved to a new workspace. I used a deployment pipeline to deploy my entire workpspace content from the original prod workspace to this new workspace. Copied the data across. Now everything works exactly as it should have been in the original workspace. For me the issue is, if the SQL endpoint to the lakehouse is so unreliable, then MS need to say so. Although I accept that notebooks are probably the right way to go, what originally led down this path was the issue that notebooks cannot connect to on-prem datasources.

John OCallaghan

Aug 13 in

Technical

Fabric tracing

We have an issue with the CUs utilisation going through the roof! This coincides with moving our infrastructure to AWS (including VMs of OPDG and a VM hosted SQL server). I am trying to get our "IT" guys to investigate. The challenge is they want to know IP addresses of the fabric process going through the OPDG to get the data. Any ideas?

John OCallaghan

Aug 5 in

Technical

High Capacity Units

As part of the process of implementing a Fabric environment, and in particular for our data load and storage, we had an additional P1 capacity enabled. Initially, the ELT and model refresh was keeping the %utilisation at around the 20 - 25% utilisation range. This all that is occurring on this capacity. Not other models, no other Fabric objects using it. Our ELT runs every 15 mins. Currently it is looping over 15 tables. The data volumes are not very large my biggest fact table is 90 million rows, but a 15 minute delta for this table is rarely over 20k rows (but it is a very wide table, and is full of text columns). The rest of the tables have significantly less rows/columns. Our IT department have decided to move all IT resources to AWS, this includes our 3 node OPDG cluster. Since this move, I have noticed a significant increase in Capacity utilisation, such that I am now around 90% all of the time. This includes the runs all over the weekend, where the data volume is significantly smaller. Is it just a coincidence that since the move to AWS there has been a significant increase in my CUs? Or is there something else I need to look at/look out for. Within the space of less than 2 weeks I have gone from a very comfortable position to squeaky bum time re % utilisation. Any help/pointers/suggestions gratefully received.

New comment Aug 6

John OCallaghan

0 likes • Aug 6

Yes, Azure. I think the issue is bandwidth. I have noticed that the throughout is smaller, the etl takes longer too. 9 mins to 12 mins full etl. IT are indicating they do not to think it is the switch to AWS that is cause.

John OCallaghan

Jul 29 in

Technical

Fabric Data Pipeline Copy Data datetime bug?

Hi I have notices that the Fabric Copy Data activity seems to switch my datetimes from the ingested SQL source by 1 hour. For example, the record in my on-premise table has a datetime of '2022-07-01 09:11:58.5880000' Once landed into fabric it reports '2022-07-01 08:11:58.588000' Has anybody else witnessed this? Have they resolved it? Thanks

New comment Aug 2

John OCallaghan

1 like • Aug 1

Hi @Will Needham I get that the behaviour of Fabric copy data activity is to automatically convert all datetime values into UTC. So if your source SQL has the datetime stored in GMT format, the copy activity automatically converts this datetime to meet UTC format. There is no way to override this. There are only workarounds. These workarounds all add additional overheads (if I understand the situation correctly). I do think this is a worthy acknowledgment, and perhaps a topic to be discussed. I can see lots of people being caught out by this. *****IT IS NOT THE COPY DATA ACTIVITY****** I am still trying to work this out, so any help appreciated. As part of ELT process it copies delta to a fabric table (using copy activity); this datetime value remains in its original format. It is the next UPSERT step, which seems to convert the datetime value which is representing GMT to UTC. I cannot see why?

John OCallaghan

0 likes • Aug 2

Hi @Will Needham 1. Yes 2. No, I using an sql stored procedure to do the upsert. This SQL SP dynamically creates the relevant create table (if silver layer table does not exist), or inserts/updates rows in the silver table, based on Lakehouse delta table. The SP code is dynamic using schema table to get list of columns to then create the relevant create or insert or update statement. No datatypes are (where) specified. It is this step that is setting a datetime column back by 1 hour, if that datetime value falls within the dates of BST. I have resolved the issue by modifying my dynamic columns, to check if the source column is type datetime2, if it is, it wraps this in a Covert statement, to a string of the same format; then the upsert does not set the time part back an hour, but as it is in the correct datetime format, it lands correctly in the destination datetime column. The problem with this approach is if the initial load of newly added tables to my ETL; this results in the datetime columns being datatype string, not datetime, so I have to manually recreate these. Before I made this change, under the assumption that there was an issue with the version of the OPDG, I had this updated to the very lates version, and carried out a full load. All of my BST columns where still set back by an hour. I have no doubt I have done something wrong in the first place, BUT, recognising that the entire fabric environment does have bugs, I need to be sure if my workaround is unnecessary, or whatever!

1-10 of 14

Level 2

1point to level up

John OCallaghan

@john-ocallaghan-5245

One of those old school do it all data guys

Active 42d ago

Joined Jun 23, 2024

Contributions

Followers

Following