Activity
Mon
Wed
Fri
Sun
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
What is this?
Less
More

Memberships

Data Innovators Exchange

Public • 181 • Free

Skool Community

Public • 141.1k • Paid

Data Alchemy

Public • 19.8k • Free

15 contributions to Data Innovators Exchange
The ultimate Data Breach nightmare scenario
In 2017, the third largest consumer credit agency in the world was victim of one of the largest ever data breaches on record. In this week's Data Radio Show, Ignition's Julien Redmond talks with Equifax's Bob Sparshatt about what happened, what was learned and how it impacted the wider publics's understanding of what data is held on them.
5
1
New comment 1d ago
The ultimate Data Breach nightmare scenario
1 like • 1d
What an interesting episode. Data breaches are happening so frequently these days and not every company can recover from this. I still see relatively small focus on data security for most projects as it seems like an expensive thing to do without having any short-term value for the business. Also there ia a huge lack of knowledge around security for many data teams. This topic also suffers from the reality that it still rarely happens to you, so you don't feel a risk until it's to late. It's important to understand that everyone can be targeted, and basic things like not giving every developer access to basically everything + admin rights can already make a big difference. In general, as data platforms are getting seen as a product (which is great), it also needs a security concept and control system like a software product would (at least, the good ones).
Hi from Soumendu Dutta
Hi All - I have just joined after completion of Data Vault 2.0 Bootcamp.
10
9
New comment 3d ago
1 like • 9d
Welcome Soumendu 🙂
1 like • 9d
@Soumendu Dutta That's great to hear! Thanks for the positive feedback 🙂 If you want to get further info around the topic, feel free to check out the classrooms here or our Knowledge Hub on our Website and join the Data Vault Fridays with Michael 🙂 And all the best for the certification exam 😉
Taming the Wild West of Distributed Ownership (Data Mesh)
In principal Data Mesh offers a brilliant approach to managing data at scale, decentralizing ownership while maintaining centralized governance. However, that requires a lot of change in the organization and without a clear strategy, Data Mesh can easily lead to anarchy, data silos and many more meetings. I'm very much looking forward to be taking the stage with @Marc Winkelmann at Data Dreamland to dive into this topic with our presentation: "Data Mesh Governance: Taming the Wild West of Distributed Ownership" I hope to see many of you in Hanover! Sign up here if you want to join: https://scalefr.ee/d7goui #DataGovernance #DataManagement #DataDreamland
8
3
New comment 8d ago
Relational Stage vs. Data Lake in Data Vault—Where Are the Differences?
Relational stages handle structured data with real-time processing and schema validation, while Data Lakes are built for unstructured data, offering flexibility and scalability for large datasets and analytics. Where do you see the biggest differences in how they’re used in your Data Vault setup?
3
3
New comment 9d ago
3 likes • 10d
I'm generally a fan of a Data Lake if they are well structured to have a more open architecture that as a standard handles semi-structured and unstructured data. However, one big thing that's a drawback is deletions of certain records in a Data Lake for e.g. privacy reasons. Of course that's still possible, but difficult. I would love to hear your thoughts or from anyone else here what you would prefer? Or maybe a mix of both to easily handle deletions for some data?
My 5 Tips when working with Snowflake
Of course there are dozen of tips available for Snowflake, but let me share the ones which came into my mind very quickly: 1) Understand how Snowflake stores the data! They are using micro-partitions, organized in a columnar way. Micro Partitions store statistics like distinct values and value-ranges for each column. Your goal should always be to prune as much as possible from both when querying data. For example: Only select columns you really need, and apply filters on columns where the values are mostly not overlapping multiple Micro Partitions. Also think on re-clustering your data if necessary, or creating your own values with a pattern to cluster your data on (usually only necessary for huge amounts of data in one table). 2) When data is spilled to local storage while querying, is a good indicator that a bigger warehouse makes sense. I assume here that the query itself is already optimized and we are just dealing with a lot of data and maybe complex logics. But keep in mind: Increasing the size of the Snowflake Virtual Warehouse by 1 step (i.e. M -> L), doubles the costs or the same runtime! (calculated per cluster). So, when the query time is less than 50%, we achieved a win-win: faster & cheaper result! If the runtime could not be reduced by 50% or more, then you have to decide whether the quicker response is worth the money you now spend. 3) Snowflakes no-copy clones allow you to test features and fixes against your production in a very easy and fast way. It should be part of your deployment pipelines. 4) Insert-only reduces the number of versions Snowflake has to create for the Micro Partitions. Updates and Deletes cause this versioning of already existing Micro Partitions what costs time and additional storage. That also means that Data Vault with its Insert-Only approach meets the scalability factors of Snowflake! 5) The QUALIFY statement improved the code writing a lot. It is using the result of a window-function as filter, means, you don't have to write nested sub-queries with and self-joins.
11
2
New comment 13d ago
3 likes • 17d
Amazing tips! Thanks for sharing 🙂
1-10 of 15
Christof Wenzeritt
3
18points to level up
@christof-wenzeritt-9987
CEO at Scalefree

Active 23h ago
Joined Apr 11, 2024
powered by