Activity
Mon
Wed
Fri
Sun
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
What is this?
Less
More

Memberships

Data Alchemy

Public • 19.8k • Free

3 contributions to Data Alchemy
unstructured PDF data
I played around a lot with several python libs to read and extract text from unstructured PDF files. My special use case is, that i need, as a preprocessing step, a library to understand complex tables but also normal chapters like a cover with important data from the same PDF file. Tables can have a list of data in it or again tables -> unstructured data made by humans. In a next step i can create embeddings of it and store them in a vector database (to work with a model with it later) - but this is not the problem of this post. After a lot of failed tryouts with "normal" python libraries, i found the library from https://unstructured.io. It helps me a lot to keep the content of tables semantically together to be able to execute search on it later. But i am quiete unsure, if i am on the right track or if there is some other, easier technique to work with such PDF files. Also, i am sure that a lot of AI apps have similar requirements like these - what do you think?
7
8
New comment Dec '23
1 like • Nov '23
i know Marco - but i cannot share the docs, as i am not allowed to. This makes it a little more complicated. But i am asking for PDF content to show here.. What i have learned is, that a changing PDF layout is crucial. For most libraries or tools, it is very hard to not doable to keep the semantik of tables in PDF files, so that a later AI model is able to understand the context at all.
1 like • Nov '23
@Brandon Phillips thx. I`ll give it a try
Bloop.. Find Code Fast
AI coding assistant called "Bloop," aiming to revolutionize how developers work with codebases. Bloop distinguishes itself by harnessing the capabilities of GPT-4 and other extensive language models, offering a holistic approach to understanding and navigating code, it also stands out for its ability to understand and navigate entire code bases. Key features include a robust code search functionality, a sophisticated Code Studio enabling contextualized code generation, and the capacity to address complex technical queries efficiently. It can enhance code efficiency and understanding, particularly in dealing with intricate tasks within legacy code bases. It can impact in streamlining coding workflows, improving code efficiency and understanding and developers' productivity.
8
6
New comment Oct '23
2 likes • Oct '23
nice - so can you use bloop to rewrite an old app from, lets say C# .net to a modern node app in typescript?
2 likes • Oct '23
@Shivkumar Honnukai would be an interesting usecase as it occurs quiete often that an old application has to be rewritten for different reasons
Short Intro
Hi everyone, I am a full stack web developer from Germany with over 20 years of experience in this field. I am happy to join this AI forum and learn more about AI and its tools and how to use them in product development. I need AI in my case since the product that I am planning requires the ability to read and understand documents and to be able to run AI locally. I'm looking forward to sharing with other developers and learning from their experiences. btw. - is there anyone else from germany?
9
3
New comment Oct '23
1-3 of 3
Chris B
3
41points to level up
@christian-buttner-7551
A Full Stack web developer, interested in adding AI into workflows and products

Active 136d ago
Joined Oct 11, 2023
powered by