Paul Miller

AI Developer Accelerator

Activity

Mon

Wed

Fri

Sun

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

What is this?

Less

Memberships

AI Developer Accelerator

Public • 3.8k • Free

Data Alchemy

Public • 22.2k • Free

Ecom Phenom Workshop

Private • 70 • Free

AI Developer Accelerator Pro

Private • 32 • $49/m

53 contributions to AI Developer Accelerator

Bastian Venegas

6d ago in

CrewAI

CrewAI Docs in a single file.

Hey everyone! Now there’s a link that contains the complete CrewAI documentation in a .txt file, so you can feed it into any LLM or agent, or even add it using ‘@‘ in cursor. https://docs.crewai.com/llms-full.txt This should make things way easier. Same goes for Anthropic/Claude: http://docs.anthropic.com/llms-full.txt

New comment 3d ago

Paul Miller

1 like • 6d

Very useful thanks Bastian

Sam G

9d ago in

LangChain

Looking for ideas on chunking reservation related conversations and context aware RAG

Hey guys, i had discussed this briefly a few meetings back about the problem I am trying to solve. I am stuck on how to best create the vector store, as simply chunking all messages and using some kind of Sentence splitting does not work well I manage a property portfolio on platforms like Airbnb, handling customer support through the entire guest journey (pre-booking to post-stay). I'm building a RAG system to help automate responses to guest inquiries. Here are the questions I have - some context of business below ## Technical Questions 1. **Vector Database Strategy** - How to structure embeddings for different information types? - Chunking Strategy Challenges: - Single message chunks: Lose conversation context - Multi-message chunks: How many messages maintain coherence? - Entire conversation chunks: May be too broad for specific queries - How to preserve booking context (guest state, property details) within chunks? - Should property-specific and global information be in separate vector spaces? - How to handle property hierarchies in vector search? 2. **Temporal Relevance** - How to weight conversation recency differently based on query type? - How to combine current property documents with historical conversations? 3. **Context-Aware Retrieval** - How to incorporate guest journey state into the retrieval process? - How to handle property relationships (e.g., similar apartments sharing info)? - How to balance property-specific vs. global policy information? 4. **Security and Policy Compliance** - How to ensure RAG responses respect security policies based on guest journey state? - How to handle platform-specific rules in responses? ## Data Sources and Unique Challenges ### 1. Historical Conversations (around 10,000 reservations over 7 years , each having 10-40 messages during the client journey) - Stored in PostgreSQL - Time relevance varies by query type: ``` Example A: "What's the WiFi password?" → Recent conversations only relevant (passwords change)

New comment 8d ago

Paul Miller

1 like • 9d

How long are the largest messages you are receiving in tokens? OpenAI 4o-mini can process 128K tokens inwards (roughly 100,000 words) and output 16,000 tokens. Why do you need to chunk at all.

Paul Miller

11d ago in

General discussion

D&D using LLM

I remember one of the community mentioned he was doing something with D&D not sure if the have seen - https://obie.medium.com/my-kids-and-i-just-played-d-d-with-chatgpt4-as-the-dm-43258e72b2c6

Paul Miller

15d ago in

General discussion

Can't find your great code snippets - I wrote an app for that

I have 90+ VS code project envs with hundreds of python libraries. I will share the code but here is a screenshot.

New comment 14d ago

Can't find your great code snippets - I wrote an app for that

Paul Miller

15d ago in

General discussion

Going from PDF to Chunks the smart way

I got asked on yesterdays call about how to take a PDF into a more consistent way into chunks for RAG. The first challenge you have with converting any PDF file is dealing with the unique underlying way that the PDF document may be formated. Much of that formating has no impact on the printed output but does have an impact if you are using python to extract with Langchain making the output often inconsistent with sections often being wrongly aggregated for the chunking process. A better approach that has worked consistantly for me is to first convert the PDF into Markdown then convert the Markdown into chunking see: Step One: import pymupdf4llm import pathlib # Convert PDF to markdown md_text = pymupdf4llm.to_markdown("input.pdf") # Save markdown to file (optional) perhaps just save as a string pathlib.Path("output.md").write_bytes(md_text.encode()) Step Two: from langchain_text_splitters import MarkdownHeaderTextSplitter # Define headers to split on headers_to_split_on = [ ("#", "Header 1"), ("##", "Header 2"), ("###", "Header 3"), ] # Initialize splitter markdown_splitter = MarkdownHeaderTextSplitter(headers_to_split_on) # Split the markdown text md_header_splits = markdown_splitter.split_text(md_text)

New comment 14d ago

Paul Miller

0 likes • 15d

@Tom Welsh there is a way u found need to talk to @Brandon Hancock about turning it on

1-10 of 53

Level 4

43points to level up

Paul Miller

@paul-miller-1511

Co-founder of two SAAS startups. Part-time Python amateur dev working with AI. Political advocate for public infrastructure in New Zealand.

Active 2d ago

Joined Apr 28, 2024

Auckland, New Zealand.

Contributions

Followers

Following