Knowledge Graph 🤝 LLMs

*A knowledge graph generated using our recent open-source project called prettygraph (link)*

At Untapped, we pride ourselves on being early in identifying upcoming technology trends, and thought we’d share what we’ve learned recently about the intersection of knowledge graphs and LLMs.

For those not familiar, knowledge graphs (Wikipedia) are a type of data representation in the forms of nodes (objects) and edges (relationships). This data structure allows for efficient querying of relationship related data such as finding and calculating path distance between two objects, and for doing further reasoning efficiently.

We were introduced to the concept through our autonomous agent building community, and as we dug in – we found a lot of promise in the opportunity to leverage LLMs to build knowledge graphs, which led to us building and open-sourcing Instagraph (Sept 2023, 3.1k stars) which converts natural language input into structured graphs. You can also try a web-hosted version at instagraph.ai.

Since then we learned about historical efforts to convert human knowledge into a massive knowledge graph (e.g. large knowledge collider) and continued down this path to build and open-source MindGraph – which deduplicates nodes and edges as new input is added.

Our hypothesis here is that in the long-run (as memory of AI increases), it makes sense to structure data upon input, then query and reason against structured data, as opposed to the standard today – simply storing the raw data and querying against chunks of it. Our hypothesis was validated in a recent Microsoft research report called GraphRAG (Feb 2024) that showed early tests with this outperforming traditional approaches (unfortunately no code has been shared).

While Neo4j, a leading graph database solution kept being mentioned (also because they provide vector embedding capability), we have been delighted to come across a number of startups in this space. Three newer embeddable graph database solutions we came across include FalkorDB (launched by team behind RedisGraph), NexusDB (with a public data component), and Kuzu. Two tools similar to MindGraph we came across were Cognee and WhyHow AI, both offering natural language building and querying of knowledge graphs. Origin Trail is building decentralized knowledge graphs maintained on the blockchain.

Beyond using graph databases to store structured knowledge, we’re also seeing explorations around using graphs to track, understand, and fix AI agents – such as the efforts with LangGraph (Jan 2024). The idea is to track the flow of tasks, which can be extended to try multiple paths to continuously find better paths for solving various objectives. You can see some of our early experiments with this here. Taking it one step further, we’ve even displayed function dependencies within a code base as a graph (here).

As you combine these ideas, you can imagine an autonomous AI agent where their skills, actions, objectives, and knowledge are all tied together in a massive graph (imagine web), with the ability to efficiently query against chunks or the whole, using embeddings to understand semantic similarity and edge paths to determine known relationships, etc.

These are some of our current thoughts on the opportunity with knowledge graphs combined with LLMs and embeddings :)

Comments

Leave a Reply Cancel reply