top of page
  • Writer's pictureRevanth Reddy Tondapu

Part 24: Loading JSON Data from a Web URL into Neo4j: A Comprehensive Guide


Loading JSON Data from a Web URL into Neo4j
Loading JSON Data from a Web URL into Neo4j

In this blog post, we'll explore how to load JSON data from a web URL into a Neo4j database. We'll demonstrate how to fetch data from a public API, process it, and create nodes and relationships in the graph database. Specifically, we'll use data from the Stack Exchange API to import questions, answers, tags, and users into Neo4j.


Step 1: Cleaning the Database

First, let's ensure our database is clean by deleting any existing data. Open the Neo4j browser connected to your active DBMS and run the following Cypher query:

MATCH (n) DETACH DELETE n;

This command will delete all nodes and relationships, providing a fresh start for our demo.


Step 2: Understanding the JSON Data

Before we dive into the Cypher query, let's inspect the JSON data from the Stack Exchange API. You can view the data by navigating to the following URL in your web browser:

This URL returns a JSON response containing a list of questions tagged with "neo4j" on StackOverflow. Each question includes details like tags, answers, owner information, and other metadata.


Step 3: Crafting the Cypher Query

Now that we understand the structure of the JSON data, let's look at the Cypher query that will load this data into our Neo4j database.

Cypher Query Explanation

WITH "https://api.stackexchange.com/2.2/questions?pagesize=100&order=desc&sort=creation&tagged=neo4j&site=stackoverflow&filter=!5-i6Zw8Y)4W7vpy91PMYsKM-k9yzEsSC1_Uxlf" AS url
CALL apoc.load.json(url) YIELD value
UNWIND value.items AS q
MERGE (question:Question {id:q.question_id})
ON CREATE SET question.title = q.title,
              question.share_link = q.share_link,
              question.favorite_count = q.favorite_count

FOREACH (tagName IN q.tags | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag))
FOREACH (a IN q.answers |
   MERGE (question)<-[:ANSWERS]-(answer:Answer {id:a.answer_id})
   MERGE (answerer:User {id:a.owner.user_id}) ON CREATE SET answerer.display_name = a.owner.display_name
   MERGE (answer)<-[:PROVIDED]-(answerer)
)

WITH * WHERE NOT q.owner.user_id IS NULL
MERGE (owner:User {id:q.owner.user_id}) ON CREATE SET owner.display_name = q.owner.display_name
MERGE (owner)-[:ASKED]->(question)

Detailed Breakdown

  1. Define the API Endpoint:

This line sets the URL of the API endpoint as a variable.

2. Load JSON Data:

CALL apoc.load.json(url) YIELD value

This line uses the APOC library's apoc.load.json procedure to load JSON data from the specified URL.

3. Unwind the Items Array:

UNWIND value.items AS q

This clause iterates over the items array in the JSON response, assigning each question to the variable q.

4. Create or Match Question Nodes:

MERGE (question:Question {id:q.question_id}) ON CREATE SET question.title = q.title, question.share_link = q.share_link, question.favorite_count = q.favorite_count

This clause creates or matches a Question node with the question ID. If the node is newly created, it sets additional properties like title, share_link, and favorite_count.

5. Handle Tags:

FOREACH (tagName IN q.tags | MERGE (tag:Tag {name:tagName}) MERGE (question)-[:TAGGED]->(tag))

For each tag in the tags array, this clause creates a Tag node and establishes a TAGGED relationship between the question and the tag.

6. Handle Answers:

FOREACH (a IN q.answers |
   MERGE (question)<-[:ANSWERS]-(answer:Answer {id:a.answer_id})
   MERGE (answerer:User {id:a.owner.user_id}) ON CREATE SET answerer.display_name = a.owner.display_name
   MERGE (answer)<-[:PROVIDED]-(answerer)
)

For each answer, this clause creates an Answer node and links it back to the question with an ANSWERS relationship. It also creates or matches a User node for the answerer and establishes a PROVIDED relationship.

7. Handle Question Owners:

WITH * WHERE NOT q.owner.user_id IS NULL
MERGE (owner:User {id:q.owner.user_id}) ON CREATE SET owner.display_name = q.owner.display_name
MERGE (owner)-[:ASKED]->(question)

If the question has an owner, this clause creates or matches a User node for the owner and links it to the question with an ASKED relationship.


Running the Query

Execute the above Cypher query in the Neo4j browser. If everything is set up correctly, the query will run successfully, importing the JSON data into your Neo4j database.


Step 4: Verifying the Data

To verify the data insertion, you can check the schema and run some match queries. For example:

MATCH (p) RETURN p LIMIT 10;

This query will return a sample of nodes in the database. You should see various nodes with labels like Question, Tag, Answer, and User.


Exploring the Data

To explore the data further, you can click on any node and expand it to reveal its relationships. For example, clicking on a Question node might show its tags, the user who asked it, and the answers it received.


Conclusion

In this guide, we demonstrated how to load JSON data from a web URL into Neo4j using a complex Cypher query. By following these steps, you can efficiently import nested JSON data from APIs into your Neo4j database, establishing meaningful relationships between entities. This approach is particularly useful for integrating data from various web services and APIs into your graph database for advanced analytics and insights.

0 views0 comments

Comments


bottom of page