In this blog post, we will walk through the process of loading data from a CSV file into your Neo4j graph database. We'll cover the steps to clean the database, load the CSV data, create nodes and relationships, and verify the data insertion. Additionally, we'll demonstrate how to perform these steps in a Docker environment.
Deleting Existing Nodes and Relationships
First, let's start by clearing any existing data in our Neo4j database. Open the Neo4j browser connected to your active DBMS and run the following Cypher query to delete all nodes and their relationships:
MATCH (n) DETACH DELETE n;
This command ensures that everything in the database is deleted, giving us a clean slate to work with.
Preparing the CSV File
Next, we'll prepare our CSV file. Suppose we have a CSV file named movies.csv with the following content:
id,title,country,year
1,The Matrix,USA,1999
2,Inception,USA,2010
3,Parasite,South Korea,2019
This file contains columns for the movie ID, title, country, and year of release.
Copying the CSV File to the Import Directory
For Neo4j to read this CSV file, it must be placed in the import directory. By default, Neo4j reads files from this location, but you can change it by updating the neo4j.conf file. For now, let's copy the movies.csv file into the import directory.
Loading Data Using Cypher
Now, let's open the Neo4j browser and use the following Cypher query to load data from our CSV file and create nodes and relationships:
LOAD CSV WITH HEADERS FROM 'file:///movies.csv' AS csvLine
MERGE (country:Country { name: csvLine.country })
CREATE (movie:Movie { id: toInteger(csvLine.id), title: csvLine.title, year: toInteger(csvLine.year) })
CREATE (movie)-[:MADE_IN]->(country);
Explanation of the Query
LOAD CSV WITH HEADERS: This line tells Neo4j to load the CSV file from the specified location. The WITH HEADERS option indicates that the first row of the CSV file contains column names.
MERGE Clause: This ensures that a node with the label Country and a property name set to the value of csvLine.country exists. If such a node doesn't exist, it will be created; if it does exist, it will be reused.
CREATE Clause: A new node with the label Movie is created. The properties of this node (id, title, and year) are set based on the values from the CSV file.
Relationship: A relationship of type MADE_IN is created between the movie node and the country node, indicating the country where the movie was made.
Running the Query
Execute the above Cypher query in the Neo4j browser. If the query executes successfully, it will load the movie data into the database, create nodes for each movie and country, and establish the appropriate relationships.
Verifying the Inserted Data
To verify that the data has been inserted correctly, run the following match query:
MATCH (m:Movie)-[:MADE_IN]->(c:Country) RETURN m, c;
This query will return all movie nodes and their associated country nodes. For example, you should see three movie nodes connected to the respective country nodes.
Loading Data in Docker
If you're running Neo4j in a Docker container, the process is slightly different. Below is a Docker Compose file that sets up Neo4j with the necessary configurations to load CSV data:
Docker Compose File
version: '3.8'
services:
neo4j:
image: neo4j:latest
container_name: neo4j
ports:
- "7474:7474" # HTTP port for Neo4j Browser and API
- "7687:7687" # Bolt port for database connections
environment:
- NEO4J_AUTH=neo4j/securepassword # Custom username and password
- NEO4J_apoc_import_file_enabled=true
- NEO4J_apoc_import_file_use_neo4j_config=true
volumes:
- ./data:/data
- ./logs:/logs
- ./import:/var/lib/neo4j/import
- ./plugins:/plugins
Running the Docker Container
Create a directory structure with data, logs, import, and plugins folders.
Place your movies.csv file in the import folder.
Start the Neo4j container:
docker-compose up -d
Loading the CSV Data in Docker
Once the container is running, open the Neo4j browser and execute the same Cypher query to load data from the CSV file:
LOAD CSV WITH HEADERS FROM 'file:///movies.csv' AS csvLine
MERGE (country:Country { name: csvLine.country })
CREATE (movie:Movie { id: toInteger(csvLine.id), title: csvLine.title, year: toInteger(csvLine.year) })
CREATE (movie)-[:MADE_IN]->(country);
Conclusion
In this guide, we learned how to load data from a CSV file into Neo4j using Cypher queries. We also covered how to perform these steps in a Docker environment. By following these steps, you can efficiently manage and analyze your data in Neo4j, leveraging its powerful graph database capabilities.
Comments