top of page
  • Writer's pictureRevanth Reddy Tondapu

Part 22: Loading Data from a CSV File into Neo4j: A Step-by-Step Guide


Loading Data from a CSV File into Neo4j
Loading Data from a CSV File into Neo4j

In this blog post, we will walk through the process of loading data from a CSV file into your Neo4j graph database. We'll cover the steps to clean the database, load the CSV data, create nodes and relationships, and verify the data insertion. Additionally, we'll demonstrate how to perform these steps in a Docker environment.


Deleting Existing Nodes and Relationships

First, let's start by clearing any existing data in our Neo4j database. Open the Neo4j browser connected to your active DBMS and run the following Cypher query to delete all nodes and their relationships:

MATCH (n) DETACH DELETE n;

This command ensures that everything in the database is deleted, giving us a clean slate to work with.


Preparing the CSV File

Next, we'll prepare our CSV file. Suppose we have a CSV file named movies.csv with the following content:

id,title,country,year
1,The Matrix,USA,1999
2,Inception,USA,2010
3,Parasite,South Korea,2019

This file contains columns for the movie ID, title, country, and year of release.


Copying the CSV File to the Import Directory

For Neo4j to read this CSV file, it must be placed in the import directory. By default, Neo4j reads files from this location, but you can change it by updating the neo4j.conf file. For now, let's copy the movies.csv file into the import directory.


Loading Data Using Cypher

Now, let's open the Neo4j browser and use the following Cypher query to load data from our CSV file and create nodes and relationships:

LOAD CSV WITH HEADERS FROM 'file:///movies.csv' AS csvLine
MERGE (country:Country { name: csvLine.country })
CREATE (movie:Movie { id: toInteger(csvLine.id), title: csvLine.title, year: toInteger(csvLine.year) })
CREATE (movie)-[:MADE_IN]->(country);

Explanation of the Query

  1. LOAD CSV WITH HEADERS: This line tells Neo4j to load the CSV file from the specified location. The WITH HEADERS option indicates that the first row of the CSV file contains column names.

  2. MERGE Clause: This ensures that a node with the label Country and a property name set to the value of csvLine.country exists. If such a node doesn't exist, it will be created; if it does exist, it will be reused.

  3. CREATE Clause: A new node with the label Movie is created. The properties of this node (id, title, and year) are set based on the values from the CSV file.

  4. Relationship: A relationship of type MADE_IN is created between the movie node and the country node, indicating the country where the movie was made.


Running the Query

Execute the above Cypher query in the Neo4j browser. If the query executes successfully, it will load the movie data into the database, create nodes for each movie and country, and establish the appropriate relationships.


Verifying the Inserted Data

To verify that the data has been inserted correctly, run the following match query:

MATCH (m:Movie)-[:MADE_IN]->(c:Country) RETURN m, c;

This query will return all movie nodes and their associated country nodes. For example, you should see three movie nodes connected to the respective country nodes.


Loading Data in Docker

If you're running Neo4j in a Docker container, the process is slightly different. Below is a Docker Compose file that sets up Neo4j with the necessary configurations to load CSV data:


Docker Compose File

version: '3.8'

services:
  neo4j:
    image: neo4j:latest
    container_name: neo4j
    ports:
      - "7474:7474"   # HTTP port for Neo4j Browser and API
      - "7687:7687"   # Bolt port for database connections
    environment:
      - NEO4J_AUTH=neo4j/securepassword  # Custom username and password
      - NEO4J_apoc_import_file_enabled=true
      - NEO4J_apoc_import_file_use_neo4j_config=true
    volumes:
      - ./data:/data
      - ./logs:/logs
      - ./import:/var/lib/neo4j/import
      - ./plugins:/plugins

Running the Docker Container

  1. Create a directory structure with data, logs, import, and plugins folders.

  2. Place your movies.csv file in the import folder.

  3. Start the Neo4j container:

    docker-compose up -d


Loading the CSV Data in Docker

Once the container is running, open the Neo4j browser and execute the same Cypher query to load data from the CSV file:

LOAD CSV WITH HEADERS FROM 'file:///movies.csv' AS csvLine
MERGE (country:Country { name: csvLine.country })
CREATE (movie:Movie { id: toInteger(csvLine.id), title: csvLine.title, year: toInteger(csvLine.year) })
CREATE (movie)-[:MADE_IN]->(country);

Conclusion

In this guide, we learned how to load data from a CSV file into Neo4j using Cypher queries. We also covered how to perform these steps in a Docker environment. By following these steps, you can efficiently manage and analyze your data in Neo4j, leveraging its powerful graph database capabilities.

2 views0 comments

Comments


bottom of page