Migration of content
IMPORTANT INFORMATION
This section describes the migration of data from the legacy website and its database to the new website. It presumes that a legacy Yii2-based website and the related database with the exact same structure exist in an exabyte cPanel setup in a way the KL Hash House Harriettes website was hosted before.
Also note, that the Python scripts used for migration are fixing some migration issues either on a low level (search/replace text in database fields) or outside the code (Excel sheet to streamline runsite locations)
Before starting the Python scripts, make sure that the respective database connection parameters and URLs are properly set. This is done through config.yml or config_docker.yml files respectively. The Python code can either be run natively on OS level or in a docker container, in the .env file of the project folder the DOCKER=True can be set to True or False selecting the mode.
Pre-requisites to run the data migration:
- An existing database on exabyte (connection parameters can be found in the management console of the cPanel environment)
- Exabyte enforces the use of dedicated IP addresses to access the database (there is a settings screen under database management in the cPanel management console). Hence, a dedicated server with a static IP address is required (e.g., a Tailscale exit node).
- Database parameters for the new database (migration target) are properly set
- The Python code uses the Strapi APIs, the base URL for the API endpoints must be properly set
- Make sure that for all tables below the API permissions are set for the public user, the Python code does not use authentication
- The cleared location data in an Excel table format (see below)
Run Python migration scripts
- members.py: Extract, transform and load members to the new database. The table content for
tshirt_mapwill be created automatically when required. - hareline.py: Extract hareline table, transforms and loads into runs table of the new database.
- locations.py: Load hareline and extract locations from it. Use the migrated locations (see below) and load it to new table. Attach a location to each run.
- runrecords: Extract, transform and load the runrecords
Manual Works after migration
- Remove all public access rights on the APIs
- Check all runs after creation of migrated locations table and adjust manually
- Manually set the locations for
run_no2000-2003 (these are exceptions, they were created with a wrong run number in the legacy database) - Create content of committee table
- Make the changes to data identified in the Excel File manual_assignments.xlsx (this is only operational data changes after migration)
Location migration
A hash run starts at a runsite. In the legacy database the data was organized in a way that every run (stored in a table named hareline) had its individual location attached. The runs however often happen at the same runsite hence the decision was taken to normalize the runsite data and move it as locations to their own table.
For this purpose all location data (which was partially only available in files) was extracted into a single pandas DataFrame. Subsequently, the geo-coordinates were compared one against each other based on proximity (exclusion parameter = 500 m). The output was written into a CSV file and loaded into an Excel file using Power Query. Technical, and finally manual cleansing was applied to get to a duplicate free but complete (runsites can be quite close together as well) location database of ~120 entries (out of 400+) runs.
In a reverse process all locations from the now normalized locations table were re-assigned back to the runs
Every runsite has so-called directions, which verbally describe the route to the respective runsite. These directions were also attached to a single run in the legacy solution and are - even for the same location - not always identical. In order not to lose this potentially valuable information, all directions text blocks were extracted, and the newest one was attached to the newly created normalized location. All other text-blocks were connected to their original locations, stored in a table called alternative directions in the Strapi database. This table will outdate naturally and can and should be removed after some time.
Synchronization script
To ease the cut-over process from the legacy website to the new website, the server installation and data migration for the new website were done about 2 months before the cut-over itself and both solutions ran (technically) in parallel. During this period data had to be synchronized between the legacy database and the Strapi backend.
For this purpose a Python script (update_all.py) was written and run on a daily basis using a cron. The script identifies changes in the members, hareline and runrecords table and synchronizes them automatically. For changes to the location of a run in the legacy database it maps all locations using their geo-coordinates and if a location is found within 100 m distance assigns it automatically. If no close enough location is found, the script triggers an email to inform the admin about the action to be taken.
