Troubleshooting
This is not a comprehensive list of possible issues which can be encountered and even less so for the possible solves. It is only based on what I have experienced.
If you are a user of the application and face an issue you cannot sort out either because it is not listed below or because you don’t have admin permission, please contact Yannick Donnadieu (donnadieu@cerege.fr), Sebastien Nguyen (sebastien.nguyen@lsce.ipsl.fr) or report an issue on GitHub.
User
Modifications done in Panel steps are not validating upon save
When save button for one of these steps (internal oceans or diffusive passages) user is redirected to main application page but bathymetry/topography modifications are not saved. For some (unknown) reasons this issue is encountered with the Firefox browser.
Switch to Google Chrome or Safari.
Admin
Volume failure
After a stack redeployment you can get a Failure volume error.
Failure volume "instance_storage" ...
Keep in mind this will remove the database, all the users of the app will loose their data.
Navigate to Volumes:
Search through for
climsim_instance_storage
if it exists check it and delete it:Nginx error
After a stack redeployment you can get a nginx error when trying to reach the app.
It has been observed that containers can get into inconsistent states and need restarting. This has been observed for the nginx
container (noteablly when containers are redeployed in different passes of watchtower, nginx builds quicker than other images and it may occur that not all images are redeployed at the same time) and the panel
container (when a error occurs, the container remains useable for new websites but provides strange results).
Containers are not redeployed automatically
It might happen that some containers are not redeployed when new images are pushed to the osupytheas regisry (registry.osupytheas.fr). If this occurs, Watchtower is probably the cause. Here is a non-exhaustive list of possible problems:
- Watchtower container is not running: It appears this container stop sometimes randomly.
Solving:
Redeploying the stackWatchtower container will be redeployed and at the same time, all the other containers will be redeployed using the latest image.
- Watchtower is not looking at the right container names: It might happens that overtime, Portainer/Docker update can alter the way the containers are named.
Solving:
Access to the docker-compose (step 3 of Redeploying the Stack). Check inside thewatchtower-climsim
section thecommand
line:
All the names provided here have to match the ones of the containers deployed in Containers section (step 6 of Get access to the containers) below: If it is not the case, modify thecommand
line forwatchtower-climsim
section in the docker-compose consequently.
Regrid or Routing or MOSAIC or MOSAIX step is not validating upon save
It might happen one of this step status remains red even after saving it.
The web application page is being refreshed every 10 seconds. When saving a step, sometimes the main app page is reached just before the process starts or completes. In this case the indicator status will remain red for 10 secondes before turning blue or green. Just wait at least for 10 seconds to make sure it updates.
If it doesn’t, check below.
If a file is not provided in the right format or for some other various reasons a process can be stuck in the python or MOSAC/MOSAIX queue. In this case, this specific process won't complete and all the next launched processes for all users will remain blocked and won't process either.
To solve this, access the console of the
climsim_message_broker_1
container and check if there is any process showing (see Process queue management).Processes should generally be treated quickly (few seconds) so if you see them in the queue after this amount of time, it is likely they are stuck. In this case, you can either:
rabbitmqadmin get queue=[QUEUE_NAME] ackmode=ack_requeue_false
rabbitmqadmin delete queue name=[QUEUE_NAME]
The app isn’t working anymore and you want to deploy an old but functionnal version instead
It might happen with time, the applications can stop working because of package version that could change and break up everything. In this case you may want to deploy an older version of the applications which are working, the time that you can fix the issue.
You will need first to identify the images version which are functionnal on the (Osupytheas registry). You can associate the version of an image with its tag which is the GitHub commit number. Generally you will want to take images with the same tag to avoid any issues.
Once you have the tag (GitHub commit number), connect to Portainer and reach the docker-compose (Step 3 of Redeploying the Stack) of the application you are interested in (either climsim = Multi Page or netcdf = Single Page).
Modify it by adding the
:$(tag)
at the end of the images:
line:
Normally you will add it only for following images:
but not for:
Then you can Redeploy the stack (step 4 of Redeploying the Stack)
:$(tag)
for the concerned images:
lines in the docker-compose.