Troubleshooting

4 minute read

This is not a comprehensive list of possible issues which can be encountered and even less so for the possible solves. It is only based on what I have experienced.

The following issues are splitted into two sections indicating the level of permission required to solve it:
  • Admin (you will need admin permission to access the Portainer interface)
  • User (you just need to have a user account on the application)
  • If you are a user of the application and face an issue you cannot sort out either because it is not listed below or because you don’t have admin permission, please contact Yannick Donnadieu (donnadieu@cerege.fr), Sebastien Nguyen (sebastien.nguyen@lsce.ipsl.fr) or report an issue on GitHub.

    User

    Modifications done in Panel steps are not validating upon save

    When save button for one of these steps (internal oceans or diffusive passages) user is redirected to main application page but bathymetry/topography modifications are not saved. For some (unknown) reasons this issue is encountered with the Firefox browser.

    Solving:
    Switch to Google Chrome or Safari.

    Admin

    Volume failure

    After a stack redeployment you can get a Failure volume error.

    Only do this if you get the following error:
    Failure volume "instance_storage" ...
    Keep in mind this will remove the database, all the users of the app will loose their data.
    Solving:
    Navigate to Volumes:


    Search through for climsim_instance_storage if it exists check it and delete it:

    Nginx error

    After a stack redeployment you can get a nginx error when trying to reach the app.

    It has been observed that containers can get into inconsistent states and need restarting. This has been observed for the nginx container (noteablly when containers are redeployed in different passes of watchtower, nginx builds quicker than other images and it may occur that not all images are redeployed at the same time) and the panel container (when a error occurs, the container remains useable for new websites but provides strange results).

    Solving:
    Restart nginx container first and then Panel container (see Managing containers)

    Containers are not redeployed automatically

    It might happen that some containers are not redeployed when new images are pushed to the osupytheas regisry (registry.osupytheas.fr). If this occurs, Watchtower is probably the cause. Here is a non-exhaustive list of possible problems:

    • Watchtower container is not running: It appears this container stop sometimes randomly.

      Watchtower container will be redeployed and at the same time, all the other containers will be redeployed using the latest image.

    • Watchtower is not looking at the right container names: It might happens that overtime, Portainer/Docker update can alter the way the containers are named.
      Solving:
      Access to the docker-compose (step 3 of Redeploying the Stack). Check inside the watchtower-climsim section the command line:

      All the names provided here have to match the ones of the containers deployed in Containers section (step 6 of Get access to the containers) below:

      If it is not the case, modify the command line for watchtower-climsim section in the docker-compose consequently.

    Regrid or Routing or MOSAIC or MOSAIX step is not validating upon save

    It might happen one of this step status remains red even after saving it.

    The web application page is being refreshed every 10 seconds. When saving a step, sometimes the main app page is reached just before the process starts or completes. In this case the indicator status will remain red for 10 secondes before turning blue or green. Just wait at least for 10 seconds to make sure it updates.

    If it doesn’t, check below.

    Solving:
    If a file is not provided in the right format or for some other various reasons a process can be stuck in the python or MOSAC/MOSAIX queue. In this case, this specific process won't complete and all the next launched processes for all users will remain blocked and won't process either.
    To solve this, access the console of the climsim_message_broker_1 container and check if there is any process showing (see Process queue management).
    Processes should generally be treated quickly (few seconds) so if you see them in the queue after this amount of time, it is likely they are stuck. In this case, you can either:
  • delete the first process pending (potentially blocking the other processes):
    rabbitmqadmin get queue=[QUEUE_NAME] ackmode=ack_requeue_false
  • or delete all the processes by using:
    rabbitmqadmin delete queue name=[QUEUE_NAME]
  • The app isn’t working anymore and you want to deploy an old but functionnal version instead

    It might happen with time, the applications can stop working because of package version that could change and break up everything. In this case you may want to deploy an older version of the applications which are working, the time that you can fix the issue.

    Solving:
    You will need first to identify the images version which are functionnal on the (Osupytheas registry). You can associate the version of an image with its tag which is the GitHub commit number. Generally you will want to take images with the same tag to avoid any issues.

    Once you have the tag (GitHub commit number), connect to Portainer and reach the docker-compose (Step 3 of Redeploying the Stack) of the application you are interested in (either climsim = Multi Page or netcdf = Single Page).

    Modify it by adding the :$(tag) at the end of the images: line:

    Normally you will add it only for following images:
  • nginx
  • message_dispatcher
  • python_workerS (there are several of them)
  • flask_app
  • panel_app
  • mosaic_worker
  • mosaix_worker

  • but not for:
  • message broker
  • watchtower-climsim

  • Then you can Redeploy the stack (step 4 of Redeploying the Stack)
    If you want Portainer to redeploy the latest images available, just remove the :$(tag) for the concerned images: lines in the docker-compose.

    Updated: