Labii electronic lab notebook (ELN) and laboratory information management system (LIMS)

Saturday, May 23, 2020

Why do I move away from zappa serverless?

No comments


There has been a lot of discussion on the use of serverless, especially with Zappa serverless. Based on my own personal experiences, I am here to summarize why I completedly shifted away from Zappa serverless after two years of use.

It is slow

Generally, with the lambda size of 1024MB, the API provided with Zappa is at least 10 times slower than some very basic EC2, for example, t3a.large. Even the t3a.small is faster than the Zappa.

It is difficult to work with SSO

If you want to integrate with SSO in Zappa, you are out of luck.  There are a lot of tedious configurations and it can not guarantee it is going to work. However, if you are using an EC2, the SSO works out of the box.

Here are some of the process I used to configure the Zappa to work with SSO:

  1. build a binary build from AMI
  2. copy `xmlsec1`, `libxmlsec1.so.1`, `libxmlsec1.so.1.2.20` to `/site-packages/lib`
  3. install libxmlsec1-openssl and copy `lbixmlsec1-openssl.so`, `lbixmlsec1-openssl.so.1`, `lbixmlsec1-openssl.so.1.20` to `/site-packages/lib`
  4. set `xmlsec_binary` in accounts/views.py to `"/var/task/lib/xmlsec1"`
  5. add `if modname != "saml2.extension.__pycache__":` to line 90 of the /site-packages/saml2/mdstore.py
  6. if metadata can not download, remove `public subnet` from zappa_settings
  7. if use slim_handler=true, add this code to zappa/core.py at line 408 to copy the `lib`. This need to be done whenever zappa is updated.
# code for step 5
def load_extensions():
    from saml2 import extension
    import pkgutil
    package = extension
    prefix = package.__name__ + "."
    ext_map = {}
    for importer, modname, ispkg in pkgutil.iter_modules(package.__path__,
                                                         prefix):
        module = __import__(modname, fromlist="dummy")
        if modname != "saml2.extension.__pycache__":
            ext_map[module.NAMESPACE] = module
# code for step 7
copytree(os.path.join(current_site_packages_dir, "lib"), os.path.join(venv_site_packages_dir, "lib"))

def create_handler_venv(self):
    """
    Takes the installed zappa and brings it into a fresh virtualenv-like folder. All dependencies are then downloaded.
    """
    import subprocess
    # We will need the currenv venv to pull Zappa from
    current_venv = self.get_current_venv()
    # Make a new folder for the handler packages
    ve_path = os.path.join(os.getcwd(), 'handler_venv')
    if os.sys.platform == 'win32':
        current_site_packages_dir = os.path.join(current_venv, 'Lib', 'site-packages')
        venv_site_packages_dir = os.path.join(ve_path, 'Lib', 'site-packages')
    else:
        current_site_packages_dir = os.path.join(current_venv, 'lib', get_venv_from_python_version(), 'site-packages')
        venv_site_packages_dir = os.path.join(ve_path, 'lib', get_venv_from_python_version(), 'site-packages')
        copytree(os.path.join(current_site_packages_dir, "lib"), os.path.join(venv_site_packages_dir, "lib"))

It could not load big sized data

There is a limit on the size of json the API can return with the Zappa serverless. This size might related to the lambda memory size you defined, but I have not tested. If you want your API to be 100% working even when querying a lot data, Zappa serverless is not for you.

It is difficult to debug

Recently I have problem to read the SSM with Zappa. It works well when you just deployed, but it will failed after 4 minutes, when a new session started. It looks like there are some consistancy problem for different sessions.

It is not cheap

Based on my calculation, the pricing of Zappa at the 1024M is similar to the a EC2 instance of t3a.large running 24 hours, at the pricing of RI. Almost no momey is saved.

Yonggan Wu

No comments :

Post a Comment