Continuing from part 1, a typical Cloud Foundry/Bluemix application would consist of a number of microservices implemented as CF Apps composing functionality made available in several CF Services. The issues I saw with this was:
- Each CF Service (hosted in Bluemix) is still publicly routable and therefore must be secured against unintended use. By Service Broker contract each Service must implement basic auth, but because of SSL termination credentials will be in the clear before reaching the application.
- Each of these “private” microservices has the additional burden of implementing the Service Broker REST API. This is so an instance of the Service can be provisioned by Cloud Foundry and bound to an application. It is through binding that service discovery is achieved.
- While the Service Broker API makes sense for provisioning “tangible” assets such as a database or queue and binding its use to an application, it makes little sense for an API that returns a stock quote or a weather forecast.
To address these problems I investigated whether a non-routable CF Application could be used, also known as a Worker. A Worker is simply an Application without a public route (cf push –no-route). This means it isn’t exposed to the public and therefore not forced to secure its endpoint since it does not have one.
In Cloud Foundry the intent of Workers is to accept input from channels other than public HTTP/S requests. For example this could be a Worker that subscribes to a message queue as input. Or a Worker that connects to a huge data repository to begin some map-reduce process. A Worker can really encompass any non-web app.
A Worker can certainly implement a REST interface and accept HTTP/S traffic without a public route assigned. The difficulty becomes one of service discovery. Apps are found by their public routes, Services are found via bindings, but Workers have no native CF discovery mechanism.
How can a CF App acting as a microservice aggregator find a Worker that implements a microservice?
My plan was to use the CF_INSTANCE_ADDR environment variable that each App/Worker can see. This variable simply is the internal IP address and port of the current process (188.8.131.52:5678). As a Worker initializes, it would register its CF_INSTANCE_ADDR with a custom registry (to be provided) along with its name. That way an App would query the registry for the microservice name, obtain an internal IP address and port and proceed with REST communications without requests being routed via the security gateway and router (of course this gets a bit more complicated when each worker is scaled).
I set up a simple test where one App would initiate an HTTP request to a Worker when provided that Worker’s CF_INSTANCE_ADDR.
This worked. Most of the time. Annoyingly there were times where despite repeated attempts there was no way an App would connect to a particular Worker. Other Workers could be found, but sometimes a particular Worker was not reachable. Why was that?
After some investigation I realized that Security Groups are the culprit. A Cloud Foundry Application Security Group (ASG) is really a terrible name for what is essentially a network firewall egress rule. CF only allows Apps to make outbound requests that pass all its ASG rules.
cf security-group bluemix_cf_api_3
These system defined ASGs each refer to an address range and list of protocols. Any Worker that was instantiated with an address in one of these many allowable ranges could be talked to by other apps subjected to the same rules. However occasionally a Worker would be initialized with an internal IP address that was not covered by the ASGs. In such cases there was nothing wrong with the Worker it’s just that the other application containers were prevented from communicating with it. Unfortunately only CF administrators may update or add Security Groups.
Because of this limitation, using Workers as private microservice implementations (at least via HTTP/S) was a bust [I wonder if AMQP could be used to route HTTP traffic to/from Workers where once unwrapped of its AMQP envelope standard HTTP request/response treatment could be applied?].
In part 3 I investigate whether it is possible to delegate all microservice security concerns to gateway such as IBM API Management.