At aioneers, we create a lot of dashboards and do lots of machine learning tasks. Some of those analyses are done regularly. To deliver these dashboards and machine learning, we need to do a lot of data transformation, like cleaning data and joining tables. To automate data transformation, we use Databricks clusters to run Python and, sometimes, R scripts. Databricks allows us to automatically run data transformation scripts on schedule, but it is missing one essential feature: email notification on how the data was loaded.
Currently, it can only send email notification if the scripts had errors. Or we could look in the logs and see how the data transformation was executed. But it is not possible to send some specific information from the data transformation process via email. In search of a solution that could do our custom email notifications, we looked at different available solutions.
Different Solutions for Email Notifications
We use Azure cloud and Office 365 Outlook, so naturally we looked at some solutions provided in this environment. We also discussed some of options outside of the environment that we use.
The options that we evaluated:
- MS Outlook using Azure Graph API
- MS Outlook using SMTP or IMAP
- Azure SendGrid
Factors that we took into account
We took into account the following factors:
- The emails have “from” field in our domain (email@example.com)
- The emails are not disappearing into Spam folders
- General solution that works also outside of Azure/MS Office
- Removal of old APIs to send emails in the future
- Data privacy
In the table below, we're providing an overview of how the individual solutions compare to our criteria. We will further elaborate on the requirements below the table.
From our domain
Sending emails from our domain is more about status and trust than about functionality. Here, Gmail performs the worst.
Disappear in Spam
If the emails are disappearing into a spam folder, then we cannot use this solution. We had the experience that if we send emails from SendGrid and used our @aioneers.com emails, Outlook understands that it did not send these emails and automatically deletes the incoming emails without a trace. Of course, we could change the domain to differentiate; then Outlook will show the email, but with a warning that the email came from a different domain than is stated in the “From” field. Again, this is a trust issue and having a warning next to our emails do not make them look professional and trustworthy.
We wanted our solution to be general (independent from the technologies we use (Azure/Office 365)). Here, only the Outlook Graph API solution could not be used in other environments or clouds. This would only be important when our startup would be sold to another company with a different environment. According to our experience, many German companies (where we mostly operate) are using Azure/Office.
APIs will be removed in the future
We found out that some APIs are deprecated. Microsoft has already tried to remove IMAP support for Office 365 in 2019. Microsoft currently deprecates all services that use Basic Authentication because Microsoft thinks that Basic Authentication is not secure. This includes IMAP and SMTP protocols, and Microsoft recommends to use other protocols to access email. All other protocols are specific to Microsoft and could not be used in different environments/clouds.
Data privacy (data location)
Our Outlook servers are situated in the region that our company chose, and this complies with the data protection rules of the EU. If we send emails to companies in the EU, then it is perfect. But if we want to change the region due to our customer locations, it will be costly or complicated. SendGrid is more flexible and allows to use different Azure regions to send emails. As a free service, Gmail only stores data in the USA, which does not comply with data privacy rules.
Gmail is an entirely free solution. When we use Office 365, we need to have a user to send emails from, which means that we need to pay one additional Office 365 license to send email notifications. SendGrid was created to send emails en masse; therefore, the costs for sending the emails for our case are very low.
Gmail is a free service and, therefore, does not provide any support. All other services in the table provide support and can help us solve issues with the email service.
As we can read from the table: there is no perfect solution. Because of data privacy, we could not accept the Gmail solution. An API that will not work in the future means that the code needs to be rewritten in the future. This means additional costs in the future to change our email notification system. Emails that are not trusted because they come with a warning in Outlook are not good when working with customers. Even internally, receiving emails with a strange warning is not good.
Finally, we decided to use Outlook via Graph API because it satisfies all the other requirements even though the solution is not general.