Hi there, I'm Allen! I'm building intelligent systems to power a
more sustainable world at
Nectar. We apply AI advancements to disrupt the way companies buy energy for their operations.
I grew up in a quiet town in Massachusetts, where I studied math and chemistry. At MIT, I discovered a passion for building software. I owe the positive life experiences over the years to serendipitous friendships and whimsical side-quests. I'm grateful to have the rare opportunity to work on hard problems with super talented people.
Nectar
Nectar is an AI climate-tech company that's building the future of AI infrastructure sustainably. We believe that for AI to reach massive adoption across all industries globally, the following must be accomplished:
-
Easy access to cheap and clean energy to power datacenters and emissive companies
-
Agent-friendly web infrastructure that allows web-agents to iterate at scale
-
Feature extraction from unstructured pdfs, images, and html with 99.999% accuracy
At Nectar, we're building (1) externally and (2) + (3) internally. Our main product monitors companies' energy usage and procures electricity at the best price for their needs. To solve this problem at scale, we built scrapers to download terrabytes of data and data pipelines to pull that into one organized schema. We aspire to one day open source our internal products to help the community build more sustainable AI.
Example of some things that we've built:
-
Text forwarding service on a raspberry pi to allow access to MFA / 2FA protected websites. We tried Google Voice, Twilio, and a few other solutions; and we realized that to do it right, we had to do it ourselves.
-
Self-hosted infrastructure to scrape websites in headed mode, building on top of Browserbase and its competitors. Our scrapers run from Mac minis in our office proxied through residential IP addresses.
-
99% field-level accuracy on structured data extraction from unstructured documents, images, and html. We use temporal for orchestration and workflow management and leverage Reducto for OCR.
However, we still have a long way to go. Most notably, 99% isn't good enough for our product: each document needs around 1000 JSON fields to be extracted, so to acheive 99% perfection at a document level, we require 99.999% field-level accuracy (assuming a uniform distribution of errors). Unfortunately, existing AI tools often prioritize latency over accuracy because highly valued AI companies like coding or customer service agents can tolerate occasional hallucinations and need immediate responses. Our product requires more 9's in the SLA and we can afford to wait on more LLM calls. So, most existing frameworks don't work for us, and we end up building a lot in-house. Some of the problems we're working on right now:
-
Prompt design abstractions: We spend some time thinking about how to make prompt design more modular, prioritizing reusability and testability. Sometimes slightly different tasks only require small changes to the prompt, and we'd like to keep things DRY. Frameworks like Langchain end up abstracting away too much of the design process, making debugging and testing more difficult. Cursor's priompt is a strong move in the right direction. However, our use case differs from Cursor's priompt because we're much more accuracy-sensitive. When the context window is limiting, priompt truncates the input to fit into the 200k context window. Truncation risks losing critical context, so we have to either divide and conquer or compress the context losslessly. How can we iterate on priompt for 99.999% SLA applications?
-
Long list extraction: Imagine trying to extract a list of unique people mentioned in a 200-paged magazine. When the extracted list is long, we face challenges in LLMs becoming lazy or duplicating data it's already seen. There are tricky logical duplicates that we can't easily handle with simple deduplication (e.g. "Barack Obama's daughter's mother", and "the first Lady", and "Mrs. Obama" are considered the same person, but if Mrs. Obama actually means Barack Obama's mother, then it's considered different). Even worse, when we divide list extraction into multiple steps, often we lose the thoughts / reasoning in the earlier steps and struggle to retain that context. What tools do we need to give LLMs to help with list extraction and deduplication? Will a scratchpad for writing thoughts suffice?
-
Scalable scraper infrastructure: We have a large number of scrapers that we use to download data from the internet. We need to make sure that we can scale these scrapers to terabytes of data per day, avoid rate limiting, solve recaptcha, and more. Reworkd, Browserbase, and Brightdata have made good progress, but we still need to build a lot of infrastructure to make sure that we can scale to our needs.
-
Schema design: One unique challenge of our product is that we track relationships between objects that are time-dependent. E.g. a building in our database may be associated with customer number 123 in 2024 but can become associated with customer 456 in 2025. How can we track and query these relationships with time as new dimension? For utility companies in particular, sometimes there are multiple ids (your account number vs your meter number, etc.) that are associated with an account, and we need to build in the flexibility to allow our customers to query by their id of choice. How can we build a schema that scales and is flexible enough to serve our customers' needs while trying to follow principles like 3NF?
-
Testing infrastructure: Most ML tools like Weights and Biases are designed for model testing when training costs dominate the testing costs. For AI applications building, testing cost ends up dominating (costs between $0.20 - $2.00 per document). Caching helps a little, but we still can't run our data pipeline on 1000s of documents for each PR just to test a new approach. At the same time, we need to make sure that when we improve the model for one type of document, performance doesn't regress on other documents. One interesting proposal was to build a large test set of documents and tag which functions / prompts in the codebase are responsible for the performance of the pipeline on the document. Then on each push, we sample from the list of documents that would be affected. Obviously, this still feels quite expensive to maintain, but it's a start. What tools should we build for efficient testing?
If you have interesting ideas on these problems, or think you'd have a blast working on them, you should consider joining our team full-time. We're a talent-dense group of researchers, engineers, and hackers based in SF. If you'd like to chat, please reach out to me at allen [at] nectarclimate [dot] com.