Allen Wang

Nectar

Nectar is an AI climate-tech company that's building the future of AI infrastructure sustainably. We believe that for AI to reach massive adoption across all industries globally, the following must be accomplished:

Easy access to cheap and clean energy to power datacenters and emissive companies
Agent-friendly web infrastructure that allows web-agents to iterate at scale
Feature extraction from unstructured pdfs, images, and html with 99.999% accuracy

At Nectar, we're building (1) externally and (2) + (3) internally. Our main product monitors companies' energy usage and procures electricity at the best price for their needs. To solve this problem at scale, we built scrapers to download terabytes of data. This data, like most data on the internet, is unstructured: think pdfs, websites, emails; so we built data pipelines to pull that into one organized schema. We aspire to one day open source our internal products to help the community build more sustainable AI.

Example of some things that we've built:

Text forwarding service on a raspberry pi to allow our service to maintain ongoing access to MFA / 2FA protected websites. We tried Google Voice, Twilio, and a few other solutions; and we realized that to do it right, we had to do it ourselves.
Self-hosted infrastructure to scrape websites in headed mode, building on top of Browserbase and its competitors. Our scrapers run from Mac minis in our office proxied through residential IP addresses.
99% field-level accuracy on structured data extraction from unstructured documents, images, and html. We use temporal for orchestration and workflow management and leverage Reducto for OCR.

However, we still have a long way to go. Most notably, 99% isn't good enough for our product: each document needs around 1000 JSON fields to be extracted, so to perfectly parse a document 99% of the time (i.e. be correct on all 1000 fields), we require 99.999% field-level accuracy (assuming a uniform distribution of errors). Unfortunately, existing AI tools often prioritize latency over accuracy because highly valued AI companies like coding or customer service agents can tolerate occasional hallucinations and need immediate responses. Our product requires more 9's in the SLA and we can afford to wait on more LLM calls. So, most existing frameworks don't work for us, and we end up building a lot in-house. Some of the problems we're working on right now:

Prompt design abstractions: We're thinking about how to make prompt design more modular, prioritizing reusability and testability. Sometimes slightly different tasks only require small changes to the prompt, and we'd like to keep things DRY. Frameworks like Langchain end up abstracting away too much of the design process, making debugging and testing more difficult. Cursor's priompt is a strong move in the right direction. However, our use case differs from Cursor's priompt because we're much more accuracy-sensitive. When the context window is limiting, priompt truncates the input to fit into the 200k context window. Truncation risks losing critical context, so we have to either divide and conquer or compress the context losslessly. How can we iterate on priompt for 99.999% SLA applications?
Long list extraction: Imagine trying to extract a list of unique people mentioned in a 200-paged magazine. When the extracted list is long, we face challenges in LLMs becoming lazy or duplicating data it's already seen. There are tricky logical duplicates that we can't easily handle with simple deduplication (e.g. "Barack Obama's daughter's mother", and "the first Lady", and "Mrs. Obama" are considered the same person, but if Mrs. Obama actually means Barack Obama's mother, then it's considered different). Even worse, when we divide list extraction into multiple steps, often we lose the thoughts / reasoning in the earlier steps and struggle to retain that context. What tools do we need to give LLMs to help with list extraction and deduplication? Will a scratchpad for writing thoughts suffice?
Scalable scraper infrastructure: We have a large number of scrapers that we use to download data from the internet. We need to make sure that we can scale these scrapers to terabytes of data per day, avoid rate limiting, solve recaptcha, and more. Reworkd, Browserbase, and Brightdata have made good progress, but we still need to build a lot of infrastructure to make sure that we can scale to our needs.
Schema design: One unique challenge of our product is that we track relationships between objects that are time-dependent. E.g. a building in our database may be associated with customer number 123 in 2024 but can become associated with customer 456 in 2025. How can we track and query these relationships with time as a new dimension? For utility companies in particular, sometimes there are multiple ids (your account number vs your meter number, etc.) that are associated with an account, and we need to build in the flexibility to allow our customers to query by their id of choice. How can we build a schema that scales and is flexible enough to serve our customers' needs while trying to follow principles like 3NF?
Testing infrastructure: Most ML tools like Weights and Biases are designed for model testing when training costs dominate the testing costs. For AI applications building, testing cost ends up dominating (costs between $0.20 - $2.00 per document). Caching helps a little, but we still can't run our data pipeline on 1000s of documents for each PR just to test a new approach. At the same time, we need to make sure that when we improve the model for one type of document, performance doesn't regress on other documents. One interesting proposal is to build a large test set of documents and tag which functions / prompts in the codebase are responsible for the performance of the pipeline on the document. Then on each push, we sample from the list of documents that would be affected. Obviously, this still feels quite expensive to maintain, but it's a start. What tools should we build for efficient testing?

If you have interesting ideas for these problems, or think you'd have a blast working on them, you should consider joining our team full-time. We're a talent-dense group of researchers, engineers, and hackers based in SF. If you'd like to chat, please reach out to me at allen [at] nectarclimate [dot] com.

Previously

Helped close Scale AI's first customer xM+ ARR for catalog scraping in 2021
Built Dialogic, a platform used by 1000+ teachers today for tracking student discussions and contributions
Published Relationship between Mullineux involution and the generalized regularization in European Journal of Combinatorics
Designed cipher-decryption methods and benchmarked cache oblivious indexing strategies
Led the dev team at HackMIT and helped build the initial versions their judging and registration systems
Organized a math competition and participated in USAMO, USNCO, USAMTS, etc.; and later, built Autocomp to host online contests

Perspectives

I'm generally interested in energy, security, and AI. There are days where I'm super techno-optimistic and other days where I feel like we have much more work to do. Some of my core ideas are below.

Energy: We do not have enough energy on Earth to sustain our current lifestyles or our AI-powered future. More AI means more energy demand. Better solar cells and better incentives mean clean energy is (for the most part) cheaper. Why are companies not buying cleaner and cheaper energy? They need tools to help them find the best deals. This is the problem we're solving at Nectar.
Security: The future of defense will be in system security. Zero-day exploits will become easier to find. IDF uses Pegasus to read your iMessages; the CIA composed four zero-days to sabotage Iran's nuclear weapons program under Barack Obama; and recently, China hacked the US Department of Treasury. AI empowers hackers and foreign enemies to discover vulnerabilities more cheaply and scalably. With AI coding, more vulnerable code is being written faster than ever. Stronger security frameworks are required to protect our national security and protect our civil liberties from domestic intelligence agencies.
AI: We are still in the early days of AI. We mastered the theory of "small models" in classical ML and burned trillions of dollars experimenting with large models. However, we have hand-wavy explanations at best for the inner workings of large models and why overparameterization is so effective (see double descent curve). The amazing LLMs we see today lack the theoretical underpinnings to explain their behavior, but that's how humanity has always worked—we built bridges long before civil engineering was a field. Even after the industrial revolution, we needed centuries of physics from Einstein, Farraday, Fermi, Oppenheimer, Bohr, Heisenberg, etc. to develop the foundational knowledge to build skyscrapers and landing on the moon. The LLMs of today are the beautiful bridges that have taken us already far. It is difficult to imagine what the skyscrapers of ML will be after an Einstein-like breakthrough. My intuition is that we'll need a stronger foundation in information geometry to get there.