Never write your own date parsing library Hackernews Viewer

Never write your own date parsing library

210 points by ulrischa 19 hours ago | 255 comments

Comments

quelsolaar 16 hours ago

When ever i see "never implement your own...", i know i want to implement it myself. People say that about hard things, and I only want to do hard things. Nobody wants people who can do easy things, people want people who can do hard things. The only way to learn how to do hard things, is to do hard things, so do the hardest things.

So go ahead, write your own date library, your own Unicode font rendering, compiler, OS, game engine or what ever else people tell you to never do because its hard.

davidw 18 hours ago

It's like that joke someone posted on Twitter: "I was in favor of space exploration until I realized what it would mean for date/time libraries"

jillesvangurp 5 hours ago

IMHO, ISO 8601 as a standard is way too broad and unspecific. ISO 8601 is way too messy. Telling somebody that they need to parse an ISO 8601 date time is not enough information to do the job. Which variant is it? Does it include the time part. IMHO allowing the full range of ISO 8601 dates and times in a data format is usually a mistake. You want to be more specific.

There's a need for a standard that locks down the commonly used variants of it and gets rid of all the ambiguity.

For me, timestamps following this pattern 'YYYY-MM-DDThh:mm:ss.xxxxxZ' is all that I use and all my APIs will accept/produce. It's nice that other legacy systems are available that don't normalize their timestamps to UTC for whatever reason, that consider seconds and fractions of a second optional, etc. But for unambiguous timestamps, all I want is this. It's fairly easy to write a parser for it. A simple regular expression will do the job. Of course add unit tests. Technically the Z is redundant information if we can all agree to normalize to UTC. Which IMHO we should. In the same way the T part and separators are redundant too. But they are nice for human readability.

You can use datetime libraries that are widely available to localize timestamps as needed in whatever way is required locally. But timestamps should get stored and transmitted in a normalized and 100% unambiguous way.

It's only when you get into supporting all the pointless and highly ambiguous but valid variants of ISO 8601 that parsing becomes a problem. There's actually no such thing as a parser that can parse all valid variants with no knowledge of which variant is being used. There are lots of libraries with complex APIs that support some or all of the major and minor variants of course. But not with just one function called parse().

I think the main challenge with ISO 8601 is that it never called out this variant as a separate thing that you should be using. This really should be its own standard. And not using that would be a mistake. ISO 8601 is what happens when you do design by committee.

zelphirkalt 48 minutes ago

The table of listed date formats doesn't look too difficult to implement. A quick look at the RFC tells me, that the RFC even specifies a grammar, though very incomplete. It would be prudent to specify a complete grammar in the RFC of course. Then it would be even simpler taking that grammar and translating it to whatever library one uses for describing grammars for parsing stuff. I really hope all these libraries didn't make silly things with regexes ...

QuadmasterXLII 19 hours ago

I ran into date heck recently in a medical setting for storing birthdates. Eventually I settled on the idea that a birthdate isn’t a physical time, it’s just a string. We can force the user to enter it in the format 02/18/1993 leading zeroes and all, and operations on it other than string equality are invalid. We’ll see if this survives contact with the enemy but it’s already going better than storing and reasoning about it as a point or interval in time and people’s birthdays changing when they move timezones.

takinola 17 hours ago

No other programming concept has caused me more grief than dealing with time and timezones. It starts to get really mind-bendingly complex once you start thinking about it deeply. That is even before you start encountering the quirks (some places have timezone changes that depend not only on the time of year but also on the actual year). Lesson learnt - choose a library (moment is great) and never think about time again.

FigurativeVoid 18 hours ago

I used to work at a company that stored all dates as ints in a YYYYMMDD format. When I asked why, I was told it was so we could subtract 2 dates to get the difference.

I asked them why they couldn’t use DATEDIFF since this was in a sql db.

They said they hadn’t heard of it and that it must be new.

neilv 16 hours ago

This article doesn't get into some of the special fun of ISO 8601, including relative values, non-Gregorian values, durations...

Some of the things in the standard are surprising, like maybe were a special request. At the time, I commented, something like, Somewhere, in the French countryside, there is a person who runs an old family vineyard, that is still stamping their barrels with the timepoint information [...]. And that person's lover was on the ISO 8601 committee.

(I once wrote an time library in Scheme that supported everything in ISO 8601. It did parsing, representation, printing, calendar conversion, and arithmetic. Including arithmetic for mixed precision and for relative values. It was an exercise in really solving the problem the first time, for a core library, rather than cascading kludges and API breakage later. I don't recall offhand whether I tried to implement arithmetic between different calendar systems, without converting them to the same system.)

bob1029 18 hours ago

I like to use the Japanese calendar as an example to scare the juniors away from DIY parsing:

https://learn.microsoft.com/en-us/dotnet/api/system.globaliz...

https://learn.microsoft.com/en-us/windows/apps/design/global...

fsckboy 7 hours ago

>Consider "200". Is this the year 200? Is this the 200th day of the current year? Surprise, in ISO 8601 it’s neither — it’s a decade, spanning from the year 2000 to the year 2010. And "20" is the century from the year 2000 to the year 2100.

there is so much wrong with this paragraph, it's a nest of people who shouldn't work on date parsing. there is no way 200 is any kind of date, but if you're going to insist it is, 2000 to 2010 is 11 years unless "to" means "up to but not including" in which case it should say 2001 to 2011 if you want to refer to the 200th decade, since decade 1 was 1AD through 10AD...

there is no saving this post

Animats 6 hours ago

I requested an ISO 8601 date parser in the Python "datetime" library in 2012.[1] "datetime" could format into ISO 8601, but not parse strings. There were five ISO 8601 parsers available, all bad. After six years of bikeshedding, it was was fixed in 2018.

That's what it took to not write my own date parsing library.

[1] https://github.com/python/cpython/issues/60077

stevage 1 hour ago

They didn't say whether it would have been feasible to just extract that one function from the library - sort of manually tree-shaking, if you will.

jimmaswell 18 hours ago

moment is far smaller if you include it without locales you don't need.

I don't care how much they talk themselves down on their homepage, begging me to choose a different library - I like it and I'll continue using it.

> We now generally consider Moment to be a legacy project in maintenance mode. It is not dead, but it is indeed done.

> We will not be adding new features or capabilities.

> We will not be changing Moment's API to be immutable.

> We will not be addressing tree shaking or bundle size issues.

> We will not be making any major changes (no version 3).

> We may choose to not fix bugs or behavioral quirks, especially if they are long-standing known issues.

I consider this a strength, not a weakness. I love a library that's "done" so I can just learn it once and not deal with frivolous breaking changes later. Extra bonus that they plan to continue making appropriate maintenance:

> We will address critical security concerns as they arise.

> We will release data updates for Moment-Timezone following IANA time zone database releases.

xaer 6 hours ago

I wrote the ethlo ITU library because I was annoyed with the lack of performance and large amount of ceremony to parse and format standard timestamps in RFC-3339 format in Java. It is somewhat more extensive now, and is used in other libraries. Ask me anything!

danneezhao 8 hours ago

Great writeup! Your journey perfectly captures the universal developer dilemma: "Never roll your own X... until you absolutely must."

The bundle size reductions are impressive (230kB client-side savings!), and your RFC 9557 alignment is a smart forward-looking move. Two questions:

Edge cases: How does your parser handle leap seconds or pre-1582 Julian dates? (e.g., astronomical data) Temporal readiness: Will @11ty/parse-date-strings become a temporary polyfill until Temporal API stabilizes, or a long-term solution? Minor observation: Your comparison table shows Luxon supports YYYY-MM-DD HH (space separator) while RFC 9557 doesn’t – this might break existing Eleventy setups using space-delimited dates. Maybe worth an explicit migration note?

Regardless, fantastic work balancing pragmatism and standards. The web needs more focused libraries like this!

ashoeafoot 4 hours ago

Now i want to make a date format, combined with other data that is the ultimate challenger of date parsing.

IntroDuceThing: The ip:port,date/time,longlat string. Oh, yes its format is also dependant on the language you encode it in and what parts you leave out to be defaulted. .:, is now a valid locationdateip

ozgrakkurt 7 hours ago

Maybe the title should be “it is difficult to write a date parsing library”

“Never write your own x” kind of titles come off as arrogant and demotivating.

Maybe some other person will write an excellent date parsing library that will be better than current ones? Maybe they think it is worth to spend some time on it?

These kinds of hard things tend to have libraries that are extremely bloated because everyone uses one library, and that one library has to work for everyone’s use case.

You can see this in the post too, not everyone needs to be able to parse every single date format.

senfiaj 18 hours ago

In UIs prefer date/time pickers instead of raw text inputs which will give the date/time in standard ISO format such as ("2025-07-25" or "2025-07-25T18:47:26.022Z"). Prefer ISO formats everywhere where possible.

userbinator 10 hours ago

You're right, you should never need a library but just a "sscanf("%04d-%02d-%02d..." ;-)

IMHO needing to handle multiple, possibly obscure, date formats simultaneously is nearly never a problem in practice.

jedberg 18 hours ago

Things you should never do:

Make your own load balancer software

Make firewall software

Make a date parsing library

Attempt to verify an email with a regular expression.

the__alchemist 18 hours ago

Good general rule of thumb, but desperate scenarios call for desperate measures. I would never do this in Python or Rust for example, but it's necessary in Javascript; `Date` and `Moment`, are so full of traps that the ends justify the means: Especially if you have use for a `Date` or `Time` type.

xingwu 8 hours ago

Thank you for sharing.

I like such subtle branding. I will try 11ty when I need a static site generator.

All engineers please follow this example when you want to promote your product, even when you don't want to promote your product.

endoblast 15 hours ago

I'm not even a programmer, but I can tell that dates are ambiguous a lot of the time.

e.g. dd/mm/yyyy (British) and mm/dd/yyyy (USA) can be confused for the first twelve days of every month.

So, given the high volume of international communication, I think we should hand-write months in full, or at least as the first three letters (Jan, Feb, Mar, ..., Dec)

We should also abandon three-letter acronyms (but that's another story).

spankalee 12 hours ago

As for this aside:

> As an aside, this search has made me tempted to ask: do we need to keep Dual publishing packages? I prefer ESM over CJS but maybe just pick one?

Pick ESM. CJS doesn't work in browsers without being transformed to something else. Node can require(esm) now, so it's time to ditch CJS completely.

macintux 14 hours ago

I wrote one in Erlang years ago for Riak’s time series implementation. I don’t remember all of the motivations, but most of all I wanted the ability to encode incomplete date/time objects.

https://github.com/macintux/jam

I’d like to get back to it. If nothing else, I dearly miss using Erlang.

cultureulterior 4 hours ago

Pity they're dropping week number- it's used so much in Europe.

x187463 18 hours ago

Relevant Computerphile: https://www.youtube.com/watch?v=-5wpm-gesOY

champtar 14 hours ago

I once had to maintain a CalDAV server that was developed in house, computing the "free busy" with recurring events, exceptions, different timezone than the organizer + some DST is a bug source that keeps on giving.

wood_spirit 18 hours ago

Yeah don’t do it!

But subtle plug of something I made long ago for when you find your data pipelines are running hot parsing timestamp strings etc: https://github.com/williame/TimeMillis

I’m still pumped by the performance of the thing! :)

deepsun 12 hours ago

Nothing came close in quality to Joda-time from Java (later adopted with small fixes as "java.time" built-in). Why not using js-joda port?

thangalin 18 hours ago

On a slightly related note, here's an algorithm for parsing time from natural inputs into a normalized time:

https://stackoverflow.com/a/49185071/59087

bryanrasmussen 5 hours ago

so, anyway, never do this which I did and have great technical reasons why it had to be done for my language and needs.

atoav 2 hours ago

Well, I did that. And it works flawlessly for a decade now. The thing is just that I know and control the context from which the dates are being parsed. If you're now like: "Yeah ok if you're the one who sends the data being parsed it might be okay", the claim of "never do X" is proven wrong if there are specific situations where soing X is not only okay, but might be the sensible option.

Which is why you should never use the word "never" unless you're really sure you can't come up with a situation that is an exception.

CurtHagenlocher 18 hours ago

In 2009 I made a note that Excel's main date parsing function was over 1000 lines of code -- not including helpers.

fitsumbelay 18 hours ago

before I even read the post lemme just say "too late, friend. faaaaar too late ..."

Hizonner 14 hours ago

Never mess with cryptography, times, Unicode, or floating point.

bdhcuidbebe 5 hours ago

Never say never

fHr 14 hours ago

fucking daylight saving time I had to fix a few weeks ago on the change back on last Sunday October where the same hour occurs twice .

aussieguy1234 6 hours ago

Almost never...

There may be some obscure cases.

Like for example, lets say you are writing very performance sensitive code where nanoseconds count. All of the date parsing libraries available for the language you are writing are too slow for your requirements. So you might roll your own lighter weight faster one.

ayaros 12 hours ago

I will now post the relevant Tom Scott video: https://www.youtube.com/watch?v=-5wpm-gesOY

I tried it. I will never try again unless I take, like, six months to plan out how the system will work before I even write a single line of code.

vivzkestrel 8 hours ago

what is your issue with dayjs?

micromacrofoot 18 hours ago

Multiple times in my career I've had a good laugh when a non-technical manager says something along the lines of "it's just the date, how hard can it be?"

deadbabe 12 hours ago

Of course I won’t write my own, AI will just write it for me.

danesparza 19 hours ago

I mean. Yes. Don't write your own date parsing library. Unless you want to go nuts.

https://gist.github.com/timvisee/fcda9bbdff88d45cc9061606b4b...

aaroninsf 16 hours ago

Funny,

I wrote my own, so had to click, but mine was for a very different use case: converting extremely varied date strings into date ranges,

where a significant % of cases are large number are human-entered human-readable date and date range specifiers, as used in periodicals and other material dating back a century or two.

I.e. I had correctly interpret not just ISO dates, but, ambiguous dates and date ranges as accepted in (library catalog) MARC records, which allows uncertain dates such as "[19--]" and "19??", and, natural language descriptors such such as "Winter/Spring 1917" and "Third Quarter '43" and "Easter 2001." In as many languages as possible for the corpus being digitized.

Output was a date range, where precision was converted into range. I'd like to someday enhance things to formalize the distinction between ambiguity and precision, but, that's a someday.

When schema is totally uncontrolled, many cases are ambiguous without other context (e.g. XX-ZZ-YYYY could be month-day-year or day-month-year for a large overlap); and some require fun heuristics (looking up Easter in a given year... but did they mean orthodox or...) and arbitrary standards (when do seasons start? what if a publication is from the southern hemisphere?) and policies for illegal dates (Feburary 29 on non-leap-years being a surprisingly common value)...

In a dull moment I should clean up the project (in Python) and package it for general use...