Microsoft open-sources "the earliest DOS source code discovered to date" (arstechnica.com)

382 points by DamnInteresting 18 hours ago

jmward01 16 hours ago

It is rare that I say this but, thanks MS! Arguably just as, if not more, important is the BASIC that they wrote. That was what they actually wanted to do. DOS just got them the contract with IBM. For decades MS was really a developer tools company with a side biz of writing operating systems and other misc software. They also open sourced that BASIC code too [1].

[1] https://opensource.microsoft.com/blog/2025/09/03/microsoft-o...

ramon156 11 hours ago

I dont think I've ever seen a commit that says "49 years ago". Damn.

RobotToaster 2 hours ago

Not quite as old, but brl-cad is still in active development and has commits from 1983. https://github.com/BRL-CAD/brlcad/graphs/contributors?all=1

pletnes 7 hours ago

formerly_proven 10 hours ago

steve1977 11 hours ago

I remember when I realized I had been using Microsoft all along through my Commodore 64.

vee-kay 11 hours ago

What's interesting is that Microsoft BASIC itself was derived from BASIC-PLUS which itself was derived from Dartmouth BASIC (which evolved into a structured programming language called SBASIC (Structured BASIC). But the popularity of Microsoft BASIC, actually halted the standardisation of SBASIC as an ANSI standard.

https://en.wikipedia.org/wiki/Microsoft_BASIC

The Altair BASIC interpreter was developed by Microsoft founders Paul Allen and Bill Gates using a self-written Intel 8080 emulator running on a PDP-10 minicomputer.[1] The MS dialect is patterned on Digital Equipment Corporation's BASIC-PLUS on the PDP-10, which Gates had used in high school.

https://en.wikipedia.org/wiki/Dartmouth_BASIC

Dartmouth BASIC is the original version of the BASIC programming language. It was designed by two professors at Dartmouth College, John G. Kemeny and Thomas E. Kurtz. With the underlying Dartmouth Time-Sharing System (DTSS), it offered an interactive programming environment to all undergraduates as well as the larger university community.

Dartmouth also introduced a dramatically updated version known as Structured BASIC (or SBASIC) in 1975, which added various structured programming concepts. SBASIC formed the basis of the American National Standards Institute (ANSI) "Standard BASIC" efforts in the early 1980s.

In contrast to the Dartmouth compilers, most other BASICs were written as interpreters. This decision allowed them to run in the limited main memory of early microcomputers. Microsoft's Altair BASIC is one example: it was designed to run in only 4 KB of memory (interestingly, it was delivered on paper tape).

Kemeny became involved in an effort to produce an ANSI standard BASIC in an attempt to bring together the many small variations of the language that had developed through the late 1960s and early 1970s. This effort initially focused on a system known as Minimal BASIC that was similar to earliest versions of Dartmouth BASIC, while later work was aimed at a Full BASIC that was essentially SBASIC with various extensions.

But by the late 1980s, tens of millions of home computers were running some variant of the MS BASIC interpreter. It had become the de facto standard for BASIC, which eventually led to the abandonment of the ANSI SBASIC efforts.

Kemeny and Kurtz, however, decided to continue their efforts to introduce the concepts from SBASIC and the ANSI Standard BASIC efforts. This became True BASIC.

https://en.wikipedia.org/wiki/True_BASIC

There are versions of the True BASIC compiler for MS-DOS, Microsoft Windows, and Classic Mac OS. At one time, versions for TRS-80 Color Computer, Amiga and Atari ST computers were offered, as well as a UNIX command-line compiler.

After several years of inactivity, as of February 2026, the TrueBASIC website is officially closed.

dboreham 2 hours ago

Nit: the pdp-10 is generally considered a mainframe not a minicomputer.

BobbyTables2 4 hours ago

Ah, the good ol’ “Embrace, Extend, … Extinguish”

nananana9 9 hours ago

I cannot describe to you how jealous I am of the fact that back then writing a few thousand lines of assembly was what it took to launch a successful software company.

curiousObject 8 hours ago

>writing a few thousand lines of assembly was what it took to launch a successful software company.

Yes, but that assembly was not DOS, and it wasn’t easy.

Microsoft purchased the DOS code, they didn’t write it. Of course, they did develop and modify DOS. But that was a clever (and lucky) business deal, not a technological accomplishment.

The real beginning of Microsoft was earlier, with Allen, Gates and Davidoff writing the Altair BASIC interpreter. That was a serious achievement.

They had never seen the computer they were writing that assembly code for. They did not even own any computers. It took them 8 weeks on a university computer they were not supposed to be using for that

“Altair agreed to meet them to possibly buy a BASIC interpreter… Gates and Allen had neither a BASIC interpreter nor even an Altair system on which to develop and test one. However, Allen had written an Intel 8008 emulator that ran on a PDP-10 time-sharing computer. Allen adapted this emulator based on the Altair programmer guide, and they developed and tested the interpreter on Harvard's PDP-10.

The finished interpreter, including its own I/O system and line editor, fit in only four kilobytes of memory, leaving plenty of room for the interpreted program. In preparation for the demo, they stored the finished interpreter on a punched tape that the Altair could read, and Paul Allen flew to Albuquerque to meet with Altair…

While on final approach into the Albuquerque airport, Allen realized that they had forgotten to write a bootloader to read the tape into memory. Writing in 8080 machine language, Allen finished the program before the plane landed. Only when they loaded the program onto an Altair and saw a prompt asking for the system's memory size did Gates and Allen know that their interpreter worked on the Altair hardware.”

https://en.wikipedia.org/wiki/Altair_BASIC

BobbyTables2 4 hours ago

Imagine if the University had sued for their share of the IP and that was created using their resources…

It’s funny because I thought Jobs/Wozinak got their initial funding from selling phreaking boxes. And more recently, Anthropic engaged in criminal copyright violations with only a slap on the wrist.

Feels like a common theme of every “great” company having its origins from a “boost” resulting from criminal activity. (After all, that’s where the money is!)

Just imagine the criminal penalties possible for pirating and selling one copy of a movie or making one long distance phone call with phreaking.

areweai an hour ago

dboreham 2 hours ago

yokoprime 9 hours ago

To be fair, i think you needed a cutthroat businessman leading the company. Which i guess is more or less the same today

themafia 36 minutes ago

> a cutthroat businessman leading the company

I'm sure his family connections aided him significantly.

justsomehnguy 6 hours ago

This too but early MS to their employs was closer to a hipster SV vibe coding in a coffee shop a decade ago.

greenbit 9 hours ago

And for such simple processors and systems no less! No descriptor tables to deal with, no memory management to configure. These days it takes a little processor inside the main processor, just to get things started. Those were golden times.

embedding-shape 9 hours ago

Replace Assembly with TypeScript/Rust/Go/whatever and as long as the idea is good and useful, same thing applies today.

risyachka 8 hours ago

Except the competition was essentially non existent and no one would copy your product with llm in a day

uluyol 8 hours ago

embedding-shape 8 hours ago

avadodin 9 hours ago

More than a few people would rather die in poverty than put in the effort today even if you offered to time-machine them back with their finished product.

gnabgib 18 hours ago

Discussion, on the source, at the time (79 points, 24 days ago, 19 comments) https://news.ycombinator.com/item?id=47957494

Or on the GitHub clone (162 points, 15 comments) https://news.ycombinator.com/item?id=47946813

locusofself 16 hours ago

wow, they had to OCR it back in from paper printouts

> This source code is old enough that it hadn’t been stored digitally. “A dedicated team of historians and preservationists led by Yufeng Gao and Rich Cini,” calling itself the “DOS Disassembly Group,” painstakingly transcribed and scanned in code from paper printouts provided by Paterson. This process was made even more difficult because modern OCR software struggled with the quality of the decades-old printout.

FarmerPotato 16 hours ago

I'd like to hear more about what works in OCR of dot-matrix fonts.

I've been able to OCR letter-quality printer output to 97% (mostly Os and Xs problems).

But it seems that machine-learning text-recognition is also now biased to reject computer code because it doesn't look like human language.

ndiddy 5 hours ago

There's a writeup here from one of the people on the team about the work it took to go from the listings to source code. http://cini.classiccmp.org/recoveryblog.htm

> With less-than-satisfactory OCR output, I resorted to a process I used many years ago when converting scans made of old Commodore ROM dumps printed on a Commodore 1515 dot-matrix printer. The process relies on the ASCII OCR output having the same repetitive errors. "B" and "8", "S" and "5" are good examples, as are "l" and "1", and "O" and "0". There are many other similar single-character errors and, when working with x86 code, there are similar errors with instructions like "MOV". This process naturally works better if the output file is monolithic rather than single-page OCR conversions because you can do substitutions across the entire converted printout and not 75 separate files.

> The next formatting hassle was the spacing. This required repetitive substitutions of a descending numbers of spaces to tabs (i.e., replace 8 spaces with a tab, 7, 6, etc.). Then if you want to return it to fixed spaces (which is likely how the original printer printed it -- spaces and not vertical tabs), you can. For pure re-creation work, spaces produce absolute column formatting while tabs can move around depending on the program displaying the file.

> Once you run thought the 15 or so common global substitutions and tab conversion, it's a lot easier to work with the file to fix formatting and perform other cleanup. This is then followed by a line-by-line comparison against the original printouts. Overall I'd say the conversion output quality with this method is very good.

FarmerPotato 3 hours ago

accrual 2 hours ago

embedding-shape 9 hours ago

Boring reply perhaps, but I've had wild success with adding even a tiny LLM afterwards to do "fixups" over OCRd text, works great for the typical O/0 issues and similar, just pass it the scrambled OCRd text together with the text around it, and even dumb and tiny 7b models running on CPU do a pretty fine job.

bob778 8 hours ago

ABBYY has a specific module for dot matrix printouts so I’m surprised it was a struggle for them but every document is different

WalterBright 11 hours ago

I've recovered some ancient software I wrote via scanning in listings I found among my dad's papers.

SoftTalker 16 hours ago

Yet another case where text printed on paper outlived any digital storage.

jshier 16 hours ago

Seems like it was never digitally stored in the first place, and the printed text was barely readable due to age. Not really a big win for paper.

SoftTalker 16 hours ago

zargon 15 hours ago

onion2k 12 hours ago

irishcoffee 14 hours ago

7bit 10 hours ago

petcat 16 hours ago

> struggled with the quality of the decades-old printout.

barely

It sounds like this printout has deteriorated badly and was barely readable.

Sharlin 9 hours ago

acomjean 5 hours ago

Interesting story of how MS got into the operating system business. IBM wanted the CPM operating system, but Digital Research wouldn’t sign ibms NDA… really a pivot point in computing history.

From “Triumph of the Nerds” tv transcript:

https://www.pbs.org/nerds/part2.html

Jack Sams (IBM) was looking for a package from Microsoft containing both the BASIC computer language and an Operating System. But IBM hadn't done their homework.

Steve Ballmer: They thought we had an operating system. Because we had this Soft Card product that had CPM on it, they thought we could licence them CPM for this new personal computer they told us they wanted to do, and we said well, no, we're not in that business.

Jack Sams (IBM); When we discovered we didn't have - he didn't have the rights to do that and that it was not...he said but I think it's ready, I think that Gary's got it ready to go. So I said well, there's no time like the present, call up Gary.

Steve Ballmer: And so Bill right there with them in the room called Gary Kildall at Digital Research and said Gary, I'm sending some guys down…. Treat them right, they're important guys.

bragr 5 hours ago

Eh, basically all facts in this story are disputed by all sides. Aside from general gist that there was some meeting that didn't go well.

chuckadams 4 hours ago

Whether Kildall actually blew IBM off at that meeting or not, what was definitely the case was that CP/M didn't have a 16-bit version ready to meet IBM's schedule, and that's what ultimately took them out of the running.

danborn26 an hour ago

Looking through the source is a great reminder of how constrained early computing was. It's amazing how much of this architecture still influences modern systems.

userbinator 17 hours ago

I wonder how long it'll be before they release the source for the earliest Windows versions. The fact that they still have the source for this very old DOS at least gives hope that they also do for old Windows.

GaryBluto 15 hours ago

The day they would make Windows 2000 codebase open source (or source available) would be the day I could die happy (although I'd probably be long dead anyways by the time there's a glimmerof chance of it happening). What a beautiful, smooth-running operating system it was.

ndiddy 5 hours ago

They will never release the code for anything that new because at that point, there's tons of licensed third-party code and the codebase is so large that going through everything to verify ownership would not be feasible. The code to NT 4 and XP have been leaked though.

optymizer 14 hours ago

Agreed. It's still my favorite Windows version.

greenbit 8 hours ago

Except for "the hive". Remember the hive? Sort of an alternate registry, in addition to the actual registry. Granted, it was pretty invisible, until it got corrupted.

I had a win2k machine that was my daily (at home) that was fine until idk about 2006, at which point something happened (muons?) and it would go into some kind of panic state just after bringing up the desktop. Hive corruption. I tried on and off for a couple of years to repair it, no luck. It wasn't just about the files on the HD, it was easy enough to transplant the drive and read/write anything, it was that I really liked the way I had the environment configured. Sure, it was all kind of moot, but it became a kind of personal windmill to resurrect this old thing. In the end, I booted an XP CD in it, and selected 'upgrade', and voila, it was Duncan Idaho, back from the dead.

Anyway.. loved win2k, but not a fan of the hive.

chuckadams 4 hours ago

justsomehnguy 6 hours ago

NitpickLawyer 13 hours ago

Wasn't there a 2000 source leak a while ago? I remember some exploits coming out after the leak.

toyg 10 hours ago

londons_explore 14 hours ago

There is a mostly complete leak of it...

WalterBright 11 hours ago

It shouldn't be hard to disassemble it.

protocolture 14 hours ago

I imagine its not far off. I get the impression they are almost done with windows as a platform.

teamsolid 17 hours ago

I am sure that there is a lot good material to take inspiration and learning even from the early Windows 3.11.

mycall 16 hours ago

Do a deep dive into how OS/360 formalized to having DOS.

SoftTalker 16 hours ago

/s ?

AlecSchueler 14 hours ago

throwaway27448 16 hours ago

They waited a couple decades too long for this to be of interest.

dang 18 hours ago

Recent and related:

Microsoft open sources DOS 1.00 on 45th anniversary - https://news.ycombinator.com/item?id=47957494 - April 2026 (19 comments)

jug 8 hours ago

While oldest source of it, note that the 86-DOS v0.1-C binaries are even earlier (and v0.34 has also been found) than this v1.00 source and can be downloaded and used in an emulator. :-)

https://arstechnica.com/gadgets/2024/01/the-oldest-known-ver...

teamsolid 17 hours ago

It is wonderful how early years of modern computing was brilliant. We treated machines as they really are: machines. Performance, creativity, science..., all possible to make a 386 machine work. Nowadays is all about libraries, virtualization, [bad] code over [bad] code over [bad] code..., I dont like it.

dhosek 16 hours ago

I sometimes think that my mental model of a computer is still an Apple ][+ with 48K of RAM leads to my writing better code.

WalterBright 11 hours ago

While I did a few 10 line programs in BASIC in high school on punch cards, when things really started was a freshman class on semiconductors. The class started with diodes and quantum mechanics, then onto transistors, then flip flops, then registers, then ALUs. Then it was on to designing/building a digital clock (which never worked right), and later designing/building/programming single board computers (6802 chip).

It was fun knowing everything about a computer. That's long gone!

stevesimmons 12 hours ago

And mine is a Commodore Vic-20 circa 1981, with 3583 bytes of free RAM. Programmed in 6502 assembler. Can't get much closer to the CPU than that.

aenis 13 hours ago

For a very long while now, we had programmers who never understood any low level concepts at all. They have started with js or python, and never looked 'down'. There are no limits to monstrosities they will consider normal.

Linus Torvalds, a few months ago, said something to this effect when discussing AI coding tools. That his (also, mine) generation was lucky to have started with low level stuff and managed to retain the understanding of the whole stack - and kids these days don't get that. Good luck acquiring this level of feel for computers, algorithms, data structures today, when a kid's first experience with coding will be a seemingly genius chatbot.

charcircuit 10 hours ago

>and managed to retain the understanding of the whole stack

No one understands the whole stack. There is too much specialized information.

Sharlin 9 hours ago

goodpoint 5 hours ago

DOS and brilliant in the same sentence...

9dev 5 hours ago

At some point, we'll probably have a new field in history for digital archeology, and I'm really envious for those future historians! They'll be getting to sleuth around old datasets, trying to reconstruct the history of computing, understand long-forgotten file formats to preserve data, use statistical methods to analyse binary backups, and trace for specific documentation versions to crack old encryption formats...

EvanAnderson an hour ago

The term "programmer-archaeologist" was coined by the author Vernor Vinge in his 1999 "A Deepness in the Sky"[0] (a pretty great read and definitely recommended) and the field is arguably a real thing now[1].

[0] https://en.wikipedia.org/wiki/A_Deepness_in_the_Sky

[1] https://en.wikipedia.org/wiki/Software_archaeology

giobox 3 hours ago

This field already is alive and well in the gaming community. Games companies are notorious for not spending money on keeping their old code around, which is why it's been at the forefront of digital archaeology efforts a lot of the time to preserve the industry's history.

I'd also throw the wayback when machine and the internet archive into this bucket.

dang 3 hours ago

Related ongoing thread:

Microsoft's 6502 BASIC is now Open Source (2025) - https://news.ycombinator.com/item?id=48257058

danborn26 9 hours ago

Fascinating piece of computing history. Preserving early DOS source code gives a lot of context to the structural choices that stuck around in x86 architecture for decades.

imoverclocked 16 hours ago

Time to find vulnerabilities!

I remember in the naughts, coming across a dos machine that was quite out of time… even for the university basement it was living in next to a pile of lead brick. Its only job was to run an instrument via an home-built ISA card and write data out to 5.25” floppies.

What uses would this code have in 2026?

yjftsjthsd-h 12 hours ago

It's a single user OS that runs everything in ring zero by design. I'm not sure, definitionally, that it can have security vulnerabilities. I... guess maybe code execution on exposure to an untrusted floppy disk filesystem?

greenbit 7 hours ago

Look closely, you'll notice there's no network interface. The only vulnerability in a system like that is physical access by malicious individuals.

About the worst mal-ware it can have is a boot sector that installs a "terminate, stay resident" (TSR) that copies itself onto any floppy that gets inserted.

FarmerPotato 16 hours ago

To see what decisions they made. Like any historical document. Aim to understand the people of the time.

gxd 2 hours ago

THANK YOU!

Can we now have all the Infocom games owned by Activision (which is yours) now? Pretty please? I know the source is available, but we'd like them with a MIT license (including the manuals, artwork etc).

PS: a couple of them could be harder, like Shogun, but it's okay to skip these.

rvnx 6 hours ago

I’m sure this is better software than Windows Millenium Edition

okandship 10 hours ago

readable plain text plus boring metadata still ages better than most clever archival systems

xandrius 9 hours ago

In this case a paper printout.

hackerqwe 10 hours ago

More code that copilot can be trained on.

gnarlouse 11 hours ago

How about Microsoft fixes npm, github, and vscode

froyooh 17 hours ago

Back when it was all written by hand and optimized well.

dooosss 15 hours ago

Too little, too late.

signa11 17 hours ago

in the words of mr. mitch-hedburg “here, you throw this away“

TedDoesntTalk 14 hours ago

He could have sold those printouts instead of giving them away.

theanonymousone 11 hours ago

I'm wondering whether ReactOS can exploit Claude et. al. to their fullest and "recreate" Windows 2000/95. I may donate some tokens for that cause.

leobuskin 11 hours ago

I've used Claude to fix/reconstruct & build leaked Win2k3 on Linux with original toolchain via Wine. This approach included full gdi sources reconstruction. I just don't know what to do with this, it's kinda difficult to "wash" on this scale

CursedSilicon 11 hours ago

That sounds like a terrifying legal minefield that they would not want to tread

theanonymousone 11 hours ago

Is it not safe to assume Window source code is not present in the LLM training data?

stavros 10 hours ago

xandrius 9 hours ago

Slap a fair use on it and call it a day.

rvnx 6 hours ago

greenbit 7 hours ago

leni536 10 hours ago

But surely anything the LLM outputs is clear of licensing requirements /s

Or would Microsoft like to argue otherwise in court?