Hardlimit Museum

vreyes1981

Olé!! little by little, si señor.

And I like the new design, with the magazine covers really big. What a blast of nostalgia.

cobito

The first feature of the year arrives and it's for the museum. We're launching a new disc browser that has become the germ for the future file browser that will be an extension of this. The new front-end has been used, so the views are optimized for both large screens and mobile devices. As it is thought, it should be possible to navigate in a comfortable and intuitive way all kinds of media: in addition to magazine discs, also operating systems, drivers, demos, compilers, etc. The content will be arriving. The entire table structure is already created and in the next iteration, the concept will be put into operation with a couple of discs or three. Due to space limitations, for the moment only the PCManía CDROMs are available. I will look for temporary solutions. I have a 2TB disc around that I might be able to use as a temporary medium. For its part, the disc card (medio.php) of which you have an example here, shows the cover along with information about the medium. The information that is shown (if available) is as follows:

Name of the publication.
Size of the medium.
Format of the medium.
System with which it was originally created.
Volume
Editorial
Entity that prepared the data
Program that was originally used for creation
MD5 sum
SHA-256 sum

Download links are also shown for the file (.iso,.zip) or files (.cue/.bin) of the medium that you want to download and the downloads are enabled, in principle, without limits. What comes next will depend mainly on the storage space that I manage to rescue from discs that I have stored around.

_Neptunno_

@cobito I love the hard work you're putting into the Museum and the new interface for the disk browser. As a professional in the digital preservation sector, I can only take my hat off to you.

Let me tell you something, because you'll like it: my company is dedicated precisely to digital preservation on a global scale. We work with clients ranging from the National Library of Spain (BNE) to international ones like Harvard University, the Holocaust Museum in the US, or HILA (Stanford University), among many other top museums and universities, setting up systems that cost hundreds of thousands of euros. And I'll tell you something: seeing what you're achieving with limited resources has an impressive value; the structure with the MD5, SHA-256 sums and the metadata cataloging is at a level of rigor that many institutions would envy.

That's why I'm so excited about it. From my humble opinion (keep in mind that I'm just a technician, but after so many projects, and especially the one at the BNE where I saw how millions of pages were published thanks in part to my work over the years for people to access them from home), your work seems to me professional and necessary. I know how difficult it is to digitize and give visibility to these archives, and yours is something to take your hat off to.

It's vital to give visibility and access to this content before it's lost forever. In fact, if you want, I'll talk to my company so that you can negotiate with them for a safebox and we'll take the Museum to the big leagues jajaja

Seriously, you're doing great work and the community will always be grateful. As soon as I get my hands on the 486 or the Pentium 166 I have around here, I'm sure I'll be using this material to the fullest!

Hugs!!

cobito

@_Neptunno_ You're going to make me red

You have no idea how happy and motivated you make me with what you say. In the technical aspect of the subject, I don't have as many doubts. It's the typical thing that can be done well in many ways and I probably am not doing it perfectly, but not badly either. But from the point of view of structuring and organization, I follow my intuition more than "technical" standards. My references are Archive.org and WinWorldPC. And well, the experience of this being the third time I try (let's see if the third time is the charm).

@_Neptunno_ said in Hardlimit Museum:

We work with clients ranging from the National Library of Spain(BNE) to others internationally such as Harvard University, the Holocaust Museum in the USA or HILA (Stanford University) among many other top museums and universities, setting up systems that cost hundreds of thousands of euros.

Having someone with this track record come and tell you that you're not on the wrong track is very motivating. As you are probably the biggest expert in Hardlimit (certainly, you are the biggest expert in the field that I know), any criticism or suggestion you have, whatever it is, please comment on it.

This is something I want to do well and since the economic cost (not in time) of development is zero and that the hardware resources can be scavenged around, it would be great if the formal aspect was done with rigor.

It's still very green, both in functionality and content. So as it evolves, you will probably see things that can be improved (if you haven't already seen them).

Thank you!

_Neptunno_

@cobito I'll do everything I can to help you, as you know it's a pleasure, even if you don't think I'm at the level of a preservation engineer!

I think you would enjoy learning a lot from my development colleagues; to program the software that manages not only Teras, but petabytes of information, there is an incredible amount of work behind it (not to mention a thousand things that were not known, such as the "Transfer Connector": a critical function that acts as a bridge for ingesting data in a secure and structured way). In that part, I have less to say, since my role is more of a system support, but I have also been in many digitization projects of all sizes and I can't help but see your museum with professional eyes. I find it admirable and, above all, very useful for the community!!

To give you an idea of the scale, the systems we set up are responsible for preserving "digital knowledge" for the next 200 years (or so I heard in a meeting a while ago, haha). We always say that if this had existed in the time of the Library of Alexandria, nothing would have been lost today. The data is stored in redundant systems that constantly audit the files to ensure they are healthy and that the disks maintain their integrity. If something fails, there are several more copies in the pools ready to come in while the damaged disks are replaced or "with a lot of running around".

Even in the face of catastrophes or wars, measures are taken. With the conflict in Ukraine, for example, critical copies were already being moved to safe locations (For example from UK to Ireland) to prevent the information from being lost in case of direct confrontation.

That being said, I must point out that I am only the "last guy" in my company! But these things are cool and I tell you about them as a bit of gossip from the world

Best regards!!

cobito

@_Neptunno_ What a find. A while ago I read about a new Rosetta stone (in the sense that it was recently created) where information was stored in a spiral reducing the size of the characters from visible to the naked eye to microscopic with a similar goal to the original Rosetta stone (to have a translation table). And then, to reproduce it and distribute it all over the world, as a way of having that redundancy.

After some searching, I found it. It's called The Rosetta Project and what they have done is create a disk that stores 13000 pages of information in 1500 languages in a way that the text can be read with a microscope (no digital information).

You probably know about it, because looking at the page, it's run by Stanford University which your company has worked with. You may have even participated in some related development!

Anyway, what I meant to say was that this was the only example I knew of an attempt to preserve knowledge on that scale. I would never have imagined that there were so many resources put into preserving knowledge at the level you describe.

Very interesting, indeed.

_Neptunno_

@cobito the most important part of my company is preservation, which is where the most resources are invested and is the "soul" of the company. Then there is the digitization department, which was the one that started everything.

At first we were fully involved in massive digitization, something quite "mechanical" (scanning material and generating metadata and bibliographic records of the images). We worked with specific scanners to obtain TIF files (300dpi although over time they went up to 400 and 600 with very specific cameras for this) and then generate the derivatives (JPG, PDF). Depending on the project, some were simple and others required massive renaming and complex structuring, such as periodicals (newspapers and magazines). To give you an idea, we have digitized everything from the AS newspaper to press from the beginning of the 20th century for the BNE, passing through projects with the Prado Museum or the University of Granada (some project of the Royal Hospital of Granada), among others.
But minor projects were also done for Town Halls of Villages and Urban Planning files, to put another example. But we are going to show you that it has been many years of doing everything.

My job for a long time was precisely that image processing, although I also did system support. In parallel, the company grew exponentially developing digital preservation software. That software is the one that manages everything now. We no longer just digitize documents, but we guarantee that they remain intact and accessible.

If you go to the BNE Digital Library, there you can see millions of pages in which I have contributed in a small way with my work. As a curious anecdote: I worked with collections of photos from the Civil War that were not open to the public due to copyright, but that the BNE had to preserve to release them in the future. Seeing those images was quite shocking

I'm sorry to disappoint you, but that project of the Rosetta Project at Stanford is something they develop on their own. That said, as a client we work with their infrastructure and, well... you would laugh if I told you that we have had to complain about being so stingy. They gave us virtual machines with system disks of 127GB (the default size of Hyper-V) and we had to protest so that they put in "more juice", because they were constantly filled with logs

Best regards!!

cobito

I have finally been able to organize all the museum content. I rescued a 2TB drive from the storage room that will give me plenty of room to continue with this. In addition, I have been able to make space on the backup drive, so the part that bothered me the most about using a drive in this way is resolved. At first, all the magazine discs are already there. @vreyes1981, your Micromanía discs are there, specifically from the year 2002 (full year) and 2003 (until April), which is what you had uploaded. If you have more material around, don't hesitate to put it (let me know if you do so I can add it). The database structure has also been finalized to take the next step: the file explorer. The initial phase will allow you to navigate the directory tree of all available drives and in the next turn of the Museum, it should be operational. I will see if I dedicate time to the test bench or the museum next.

cobito

This week we launched the first phase of the file browser. All files from almost all disks are already indexed and browsable. We left out a couple of MDF/MDS from Micromanía that we can't extract despite being able to extract most MDF/MDS. Here you have a disk where, in addition to the information that was already announced, now includes the list of files and folders browsable. In addition to the file explorer, it is possible to view detailed information of each file along with duplications, that is, files that could have different names, dates, and other attributes but whose content is the same. The information that is shown is as follows:

Name of the file in the current medium and route.
Original creation date in the medium.
Size in ISO/IEC 80000-13 binary format.
The type of content in a descriptive way (from here for most it will not show information; it will be added over time).
MIME type (it is not 100% precise, but almost).
A more detailed description of the content of the file, for example, if it is a self-extracting executable, details of the encapsulated content are given.
An MD5 signature
An SHA256 signature

Here an example of pkunzip.exe which is a pretty popular file. In this first phase, it has been a challenge to be able to determine the character set of the file system. Sometimes UTF-8 is used, other times CP850 and there are a couple of images out there that look like they came out wrong from the beginning due to some failure in the creation software (apparently, it was not uncommon in the 90s). In any case, the names of the files are shown correctly with their es, accents, and other regardless of the original format. We have 570,000 files. This comes from the media of the five publications that we have at the moment, which are being used as a reference for all development before adding much more. The second phase of the browser has also begun, which consists of extracting all extractable files. And of the extracted ones, repeat the operation recursively. At the moment we are compatible with more than 70 compressed formats from all eras that are determined by heuristics, not by extension or by magic number, which avoids that any compatible file escapes. About this, when this first version of the extractor is polished, it will be put into production. Over time, more formats will be added but for now we stay with those 70-80 to prioritize other aspects. In another order of things, a change has been made in the v86 configuration that makes the response of the virtual machines now much faster (example). It was something that I didn't notice by going locally, but now that all traffic goes through a VPS, I realize details of performance that can be improved (dependent on latency). And to finish, the algorithm behind the new page translation system has been corrected making pages render much faster now (this affects both the museum and the test bank). This change, in addition, has stopped breaking certain functions such as the magnifying glass in the magazines and the capture of the mouse pointer in the virtualizations.

cobito

It is now possible to view the contents of the extractable files. These include any type of file and the content can be other files or sections of binaries. The sections often contain binary data without importance, but in others, there is embedded relevant content such as images, sounds, animations, cursors, plain text and so on (mainly in DLLs). Here is an example of a.zip that in turn contains another zip. In the description you can see the format and the algorithm that is used in each of them. 70% has been indexed. I suppose that during the weekend the process will be completed. At the moment, they add up to 1.3 million files/sections to the ones we already had. With this, phase 2 is in principle finalized (waiting to complete the indexing, which is an automatic process) and the most arid part of the topic is closed until new formats are added (there is already a list for the next iteration). Possibly phase 3 will begin soon, which consists of being able to visualize the files from the browser. This is: images, videos, sounds, midis, mods, documents, etc, etc, etc. It is one of the coolest parts of the explorer for which it has been necessary to previously do what has been done so far and with which it can be converted into a powerful tool for digital archaeology. On the other hand, the front-end has been consolidated throughout the museum (except for hardware cards): now the new one is already being used in all sections, which gives it a better appearance on the desktop and fixes many things that were broken in the mobile version. In addition, videos have been added about each hardware and software that I uploaded at the time to Peertube: example.

cobito

Phase 3A is in process at the same time as it has entered production (there is still a lot to process, but that is already automatic). Standardized and free formats are being used for visualization. The idea is that, regardless of the original format, it can be viewed in any browser, because one of the problems with old files is that many times they stop being reproducible due to codec/format/algorithm issues. The chosen formats are: VP9 for video, OPUS for audio and WebP for image.

The following types of files can be visualized from the browser:

Images, videos and audios
These three types of "media" are being extracted by heuristic. That means that a large amount of images, videos and audios are coming out even from files that are not identifiable as such. In the case of images, a lot of metadata is also shown. Over time, metadata will be added to videos and audios and it is possible that histograms of color will be shown in images (this information is already being captured; it just remains to show it).

For their part, OCR is being passed to the images. If the text is well-defined enough, the results are not bad. It will be used in the future to search for text in images although, from now on, OCR is shown when the image is visited.

Example 1 of image
Example 2 of image (representative example of OCR)

MIDIs
The midis come in six flavors (neither more nor less). They have been rendered in:

OPL2 (synthesizer).
OPL3 (synthesizer).
Gravis UltraSound (official patches).
Roland MT-32 (official roms).
FluidR3 (modern soundfont).
ToH (modern soundfont).

In some cases, it has not been possible to extract MT-32, GUS and/or ToH. Many old MIDIs are malformed, do not follow the standard, etc.

Example of MIDI
Example 2 of MIDI

MODs
They are being rendered to comply as much as possible with the Amiga's Paula thanks to OpenMPT. Initially, several versions were going to be offered, but here the panorama is more gloomy and it seems that all efforts are focused on this implementation. Here, too, files are being scanned by heuristic and the truth is that very interesting things are coming out, such as PSM files that were a kind of Epic MODs used in things like their Pinball or in Jazz Jackrabbit (I had to look up what this is because I had no idea it existed).

Example of PSM of Jazz JackRabbit
Another PSM of Silver Pinball (precursor of the Pinball of Epic Megagames)
Normal MOD
Another MOD

File browser
In another order of things, when you access any folder, a selection of all these files that cover the current folder and all the upper subfolders is shown (up to 6 files per type). As you navigate through the folders, the "media" shown will be narrowed down. And if you click on a viewable file, all the information is shown next to the visualization. What comes out comes in order of "importance" and importance consists of the number of pixels in images and duration in the rest.

Processing is slow (we are at 2%). We are still going through the first media on the list. Here is an example:
Root directory of PCMania 21

In addition, it is possible to see all the files of a specific type from the current directory. For example, here are all the images of PCMania 27.

To finish, icons have begun to be shown next to files and folders to make them more identifiable. There are still many left, but they will be added gradually. For this, Unicode characters are being used since I am getting into the normalization of formats and encodings.

There is one thing: this is supposed to be a file search engine. But I went to look for a few files to put in this thread as an example and it turns out that I forgot to implement the search engine and I didn't realize it until now. So that's what I'll try to have next time.

Phase 3b consists of doing the same thing with documents: txt, rtf, wp5.1, pdfs, docs, etc, etc, etc. But this will be left for much later.

cobito

The proof of concept for the fourth (and final) phase of the file browser is implemented. The idea here, more than adding pure functionality, was to be able to demonstrate if this was possible. And it is possible!

It consists of being able to run executables from the browser. The system automatically detects the type of executable and launches it in MS-DOS 6.22 or Windows 95/98. In addition, something else related to all this has been wanted to be tested that for the moment is limited to some formats of images and sounds: being able to visualize the files in native software. For the moment, gifs/jpegs can be seen in Internet Explorer 3/4 and BMPs in Windows 95 Paint and the three formats in Imaging. In addition,.wav can be listened to in the Windows sound recorder.

Some examples (to hear sound, you have to click on the emulation screen):

Example 0: Demo of Tomb Raider II Gold
Example 1: Demo of Theme Hospital for DOS
Example 2: Demo of Theme Hospital for Windows 95.
Example 3: Demo of International Rally Championship (- to brake, key to the right of the Ñ to accelerate, z/x to turn):
Example 4: Demo of Duken Nukem 3D (do not select sound card because they did not include the audio files and it fails)
Example 5: Demo of Epic MegaPinball.

The Windows 95 emulations also come in two flavors: with 32-bit and 8-bit color depth, so that you can experiment with palettes in images and improve compatibility in software. In this way, you have Windows 95 at 8 and 32 bits and Windows 98 SE at 32 bits, so you have several ways in case of finding incompatibilities.

Example 5: Jpeg image from Internet Explorer 3/4
Example 6: Bmp image from MS Paint
Example 7: Sound from the Windows 95 sound recorder

In the previous version of the museum, it was possible to emulate some MS-DOS programs, but there was a very strong condition: the program had to be contained in a.zip and that implied preparing individual emulations, which was extremely costly in time. Now, thanks to HLFSv2 (Hardlimit File System!), the flexibility is absolute and each file can be handled individually, wherever it is and at an amazing speed. And that without counting that it breaks the MS-DOS limit and expands, in theory, to any x86 system (there are still kilometers of fabric to cut here). With this, we have already surpassed the functionality of the previous version by far and as far as I know, we are the only ones who can run programs this way.

This topic is still green and will be polished very little by little: for example, from MS-DOS and Windows 98 it is not possible to read from cylinder 1024 of the disk (you will find read errors in very large folders: >500MB): if this happens to you with Windows 98, use 95. In addition, the virtual drive only supports files in 8.3 format and other issues.

In another order of things, from the file browser, it is already possible to search for files. Searches are in the complete file system until a specific medium/disk is visited. From there, searches are narrowed down to the medium or directory recursively. The search, unlike multimedia, orders by number of repetitions, that is, the most popular files appear first. A selection of "media" is also added in the search itself.

With this, I am already satisfied and this season of intense development of the museum closes. From now until the end of the month, changes will be consolidated and documented and there will be no major new features (beyond small fixes).

The museum, as a platform, is already defined.

Now the big thing will come from the content, regardless of the fact that there is much room for improvement in everything.

PS: The indexing of the media is 33%, so in another two weeks, practically everything will be.

Hardlimit Museum

Foreros conectados [Conectados hoy]

Estadísticas de Hardlimit