JeanFred's comments

JeanFred · on Oct 25, 2021

Previous thread: https://news.ycombinator.com/item?id=26484080 (March 2021)

See also https://enterprise.wikimedia.com/

JeanFred · on Oct 20, 2020

askplatyp.us is another experimental front-end on top of WDQS − it worked with a slightly reworded question: https://askplatyp.us/?lang=en&q=Who%20is%20the%20aunt%20of%2...

thom · on Oct 20, 2020

That appears to still just be using 'relative' as a relation, not 'aunt'. You get the same results for 'cousin' etc.

JeanFred · on Oct 20, 2020

Check out Kiwix: https://www.kiwix.org/

“Kiwix is an offline reader for online content like Wikipedia, Project Gutenberg, or TED Talks. It makes knowledge available to people with no or limited internet access. The software as well as the content is free to use for anyone.”

oriesdan · on Oct 20, 2020

Indeed, Kiwix is the proper solution for anyone who cares about knowledge access in places where internet connection is spotty. Given the file format it uses (zim) is compressed, and meant to be accessed compressed to pluck a specific article, you can have all the content of English's wikipedia without images and videos in just 36GB. Plus, the specs of the file format are published and it's easy to build your own implementation, I did it for my needs.

It's also great for privacy maximalists : I have wikipedia, wikisource and wiktionary in two languages locally, which means that most of my searches never leave my computer.

Kiwix/openzim don't provide only mediawiki projects either, I've recently downloaded stackoverflow's content (although, I'll need to build a dedicated search engine for it to be really usable).

You can have a look at the incredible amount of available content there: https://wiki.kiwix.org/wiki/Content_in_all_languages

read_if_gay_ · on Oct 20, 2020

> all the content of English's wikipedia without images and videos in just 36GB

36GB seems like a really big number if it's just text. A cursory Google search says 1MB will hold about 500 pages of text (ignoring compression). So 36GB would be something like 18 million pages? Let's say a 1000 page book is 10cm wide, so 18M pages wind up as 1800 meters of books, or 180 meter-wide bookshelves with 10 shelves each, which is maybe a large library? It seems like a lot of that must be external sources. I wonder what percentage was actually written by Wikipedia editors?

oriesdan · on Oct 20, 2020

Not sure what you mean with external sources, but I have seen nothing but user generated content in there (but I haven't read all wikipedia articles, obviously).

A few things to note, though:

1/ it's not pure text content, it's html content, this has a significant overhead

2/ a zim file is not just compressed content, but also huge indexes referencing where is which content. You look for your article's title in the reference table, you find the position of your article in the file and you decompress just that part. This is what allows for selective decompression without decompressing the whole content.

londons_explore · on Oct 20, 2020

The zim file format is far from ideal for compression efficiency - all the best algorithms typically don't allow random access without decompressing everything.

Also, wikipedia has a lot of spam and orphan pages, insanely long lists, etc. Those are hard to algorithmically filter out.

bawolff · on Oct 20, 2020

Wikipedia (english) currently has about 6.2 million pages https://en.wikipedia.org/wiki/Special:Statistics

indy · on Oct 20, 2020

I'd assume that figure would also include the indexes required for searching

reitzensteinm · on Oct 20, 2020

I used this in Cuba. It was immensely useful, both to pass the time, and to look up many things of interest along the way without waiting to go to an internet zone.

I was a passenger in a car driving through central Cuba and thought I saw a sign towards Australia. Breaking out Kiwix I found the article and was relieved to see I wasn't going crazy.

Later on I was in an Uber in Australia driven by a Cuban man, and I thought I'd impress my friends with my worldliness by mentioning that there's a town in Cuba called Australia. The driver furrowed his brows and said flatly "no there's not", much to their delight. Can't win em all!

https://en.m.wikipedia.org/wiki/Australia,_Cuba

JeanFred · on June 16, 2020

> It was already a top visited site when its development team consisted of a single person, Brion Vibber. There have been very little significant development since then […].

The second member of the development team was hired in 2006 [0].

In these ~14 years, quite a few things happened.

Three projects joined the Wikimedia galaxy: Wikiversity in 2006, Wikivoyage adopted in 2012, and frickin’ Wikidata [1] in 2012 − which has deeply reshaped many aspects of the other projects, particularly Wikipedias and Commons.

On the multimedia side of things, we got InstantCommons in 2008 [3], thumbnailing infrastructure changes in 2013, various file format support (TIFF in 2010, FLAC and WAV in 2013, WebM, 3D formats in 2018 [4]) new upload wizard in 2011 [5]. The Graph extension [6] and Wikimedia Maps [7] in 2015. Structured Data on Commons in 2019 [8]. New default skin (Vector) in 2010 [9]. Unified login in 2008 [10]. 2013 brought OAuth [11], Echo notifications [12], Lua scripting [13], VisualEditor [14]. iOS and Android apps [15]. The Wikimedia Cloud Services starting 2012 [2].

(And in terms of size: article count went from ~5M to ~50M [16] ; Commons went from 1M files to 50M files [17])

And that’s just what I’m putting together in a few minutes (Besides my own memory, I’m indebted to [18], a curated timeline up until 2013).

Of course, these may or may not justify the staff size in your book ; but I’d say discounting all of that (and the rest) as “very little significant development” is a bit pushing it. :-)

(And fairly sure that “you’d probably notice” if Wikipedia was still using good’old Monobook skin ;-þ).

[0] https://foundation.wikimedia.org/wiki/Resolution:Additional_... [1] https://www.wikidata.org/ [2] https://wikitech.wikimedia.org/ [3] https://www.mediawiki.org/wiki/InstantCommons [4] https://wikimediafoundation.org/news/2018/02/20/three-dimens... [5] https://commons.wikimedia.org/wiki/Commons:Upload_Wizard [6] https://www.mediawiki.org/wiki/Extension:Graph [7] https://www.mediawiki.org/wiki/Maps [8] https://commons.wikimedia.org/wiki/Commons:Structured_data [9] https://meta.wikimedia.org/wiki/Vector [10] https://meta.wikimedia.org/wiki/Help:Unified_login [11] https://www.mediawiki.org/wiki/Help:OAuth [12] https://meta.wikimedia.org/wiki/Echo_(Notifications) [13] https://meta.wikimedia.org/wiki/Lua [14] https://www.mediawiki.org/wiki/VisualEditor [15] https://www.mediawiki.org/wiki/Wikimedia_Apps [16] https://meta.wikimedia.org/wiki/List_of_Wikipedias/Table [17] https://commons.wikimedia.org/wiki/Commons:Milestones [18] https://raw.githubusercontent.com/gpaumier/wikipedia-infogra...

JeanFred · on June 27, 2018

Gitlab itself ships with Mattermost <https://mattermost.com/> :)

aliljet · on June 27, 2018

This doesn't look half bad... while slack's down, seems like a pretty optimal chance to try https://github.com/mattermost/mattermost-server