--- config: layout: auto --- flowchart LR A[🏠 my-r-project/] --> B[📄 .gitignore<br/>ROOT LEVEL] A --> C[📁 R/] A --> D[📁 data/] A --> G[📁 vignettes/] A --> J[📁 docs/] A --> N[📄 README.md] B --> B1[".Rproj.user/<br/>*.Rproj<br/>.Rhistory<br/>.RData<br/>.Ruserdata<br/>docs/<br/>Meta/<br/>doc/<br/>*.tar.gz"] C --> C1[📄 utils.R] C --> C2[📄 main_functions.R] C --> C3[📄 plot_functions.R] D --> D1[📄 .gitignore<br/>DATA SPECIFIC] D --> D2[📄 raw_data.csv] D --> D3[📄 processed_data.rda] D1 --> D1A["*.csv<br/>*.xlsx<br/>raw_*<br/>temp_*<br/>backup_*"] G --> G1[📄 introduction.Rmd] G --> G2[📄 .gitignore<br/>VIGNETTE SPECIFIC] G2 --> G2A["*.html<br/>*.pdf<br/>*_cache/<br/>*_files/<br/>figure-html/"] J --> J1[📄 .gitignore<br/>DOCS SPECIFIC] J --> J2[📁 _site/] J --> J3[📄 pkgdown.yml] J1 --> J1A["_site/<br/>*.html<br/>search.json<br/>sitemap.xml"] style A fill:#e1f5fe classDef gitignoreFile fill:#ff9800,stroke:#e65100,stroke-width:2px,color:#fff class B,D1,F3,G2,H2B,H3B,I2,J1,K2 gitignoreFile
If you are working with Git but find yourself dealing with unnecessary files cluttering your repository, .gitignore
is a tool that can help. Let’s explore what it does in plain terms.
What is gitignore?
.gitignore
is simply a text file that tells Git which files or folders to ignore in your project. It works like an instruction list for version control - when Git sees something mentioned in this file, it pretends that file doesn’t exist and will not include it in tracking changes.
Think of it as creating a “do not pack” list before the travel. .gitignore
helps programmers avoid committing files they don’t want to track. It also helps programmers exclude the files that don’t need to be versioned.
The Problem It Solves
When working with code - especially languages like SAS
, R
, or Python
- we often generate temporary files:
- Log files showing execution results
- Temporary output files
- Large datasets created during processing
- Configuration files specific to the local machine
- Compiled binaries and dependencies
These files can clutter the repository, making it harder to see actual code changes. They also unnecessarily increase repository size, which can slow down operations.
.gitignore
solves this by automatically excluding these unwanted files from version control without someone having to manually specify each time.
How It Works
The .gitignore
file uses simple patterns to match filenames:
*.log
- ignores all files ending with .logtemp/
- ignores any folder named temp/build/
- ignores a build folder in the root directory*.tmp
- ignores all temporary files with extension.tmp
!*.sas
- do not ignore all files ending with .sas# This is a comment
- Anything after a # is a comment and is ignored
For complete syntax please refer to the official documentation.
These rules are applied whenever Git checks for changes, so you only see relevant modifications.
About Multiple .gitignore
Files
It’s worth noting that it is possible to have multiple .gitignore
files throughout the project. .gitignore
files are not limited to just one in the root directory. When we place a .gitignore
file in a subfolder, Git applies those specific ignore patterns only within that folder.
This is particularly useful for tools like R’s renv package or Python’s virtual environments, which might have their own temporary files and configurations that should be ignored at different levels of project structure.
Different Ways to Use gitignore
Basic Setup
For most projects, create a .gitignore
file in your project’s root directory with patterns specific to your language or tools:
# Ignore log files
*.log
*.tmp
# Ignore compiled output
/bin/
/dist/
# Ignore IDE configuration files
.idea/
.vscode/
Project-Specific Rules
Different programming languages often have different temporary files:
For SAS programs:
# To exclude SAS log, lst, and sas7bdat files *.log *.lst *.sas7bdat
For R projects:
# To exclude R temporary files .RData .Rhistory .Rproj.user *.Rproj
For Python projects:
__pycache__/ *.pyc .env .pytest_cache .venv/
Global Ignore Patterns
We can set up global ignore patterns that apply to all our repositories:
git config --global core.excludesfile ~/.gitignore_global
Then add common patterns in .gitignore_global
file located at ~/
(home directory) that should be ignored across all projects.
Important Note
While .gitignore
is powerful, it has an important feature: it won’t ignore files that are already included (staged or committed) in Git. If a file is added to the staging area with git add
, Git will continue tracking it even if it matches patterns in existing .gitignore
file.
If such a file is already included and needs to be excluded from version control, this need to explicitly remove it from tracking using:
git rm --cached filename
This removes the file from version control but leaves it on the local filesystem.
Best Practices
- Create early: Add
.gitignore
at the beginning of a project - Commit it: Make sure
.gitignore
itself is version controlled - Share with team: Everyone working on a project should use the same rules
- Review occasionally: As your project evolves, update your ignore patterns
- Exclude all/include some: To avoid new file types from being tracked, exclude all and include what is expected
Resources
Conclusion
.gitignore
is a simple but powerful tool that helps maintain clean repositories by excluding unnecessary files. It’s not magical - just practical configuration that saves time and reduces clutter in version control systems.
If you haven’t used .gitignore
before, give it a try on your next project. You’ll likely find yourself wondering how you ever worked without it!